Distributed Transactions and Virtualization

A suggestion was made on mygreatwindowsazureidea.com for Windows Azure Service Bus to support distributed transactions. The item isn’t very popular on the site with 7 votes, but I know that that’s a topic near and dear to the heart of many folks writing business solutions. We in the Service Bus team owning MSMQ and the Workflow team next door owns DTC and we’re getting enough requests now that we’ll start working on better guidance around transactions in the coming months, some of which will come in form of clips on my Subscribe blog on Channel 9.

What’s not likely going to happen is that we will provide a magic “it just works” solution that brings DTC and the 2PC model to the cloud. Why? Because 2PC isn’t doing well in that world. Here is my reply to the post on mygreatwindowsazureidea.com for better linking:

Hi, I'm on the Service Bus team and I very much appreciate the intent of this suggestion.

I wish we could enable that easily, but unfortunately this is a hard problem.

The distributed transaction model with the common 2-phase-commit protocol with a central coordinator is very suitable as a convenient error management mechanism for physical single-node systems and for small clusters of a few physical nodes that are close together. As you get very serious about scale, virtualization and high availability, the very foundation of that model starts shaking.

For 2PC to work, the coordinator’s availability both in terms of compute but also in terms of network availability must be close to perfect. If you lose the coordinator or you lose sight of the coordinator and you have resources in ‘prepared’ state, there is no reasonable mechanism for those resources to break their promise and back out in 2PC. On premises, the solution to that is to cluster DTC with the Windows Clustering services on a shared, redundant disk array and have redundant networking to all resources. Unless you do exactly that, you’re not likely building a solution that survives a DTC hardware component failure without running in major trouble on the software side. Once you step into virtualized environments, a lot of the underlying assumptions of that cluster setup start to break down as the virtualization environment and placement strategies introduce new risk into the relationship between the clustered resources.

Likewise, the resource managers themselves are moved further away. You no longer have a tightly controlled system where everything runs in a rack and is on the same network segment with negligible latency. Things run scattered over many racks. The bias in virtualization environments and the cloud is system availability (i.e. the majority of nodes in a system is available) and not single-node reliability (i.e. nodes don't go down).

The 2PC model largely assumes that individual transactions go wrong due to intermittent issues and not due to losing random nodes completely and without notice. It obviously does provide a lifeline for when resource managers run into serious system issues as transactions are in progress, but it’s generally not very suitable for a world where workloads span many nodes and stuff goes up and down and moves all the time for the sake of overall system availability when that also includes the coordinator.

The result of using distributed transactions spanning multiple nodes in such an environment is, at worst and as explained by the CAP theorem, a complete gridlock as locks get placed and held and either take very long to resolve or end up leaving transactions in doubt requiring intervention.

Ultimately, MSDTC is a single-node/cluster and local-network technology, which also manifests in its security model that is fairly difficult to adapt to a multitenant cloud system.

Mind that I am by no means looking to cast any doubts over anyone's use of MSDTC within its design scope. MSDTC is proven and rock-solid reliable within those limits. When all resources are on one node or are close together, belong to a single tenant/app, and within a trust domain, it is and remains a great choice, because of the simplicity it provides around failure management, even for work spanning multiple resources inside a Windows Azure VM.

Due to these considerations, it's hard for us to support classic distributed transactions with DTC enlistment because people would justifiably expect them to "just work" - and it's hard to see how they would. Beyond that, I have serious concerns around system availability and security if locks on Service Bus' internal resources could be impacted by third parties by ways of having them enlisted in a transaction even if we were owning the coordinator.

That all said, we do have DTC support for MSMQ, which is also owned by the Service Bus team. The way to get DTC support for Service Bus is to proxy it with a local MSMQ queue and then do a reliable handoff to Service Bus with a pump. We already have a sample for that and we will framework that further:

http://code.msdn.microsoft.com/windowsazure/Service-Bus-Durable-Sender-0763230d

The considerations for Service Bus for Windows Server are similar.

Clemens Vasters