Transactions in Windows Azure (with Service Bus) - An Email Discussion

I had a email discussion late last weekend and through this weekend on the topic of transactions in Windows Azure. One of our technical account managers asked me on behalf of their clients how the client could migrate their solution to Windows Azure without having to make very significant changes to their error management strategy – a.k.a. transactions. In the respective solution, the customer has numerous transactions that are interconnected by queuing and they’re looking for a way to preserve the model of taking data from a queue or elsewhere, performing an operation on a data store and writing to a queue as a result as an atomic operation.

I’ve boiled down the question part of the discussion into single sentences and edited out the customer specific pieces, but left my answers mostly intact, so this isn’t written as a blog article.

The bottom line is that Service Bus, specifically with its de-duplication features for sending and with its reliable delivery support using Peek-Lock (which we didn’t discuss in the thread, but see here and also here) is a great tool to compensate for the lack of coordinator support in the cloud. I also discuss why using DTC even in IaaS may not be an ideal choice:

Q: How do I perform distributed, coordinated transactions in Windows Azure?

2PC in the cloud is hard for all sorts of reasons. 2PC as implemented by DTC effectively depends on the coordinator and its log and connectivity to the coordinator to be very highly available. It also depends on all parties cooperating on a positive outcome in an expedient fashion. To that end, you need to run DTC in a failover cluster, because it’s the Achilles heel of the whole system and any transaction depends on DTC clearing it.

In cloud environments, it’s a very tall order to create a cluster that’s designed in a way similar to what you can do on-premises by putting a set of machines side-by-side and interconnecting them redundantly. Even then, use of DTC still put you into a CAP-like tradeoff situation as you need to scale up.

Since the systems will be running in a commoditized environment where the clustered assets may quite well be subject to occasional network partitions or at least significant congestion and the system will always require – per 2PC rules – full consensus by all parties about the transaction outcome, the system will inevitably grind to a halt whenever there are sporadic network partitions. That risk increases significantly as the scale of the solution and the number of participating nodes increases.

There are two routes out of the dilemma. The first is to localize any 2PC work onto a node and scale up, which lets you stay in the classic model, but will limit the benefits of using the cloud to having externalized hosting. The second is to give up on 2PC and use per-resource transaction support (i.e. transactions in SQL or transactions in Service Bus) as a foundation and knit components together using reliable messaging, sagas/compensation for error management and, with that, scale out.

Q: Essentially you are saying that there is absolutely no way of building a coordinator in the cloud?

I’m not saying it’s absolutely impossible. I’m saying you’d generally be trading a lot of what people expect out of cloud (HA, scale) for a classic notion of strong consistency unless you do a lot of work to support it.

The Azure storage folks implement their clusters in a very particular way to provide highly-scalable, highly-available, and strongly consistent storage – and they are using a quorum based protocol (Paxos) rather than classic atomic TX protocol to reach consensus. And they do while having special clusters that are designed specifically to that architecture – because they are part of the base platform. The paper explains that well.

Since the storage system and none of the other components trust external parties to be in control of their internal consistency model and operations – which would be case if they’d enlist in distributed transactions – any architecture built on top of those primitives will either have to follow a similar path to what the storage folks have done, or start making trades.

You can stick to the classic DTC model with IaaS; but you will have to give up using the PaaS services that do not support it, and you may face challenges around availability traded for consistency as your resources get distributed across the datacenter and fault domains for – ironically – availability. So ultimately you’ll be hosting a classic workload in IaaS without having the option of controlling the hardware environment tightly to increase intra-cluster reliability.

The alternative is to do what the majority of large web properties do and that is to deal with these constraints and achieve reliability by combining per-resource transactions, sagas, idempotency, at-least-once messaging, and eventual consistency.

Q: What are the chances that you will build something that will support at least transactional handoffs between Service Bus the Azure SQL database?

We can’t directly couple a SQL DB and Service Bus because SQL, like storage, doesn’t allow transactions that span databases for the reasons I cited earlier.

But there is a workaround using Service Bus that gets you very close. If the customer’s solution DB had a table called “outbox” and the transactions would write messages into that table (including the destination queue name and the desired message-id), they can get full ACID around their DB transactions. With storage, you can achieve a similar model with batched writes into singular partitions.

We can’t make that “outbox” table, because it needs to be in the solution’s own DB and inside their schema. A background worker can then poll that table (or get post-processing handoff from the transaction component) and then replicate the message into SB.

If SB has duplicate detection turned on, even intermittent send failures or commit issues on deleting sent messages from the outbox won’t be a problem, so this simple message transfer doesn’t require 2PC since the message is 100% replicable including its message-id and thus the send is idempotent towards SB – while sending to SB in the context of the original transaction wouldn’t have that.

With that, they can get away without compensation support, but they need to keep the transactions local to SQL and the “outbox” model gives the necessary escape hatch to do that.

Q: How does that work with the duplicate detection?

The message-id is a free-form string that the app can decide on and set as it likes. So that can be an order-id, some contextual transaction identifier or just a Guid. That id needs to go into the outbox as the message is written.

If the duplicate detection in Service Bus is turned on for the particular Queue, we will route the first message and drop any subsequent message with the same message-id during the duplicate detection time window. The respective message(s) is/are swallowed by Service Bus and we don’t specifically report that fact.

With that, you can make the transfer sufficiently robust.

Duplicate Detection Sample: http://code.msdn.microsoft.com/windowsazure/Brokered-Messaging-c0acea25#content

Clemens Vasters