May 4, 2012
@ 03:15 PM

I’m toying around with very small and very constrained embedded devices right now. When you make millions of a small thing, every byte in code footprint and any processing cycle you can save saves real money. An XML parser is a big chunk of code. So is a JSON parser. Every HTTP stack already has a key/value pair parser for headers. We can use that.

NHTTP stands for NoHyperText Transfer Protocol. Yes, I made that up. No, this is not an April Fool’s joke. Hear me out.

All rules of RFC2616 apply, except for section 7.2, meaning there is must never an entity body on any request or reply. Instead we rely entirely on section 7.1 and its extensibility rule:

  • The extension-header mechanism allows additional entity-header fields to be defined without changing the protocol, but these fields cannot be assumed to be recognizable by the recipient. Unrecognized header fields SHOULD be ignored by the recipient and MUST be forwarded by transparent proxies.

All property payloads are expressed as key/value pairs that are directly mapped onto HTTP headers. No value can exceed 2KB in size and you can’t have more than 32 values per message so that we stay comfortably within common HTTP infrastructure quotas. To avoid collisions with existing headers and to allow for easy enumeration, each property key is prefixed with “P-“

POST /foo HTTP/1.1
Host: example.com
Content-Length: 0
P-Name: “Clemens”

HTTP/1.1 200 OK
Content-Length: 0
P-Greeting: “Hello, Clemens”

(The fun bit is that the Windows Azure Service Bus HTTP API for sending and receiving messages already supports this exact model since we map custom message properties to headers and the HTTP entity body to the body of broker messages and those can be empty)

Categories: Architecture | Technology

I just wrote this email on a private mailing list and thought it may make sense to share it. The context of the discussion was overuse of the term “REST” in a document discussing an HTTP API:


REST is a set of architectural principles. REST describes how state flows and describes the shape of relationships between the parties in a distributed system. HTTP is a protocol with a variety of stacks supporting it, and the REST principles were born out of developing HTTP. There could, in theory, a broad variety of protocols that also embody REST architecture, but  there are, in fact, very few (if any) that aren’t just variations of HTTP.

“The client sends …”, “The server receives …”, “The server provides an interface for …” are all statements about implementation and, thus, HTTP. It commonly starts making talking about REST specifically when debating whether a system is actually following the principles according to the 5.3.3 “Data View” section in [1], since everything up to that point in Fielding’s dissertation you get generally for free with HTTP. 

[1] http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm


Bottom line: HTTP APIs are HTTP APIs. REST is about how things hang together. The terms aren’t interchangeable. In most technical discussions about interfaces or methods or URIs and most other implementation details, HTTP API is the right term.

Categories: Architecture

March 5, 2012
@ 11:00 PM

Greg says what it’s not, and since he didn’t use the opportunity to also succinctly express what it is, I helped him out in the comments:

CQRS ("Command-Query Responsibility Segregation") is a simple pattern that strictly segregates the responsibility of handling command input into an autonomous system from the responsibility of handling side-effect-free query/read access on the same system. Consequently, the decoupling allows for any number of homogeneous or heterogeneous query/read modules to be paired with a command processor and this principle presents a very suitable foundation for event sourcing, eventual-consistency state replication/fan-out and, thus, high-scale read access. In simple terms: You don’t service queries via the same module of a service that you process commands through. For REST heads: GET wires to a different thing from what PUT/POST/DELETE wire up to.

Martin Fowler has a nice discussion here, with pictures. Udi Dahan has another nice description, also with pictures. To say it in yet another way, the key point of the pattern is that the read and write paths in a system are entirely separate.

Categories: Architecture | SOA

February 13, 2012
@ 11:04 PM

(tl;dr https://github.com/SignalR/SignalR.WindowsAzureServiceBus)

Our friends over in the ASP.NET team are working on a very nice, lightweight web-browser eventing technology called SignalR. SignalR allows server-pushed events into the browser with a variety of transport options and a very simple programming model. You can get it via NuGet and/or see growing and get the source of on github. There is also a very active community around SignalR chatting on Jabbr.net, a chat system whose user model is derived from IRC, but that runs – surprise – on top of SignalR.

For a primer, check out the piece that Scott Hanselman wrote about SignalR a while back.

At the core, SignalR is a lightweight message bus that allows you to send messages (strings) identified by a key. Ultimately it’s a key/value bus. If you’re interested in messages with one or more particular keys, you walk up and ask for them by putting a (logical) connection into the bus – you create a subscription. And while you are maintaining that logical connection you get a cookie that acts as a cursor into the event stream keeping track of what you have an have not seen, which is particularly interesting for connectionless transports like long polling.

SignalR implements this simple pub/sub pattern as a framework and that works brilliantly and with great density, meaning that you can pack very many concurrent notification channels on a single box.

What SignalR, out-of-the-box, doesn’t (or didn’t) provide yet is a way to stretch its message bus across multiple nodes for even higher scale and for failover safety.

That’s where Service Bus comes in.

Last week, I built a Windows Azure Service Bus backplane for SignalR that allows deploying SignalR solutions to multiple nodes with message distribution across those nodes and ensuring proper ordering on a per-sender basis as well as node-to-node correctness and consistency for the cursor cookies. That code is Apache licensed and now available on github.

You can use this backplane irrespective where you host solutions that use SignalR, as long as your backend host has access to a Service Bus namespace. That’s obviously best in one of the Windows Azure datacenters, but will work just as well anywhere else, albeit with a few msec more latency.

If you want to try it out, here are the steps (beyond getting the code):

  1. Make a small SignalR app or take one from the SignalR samples (caveat below)
  2. Make a Windows Azure account and a Service Bus namespace.  For that, follow the same steps as outlined in the Multi-Tier apps tutorial on MSDN.
  3. Compile the extension project and add it to your SignalR solution
  4. At initialization time (global.asax, startup, etcetc), you need to reference (using)the SignalR.WindowsAzureServiceBus namespace and then add  the following initialization code: AspNetHost.DependencyResolver.UseWindowsAzureServiceBus(“{namespace}“,”{account}”, “{key}”, "{appname}", 2);
  5. Compile, run

In the above example, {namespace} is the Service Bus namespace you created following the tutorial steps, {account} is likely “owner” (to boot) and {key} is the default key you copied from the portal. {appname} is some string, without spaces, that disambiguates your app from other apps on the same namespace and 2 stands for splitting the Service Bus traffic across 2 topics. 

Most of the SignalR samples don’t quite work yet in a scale-out mode since they hold local, per-node state. That’s getting fixed.

If you want to see SignalR and Service Bus in action right this second, you can hop into the Azure chat room on our test deployment of jabbr that runs across 4 nodes.

Categories:

The term “Enterprise-Class Software” was always a funny one since it’s been used to describe an amorphous class of software that’s somehow “more serious” than any other software – it’s “enterprise class”. But what is “enterprise class” software? What’s the “enterprise” that the term refers to? A 3,000 employee business? Or a 30,000 employee business? Can 3 people in an office do “serious” business that requires “enterprise class” software?

In the last few of years, I’ve seen the term slide from being merely amorphous to being outright poisonous and pointless.

We’re at a point where the private needs of an “enterprise”, and that includes the biggest ones, doesn’t represent much of an interesting frontier for software anymore, at least not from an architectural perspective. The industry has learned and is continuing to expand the knowledge on how to build systems that scale well beyond the need of an individual enterprise. We have learned how to deal with failure in ways that no longer require huge boxes with built-in triple redundancy.

The term “Enterprise Software” is poisonous because it implies a scalability ceiling for a solution serving a few 10,000 people that used to be ambitious, but that’s no longer the case. It’s also no longer the hardest class of software solutions – not at all.

A simple multi-player game for a mobile device that allows a quarter million or more players in groups of 16 to play against each other is killing the vast majority of “enterprise class” solutions in terms of required sophistication of its messaging infrastructure. It needs to scale to  an audience size that even the largest enterprises in the world won’t get on a system concurrently and that audience is going to be enormously unhappy if the infrastructure is dropping messages or is slow. Just search for “MW3 lag” and you’ll find plenty of opinion about latency that’s louder and much more impactful on business than a few hundred employees lamenting about the speed of an internal website, because here you’ve got customers who are individually putting money on the table.

To suggest that consumer-facing scenarios are not “enterprise class” isn’t just arrogant, it’s also dangerous. Enterprises that don’t embrace mobile devices and start deeply interacting with consumers at “consumer scale” will find themselves in deep trouble. Software infrastructure vendors will find themselves in deep trouble if they still think of “enterprise scale” as a key capability or aspiration because their own customers, enterprises, will have to turn outward and embrace consumer scale or face the consequences. There’s also no reliability monopoly that “enterprise” holds in some way. When a systems needs to perform millions of transactions directly interacting with consumers, there’s no room for sweeping failures under the rug. It is actually easier to handle failures and provide high reliability in a tightly controlled, low-scale system than in a system that is the front-door of a business where each hiccup may personally upset a paying customer. It is also easier to shut down a system over the weekend or over a long holiday break than to keep it running 365/7/24. And it’s also easier to secure an insulated internal system than one that’s exposed to the outside world and where a single breach can tarnish the entire business’s reputation.  

“Enterprise software” is something where a single sale can generate a ton of revenue. But “enterprise-class” doesn’t represent anything interesting on the engineering side – except an outdated set of goals and a level of scalability that’s several magnitudes below what’s needed for an environment where more and more consumers have a personal computing device in their pocket and another on the coffee table.

PS: Put “Web scale” on the same heap of useless terms. You ought to scale to your audience, not the web.

Categories:

I've long wanted to write an architecture book. The problem with books is that writing them robs you of at least half a year of your life, and that half a year is a very painful one. I've done it. So instead of a book, I talked to the folks at Pluralsight and made an on-demand video course.  

"The Elements of Distributed Architecture" is about the foundational elements of distributed architecture and about the ‘physics’ that affect distributed software designs. The goal of this course, which is designed to be independent of specific languages, technologies, and products, is to provide software teams with a shared baseline of concepts and terminologies in the areas of information management, communication, presentation, processing, failure management, security, and safety.

I think of this course as a baseline and there's plenty of runway for more in-depth material. If you like this one that will give me motivation to spend more private time (this course is not related to my Microsoft job) to create architectural material.  

If you don't have a Pluralsight account, you can sign up for the free trial here

Categories: Architecture

December 1, 2011
@ 05:38 PM

I answered 4 questions in Richard Seroter’s series of interviews with folks working on connect systems. See the Q&A here.

Categories: Architecture | SOA

Elastic and dynamic multitenant cloud environments have characteristics that make traditional failure management mechanisms using coordinated 2-phase transactions a suboptimal choice. The common 2-phase commit protocols depend on a number of parties enlisted into a transaction making hard promises on the expected outcome of their slice a transaction. Those promises are difficult to keep in an environment where systems may go down at any time with their local state vanishing, where not all party trust each other, where significant latency may be involved, and network connectivity cannot be assumed to be reliable. 2-phase-commit is also not a good choice for operations that take significant amounts of time and span a significant amount of resources, because such a coordinated transaction may adversely affect the availability of said resources, especially in cases where the solution is a high-density multitenant solution where virtualized, and conceptually isolated resources are collocated on the same base resources. In such a case, database locks and locks on other resources to satisfy coordinated transaction promises may easily break the isolation model of a multitenant system and have one tenant affect the other.

Therefore, failure management – and this is ultimately what transactions are about – requires a somewhat different approach in cloud environments and other scalable distributed systems with similar characteristics.

To find a suitable set of alternative approaches, let’s quickly dissect what goes on in a distributed transaction:

To start, two or more parties ‘enlist’ into a shared transaction scope performing some coordinated work that’s commonly motivated by a shared notion of a ‘job’ that  needs to be executed. The goal of having a shared transaction scope is that the overall system will remain correct and consistent in both the success and the failure cases. Consistency in the success case is trivial. All participating parties could complete their slice of the job that had to be done. Consistency in the failure case is more interesting. If any party fails in doing their part of the job, the system will end up in a state that is not consistent. If you were trying to book a travel package and ticketing with the airline failed, you may end up with a hotel and a car, but no flight. In order to prevent that, a ‘classic’ distributed transaction asks the participants to make promises on the outcome of the transaction as the transaction is going on.

As all participating parties have tentatively completed but not finalized their work, the distributed transaction goes into a voting phase where every participant is asked whether it could tentatively complete its portion of the job and whether it can furthermore guarantee with a very high degree of certainty that it can finalize the job outcome and make it effective when asked to do so. Imagine a store clerk who puts an item on the counter that you’d like to purchase – you’ll show him your $10 and ask for a promise that he will hand you the item if you give him the money – and vice versa.

Finally, once all parties have made their promises and agreed that the job can be finalized, they are told to do so.

There are two big interesting things to observe about the 2-phase-commit (2PC) distributed transaction model that I just described: First, It’s incredibly simple from a developer’s perspective because the transaction outcome negotiation is externalized and happens as ‘magic’. Second, it’s not resembling anything that happens in real life and that should be somewhat suspicious. You may have noticed that there was no neutral escrow agent present when you bought the case of beverages at the store for $10 two paragraphs earlier.

The grand canonical example for 2PC transactions is a bank account transfer. You debit one account and credit another. These two operations need to succeed or fail together because otherwise you are either creating or destroying money (which is illegal, by the way). So that’s the example that’s very commonly used to illustrate 2PC transactions. The catch is – that’s not how it really works, at all. Getting money from one bank account to another bank account is a fairly complicated affair that touches a ton of other accounts. More importantly, it’s not a synchronous fail-together/success-together scenario. Instead, principles of accounting apply (surprise!). When a transfer is initiated, let’s say in online banking, the transfer is recorded in form of a message for submission into the accounting system and the debit is recorded in the account as a ‘pending’ transaction that affects the displayed balance. From the user’s perspective, the transaction is ’done’, but factually nothing has happened, yet. Eventually, the accounting system will get the message and start performing the transfer, which often causes a cascade of operations, many of them yielding further messages, including booking into clearing accounts and notifying the other bank of the transfer. The principle here is that all progress is forward. If an operation doesn’t work for some technical reason it can be retried once the technical reason is resolved. If operation fails for a business reason, the operation can be aborted – but not by annihilating previous work, but by doing the inverse of previous work. If an account was credited, that credit is annulled with a debit of the same amount. For some types of failed transactions,  the ‘inverse’ operation may not be fully symmetric but may result in extra actions like imposing penalty fees. In fact, in accounting, annihilating any work is illegal – ‘delete’ and ‘update’ are a great way to end up in prison.

As all the operations occur that eventually lead to the completion or failure of the grand complex operation that is a bank transfer, the one thing we’ll be looking to avoid is to be in any kind of ‘doubt’ of the state of the system. All participants must be able to have a great degree of confidence in their knowledge about the success or failure of their respective action. No shots into the dark. There’s no maybe. Succeed or fail.

That said, “fail” is a funny thing is distributed systems because it happens quite a bit. In many cases “fail” isn’t something that a bit of patience can’t fix. Which means that teaching the system some patience and tenacity is probably a good idea instead of giving up too easily. So if an operation fails because it runs into a database deadlock or the database is offline or the network is down or the local machine’s network adapter just got electrocuted that’s all not necessarily a reason to fail the operation. That’s a reason to write an alert into a log and call for help for someone to fix the environment condition.

If we zoom into an ‘operation’ here, we might see a message that we retrieve from some sort of reliable queue or some other kind of message store and subsequently an update of system state based on message. Once the state has been successfully updated, which may mean that we’ve inserted a new database record, we can tell the message system that the message has been processed and that it can be discarded. That’s the happy case.

Let’s say we take the message and as the process wants to walk up the database the power shuts off. Click. Darkness. Not a problem. Assuming the messaging system supports a ‘peek/lock’ model that allows the process to first take the message and only remove it from the queue once processing has been completed, the message will reappear on the queue after the lock has expired and the operation can be retried, possibly on a different node. That model holds true for all failures of the operation through to and in the database. If the operation fails due to some transient condition (including the network card smoking out, see above), the message is either explicitly abandoned by the process or returns into the queue by ways of a lock timeout.  If the operation fails because something is really logically wrong, like trying to ship a product out of the inventory that’s factually out of stock, we’ll have to take some forward action to deal with that. We’ll get to that in a bit.

Assuming the operation succeeded, the next tricky waypoint is failure after success, meaning that the database operation succeeded, but the message subsequently can’t be flagged as completed and thus can’t be removed from the queue. That situation would potentially lead to another delivery of the message even though the job has already been completed and therefore would cause the job to be executed again – which is only a problem if the system isn’t expecting that, or, in fancier terms, if it’s not ‘idempotent’. If the job is updating a record to absolute values and the particular process/module/procedure is the only avenue to perform that update (meaning there are no competing writers elsewhere), doing that update again and again and again is just fine. That’s natural idempotency. If the job is inserting a record, the job should contain enough information, such as a causality or case or logical transaction identifier that allows the process to figure out whether the desired record has already been inserted and if that’s the case it should do nothing, consider its own action a duplicate and just act as if it succeeded.

Checkpoint: With what I said in the last two paragraphs, you can establish pretty good confidence about failure or success of individual operations that are driven by messages. You fail and retry, you fail and take forward action, or you succeed and take steps to avoid retrying even if the system presents the same job again. There’s very little room for doubt. So that’s good.

The ‘forward action’ that results from failure is often referred to as ‘compensation’, but that’s a bit simplistic. The forward action resulting from running into the warehouse with the belief that there’s still product present while the shelf is factually empty isn’t to back out and cancel the order (unless you’re doing a firesale of a touch tablet your management just killed). Instead, you notify the customer of the shipping delay, flag a correction of the inventory levels, and put the item on backorder. For the most part, pure ‘compensation’ doesn’t really exist. With every action, the system ends up in a consistent state. It’s just that some states are more convenient than others and there are some state for which the system has a good answer and some states for which it doesn’t. If the system ends up in a dead end street and just wants to sit down and cry because nobody told it what to do now, it should phone home and ask for human intervention. That’s fine and likely a wise strategy in weird edge cases.

Initiating the ‘forward action’ and, really, any action in a system that’s using messaging as its lifeline and as a backplane for failure resilience as I’m describing it here is not entirely without failure risk in itself. It’s possible that you want to initiate an action and can’t reach the messaging system or sending the message fails for some other reason. Here again, patience and tenacity are a good idea. If we can’t send, our overall operation is considered failed and we won’t flag the initiating message as completed. That will cause the job to show up again, but since we’ve got idempotency in the database that operation will again succeed (even if by playing dead) or fail and we will have the same outcome allowing us to retry the send. If it looks like we can send but sending fails sometime during the operation, there might be doubt about whether we sent the message. Since doubt is a problem and we shouldn’t send the same message twice, duplicate detection in the messaging system can help suppressing a duplicate so that it never shows up at the receiver. That allows the sender to confidently resend if it’s in doubt about success in a prior incarnation of processing the same message.

Checkpoint: We now also can establish pretty good confidence about initiating forward action or any other action in the system given if the ‘current’ action is following the principles described above.

So far I’ve talked about individual actions and also about chains of actions, albeit just in the failure case. Obviously the same applies to success cases where you want to do something ‘next’ once you’re done with ‘this’.

Now let’s assume you want to do multiple things in parallel, like updating multiple stores as part of executing a single job – which gets us back to the distributed transaction scenario discussed earlier. What helps in these cases is if the messaging system supports ‘topics’ that allow dropping a message (the job) into the messaging system once and serve the message to each participant in the composite activity via their own subscription on the topic. Since the messaging system is internally transactional it will guarantee that each message that is successfully submitted will indeed appear on each subscription so it ensures the distribution. With that, the failure handling story for each slice of the composite job turns into the same model that I’ve been explaining above. Each participant can be patient and tenacious when it comes to transient error conditions. In hard failure cases, the forward action can be a notification to the initiator that will then have to decide how to progress forward, including annulling or otherwise invoking forward actions activities that have been executed in parallel. In the aforementioned case of a ticketing failure that means that the ticketing module throws its hands up and the module responsible for booking the travel package either decides to bubble the case to the customer or an operator leaving the remaining reservations intact or to cancel the reservations for the car and the hotel that have been made in parallel. Should two out of three or more participants’ operations fail and each report up to the initiator, the initiator can either keep track of whether it already took corrective forward action on the third participant or, in doubt, the idempotency rule should avoid doing the same thing twice.

The model described here is loosely based on the notion of ‘Sagas’, which were first described in a 1987 ACM paper by Hector Garcia-Molina and Kenneth Salem, so this isn’t grand news. However, the notion of such Sagas is only now really gaining momentum with long-running and far distributed transactions becoming more commonplace, so it’s well worth to drag the model further out into the limelight and give it coverage. The original paper on Sagas is still assuming that the individual steps can be encapsulated in a regular transaction, which may not even be the case in the cloud and with infrastructures that don’t have inherent transaction support. The role of the messaging system with the capabilities mentioned above is to help compensate for the absence of that support.

… to be continued …

Categories: Architecture | SOA