If you are a developer and don't live in the Netherlands (where SOA stands, well known, for "Sexueel Overdraagbare Aandoeningen" = "Sexually transmitted diseases”), you may have heard by now that SOA stands for "service oriented architectures".

What's really interesting about talks and articles about SOA (including the ones that I gave on this year's Microsoft Architect's Tour) tend to focus almost exclusively on the glue between services and how the use of registries and dynamic binding, message and service contract exchange and negotiation and use of standard protocols and data exchange formats promises greater flexibility for enterprise architectures, but little is said about the characteristics of the "services" themselves.

So, here's a bit of my current thinking around services; by now I think I could probably fill a book on the topic and therefore a blog entry cannot come even close to give the complete picture. Also, I don’t claim to say anything new here, but rather just want to have it all once in one place on my blog. So here we go:

***

Very broadly speaking, a service is an autonomous unit that is responsible for a transformation, storage and/or retrieval of data. Services never interact with other services by side-effect, meaning there is no notion of inter-service (application) state that is not explicitly exchanged through messages. Services are accessed through well-defined public access points that are governed by contracts that tightly define the set of supported messages, the message content and the applicable service policies.

To explain services and their motivation, I will first have to write about objects. The basic idea of exclusive data ownership is not too dissimilar to the idealistic view domain objects. There, you have an object “Customer 12345678 Peter Miller” and that object has its own “save data” and “load data” capability. To activate (load) an object from persistent storage, you go through some sort of factory that is getting the object identity as an argument and from there, all you talk to is the object and the object’s inner implementation worries about the details of storage all by itself.

However, in contrast to the object notion of having data and code in one place and at one location, services strictly separate between code and data. The customer record mentioned above isn’t a uniquely identifiable in-memory and even not an addressable on-disk entity that’s known throughout the system, but data simply flows through the system and the same record may exist in multiple places at the same time. In a service world, there are no objects, there is just data.

The idea of “self-contained” domain objects, while thought to be an ideal modularization model, indeed most often fails to provide modularity. A data record can be stored to and retrieved from many different data sources within the same application out of a variety of motivations. It may be stored in an offline “isolated storage” replica on a mobile machine, inside a queue message that is processed only during an hourly or daily batch run, in a SQL database, an in-memory caching structure and many more places based on its character. The character of a data record includes, for instance, how often and by how many concurrent activities it is likely to be changed and therefore how safe it is to create read-only replicas of the record and how long these replicas can be regarded as being valid and accurate or even just “good enough” to base further processing on them. Likewise, a data record can be rendered for presentation to both human and machine consumers is a vast variety of ways, ranging from an XML fragment over HTML rendering to sophisticated 3D graphics visualizations.

Although the idea of object-centric storage, object self-responsibility and universal object-identity is fantastically attractive, a single object implementation that attempts to accommodate all these requirements simply results in a monolithic application block that is anything but modular. Even when it comes to “business logic”, the implementation of the rules that govern the contextual correctness and integrity of an object, putting all rules that result from the requirements for an entire system into a single class breaks the separation of subsystems. Creating a single “customer” object class for a bank’s loan, investment and financial collections business is essentially impossible because of conflicting rules and requirements and a different perception of “customer” by these businesses. Still, it is standard procedure to have a central database with data records that hold the customer information shared by these systems – and a service that governs this data store. Not too rarely that service goes by the name “host communication” and shifts data records on and off the mainframe via CICS transactions.

The consequence from this thinking about domain objects and generally about the notion of object identity and self-responsibility (and I am sure that a lot of people will disagree violently with me on that) is that there is not only no proper way of realizing the dream of “true objects”, but there is indeed no way of defining any method for a domain object in a way that it doesn’t result in a monolith spanning multiple concerns having methods that are inappropriate or wrong to be used in certain contexts.

However, this statement explicitly excludes property access methods that enforce rules like “value must be greater-equal to 0 and less-equal to 100”, because the value in question may represent an expression in percent. Now, one could argue that the fact that such property access methods are a clear example why domain objects do indeed make sense, because these method implement fundamental business logic, but in my view they don’t. The fact that property access methods enforcing such rules must exist, simply fixes an inherent weakness in the type system of most mainstream programming languages. The rule [0<=x<=100] is a property of the “percentage” data type, but that doesn’t readily map into most languages. Hence, it’s the job of explicit coding to fix that limitation and provide stronger types. The type description language XML Schema (and siblings like Schematron and Relax NG) provides facilities to define data types of the desired strength and infrastructures supporting these description formats are capable of either enforcing these rules without specific coding or generating the code required to enforce them. Property access methods are just a way to overcome programming model limitations and enforcing contract, they are not an object feature or business logic. At least I don’t see them that way.

So, if you’re still reading after I’ve slaughtered the idea of “objects” for a modular, layered and even distributed system, it’s not so far to go from here to the essence of what a service is.

To recap the initial statement, a service is an autonomous unit. The autonomous character of a service results from the combination of exclusive responsibility for certain operations on data and a strict definition of the message contracts for both the messages it receives and the messages it is able to provide.

Exclusive responsibility means that there is exactly on service in a given system that may perform a certain operation on data, for instance storing and retrieving data into a certain set of tables on a certain database. Any other service that requires access to this data must use the responsible service. This serves to guarantee that only a single implementation of (for instance) data consistency rules exists, but also helps to eliminate assumptions that hinder (again, for instance) scalability. One of these problematic assumptions is that all records of a given type are co-located in the same database or in the same location. That assumption is okay as long as you don’t have to deal with a massive data volume or very high concurrency with very frequent transactional writes. In these cases, it may be beneficial to break up the storage into multiple tables or even across multiple databases, which may or may not be directly supported by your database system. If it isn’t or doesn’t work with the desired flexibility, it’s nearly impossible to introduce this scalability technique once everyone is permitted to access backend storage directly. (To get an idea of this sort of parallelism and partitioning, check out this PPT by Jim Gray).

Ruling out that state is implicitly shared between services (in memory or on disk) is a direct consequence from this and also serves the scalability purpose, because it further eliminates co-location assumptions about services and enables clustering. Note that this isn’t about “stateless” or “stateful”. Everything is stateful while it runs.

Strong contracts and operational guarantees further allow you to rely on (trust) the service that it will be able to perform a given task without passing the caller an error that it likely can’t handle, anyways. If the message contract and the description of types are sufficiently precise, a service won’t ever and should never have to come back to the caller with an “invalid argument” exception. If input is compliant with a contract, it’s the receiving service’s own problem to deal with any issues it has with the data, even if that involves manual resolution by an operator. The sender (client) can’t and won’t have any additional information and implemented measures to fix the input if the contract isn’t sufficiently expressing the constraints. Operational guarantees like transactional processing and reliable transport make sure that the data that is passed on to a service does not get lost on the way or gets lost when a processing attempt is unsuccessfully. If a service (A) can trust that an invoked service (B) will be able to handle a set of data contained in a message and can trust that processing will occur without further intervention by (A), the processing can occur asynchronously and the message sent from (A) to (B) can be queued and load balanced.

This is not only true for one-way storage operations, but also for requesting data. If (A) can trust that service (B) will not simply fail a request operation and die, but is able to recover from any problem (with a reasonable probability) it may run into and send a response, and (A) passes (B) a reply-to entry point to drop the request result into, (A) can safely trust to end or suspend processing until that reply arrives or, if required, a timeout occurs. This type of asynchronous “call me back when you’re ready” interaction between services is called “dialog” and much better suited for fair load distribution in distributed systems than request/response. In essence, dialogs turn the call trees resulting from request/response operations into a sequence of one-way operations. [A further important aspect in this context is 2PC vs. compensating transactions, but I won’t go into that here and now]

Asynchronous and parallel operation is a key element of both scalable systems and systems that operate well in the presence of substantial communication constraints like network latency and the required processing introduced by strong security boundaries. The vision of Web services as an integration tool of global scale exhibits these and other constraints, making it necessary to enable asynchronous behavior and parallel processing as a core principle of mainstream application design and don’t leave that as a specialty to the high-performance and super-computing space.

Summarizing, services and service oriented architectures are, in a sense, a return to quite a few of the good old principles of structured programming and batch processing. Data and code are kept separate in order to allow cross-organization, cross-platform modularization, and asynchronous processing is better than synchronous processing if you want your systems to scale. But service oriented architectures also mean that we rely much more on the abstraction and tighter definition that data contracts provide compared to what can be expressed in a programming model. Message contracts expressed in a rich, cross-platform type description language such as XML Schema are much more powerful and precise than any IDL file you could ever write and they are independent of the implementation platform that’s chosen for a particular subsystem. Service policy contracts provide a similar abstraction for the operational requirements and guarantees that can be mandated or given in order to establish the required level of trust between services independent of the platform they are implemented on.

[For some answers to reader comments go here]

Updated: