“Services” in SOA posting: Comments and Responses

Javier Gonzalez sent me a mail today on my most recent SOA post and says that it resonates with his experience:

I just read your article about services and find it very interesting. I have been using OOP languages to build somewhat complex systems for the last 5 years and even if I have had some degree of success with them, I usually find myself facing those same problems u mention (why, for instance, do I have to throw an exception to a module that doesn't know how to deal with it?). Yes, objects in a well designed OOP systems are *supposed* to be loosely coupled, but then, is that really possible to completely achieve? So I do agree with u SOA might be a solution to some of my nightmares. Only one thing bothers me, and that is service implementation. Services, and most of all Web Services only care about interfaces, or better yet, contracts, but the functionality that those contracts provide have to be implemented in some way, right? Being as I am an "object fan" I would use an OO language, but I would like to hear your opinions on the subject. Also, there's something I call "service feasibility". Web Services and SOA in general do "sound" a very nice idea, but then, on real systems they tend to be sluggish, to say the least. They can put a network on its knees if the amount of information transmitted is only fair. SAOP is a very nice idea when it comes to interoperability, but the messages are *bloated* and the system's performance tend to suffer. -- I'd love to hear your opinions on this topics.

Here’s my reply to Javier:

Within a service, OOP stays as much of a good idea as it always was, because it gives us all the qualities of pre-built infrastructure reuse that we've learned to appreciate in recent years. I don't see much realistic potential for business logic or business object reuse, but OOP as a tool is well and alive.

Your point about services being sluggish has some truth to it, if you look at system components singularly. There is no doubt that a Porsche 911 is faster than a Ford Focus. However, if you look at a larger system as a whole, to stay in the picture let's take a bridge crossing a river at rush hour, the Focus and the 911 move at the same speed because of congestion -- a congestion that would occur even if everyone driving on that bridge were driving a 911. The primary goal is thus to make that bridge wider and not to give everyone a Porsche.

Maximizing throughput always tops optimizing raw performance. The idea of SOA in conjunction with autonomous computing networks decouples subsystems in a way that you get largely independent processing islands connected by one-way roads to which you can add arbitrary numbers of lanes (and arbitrary number of identical islands). So while an individual operation may indeed take a bit longer and the bandwidth requirements may be higher, the overall system can scale its capacity and throughput to infinity.

Still, for a quick reality check: Have you looked at what size packages IIOP or DCOM produce on the wire and at the number of network roundtrips they require for protocol negotiation? The scary thing about SOAP is that it is really very much in our face and relatively easy to comprehend. Thus people tend to pay more attention to it. If you compare common binary protocols to SOAP (considering a realistic mix of payloads), SOAP doesn't look all that horrible. Also, XML compresses really well and much better than binary data. All that being said, I know that the vendors (specifically Microsoft) are looking very closely at how to reduce the wire footprint of SOAP and I expect them to come around with proposals in a not too distant future.

Over in the comment view of that article, Stu Charlton raises some concerns and posts some questions. Here are some answers:

1) "No shared application state, everything must be passed through messages." Every "service" oriented system I have ever witnessed has stated this as a goal, and eventually someone got sick of it and implemented a form of shared state. The GIT in COM, session variables in PL/SQL packages, ASP[.NET] Sessions, JSP HttpSession, common areas in CICS, Linda/JavaSpaces, Stateful Session Beans, Scratchpads / Blackboards, etc. Concern: No distributed computing paradigm has ever eliminated transient shared state, no matter how messy or unscalable it is.

Sessions are scoped to a conversation; what I mean is application-scoped state shared across sessions. Some of the examples you give are about session state, some are about application state. Session state can’t be avoided (although it can sometimes be piggybacked into the message flow) and is owned by a particular service. If you’ve started a conversation with a service, you need to go back to that service to continue the conversation. If the service itself is implemented using a local (load balance and/or failover) cluster that’s great, but you shouldn’t need to know about it. Application state that’s shared between multiple services provided by an application leads to co-location assumptions and is therefore bad.

2) "A customer record isn't uniquely identifiable in-memory and even not an addressable on-disk entity that's known throughout the system" -- Question: This confuses me quite a bit. Are you advocating the abolishment of a primary key for a piece of shared data? If not, what do you mean by this: no notion of global object identity (fair), or something else?

I am saying that not all data can and should be treated alike. There is shared data whose realistic frequency of change is so low, that it simply doesn’t deserve uniqueness (and be identified by a primary key in a central store). There is shared data for which a master copy exists, but of which many concurrent on-disk replicas and in-memory copies may safely float throughout the system as long as there is understanding about the temporal accuracy requirements as well as about the potential for concurrent modification. While there is always a theoretical potential for concurrent data modification, the reality of many systems is that a records in many tables can and will never be concurrently accessed, because the information causing the change does not surface at two places at the same time. How many call center agents will realistically attempt to change a single customer’s address information at the same time? Lastly, there is data that should only be touched within a transaction and can and may only exist in a single place.

I am not abandoning the idea of “primary key” or a unique customer number. I am saying that reflecting that uniqueness in in-memory state is rarely the right choice and rarely worth the hassle. Concurrent modification of data is rare and there are techniques to eliminate it in many cases and by introduction of chronologies. Even if you are booking into a financial account, you are just adding information to a uniquely identifiable set of data. You are not modifying the account itself, but you add information to it. Counter example: If you have an object that represents a physical device such as a printer, a sensor, a network switch or a manufacturing robot, in-memory identity immediately reflects the identity of the physical entity you are dealing with. These are cases where objects and object identity make sense. That direct correspondence rarely exists in business systems. Those deal with data about things, not things.

3) "In a services world, there are no objects, just data". – […] Anyway, I don't think anyone [sane] has advocated building fine-grained object model distributed systems for quite a few years. […] But the object oriented community has known that for quite some time, hence the "Facade" pattern, and the packaging/reuse principles from folks such as Robert C. Martin. Domain models may still exist in the implementation of the service, depending on the complexity of the service.

OOP is great for the inner implementation of a service (see above) and I am in line with you here. There, however, plenty of people who still believe in object purity and that’s why I am saying what I am saying.

4) "data record stored & retrieved from many different data sources within the same application out of a variety of motivations" --- I assume all of these copies of data are read-only, with one service having responsibility for updates. I also assume you mean that some form of optimistic conflict checking would be involved to ensure no lost updates. Concern: Traditionally we have had serializable transaction isolation to protect us from concurrent anomalies. Will we still have this sort of isolation in the face of multiple cached copies across web services?

I think that absolute temporal accuracy is severely overrated and is more an engineering obsession than anything else. Amazon.com basically lies into the faces of millions of users each day by saying “only 2-4 items left in stock” or “Usually ships within 24 hours”. Can they give you to-the-second accurate information from their backend warehouse? Of course they don’t. They won’t even tell you when your stuff ships when you’re through checkout and gave them you money. They’ll do so later – by email.

I also think that the risk of concurrent updates to records is – as outlined above – very low if you segment your data along the lines of the business use cases and not so much along the lines of what a DBA thinks is perfect form.

I’ll skip 5) and 6) (the answers are “Ok” and “If you want to see it that way”) and move on to
7) "Problematic assumptions regarding single databases vs. parallel databases for scalability" -- I'm not sure what the problem is here from an SOA perspective? Isn't this a physical data architecture issue, something encapsulated by your database's interface? As far as I know it's pretty transparent to me if Oracle decides to use a parallel query, unless I dig into the SQL plan. […]

“which may or may not be directly supported by your database system” is the half sentence to consider here as well. The Oracle cluster does it, SQL Server does it too, but there are other database system out there and there’s also other ways of storing and accessing data than RDBMS.

8) "Strong contracts eliminate "illegal argument" errors" Question: What about semantic constraints? Or referential integrity constraints? XML Schemas are richer than IDL, but they still don't capture rich semantic constraints (i.e. "book a room in this hotel, ensuring there are no overlapping reservations" -- or "employee reporting relationships must be hierarchical"). […]

“Book a room in this hotel” is a message to the service. The requirements-motivated answer to this message is either “yes” or “no”. “No overlapping reservations” is a local concern of that service and even “Sorry, we don’t know that hotel” is. The employee reporting relationships for a message relayed to an HR service can indeed be expressed by referential constraints in XSD, the validity of the merging the message into the backend store is an internal concern of the service. The answer is “can do that” or “can’t do that”.

What you won’t get are failures like “the employee name has more than 80 characters and we don’t know how to deal with that”. Stronger contracts and automatic enforcement of these contracts reduce the number of stupid errors, side-effects and the combination of stupid errors and side effects to look for – at either endpoint.

9) "The vision of Web services as an integration tool of global scale exhibits these and other constraints, making it necessary to enable asynchronous behavior and parallel processing as a core principle of mainstream application design and don’t leave that as a specialty to the high-performance and super-computing space." -- Concern: Distributed/concurrent/parallel computing is hard. I haven't seen much evidence that SOA/ web services makes this any easier. It makes contracts easier, and distributing data types easier. But it's up to the programming model (.NET, J2EE, or something else) to make the distributed/concurrent/parallel model easier. There are some signs of improvement here, but I'm skeptical there will be anything that breaks this stuff into the "mainstream" (I guess it depends on what one defines as mainstream)...

Oh, I wouldn’t be too sure about that. There are lots of thing going on in that area that I know of but can’t talk about at present.

While SOA as a means of widespread systems integration is a solid idea, the dream of service-oriented "grid" computing isn't really economically viable unless the computation is very expensive. Co-locating processing & filtering as close as possible to the data source is still the key principle to an economic & performing system. (Jim Gray also has a recent paper on this on his website). Things like XQuery for integration and data federations (service oriented or not) still don't seem economically plausible until distributed query processors get a lot smarter and WAN costs go down.

Again, if the tools were up to speed, it would be economically feasible to do so. That’s going to be fixed. Even SOA based grids apparently sound much less like science fiction to me than to you.

Clemens Vasters