Reading Fujitsu’s WS-Reliability

Fujitsu, along with Sun, NEC, Oracle and Sonic Software published a proposal for reliable messaging with SOAP. I assume this has been reported all over blogland already, but I am a bit "connection challenged" right now since I've been on the road, so I couldn't really do much blog reading. I've had time to look at this spec offline, though, and I think it is indeed interesting -- because it is so familiar.

Comparing WS-Reliability to the 2 year old BizTalk Framework 2.0, which first defined the reliability mechanism also known as SRMP -- SOAP Reliable Messaging Protocol, shows that there's unfortunately very little new to see in WS-Reliability, with the notable exception of ordered delivery of message sequences through the use of unique message identifiers. What's also interesting -- and doesn't really make me happy -- is that we're seeing the invention of yet another message header with message-id. WS-Routing, for instance, already defines one and I don't get why there needs to be yet another header to establish message identity with WS-Routing being around for such a long time already. I would think that reliable messaging is something that doesn't really work without a solid understanding of where to send a message, so it certainly could and probably should pick up that header instead of defining its own.

A bigger problem is, though, that WS-Reliability carries forward quite a few shortcomings of the BizTalk Framework and introduces a whole set of new problems due to the spec's choice of language.

Because WS-Reliability is unaware of and not integrated with WS-Routing, it is only useful as a point to point mechanism. While routing from the sender to the receiver will likely be possible, the "ReplyTo" to send the acknowledgement message to does specify a plain URL and doesn't allow integration with a reverse path as per WS-Routing. This means that unless the ACK message can be piggybacked on a synchronous response (the luckiest of all circumstances), the spec requires either direct connectivity from the receiver back to the sender, which may be impossible due to firewalls and NAT, or requires some form of acknowledgement message dispatcher gateway at the sender's site, which requires some form of central service deployment as well. In short: This doesn't really work for a desktop PC wishing to reliably deliver a message to an external service from within the corporate firewall.
There's quite a few problems to be solved with regards to simple sequence numbers and resends of an unaltered, carbon-copy (2.2.2) of the original message considering the accuracy of message timestamps, digital signatures, context coordination and techniques to avoid replay attacks. Sending the exact same message may be entirely impossible, even if it couldn't be delivered properly and therefore the "MUST" requirement of 2.2.2 cannot be fulfilled. Also, in 2.2.2 there's a reference to a "specified number of resend attempts" -- who specifies them?
The spec rightfully calls for persistent storage of messages (2.2.3), but doesn't spell out rules for when messages must be written to persistent storage in the process (it should obviously before sending and after receiving, but before acknowledgement and forward).

What I find also very noteworthy is that the authors say that they have yet to address synchronization between sender and receiver and establishing a common understanding by sender and receiver about whether the message was properly delivered (meaning that the send/ack cycle was fully completed). I assume that once they do so, they'll throw the synchronous, piggybacked reply on top of HTTP out of the window, because this creates an in-doubt situation for the acknowledging party.

Clemens Vasters