There is good reason to be worried about the "Internet of Things" on current course and trajectory. Both the IT industry as well as manufacturers of "smart products" seem to look at connected special-purpose devices and sensors as a mere variation of the information technology assets like servers, PCs, tablets, or phones. That stance is problematic as it neglects important differences between the kinds of interactions that we're having with a phone or PC, and the interactions we're having with a special-purpose devices like a gas valve, a water heater, a glass-break sensor, a vehicle immobilizer, or a key fob.

Before I get to a proposal for how to address the differences, let's take a look at the state of things on the Web and elsewhere.

Information Devices

PCs, phones, and tablets are primarily interactive information devices. Phones and tablets are explicitly optimized around maximizing battery lifetime, and they preferably turn off partially when not immediately interacting with a person, or when not providing services like playing music or guiding their owner to a particular location. From a systems perspective, these information technology devices are largely acting as proxies towards people. They are "people actuators" suggesting actions and "people sensors" collecting input.

People can, for the most part, tell when something is grossly silly and/or could even put them into a dangerous situation. Even though there is precedent of someone driving off a cliff when told to do so by their navigation system, those cases are the rarest exceptions.

Their role as information gathering devices allowing people to browse the Web and to use a broad variety of services, requires these devices to be "promiscuous" towards network services. The design of the Web, our key information tool, centers on aggregating, combining, and cross referencing information from a myriad of different systems. As a result, the Web's foundation for secure communication is aligned with the goal of this architecture. At the transport protocol level, Web security largely focuses on providing confidentiality and integrity for fairly short-lived connections.

User authentication and authorization are layered on top, mostly at the application layer. The basic transport layer security model, including server authentication, builds on a notion of federated trust anchored in everyone (implicitly and largely involuntarily) trusting in a dozen handfuls of certification authorities (CA) chosen by their favorite operating system or browser vendor. If one of those CAs deems an organization trustworthy, it can issue a certificate that will then be used to facilitate secure connections, also meaning to express an assurance to the user that they are indeed talking to the site they expect to be talking to. To that end, the certificate can be inspected by the user. If they know and care where to look.

This federated trust system is not without issues. First, if the signing key of one of the certification authorities were to be compromised, potentially undetected, whoever is in possession of the key can now make technically authentic and yet forged certificates and use those to intercept and log communication that is meant to be protected. Second, the system is fairly corrupt as it takes all of $3 per year to buy a certification authority's trust with minimal documentation requirements. Third, the vast majority of users have no idea that this system even exists.

Yet, it all somehow works out halfway acceptably, because people do, for the most part, have common sense enough to know when something's not quite right, and it takes quite a bit of work to trick people into scams in huge numbers. You will trap a few victims, but not very many and not for very long. The system is flawed and some people get tricked, but that can also happen at the street corner. Ultimately, the worst that can happen – without any intent to belittle the consequences – is that people get separated from some of their money, or their identities get abused until the situation is corrected by intervention and, often, some insurance steps in to rectify these not entirely unexpected damages.

Special-Purpose Devices

Special-purpose devices, from simple temperature sensors to complex factory production lines with thousands of components inside them are different. The devices are much more scoped in purpose and even if they may provide some level of a people interface, they're largely scoped to interfacing with assets in the physical world. They measure and report environmental circumstances, turn valves, control servos, sound alarms, switch lights, and do many other tasks. They help doing work for which an information device is either too generic, too expensive, too big, or too brittle.

If something goes wrong with automated or remote controllable devices that can influence the physical world, buildings may burn down and people may die. That's a different class of damage than someone maxing out a stolen credit-card's limit. The security bar for commands that make things move, and also for sensor data that eventually results in commands that cause things to move, ought to be, arguably, higher than in an e-commerce or banking scenario.

What doesn't help on the security front is that machines, unlike most people, don't have a ton of common sense. A device that goes about its day in its programmed and scheduled ways has no notion of figuring when something it not quite right. If you can trick a device into talking to a malicious server or intermediary, or into following a network protocol redirection to one, it'll dutifully continue doing its work unless it's explicitly told to never do so.

Herein lies one of the challenges. A lot of today's network programming stacks and Web protocols are geared towards the information-oriented Web and excellently enable building promiscuous clients by default. In fact, the whole notion of REST rests on the assumption that the discovery and traversal of resources is performed though hypertext links included in the returned data. As the Web stacks are geared towards that model, there is extra work required to make a Web client faithful to a particular service and to validate, for instance, the signature thumbnail of the TLS certificate returned by the permitted servers. As long as you get to interact with the web stack directly, that's usually okay, but the more magic libraries you use on top of the Web stack basics, the harder that might get. And you have, of course, and not to be underestimated in complexity, to teach the device the right thumbnail(s) and thus effectively manage and distribute an allow-list.

Generally, device operators will not want to allow unobserved and non-interactive devices that emit telemetry and receive remote commands to be able to stray from a very well-defined set of services they're peered with. They should not be promiscuous. Quite the opposite.

Now – if the design goal is to peer a device with a particular service, the federated certificate circus turns into more of a burden than being a desired protocol-suite feature. As the basic assumptions about promiscuity towards services are turned on their head, the 3-6 KByte and 2 network roundtrips of certificate exchange chatter slow things down and also may cost quite a bit of real money paying for precious, metered wireless data volume. Even though everyone currently seems to assume Transport Layer Security (TLS) being the only secure channel protocol we'll ever need, it's far from being ideal for the 'faithful' connected devices scenario.

If you allow me to take you into the protocol basement for a second: That may be somewhat different if we could seed clients with TLS RFC5077 session resumption tickets in an out-of-band fashion, and have a TLS mode that never falls back to certs. Alas, we do not.

Bi-Directional Addressing

Connected and non-interactive devices not only differ in terms of the depth of their relationship with backend services, they also differ very much in terms of the interaction patterns with these services when compared to information-centric devices. I generally classify the interaction patterns for special-purpose devices into the categories Telemetry, Inquiries, Commands, and Notifications.

  • Telemetry is unidirectionally flowing information which the device volunteers to a collecting service, either on a schedule or based on particular circumstances. That information represents the current or temporally aggregated state of the device or the state of its environment, like readings from sensors that are associated with it.
  • With Inquiries, the device solicits information about the state of the world beyond its own reach and based on its current needs; an inquiry can be a singular request, but might also ask a service to supply ongoing updates about a particular information scope. A vehicle might supply a set of geo-coordinates for a route and ask for continuous traffic alert updates about particular route until it arrives at the destination.
  • Commands are service-initiated instructions sent to the device. Commands can tell a device to provide information about its state, or to change the state of the device, including activities with effects on the physical world. That includes, for instance, sending a command from a smartphone app to unlock the doors of your vehicle, whereby the command first flows to an intermediating service and from there it's routed to the vehicle's onboard control system.
  • Notifications are one-way, service-initiated messages that inform a device or a group of devices about some environmental state they'll otherwise not be aware of. Wind parks will be fed weather forecast information and cities may broadcast information about air pollution, suggesting fossil-fueled systems to throttle CO2 output or a vehicle may want to show weather or news alerts or text messages to the driver.

While Telemetry and Inquiries are device-initiated, their mirrored pattern counterparts, Command and Notifications, are service-initiated – which means that there must be a network path for messages to flow from the service to the device and that requirement bubbles up a set of important technical questions:

  • How can I address a device on a network in order to route commands and notifications to it?
  • How can I address a roaming and/or mobile device on a network in order to route commands and notifications to it?
  • How can I address a power constrained device on a network in order to route commands and notifications to it?
  • How can I send commands or notifications with latency that's acceptable for my scenario?
  • How can I ensure that the device only accepts legitimate commands and trustworthy notifications?
  • How can I ensure that the device is not easily susceptible to denial-of-service attacks that render it inoperable towards the greater system? (not good for building security sensors, for instance)
  • How can I do this with several 100,000 or millions of devices attached to a telemetry and control system?

Most current approaches that I'm running into are trying to answer the basic addressing question with traditional network techniques. That means that the device either gets a public network address or it is made part of a virtual network and then listens for incoming traffic using that address, acting like a server. For using public addresses the available options are to give the device a proper public IPv4 or IPv6 address or to map it uniquely to a well-known port on a network address translation (NAT) gateway that has a public address. As the available pool of IPv4 addresses has been exhausted and network operators are increasingly under pressure to move towards providing subscribers with IPv6 addresses, there's hope that every device could eventually have its very own routable IPv6 address. The virtual network approach is somewhat similar, but relies on the device first connecting to some virtual network gateway via the underlying native network, and then getting an address assigned within the scope of the virtual network, which it shares with the control system that will use the virtual network address to get to the device.

Both of those approaches are reasonable from the perspective of answering the first, basic addressing question raised above, and if you pretend for a moment that opening inbound ports through a residential edge firewall is acceptable. However, things get tricky enough once we start considering the other questions, like devices not being in the house, but on the road.

Roaming is tricky for addressing and even trickier if the device is switching networks or even fully mobile and thus hopping through networks and occasionally dropping connections as it gets out of radio range. While there are "Mobile IP" roaming standards for both IPv4 (RFC3344) and IPv6 (RFC6275), but those standards rely on a notion of traffic relaying through agents and those are problematic at scale with very large device populations as the relay will have to manage and relay traffic for very many routes and also needs to keep track of the devices hopping foreign networks. Relaying obviously also has significant latency implications with global roaming. What even the best implementations of these standards-based approaches for roaming can't solve is that you can't connect to a device that's outside of radio coverage and therefore not connected, at all.

The very same applies to the challenge of how to reliably deliver commands and notifications to power-constrained devices. Those devices may need to survive on battery power for extended periods (in some cases for years) between battery recharges, or their external power source, like "power stealing" circuits employed in home building automation devices, may not yield sufficient power for sustained radio connectivity to a base station. Even a vehicle battery isn't going to like powering an always-on radio when parked in the long-term airport garage while you're on vacation for 2 weeks.

So if a device design aims to conserve power by only running the radio occasionally or if the device is mobile and frequently in and out of radio coverage or hopping networks, it gets increasingly difficult to reach it naively by opening a network connection to it and then hoping for that to remain stable if you're lucky enough to catch a moment when the device is indeed ready to talk. That's all even assuming that the device were indeed having a stable network address provided by one of the cited "Mobile IP" standards, or the device was registering with an address registration/lookup service every time it comes online with a new address so that the control service can locate it.

All these approaches aiming to provide end-to-end network routes between devices and their control services are almost necessarily brittle. As it tries to execute a command, the service needs to locate the device, establish a connection to it, issue the command and collect the command feedback all while, say, a vehicle drives through a series of tunnels. Not only does this model rely on the device being online and available at the required moment, it also introduces a high number of tricky-to-diagnose failure points (such as the device flipping networks right after the service resolved its address) with associated security implications (who gets that newly orphaned address next?), it also has inherent reliability issues at the application layer since all faults that occur after the control system has sent the command, do introduce doubt in the control system on whether the command could be successfully executed; and not all commands are safe to just blindly retry, especially when they have physical consequences.

For stationary power constrained or wirelessly connected devices, the common approach to bridging the last meters/yards is a hub device that's wired to the main network and can bridge to the devices that live on a local network. The WLAN hub(s) in many homes and buildings are examples of this as there is obviously a need to bridge between devices roaming around house and the ISP network. From an addressing perspective, these hubs don't change the general challenge much as they themselves need to be addressable for commands they then ought to forward to the targeted device and that means you're still opening up a hole in the residential firewall, either by explicit configuration or via (don't do this) UPnP.

If all this isn't yet challenging enough for your taste, there's still security. Sadly, we can't have nice and simple things without someone trying to exploit them for malice or stupid "fun".

Trustworthy Communication

All information that's being received from and sent to a device must be trustworthy if anything depends on that information – and why would you send it otherwise? "Trustworthy communication" means that information is of verifiable origin, correct, unaltered, timely, and cannot be abused by unauthorized parties in any fashion. Even telemetry from a simple sensor that reports a room's temperature every five minutes can't be left unsecured. If you have a control system reacting on that input or do anything else with that data, the device and the communication paths from and to it must be trustworthy.

"Why would anyone hack temperature sensors?" – sometimes "because they can", sometimes because they want to inflict monetary harm on the operator or physical harm on the facility and what's in it. Neglecting to protect even one communication path in a system opens it up for manipulation and consequential harm.

If you want to believe in the often-cited projection of 50 billion connected devices by 2020, the vast majority of those will not be classic information devices, and they will not be $500 or even $200 gadgets. Very many of these connected devices will rather be common consumer or industry goods that have been enriched with digital service capabilities. Or they might even just be super inexpensive sensors hung off the side of buildings to collect environmental information. Unlike apps on information devices, most of these services will have auxiliary functions. Some of these capabilities may be even be largely invisible. If you have a device with built-in telemetry delivery that allows the manufacturer or service provider to sense an oncoming failure and proactively get in touch with you for service – which is something manufacturers plan to do – and then the device just never breaks, you may even not know such a capability exists, especially if the device doesn't rely on connectivity through your own network. In most cases, these digital services will have to be priced into the purchase price of the product or even be monetized through companion apps and services as it seems unlikely that consumers will pay for 20 different monthly subscriptions connected appliances. It's also reasonable to expect that many devices sold will have the ability to connect, but their users will never intentionally take advantage these features.

On the cost side, a necessary result from all this is that the logic built into many products will (continue to) use microcontrollers that require little power, have small footprint, and are significantly less expensive than the high-powered processors and ample memory in today's information devices – trading compute power for much reduced cost. But trading compute power and memory for cost savings also means trading cryptographic capability and more generally resilience against potential attacks away.

The horror-story meme "if you're deep into the forest nobody will hear your screams" is perfectly applicable to unobserved field-deployed devices under attack. If a device were to listen for unsolicited traffic, meaning it listens for incoming TCP connections or UDP datagrams or some form of UDP-datagram based sessions and thus acting as server, it would have to accept and then triage those connection attempts into legitimate and illegitimate ones.

With TCP, even enticing the device to accept a connection is already a very fine attack vector, because a TCP connection burns memory in form of a receive buffer. So if the device were to use a network protocol circuit like, for instance, the WizNet W5100 used one the popular enthusiast tinker platform Arduino Ethernet, the device's communication capability is saturated at just 4 connections, which an attacker could then service in a slow byte-per-packet fashion and thus effectively take the device out. As that happens, the device now also wouldn't have a path to scream for help through, unless it made – assuming the circuit supports it – an a priori reservation of resources for an outbound connection to whoever plays the cavalry.

If we were to leave the TCP-based resource exhaustion vector out of the picture, the next hurdle is to establish a secure baseline over the connection and then triaging connections into good and bad. As the protocol world stands, TLS (RFC5246) and DTLS (RFC6347) are the kings of the security protocol hill and I've discussed the issues with their inherent client promiscuity assumption above. If we were indeed connecting from a control service to a device in an outbound fashion, and the device were to act as server, the model may be somewhat suitable as the control service will indeed have to speak to very many and potentially millions of devices. But contrary to the Web model where the browser has no idea where the user will send it, the control system has a very firm notion of the devices it wants to speak to. There are many of those, but there no promiscuity going on. If they play server, each device needs to have its own PKI certificate (there is a specified option to use TLS without certificates, but that does not matter much in practice) with their own private key since they're acting as servers and since you can't leak shared private keys into untrusted physical space, which is where most of the devices will end up living.

The strategy of using the standard TLS model and having the device play server has a number of consequences. First, whoever provisions the devices will have to be a root or intermediate PKI certification authority. That's easy to do, unless there were any need to tie into the grand PKI trust federation of today's Web, which is largely anchored in the root certificate store contents of today's dominant client platforms. If you had the notion that "Internet of Things" were to mean that every device could be a web server to everyone, you would have to buy yourself into the elite circle of intermediate CA authorities by purchasing the necessarily signing certificates or services from a trusted CA and that may end up being fairly expensive as the oligopoly is protective of their revenues. Second, those certificates need to be renewed and the renewed ones need to be distributed securely. And when devices get stolen or compromised or the customer opts out of the service, these certificates also need to get revoked and that revocation service needs to be managed and run and will have to be consulted quite a bit.

Also, the standard configuration of most application protocol stacks' usage of TLS tie into DNS for certificate validation, and it's not obvious that DNS is the best choice for associating name and network address for devices that rapidly hop networks when roaming – unless of course you had a stable "home network" address as per the IPv6 Mobile IP. But that would mean you are now running an IPv6 Mobile relay. The alternative is to validate the certificate by some other means, but then you'll be using a different validation criterion in the certificate subject and will no longer be aligned with the grand PKI trust federation model. Thus, you'll be are back to effectively managing an isolated PKI infrastructure, with all the bells and whistles like a revocation service, and you will do so while you're looking for the exact opposite of the promiscuous security session model all that enables.

Let's still assume none of that would matter and (D)TLS with PKI dragged in its wake were okay and the device could use those and indeed act as a server accepting inbound connections. Then we're still faced with the fact that cryptography computation is not cheap. Moving crypto into hardware is very possible, but impacts the device cost. Doing crypto in software requires that the device deals with it inside of the application or underlying frameworks. And for a microcontroller that costs a few dollars that's non-negligible work. So the next vector to keep the device from doing its actual work is to keep it busy with crypto. Present it with untrusted or falsely signed client certificates (if it were to expect those). Create a TLS link (even IPSec) and abandon it right after the handshake. Nice ways to burn some Watts.

Let's still pretend none of this were a problem. We're now up at the application level with transport layer security underneath. Who is authorized to talk to the device and which of the connections that pop up through that transport layer are legitimate? And if there is an illegitimate connection attempt, where do you log these and if that happens a thousand times a minute, where do you hold the log and how do you even scream for help if you're pegged on compute by crypto? Are you keeping an account store in the device? Quite certainly not in a system whose scope is more than one device. Are you then relying on an external authentication and authorization authority issuing authorization tokens? That's more likely, but then you're already running a token server.

The truth, however inconvenient, is that non-interactive special-purpose devices residing in untrusted physical spaces are, without getting external help from services, essentially indefensible as when acting as network servers. And this is all just on top of the basic fact that devices that live in untrusted physical space are generally susceptible to physical exploitation and that protecting secrets like key material is generally difficult.

Here's the recipe to eradicate most of the mess I've laid out so far: Devices don't actively listen on the network for inbound connections. Devices act as clients. Mostly.

Link vs. Network vs. Transport vs. Application

What I've discussed so far are considerations around the Network and Transport layers (RFC1122, 1.1.3) as I'm making a few general assumptions about connectivity between devices and control and telemetry collections systems, as well as about the connectivity between devices when they're talking in a peer-to-peer fashion.

First, I have so far assumed that devices talk to other systems and devices through a routable (inter-)network infrastructure whose scope goes beyond a single Ethernet hub, WLAN hotspot, Bluetooth PAN, or cellular network tower. Therefore I am also assuming the usage of the only viable routable network protocol suite and that is the Internet Protocol (v4 and v6) and with that the common overlaid transport protocols UDP and TCP.

Second, I have so far assumed that the devices establish a transport-level and then also application-level network relationship with their communication peers, meaning that the device commits resources to accepting, preprocessing, and then maintaining the connection or relationship. That is specifically true for TCP connections (and anything riding on top of it), but is also true for Network-level links like IPSec and session-inducing protocols overlaid over UDP, such as setting up agreements to secure subsequent datagrams as with DTLS.

The reason for assuming a standards-based Network and Transport protocol layer is that everything at the Link Layer (including physical bits on wire or through space) is quite the zoo, and one that I see growing rather than shrinking. The Link Layer will likely continue to be a space of massive proprietary innovation around creative use of radio frequencies, even beyond what we've seen in cellular network technology where bandwidth from basic GSM's 9.6Kbit/s to today's 100+ MBit/s on LTE in the last 25 years. There are initiatives to leverage new "white space" spectrums opened up by the shutdown of Analog TV, and there are services leveraging ISM frequency bands, and there might be well-funded contenders for licensed spectrum emerging that use wholly new stacks. There is also plenty of action on the short-range radio front, specifically also around suitable protocols for ultra-low power devices. And there are obviously also many "wired" transport options over fiber and copper that have made significant progress and will continue to do so and are essential for device scenarios, often in conjunction with a short-range radio hop for the last few meters/yards. Just as much as it was a losing gamble to specifically bet on TokenRing or ARCnet over Ethernet in the early days of Local Area Networking, it isn't yet clear what to bet on in terms of protocols and communication service infrastructures as the winners for the "Internet of Things", not even today's mobile network operators.

Betting on a particular link technology for inter-device communication is obviously reasonable for many scenarios where the network is naturally scoped by physical means like reach by ways of radio frequency and transmission power, the devices are homogeneous and follow a common and often regulation-imposed standard, and latency requirements are very narrow, bandwidth requirements are very high, or there is no tolerance for failure of intermediaries. Examples for this are in-house device networks for home automation and security, emerging standards for Vehicle-To-Vehicle (V2V) and Vehicle-To-Infrastructure (V2I) communication, or Automatic Dependent Surveillance (ADS, mostly ADS-B) in Air Traffic Control. Those digital radio protocols essentially form peer meshes where everyone listens to everything in range and filters out what they find interesting or addressed specifically at them. And if the use of the frequencies gets particularly busy, coordinated protocols impose time slices on senders.

What such link-layer or direct radio information transfers have generally struggled with is trustworthiness – allow me to repeat: verifiable origin, correct, unaltered, timely, and cannot be abused by unauthorized parties in any fashion.

Of course, by its nature, all radio based communication is vulnerable to jamming and spoofing, which has a grand colorful military history as an offensive or defensive electronic warfare measure along with fitting countermeasures (ECM) and even counter-countermeasures (ECCM). Radio is also, especially when used in an uncoordinated fashion, subject to unintended interference and therefore distortion.

ADS-B, which is meant to replace radar in Air Traffic Control doesn't even have any security features in its protocol. The stance of the FAA is that they will detect spoofing by triangulation of the signals, meaning they can tell whether a plane that say it's at a particular position is actually there. We should assume they have done their ECM and ECCM homework.

IEEE 1609 for Wireless Access in Vehicular Environments that's aiming to facilitate ad-hoc V2V and V2I communication, spells out an elaborate scheme to manage and use and roll X.509 certificates, but relies on the broad distribution of certificate revocation lists to ban once-issued certificates from the system. Vehicles are sold, have the telematics units replaced due to malfunction or crash damage, may be tampered with, or might be stolen. I can see the PKI's generally overly optimistic stance on revocations being challenging at the scale of tens if not hundreds of million vehicles, where churn will be very significant. The Online Certificate Status Protocol (OCSP, RFC6960) might help IEEE 1609 deal with the looming CRL caching issues due to size, but then requires very scalable validation server infrastructure that needs to be reachable whenever two vehicles want to talk, which is also not acceptable.

Local radio link protocols such as Bluetooth, WLAN (802.11x with 802.11i/WPA2-PSK), or Zigbee often assume that participants in a local link network share a common secret, and can keep that secret secret. If the secret leaks, all participants need to be rolled over to a new key. IEEE 802.1X, which is the foundation for the RADIUS Authentication and Authorization of participants in a network, and the basis of "WPA2 Enterprise" offers a way out of the dilemma of either having to rely on a federated trust scheme that has a hard time dealing with revocations of trust at scale, or on brittle pre-shared keys. 802.1X introduces the notion of an Authentication (and Authorization) server, which is a neutral third party that makes decisions about who gets to access the network.

Unfortunately, many local radio link protocols are not only weak at managing access, they also have a broad history of having weak traffic protection. WLAN's issues got largely cleaned up with WPA2, but there are plenty of examples across radio link protocols where the broken WEP model or equivalent schemes are in active use, or the picture is even worse. Regarding the inherent security of cellular network link-level protection, it ought to be sufficient to look at the recent scolding of politicians in Europe for their absent-mindedness to use regular GSM/UMTS phones without extra protection measures – and the seemingly obvious result of dead-easy eavesdropping by foreign intelligence services. Ironically, mobile operators make some handsome revenue by selling "private access points" (private APNs) that terminate cellular device data traffic in a VPN and that the customer then tunnels into across the hostile Internet to meet the devices on this fenced-off network, somehow pretending that the mobile network somehow isn't just another operator-managed public network and therefore more trustworthy.

Link-layer protection mechanisms are largely only suitable for keeping unauthorized local participants (i.e. intruders) from getting link-layer data frames up to any higher-level network logic. In link-layer-scoped peer-to-peer network environments, the line between link-layer data frames and what's being propagated to the application is largely blurred, but the previous observation stays true. Even if employed, link-layer security mechanisms are not much help on providing security on the network and transport layers, as many companies are learning the hard way when worms and other exploits sweep through the inside of their triply-firewalled, WPA2 protected, TPM-tied-IPSec-protected networks, or as travelers can learn when they don't have a local firewall up on their machine or use plaintext communication when connecting to the public network at a café, airport, or hotel.

Of course, the insight of public networks not being trustworthy has led many companies interconnecting sites and devices down the path of using virtual private network (VPN) technology. VPN technology, especially when coming in the form of a shiny appliance, makes it very easy to put a network tunnel terminator on either end of a communication path made up of a chain of untrustworthy links and networks. The terminator on either end conveniently surfaces up as a link-layer network adapter. VPN can fuse multiple sites into a single link-layer network and it is a fantastic technology for that. But like all the other technologies I discussed above, link-layer protection is a zoning mechanism, the security mechanisms that matter to protect digital assets and devices sit at the layers above it. There is no "S" for Security in "VPN". VPN has secure virtual network cables, it doesn't make the virtual hub more secure that they plug into. Also, in the context of small devices as discussed above, VPN is effectively a non-starter due to its complexity.

What none of these link-layer protection mechanisms help with, including VPN, is to establish any notion of authentication and authorization beyond their immediate scope. A network application that sits on the other end of a TCP socket, where a portion of the route is facilitated by any of these link layer mechanisms, is and must be oblivious to their existence. What matters for the trustworthiness of the information that travels from the logic on the device to a remote control system not residing on the same network, as well as for commands that travel back up to the device, is solely a fully protected end-to-end communication path spanning networks, where the identity of the parties is established at the application layer, and nothing else. The protection of the route at the transport layer by ways of signature and encryption is established as a service for the application layer either after the application has given its permission (e.g. certificate validation hooks) or just before the application layer performs an authorization handshake, prior entering into any conversations. Establishing end-to-end trust is the job of application infrastructure and services, not of networks.

Service Assisted Communication

The findings from this discussion so far can be summarized in a few points:

  • Remote controllable special-purpose devices have a fundamentally different relationship to network services compared to information devices like phones and tablets and require an approach to security that enables exclusive peering with a set of services or a gateway.
  • Devices that take a naïve approach to connectivity by acting like servers and expecting to accept inbound connections pose a number of network-related issues around addressing and naming, and even greater problems around security, exposing themselves to a broad range of attack vectors.
  • Link-layer security measures have varying effectiveness at protecting communication between devices at a single network scope, but none is sufficient to provide a trustworthy communication path between the device and a cloud-based control system or application gateway.
  • The PKI trust model is fundamentally flawed in a variety of ways, including being too static and geared towards long-lived certificates, and it's too optimistic about how well certificates are and can be protected by their bearers. Its use in the TLS context specifically enables the promiscuous client model, which is the opposite of the desired model for special-purpose devices.
  • Approaches to security that provide a reasonable balance between system throughput, scalability, and security protection are generally relying on third party network services that validates user credentials against a central pool, issues security tokens, or validates assurances made by an authority for their continued validity.

The conclusion I draw from these findings is an approach I call "Service Assisted Communication" (SAC). I'm not at all claiming the principles and techniques being an invention, as most are already broadly implemented and used. But I do believe there is value in putting them together here and to give them a name so that they can be effectively juxtaposed with the approaches I've discussed above.

The goal of Service Assisted Communication is to establishing trustworthy and bi-directional communication paths between control systems and special-purpose devices that are deployed in untrusted physical space. To that end, the following principles are established:

  • Security trumps all other capabilities. If you can't implement a capability securely, you must not implement it. You identify threats and mitigate them or you don't ship product. If you employ a mitigation without knowing what the threat is you don't ship product, either.
  • Devices do not accept unsolicited network information. All connections and routes are established in an outbound-only fashion.
  • Devices generally only connect to or establish routes to well-known services that they are peered with. In case they need to feed information to or receive commands from a multitude of services, devices are peered with a gateway that takes care of routing information downstream, and ensuring that commands are only accepted from authorized parties before routing them to the device
  • The communication path between device and service or device and gateway is secured at the application protocol layer, mutually authenticating the device to the service or gateway and vice versa. Device applications do not trust the link-layer network
  • System-level authorization and authentication must be based on per-device identities, and access credentials and permissions must be near-instantly revocable in case of device abuse.
  • Bi-directional communication for devices that are connected sporadically due to power or connectivity concerns may be facilitated through holding commands and notifications to the devices until they connect to pick those up.
  • Application payload data may be separately secured for protected transit through gateways to a particular service

The manifestation of these principles is the simple diagram on the right. Devices generally live in local networks with limited scope. Those networks are reasonably secured, with link-layer access control mechanisms, against intrusion to prevent low-level brute-force attacks such as flooding them with packets and, for that purpose, also employ traffic protection. The devices will obviously observe link-layer traffic in order to triage out solicited traffic, but they do not react to unsolicited connection attempts that would cause any sort of work or resource consumption from the network layer on up.

All connections to and from the device are made via or at least facilitated via a gateway, unless the device is peered with a single service, in which case that service takes on the role of the gateway. Eventual peer-to-peer connections are acceptable, but only if the gateway permits them and facilitates a secure handshake. The gateway that the device peers with may live on the local network and thus govern local connections. Towards external networks, the local gateway acts as a bridge towards the devices and is itself connected by the same set of principles discussed here, meaning it's acting like a device connected to an external gateway.

When the device connects to an external gateway, it does so by creating and maintaining an outbound TCP socket across a network address translation boundary (RFC2663), or by establishing a bi-directional UDP route, potentially utilizing the RFC5389 session traversal utilities for NAT, aka STUN. Even though I shouldn't have to, I will explicitly note that the WebSocket protocol (RFC6455) rides on top of TCP and gets its bi-directional flow capability from there. There's quite a bit of bizarre information on the Interwebs on how the WebSocket protocol somehow newly and uniquely enables bi-directional communication, which is obviously rubbish. What it does is to allow port-sharing, so that WebSocket aware protocols can share the standard HTTP/S ports 80 (RFC2616) and 443 (RFC2818) with regular web traffic and also piggyback on the respective firewall and proxy permissions for web traffic. The in-progress HTTP 2.0 specification will expand this capability further.

By only relying on outbound connectivity, the NAT/Firewall device at the edge of the local network will never have to be opened up for any unsolicited inbound traffic.

The outbound connection or route is maintained by either client or gateway in a fashion that intermediaries such as NATs will not drop it due to inactivity. That means that either side might send some form of a keep-alive packet periodically, or even better sends a payload packet periodically that then doubles as a keep-alive packet. Under most circumstances it will be preferable for the device to send keep-alive traffic as it is the originator of the connection or route and can and should react to a failure by establishing a new one.

As TCP connections are endpoint concepts, a connection will only be declared dead if the route is considered collapsed and the detection of this fact requires packet flow. A device and its gateway may therefore sit idle for quite a while believing that the route and connection is still intact before the lack of acknowledgement of the next packet confirms that assumption is incorrect. There is a tricky tradeoff decision to be made here. So-called carrier-grade NATs (or Large Scale NAT) employed by mobile network operators permit very long periods of connection inactivity and mobile devices that get direct IPv6 address allocations are not forced through a NAT, at all. The push notification mechanisms employed by all popular Smartphone platforms utilize this to dramatically reduce the power consumption of the devices by maintaining the route very infrequently, once every 20 minutes or more, and therefore being able to largely remain in sleep mode with most systems turned off while idly waiting for payload traffic. The downside of infrequent keep-alive traffic is that the time to detection of a bad route is, in the worst-case, as long as the keep-alive interval. Ultimately it's a tradeoff between battery-power and traffic-volume cost (on metered subscriptions) and acceptable latency for commands and notifications in case of failures. The device can obviously be proactive in detecting potential issues and abandon the connection and create a new one when, for instance, it hops to a different network or when it recovers from signal loss.

The connection from the device to the gateway is protected end-to-end and ignoring any underlying link-level protection measures. The gateway authenticates with the device and the device authenticates with the gateway, so neither is anonymous towards the other. In the simplest case, this can occur through the exchange of some proof of possession of a previously shared key. It can also happen via a (heavy) X.509 certificate exchange as performed by TLS, or a combination of a TLS handshake with server authentication where the device subsequently supplies credentials or an authorization token at the application level. The privacy and integrity protection of the route is also established end-to-end, ideally as a byproduct of the authentication handshake so that a potential attacker cannot waste cryptographic resources on either side without producing proof of authorization.

The current reality is that we don't have many if any serious alternatives to TLS/DTLS or SSH for securing this application-level connection or route today. TLS is far from being a perfect fit for many reasons I laid out here, and not least because of the weight in footprint and compute effort of the TLS stack which is too heavy for inexpensive circuitry. SSH is a reasonable alternative from the existing popular protocol suites, but suffers from lack of a standardized session resumption gesture. My hope is that we as an industry fix either of those to make it a better fit for the connected devices scenarios or we come up with something better. Here's a summary of criteria.

The result of the application-level handshake is a secure peer connection between the device and a gateway that only the gateway can feed. The gateway can, in turn, now provide one or even several different APIs and protocol surfaces, that can be translated to the primary bi-directional protocol used by the device. The gateway also provides the device with a stable address in form of an address projected onto the gateway's protocol surface and therefore also with location transparency and location hiding.

The device could only speak AMQP or MQTT or some proprietary protocol, and yet have a full HTTP/REST interface projection at the gateway, with the gateway taking care of the required translation and also of enrichment where responses from the device can be augmented with reference data, for instance. The device can connect from any context and can even switch contexts, yet its projection into the gateway and its address remains completely stable. The gateway can also be federated with external identity and authorization services, so that only callers acting on behalf of particular users or systems can invoke particular device functions. The gateway therefore provides basic network defense, API virtualization, and authorization services all combined into in one.

The gateway model gets even better when it includes or is based on an intermediary messaging infrastructure that provides a scalable queuing model for both ingress and egress traffic.

Without this intermediary infrastructure, the gateway approach would still suffer from the issue that devices must be online and available to receive commands and notifications when the control system sends them. With a per-device queue or per-device subscription on a publish/subscribe infrastructure, the control system can drop a command at any time, and the device can pick it up whenever it's online. If the queue provides time-to-live expiration alongside a dead-lettering mechanism for such expired messages, the control system can also know immediately when a message has not been picked up and processed by the device in the allotted time.

The queue also ensures that the device can never be overtaxed with commands or notifications. The device maintains one connection into the gateway and it fetches commands and notifications on its own schedule. Any backlog forms in the gateway and can be handled there accordingly. The gateway can start rejecting commands on the device's behalf if the backlog grows beyond a threshold or the cited expiration mechanism kicks in and the control system gets notified that the command cannot be processed at the moment.

On the ingress-side (from the gateway perspective) using a queue has the same kind of advantages for the backend systems. If devices are connected at scale and input from the devices comes in bursts or has significant spikes around certain hours of the day as with telematics systems in passenger cars during rush-hour, having the gateway deal with the traffic spikes is a great idea to keep the backend system robust. The ingestion queue also allows telemetry and other data to be held temporarily when the backend systems or their dependencies are taken down for service or suffer from service degradation of any kind. You can find more on the usage of brokered messaging infrastructures for these scenarios in a MSDN Magazine article I wrote a year back.

Conclusion

An "Internet of Things" where devices reside in unprotected physical space and where they can interact with the physical world is a very scary proposition if we solely rely on naïve link and network-level approaches to connectivity and security, which are the two deeply interwoven core aspects of the "I" in "IoT". Special-purpose devices don't benefit from constant human oversight as phones and tablets and PCs do, and we struggle even to keep those secure. We have to do a better job, as an industry, to keep the devices secure that we want to install in the world without constant supervision.

"Trustworthy communication" means that information exchanged between devices and control systems is of verifiable origin, correct, unaltered, timely, and cannot be abused by unauthorized parties in any fashion. Such trust cannot be established at scale without employing systems that are designed for the purpose and keep the "bad guys" out. If we want smarter devices around us that helping to improve our lives and are yet power efficient and affordable, we can't leave them alone in untrustworthy physical space taking care of their own defenses, because they won't be able to.

Does this mean that the refrigerator cannot talk to the laundry washing machine on the local network? Yes, that is precisely what that means. Aside from that idea being somewhat ludicrous, how else does the washing machine defend itself from a malicious refrigerator if not by a gateway that can. Devices that are unrelated and are not part of a deeply integrated system meet where they ought to meet: on the open Internet, not "behind the firewall".

Categories: Architecture | Technology

I have an immediate job opening for an open standard or multivendor transport layer security protocol that

  1. does NOT rely on or tie into PKI and especially
  2. doesn’t require the exchange of X.509 certificates for an initial handshake,
  3. supports session resumption, and
  4. can be used with a minimal algorithm suite that is microcontroller friendly (AES-256, SHA-256, ECDH)

Because

  1. For “service assisted connectivity” where a device relies on a gateway to help with any defensive measures from the network layer on up, the device ought to be paired with exactly one (cluster of) gateway(s). Also, an unobserved device should not pose any threat to a network that it is deployed into (see the fridges abused as spam bots or local spies) and therefore outbound communication should be funneled through the gateway as well. TLS/PKI specifically enables promiscuous clients that happily establish sessions with any “trustworthy” (per CA) server, often under direction of an interactive user. Here, I want to pair a device with a gateway, meaning that the peers are known a priori and thus
  2. the certificate exchange is 3-6kb of extra baggage that’s pure overhead if the parties have an existing and well-known peer relationship.
  3. Session resumption is required because devices will get disconnected while roaming and on radio or will temporarily opt to turn off the radio, which might tear sockets. It’s also required because the initial key exchange is computationally very expensive and imposes significant latency overhead due to the extra roundtrips.
  4. Microcontroller based devices are often very constrained with regards to program storage and can’t lug a whole litany of crypto algorithms around. So the protocol must allow for a compliant implementation to only support a small set of algos that can be implemented on MCUs in firmware or in silicone.

Now, TLS 1.2 with a minimal crypto suite profile might actually be suitable if one could cheat around the whole cert exchange and supply clients with an RFC5077 session resumption ticket out-of-band in such a way that it effectively acts as a long-term connection authN/Z token. Alas, you can't. SSH is also a candidate but it doesn't have session resumption.

Ideas? Suggestions? clemensv@microsoft.com or Twitter @clemensv

Categories: Technology

Just replied yet again to someone whose customer thinks they're adding security by blocking outbound network traffic to cloud services using IP-based allow-lists. They don't.

Service Bus and many other cloud services are multitenant systems that are shared across a range of customers. The IP addresses we assign come from a pool and that pool shifts as we optimize traffic from and to datacenters. We may also move clusters between datacenters within one region for disaster recovery, should that be necessary. The reason why we cannot give every feature slice an IP address is also that the world has none left. We’re out of IPv4 address space, which means we must pool workloads.

The last points are important ones and also shows how antiquated the IP-address lockdown model is relative to current practices for datacenter operations. Because of the IPv4 shortage, pools get acquired and traded and change. Because of automated and semi-automated disaster recovery mechanisms, we can provide service continuity even if clusters or datacenter segments or even datacenters fail, but a client system that’s locked to a single IP address will not be able to benefit from that. As the cloud system packs up and moves to a different place, the client stands in the dark due to its firewall rules. The same applies to rolling updates, which we perform using DNS switches.

The state of the art of no-downtime datacenter operations is that workloads are agile and will move as required. The place where you have stability is DNS.

Outbound Internet IP lockdowns add nothing in terms of security because workloads increasingly move into multitenant systems or systems that are dynamically managed as I’ve illustrated above. As there is no warning, the rule may be correct right now and pointing to a foreign system the next moment. The firewall will not be able to tell. The only proper way to ensure security is by making the remote system prove that it is the system you want to talk to and that happens at the transport security layer. If the system can present the expected certificate during the handshake, the traffic is legitimate. The IP address per-se proves nothing. Also, IP addresses can be spoofed and malicious routers can redirect the traffic. The firewall won’t be able to tell.

With most cloud-based services, traffic runs via TLS. You can verify the thumbprint of the certificate against the cert you can either set yourself, or obtain from the vendor out-of-band, or acquire by hitting a documented endpoint (in Windows Azure Service Bus, it's the root of each namespace). With our messaging system in ServiceBus, you are furthermore encouraged to use any kind of cryptographic mechanism to protect payloads (message bodies). We do not evaluate those for any purpose. We evaluate headers and message properties for routing. Neither of those are logged beyond having them in the system for temporary storage in the broker.

The server having access to Service Bus should have outbound Internet access based on the server’s identity or the running process’s identity. This can be achieved using IPSec between the edge and the internal system. Constraining it to the Microsoft DC ranges it possible, but those ranges shift and expand without warning.

The bottom line here is that there is no way to make outbound IP address constraints work with cloud systems or high availability systems in general.

Categories: Technology

I just got off the call with a customer and had a bit of a déjà vu from a meeting at the beginning of the week, so it looks like the misconception I'll explain here is a bit more common than I expected.

In both cases, the folks I talked to, had the about equivalent of the following code in their app:

var qc = factory.CreateQueueClient(…);
for( int i = 0; i < 1000; i++ )
{
… create message …
qc.BeginSend( msg, null, null );
}
qc.Close();

In both cases, the complaint was that messages were lost and strange exceptions occurred in the logs – which is because, well, this doesn't do what they thought it does.

BeginSend in the Service Bus APIs or other networking APIs as much as BeginWrite on the file system isn't really doing the work that is requested. It is putting a job into a job queue – the job queue of the I/O thread scheduler.

That means that once the code reaches qc.Close() and you have also been mighty lucky, a few messages may indeed have been sent, but the remaining messages will now still sit in that job queue and scheduled for an object that the code just forced to close. With the result that every subsequent send operation that is queued but hasn't been scheduled yet will throw as you're trying to send on a disposed object. Those messages will fail out and be lost inside the sender's process.

What's worse is that writing such code stuffs a queue that is both out of the app's control and out of the app's sight and that all the arguments (which can be pretty big when we talk about messages) dangle on those jobs filling up memory. Also, since the app doesn't call EndSend(), the application also doesn't pick up whatever exceptions are potentially raised by the Send operation and flies completely blind. If there is an EndXXX method for an async operation, you _must_ call that method even if it doesn't return any values, because it might quite well throw you back what went wrong.

So how should you do it? Don't throw messages blindly into the job queue. It's ok to queue up a few to make sure there's a job in the queue as another one completes (which is just slightly trickier than what I want to illustrate here), but generally you should make subsequent sends depend on previous sends completing. In .NET 4.5 with async/await that's a lot easier now:

var qc = factory.CreateQueueClient(…);
for( int i = 0; i < 1000; i++ )
{
… create message …
await task.Factory.FromAsync(qc.BeginSend, qc.EndSend, msg, null );
}
qc.Close();

Keep in mind that the primary goal of async I/O is to not waste threads and lose time through excessive thread switching as threads hang on I/O operations. It's not making the I/O magically faster per-se. We achieve that in the above example as the compiler will break up that code into distinct methods where the loop continues on an I/O thread callback once the Send operation has completed.

Summary:

  1. Don't stuff the I/O scheduler queue with loads of blind calls to BeginXXX without consideration for how the work gets done and completed and that it can actually fail
  2. Always call End and think about how many operations you want to have in flight and what happens to the objects that are attached to the in-flight jobs
Categories: Architecture | Technology

September 6, 2012
@ 07:08 PM

as I thumb through some people's code on Github, I see a fairly large number of "catch all" exception handling cases. It's difficult to blame folks for that, since there's generally (and sadly) very little discipline about exception contracts and exception masking, i.e. wrapping exceptions to avoid bubbling through failure conditions of underlying implementation details.

If you're calling a function and that sits on a mountain of dependencies and folks don't care about masking exceptions, there are many dozens of candidate exceptions that can bubble back up to you and there's little chance to deal with them all or even knowing them. Java has been trying to enforce more discipline in that regards, but people cheat there with "catch all" as well.  There's also a question what the right way tot deal with most exceptions is. In many cases, folks implement "intercept, shrug and log" and mask the failure by telling users that something went wrong. In other common cases, folks implement retries. It's actually fairly rare to see deeply customized and careful reactions to particular exceptions. Again - things are complicated and exceptions are supposed to be exceptional (reminder: throwing exceptions as part of the regular happy path is horrifingly bad for performance and terrible from a style perspective), so these blanket strategies are typically an efficient way of dealing with things.

That all said ...

Never, never ever do this:

try
{
    Work();
}
catch
{
}

And  not even this:

try
{
    Work();
}
catch(Exception e)
{
    Trace.TraceError(e.ToString());
}

Those examples are universally bad. (Yes, you will probably find examples of that type even in the archive of this blog and some of my public code. Just goes to show that I've learned some better coding practices here at Microsoft in the past 6 1/2 years.)

The problem with them is that they catch not only the benign stuff, but they also catch and suppress the C# runtime equivalent of the Zombie Apocalypse. If you get thread-abort, out-of-memory, or stack-overflow exceptions thrown back at you, you don't want to suppress those. Once you run into these, your code has ignored all the red flags and exhausted its resources and whatever it was that you called didn't get its job done and likely sits there as a zombie in an undefined state. That class of exceptions is raining down your call stack like a shower of knife blades. They can't happen. Your code must be defensively enough written to never run into that situation and overtax resources in that way; if it does without you knowing what the root cause is, this is an automatic "Priority 0", "drop-everything-you're-working-on" class bug. It certainly is if you're writing services that need to stay up 99.95%+.

What do we do? if we see any of those exceptions, it's an automatic death penalty for the process. Once you see an unsafe out-of-memory exception or stack overflow, you can't trust the state of the respective part of the system and likely not the stability of the system. Mind that there's also a "it depends" here;  I would follow a different strategy if I was talking about software for an autonomous Mars Rover that can't crash even if its gravely ill.  There I would likely spend a few months on the exception design and "what could go wrong here" before even thinking about functionality, so that's a different ballgame.  In a cloud system, booting a cluster machine that has the memory flu is a good strategy.

Here's a variation of the helper we use:

public static bool IsFatal(this Exception exception)
{
    while (exception != null)
    {
        if (exception as OutOfMemoryException != null && exception as InsufficientMemoryException == null || exception as ThreadAbortException != null || 
exception as AccessViolationException != null || exception as SEHException != null || exception as StackOverflowException != null) { return true; } else { if (exception as TypeInitializationException == null && exception as TargetInvocationException == null) { break; } exception = exception.InnerException; } } return false; }

If you put this into a static utility class, you can use this on any exception as an extension. And whenever you want to do a "catch all", you do this:

try
{
    DoWork();
} catch (Exception e) { if (e.IsFatal()) { throw; } Trace.TraceError(..., e); }

If the exception is fatal, you simply throw it up as high as you can. Eventually it'll end up on the bottom of whatever thread they happen on (where you might log and rethrow) and will hopefully take the process with it. Threads marked as background threads don't do that, so it's actually not a good idea to use those. These exceptions are unhandled, process-terminating disasters with a resulting process crash-dump you want to force in a 24/7 system so that you can weed them out one by one.

(Update) As Richard Blewett pointed out after reading this post, the StackOverflowException can't be caught in .NET 2.0+, at all, and the ThreadAbortException automatically rethrows even if you try to suppress it. There are two reasons for them to be on the list: first, to shut up any code review debates about which of the .NET stock exceptions are fatal and ought to be there; second, because code might (ab-)use these exceptions as fail-fast exceptions and fake-throw them, or the exceptions might be blindly rethrown when marshaled from a terminated background thread where they were caught at the bottom of the thread. However they show up, it's always bad for them to show up.

If you catch a falling knife, rethrow.

Categories: Technology | CLR

We get a ton of inquiries along the lines of “I want to program my firewall using IP ranges to allow outbound access only to my cloud-based apps”. If you (or the IT department) insist on doing this with Windows Azure, there is even a downloadable and fairly regularly updated list of the IP ranges on the Microsoft Download Center in a straightforward XML format.

Now, we do know that there are a lot of customers who keep insisting on using IP address ranges for that purpose, but that strategy is not a recipe for success.

The IP ranges shift and expand on a very frequent basis and cover all of the Windows Azure services. Thus, a customer will open their firewall for traffic for the entire multitenant range of Azure, which means that the customer’s environment can reach their own apps and the backend services for the “Whack A Panda” game just the same. With apps in the cloud, there is no actual security gain from these sorts of constraints; pretty much all the advantages of automated, self-service cloud environments stem from shared resources including shared networking and shared gateways and the ability to do dynamic failover including cross-DC failover and the like that means that there aren’t any reservations at the IP level that last forever.

The best way to handle this is to do the exact inverse of what’s being tried with these rules, and rather limit access to outside resources to a constrained set of services based on the services’ or users’ identity as it is done on our Microsoft corporate network. At Microsoft, you can’t get out through the NAT/Proxy unless you have an account that has external network privileges. If you are worried about a service or user abusing access to the Internet, don’t give them Internet. If you think you need to have tight control, make a DMZ – in the opposite direction of how you usually think about a DMZ.

Using IP-address based outbound firewall access rules constraining access to public cloud computing resources is probably getting a box on a check-list ticked, but it doesn’t add anything from a security perspective. It’s theater. IMHO.

Categories: Architecture | Technology

May 4, 2012
@ 03:15 PM

I’m toying around with very small and very constrained embedded devices right now. When you make millions of a small thing, every byte in code footprint and any processing cycle you can save saves real money. An XML parser is a big chunk of code. So is a JSON parser. Every HTTP stack already has a key/value pair parser for headers. We can use that.

NHTTP stands for NoHyperText Transfer Protocol. Yes, I made that up. No, this is not an April Fool’s joke. Hear me out.

All rules of RFC2616 apply, except for section 7.2, meaning there is must never an entity body on any request or reply. Instead we rely entirely on section 7.1 and its extensibility rule:

  • The extension-header mechanism allows additional entity-header fields to be defined without changing the protocol, but these fields cannot be assumed to be recognizable by the recipient. Unrecognized header fields SHOULD be ignored by the recipient and MUST be forwarded by transparent proxies.

All property payloads are expressed as key/value pairs that are directly mapped onto HTTP headers. No value can exceed 2KB in size and you can’t have more than 32 values per message so that we stay comfortably within common HTTP infrastructure quotas. To avoid collisions with existing headers and to allow for easy enumeration, each property key is prefixed with “P-“

POST /foo HTTP/1.1
Host: example.com
Content-Length: 0
P-Name: “Clemens”

HTTP/1.1 200 OK
Content-Length: 0
P-Greeting: “Hello, Clemens”

(The fun bit is that the Windows Azure Service Bus HTTP API for sending and receiving messages already supports this exact model since we map custom message properties to headers and the HTTP entity body to the body of broker messages and those can be empty)

Categories: Architecture | Technology

imageI’ll admit this is an odd topic for me to write about since my job pretty far away from that part of the world, but our PM team at MS is building a set of demos for which we need some semi-random and fun input data that doesn’t change all that rapidly. So we thought that reading the temperature off hard drives would be a nice input. But how to get at it?

The solution is to use the WMI interface for ATAPI to get at the SMART data. Binging the subject you’ll find a ton of little snippets that have one thing in common: ‘magic’. Somehow, you get at the ‘VendorSpecific’ structure of the SMART data using WMI and then you believe that byte number 115 is the one that holds the temperature. Of course that’s not what someone who’s doing protocol in their day-job would ever settle for. So I’ve been digging around a little and found a description of the structure and grabbed the attribute value list from Wikipedia, shook it all up a little and out came the little program below.

The app grabs the vendor specific array from the ATAPI data, shreds it into a set of structures, and dumps it out. Code here, zip file at the bottom.

// (c) Microsoft Corporation
// Author: Clemens Vasters (clemensv@microsoft.com)
// Code subject to MS-PL: http://opensource.org/licenses/ms-pl.html 
// SMART Attributes and Background: http://en.wikipedia.org/wiki/S.M.A.R.T.
// SMART Attributes Overview: http://www.t13.org/Documents/UploadedDocuments/docs2005/e05171r0-ACS-SMARTAttributes_Overview.pdf
 
namespace SmartDataApp
{
    using System;
    using System.Collections.Generic;
    using System.Management;
    using System.Runtime.InteropServices;
 
    public enum SmartAttributeType : byte
    {
        ReadErrorRate = 0x01,
        ThroughputPerformance = 0x02,
        SpinUpTime = 0x03,
        StartStopCount = 0x04,
        ReallocatedSectorsCount = 0x05,
        ReadChannelMargin = 0x06,
        SeekErrorRate = 0x07,
        SeekTimePerformance = 0x08,
        PowerOnHoursPOH = 0x09,
        SpinRetryCount = 0x0A,
        CalibrationRetryCount = 0x0B,
        PowerCycleCount = 0x0C,
        SoftReadErrorRate = 0x0D,
        SATADownshiftErrorCount = 0xB7,
        EndtoEnderror = 0xB8,
        HeadStability = 0xB9,
        InducedOpVibrationDetection = 0xBA,
        ReportedUncorrectableErrors = 0xBB,
        CommandTimeout = 0xBC,
        HighFlyWrites = 0xBD,
        AirflowTemperatureWDC = 0xBE,
        TemperatureDifferencefrom100 = 0xBE,
        GSenseErrorRate = 0xBF,
        PoweroffRetractCount = 0xC0,
        LoadCycleCount = 0xC1,
        Temperature = 0xC2,
        HardwareECCRecovered = 0xC3,
        ReallocationEventCount = 0xC4,
        CurrentPendingSectorCount = 0xC5,
        UncorrectableSectorCount = 0xC6,
        UltraDMACRCErrorCount = 0xC7,
        MultiZoneErrorRate = 0xC8,
        WriteErrorRateFujitsu = 0xC8,
        OffTrackSoftReadErrorRate = 0xC9,
        DataAddressMarkerrors = 0xCA,
        RunOutCancel = 0xCB,
        SoftECCCorrection = 0xCC,
        ThermalAsperityRateTAR = 0xCD,
        FlyingHeight = 0xCE,
        SpinHighCurrent = 0xCF,
        SpinBuzz = 0xD0,
        OfflineSeekPerformance = 0xD1,
        VibrationDuringWrite = 0xD3,
        ShockDuringWrite = 0xD4,
        DiskShift = 0xDC,
        GSenseErrorRateAlt = 0xDD,
        LoadedHours = 0xDE,
        LoadUnloadRetryCount = 0xDF,
        LoadFriction = 0xE0,
        LoadUnloadCycleCount = 0xE1,
        LoadInTime = 0xE2,
        TorqueAmplificationCount = 0xE3,
        PowerOffRetractCycle = 0xE4,
        GMRHeadAmplitude = 0xE6,
        DriveTemperature = 0xE7,
        HeadFlyingHours = 0xF0,
        TransferErrorRateFujitsu = 0xF0,
        TotalLBAsWritten = 0xF1,
        TotalLBAsRead = 0xF2,
        ReadErrorRetryRate = 0xFA,
        FreeFallProtection = 0xFE,
    }
 
    public class SmartData
    {
        readonly Dictionary<SmartAttributeType, SmartAttribute> attributes;
        readonly ushort structureVersion;
 
        public SmartData(byte[] arrVendorSpecific)
        {
            attributes = new Dictionary<SmartAttributeType, SmartAttribute>();
            for (int offset = 2; offset < arrVendorSpecific.Length; )
            {
                var a = FromBytes<SmartAttribute>(arrVendorSpecific, ref offset, 12);
                // Attribute values 0x00, 0xfe, 0xff are invalid
                if (a.AttributeType != 0x00 && (byte)a.AttributeType != 0xfe && (byte)a.AttributeType != 0xff)
                {
                    attributes[a.AttributeType] = a;
                }
            }
            structureVersion = (ushort)(arrVendorSpecific[0] * 256 + arrVendorSpecific[1]);
        }
 
        public ushort StructureVersion
        {
            get
            {
                return this.structureVersion;
            }
        }
 
        public SmartAttribute this[SmartAttributeType v]
        {
            get
            {
                return this.attributes[v];
            }
        }
 
        public IEnumerable<SmartAttribute> Attributes
        {
            get
            {
                return this.attributes.Values;
            }
        }
 
        static T FromBytes<T>(byte[] bytearray, ref int offset, int count)
        {
            IntPtr ptr = IntPtr.Zero;
 
            try
            {
                ptr = Marshal.AllocHGlobal(count);
                Marshal.Copy(bytearray, offset, ptr, count);
                offset += count;
                return (T)Marshal.PtrToStructure(ptr, typeof(T));
            }
            finally
            {
                if (ptr != IntPtr.Zero)
                {
                    Marshal.FreeHGlobal(ptr);
                }
            }
        }
    }
 
    [StructLayout(LayoutKind.Sequential)]
    public struct SmartAttribute
    {
        public SmartAttributeType AttributeType;
        public ushort Flags;
        public byte Value;
        [MarshalAs(UnmanagedType.ByValArray, SizeConst = 8)]
        public byte[] VendorData;
 
        public bool Advisory
        {
            get
            {
                return (Flags & 0x1) == 0x0; // Bit 0 unset?
            }
        }
        public bool FailureImminent
        {
            get
            {
                return (Flags & 0x1) == 0x1; // Bit 0 set?
            }
        }
        public bool OnlineDataCollection
        {
            get
            {
                return (Flags & 0x2) == 0x2; // Bit 0 set?
            }
        }
 
    }
 
    public class Program
    {
        public static void Main()
        {
            try
            {
                var searcher = new ManagementObjectSearcher("root\\WMI", "SELECT * FROM MSStorageDriver_ATAPISmartData");
 
                foreach (ManagementObject queryObj in searcher.Get())
                {
                    Console.WriteLine("-----------------------------------");
                    Console.WriteLine("MSStorageDriver_ATAPISmartData instance");
                    Console.WriteLine("-----------------------------------");
 
                    var arrVendorSpecific = (byte[])queryObj.GetPropertyValue("VendorSpecific");
 
                    // Create SMART data from 'vendor specific' array
                    var d = new SmartData(arrVendorSpecific);
                    foreach (var b in d.Attributes)
                    {
                        Console.Write("{0} :{1} : ", b.AttributeType, b.Value);
                        foreach (byte vendorByte in b.VendorData)
                        {
                            Console.Write("{0:x} ", vendorByte);
                        }
                        Console.WriteLine();
                    }
 
                }
            }
            catch (ManagementException e)
            {
                Console.WriteLine("An error occurred while querying for WMI data: " + e.Message);
            }
        }
    }
}

 

SmartDataProgram.zip (2.25 KB)
Categories: Technology

For programmers writing distributed systems and are not using queues in them just yet. If you are a message-oriented middleware veteran - move along ;-)

Categories: Technology | MSMQ

My PDC10 session is available online (it was pre-recorded). I talk about the new ‘Labs’ release that we released into the datacenter this week and about a range of future capabilities that we’re planning for Service Bus. Some of those future capabilities that are a bit further out are about bringing back some popular capabilities from back in the .NET Services incubation days (like Push and Service Orchestration), some are entirely new.

One important note about the new release at http://portal.appfabriclabs.com – for Service Bus, this is a focused release that provides mostly only new features and doesn’t provide the full capability scope of the production system and SDK. The goal here is to provide insight into an ongoing development process and opportunity for feedback as we’re continuing to evolve AppFabric. So don’t derive any implications from this release on what we’re going to do with the capabilities already in production.

Click here to go to the talk.

Categories: AppFabric | Azure | Technology | Web Services

As our team was starting to transform our parts of the Azure Services Platform from a CTP ‘labs’ service exploring features into a full-on commercial service, it started to dawn on us that we had set ourselves up for writing a bunch of ‘enterprise apps’. The shiny parts of Service Bus and Access Control that we parade around are all about user-facing features, but if I look back at the work we had to go from a toy service to a commercial offering, I’d guess that 80%-90% of the effort went into aspects like infrastructure, deployment, upgradeability, billing, provisioning, throttling, quotas, security hardening, and service optimization. The lesson there was: when you’re boarding the train to shipping a V1, you don’t load new features on that train –  you rather throw some off.

The most interesting challenge for these infrastructure apps sitting on the backend was that we didn’t have much solid ground to stand on. Remember – these were very early days, so we couldn’t use SQL Azure since the folks over in SQL were on a pretty heroic schedule themselves and didn’t want to take on any external dependencies even from close friends. We also couldn’t use any of the capabilities of our own bits because building infrastructure for your features on your features would just be plain dumb. And while we could use capabilities of the Windows Azure platform we were building on, a lot of those parts still had rough edges as those folks were going through a lot of the same that we went through. In those days, the table store would be very moody, the queue store would sometimes swallow or duplicate messages, the Azure fabric controller would occasionally go around and kill things. All normal –  bugs.

So under those circumstances we had to figure out the architecture for some subsystems where we need to do a set of coordinated action across a distributed set of resources – a distributed transaction or saga of sorts. The architecture had a few simple goals: when we get an activation request, we must not fumble that request under any circumstance, we must run the job to completion for all resources and, at the same time, we need to minimize any potential for required operator intervention, i.e. if something goes wrong, the system better knows how to deal with it – at best it should self-heal.

My solution to that puzzle is a pattern I call “Scheduler-Agent-Supervisor Pattern” or, short, “Supervisor Pattern”. We keep finding applications for this pattern in different places, so I think it’s worth writing about it in generic terms – even without going into the details of our system.

The pattern foots on two seemingly odd and very related assumptions: the system is perfect’ and ‘all error conditions are transient’. As a consequence, the architecture has some character traits of a toddler. It’s generally happily optimistic and gets very grumpy, very quickly when things go wrong – to the point that it will simply drop everything and run away screaming. It’s very precisely like that, in fact.

image

The first picture here shows all key pieces except the Supervisor that I’ll introduce later. At the core we have a Scheduler that manages a simple state machine made up of Jobs and those jobs have Steps. The steps may have a notion of interdependency or may be completely parallelizable. There is a Job Store that holds jobs and steps and there are Agents that execute operations on some resource.  Each Agent is (usually) fronted by a queue and the Scheduler has a queue (or service endpoint) through which it receives reply messages from the Agents.

Steps are recorded in a durable storage table of some sort that has at least the following fields: Current State (say: Disabled, Active), Desired State (say: Disabled, Active), LockedUntil (Date/Time value), and Actor plus any step specific information you want to store and eventually submit with the job to the step agent.

When Things Go Right

The initial flow is as follows:

(1)a – Submit a new job into the Scheduler (and wait)
(2)a – The Scheduler creates a new job and steps with an initial current state (‘Disabled’) in the job store 
(2)b – The Scheduler sets ‘desired state’ of the job and of all schedulable steps (dependencies?) to the target state (‘Active’) and sets the ‘locked until’ timeout of the step to a value in the near future, e.g. ‘Now’ + 2 minutes.
(1)b – Job submission request unblocks and returns

If all went well, we now have a job record and, here in this example, two step records in our store. They have a current state of ‘Disabled’ and a desired state of ‘Active’. If things didn’t go well, we’d have incomplete or partially wedged records or nothing in the job store, at all. The client would also know about it since we’ve held on to the reply until we have everything done – so the client is encouraged to retry. If we have nothing in the store and the client doesn’t retry – well, then the job probably wasn’t all that important, after all. But if we have at least a job record, we can make it all right later. We’re optimists, though; let’s assume it all went well.

For the next steps we assume that there’s a notion of dependencies between the steps and the second steps depends on the first. If that were not the case, the two actions would just be happening in parallel.

(3) – Place a step message into the queue for the actor for the first step; Agent 1 in this case. The message contains all the information about the step, including the current and desired state and also the LockedUntil that puts an ultimatum on the activity. The message may further contain an action indicator or arguments that are taken from the step record.
(4) – After the agent has done the work, it places a completion record into the reply queue of the Scheduler.
(5) – The Scheduler records the step as complete by setting the current state from ‘Disabled’ to ‘Active’; as a result the desired and the current state are now equal.
(6) – The Scheduler sets the next step’s desired state to the target state (‘Active’) and sets the LockedUntil timeout of the step to a value in the near future, e.g. ‘Now’ + 1 minute. The lock timeout value is an ultimatum for when the operation is expected to be complete and reported back as being complete in a worst-case success case. The actual value therefore depends on the common latency of operations in the system. If operations usually complete in milliseconds and at worst within a second, the lock timeout can be short – but not too short. We’ll discuss this  value in more detail a bit later.
(7), (8), (9) are equivalent to (3), (4), (5).

Once the last step’s current state is equal to the current state, the job’s current state gets set to the desired state and we’re done. So that was the “99% of the time” happy path.

image

When Things Go Wrong

So what happens when anything goes wrong? Remember the principle ‘all errors are transient’. What we do in the error case – anywhere – is to log the error condition and then promptly drop everything and simply hope that time, a change in system conditions, human or divine intervention, or – at worst – a patch will heal matters. That’s what the second principle ‘the system is perfect’ is about; the system obviously isn’t really perfect, but if we construct it in a way that we can either wait for it to return from a wedged state into a functional state or where we enable someone to go in and apply a fix for a blocking bug while preserving the system state, we can consider the system ‘perfect’ in the sense that pretty much any conceivable job that’s already in the system can be driven to completion.

In the second picture, we have Agent 2 blowing up as it is processing the step it got handed in (7). If the agent just can’t get its work done since some external dependency isn’t available – maybe a database can’t be reached or a server it’s talking to spews out ‘server too busy’ errors – it may be able to back off for a moment and retry. However, it must not retry past the LockedUntil ultimatum that’s in the step record. When things fail and the agent is still breathing, it may, as a matter of courtesy, notify the scheduler of the fact and report that the step was completed with no result, i.e. the desired state and the achieved state don’t match. That notification may also include diagnostic information. Once the LockedUntil ultimatum has passed, the Agent no longer owns the job and must drop it. It must even not report failure state back to the Scheduler past that point.

If the agent keels over and dies as it is processing the step (or right before or right after), it is obviously no longer in a position to let the scheduler know about its fate. Thus, there won’t be any message flowing back to the scheduler and the job is stalled. But we expect that. In fact, we’re ok with any failure anywhere in the system. We could lose or fumble a queue message, we could get a duplicate message, we could have the scheduler die a fiery death (or just being recycled for patching at some unfortunate moment) – all of those conditions are fine since we’ve brought the doctor on board with us: the Supervisor. 

image

The Supervisor

The Supervisor is a schedule driven process (or thread) of which one or a few instances may run occasionally. The frequency depends on much on the average duration of operations and the expected overall latency for completion of jobs.

The Supervisor’s job is to recover steps or jobs that have failed – and we’re assuming that failures are due to some transient condition. So if the system would expect a transient resource failure condition that prevented a job from completing just a second ago to be healed two seconds later, it’d depend on the kind of system and resource whether that’d be a good strategy.  What’s described here is a pattern, not a solution, so it depends on the concrete scenario to get the  timing right for when to try operations again once they fail.

This desired back-off time manifests in the LockedUntil value.  When a step gets scheduled, the Scheduler needs to state how long it is willing to wait for that step to complete; this includes some back-off time padding. Once that ultimatum has passed and the step is still in an inconsistent state (desired state doesn’t equal the current state)  the Supervisor can pick it up at any time and schedule it.

(1) – Supervisor queries the job store for any inconsistent steps whose LockedUntil value has expired.
(2) – The Supervisor schedules the step again by setting the LockedUntil value to a new timeout and submitting the step into the target actor’s queue
(3) – Once the step succeeds, the step is reported as complete on the regular path back to the Scheduler  where it completes normally as in steps (8), (9) from the happy-path scenario above. If it fails, we simply drop it again. For failures that allow reporting an error back to the Scheduler it may make sense to introduce an error counter that round-trips with the step so that the system could detect poisonous steps that fail ‘forever’ and have the Supervisor ignore those after some threshold.

The Supervisor can pursue a range of strategies for recovery. It can just take a look at individual steps and recover them by rescheduling them – assuming the steps are implemented as idempotent operations. If it were a bit cleverer, it may consider error information that a cooperative (and breathing) agent has submitted back to the Scheduler and even go as far as to fire an alert to an operator if the error condition were to require intervention and then take the step out of the loop by marking it and setting the LockedUntil value to some longer timeout so it’s taken out of the loop and someone can take a look.

At the job-scope, the Supervisor may want to perform recovery such that it first schedules all previously executed steps to revert back to the initial state by performing compensation work (all resources that got set to active are getting disabled again here in our example) and then scheduling another attempt at getting to the desired state.

In step (2)b up above, we’ve been logging current and desired state at the job-scope and with that we can also always find inconsistent jobs where all steps are consistent and wouldn’t show up in the step-level recovery query. That situation can occur if the Scheduler were to crash between logging one step as complete and scheduling the next step. If we find inconsistent jobs with all-consistent steps, we just need to reschedule the next step in the dependency sequence whose desired state isn’t matching the desired state of the overall job.

To be thorough, we could now take a look at all the places where things can go wrong in the system. I expect that survey to yield that at as long we can successfully get past step (2)b from the first diagram, the Supervisor is always in a position to either detect that a job isn’t making progress and help with recovery or can at least call for help. The system always knows what its current intent is, i.e. which state transitions it wants to drive, and never forgets about that intent since that intent is logged in the job store at all times and all progress against that intent is logged as well.  The submission request (1) depends on the outcome of (2)a/b to guard against failures while putting a job and its steps into the system so that a client can take corrective action. In fact, once the job record is marked as inconsistent in step (2)b, the scheduler could already report success back to the submitting party even before the first step is scheduled, because the Supervisor would pick up that inconsistency eventually.

 

Categories: Architecture | SOA | Azure | Technology

In case you need a refresher or update about the things me and our team work on at Microsoft, go here for a very recent and very good presentation by my PM colleague Maggie Myslinska from TechEd Australia 2010 about Windows Azure AppFabric with Service Bus demos and a demo of the new Access Control V2 CTP

Categories: AppFabric | SOA | Azure | Technology | ISB | WCF | Web Services

Room 398, Tuesday June 8
3:15pm-4:30pm

Session Type: Breakout Session
Track: Application Server & Infrastructure
Speaker(s): Maggie Myslinska
Level: 200 – Intermediate

Come learn how to use Windows Azure AppFabric (with Service Bus and Access Control) as building block services for Web-based and hosted applications, and how developers can leverage services to create applications in the cloud and connect them with on-premises systems.

If you are planning on seeing Juval’s and my talk ASI304 at TechEd and/or if you need to know more about how Windows Azure AppFabric enables federated cloud/on-premise applications and a range of other scenarios, you should definitely put Maggie’s talk onto your TechEd schedule as well. 

Categories: AppFabric | Talks | TechEd US | Technology

I put the slides for my talks at NT Konferenca 2010 on SkyDrive. The major difference from my APAC slides is that I had to put compute and storage into one deck due to the conference schedule, but instead of purely consolidating and cutting down the slide count,  I also incorporated some common patterns coming out from debates in Asia and added slides on predictable and dynamic scaling as well as on multitenancy. Sadly, I need to rush through all that in 45 minutes today.

 

Categories: AppFabric | Architecture | Azure | Talks | Technology | Web Services

Anyone using the .NET Service Bus should take a good look at the SocketShifter project started by Rob Blackwell and Richard Prodger from AWS in the UK. AWS stands for Active Web Solutions, not for the "other" AWS. The full project is up on Codeplex.

What makes SocketShifter significant is that it takes the network abstraction of SOAP, WS-Addressing, and the Service Bus full circle and layers the very bottom of that stack - plain TCP connections - as a virtualization on top of the the stack. In other words: SocketShifter allows you to create full-fidelity, bi-directional socket connections through the .NET Service Bus.

We've created something very similar to SocketShifter last year (we're using it for a few internal purposes), but haven't made it public so far. I'm glad that the AWS folks built this, so that you get to play with it.   

Categories: .NET Services | Architecture | Technology

seht Euch mal die Wa an, wie die Wa ta kann. Auf der Mauer auf der Lauer sitzt ‘ne kleine Wa!.

It’s a German children’s song. The song starts out with “… sitzt ‘ne kleine Wanze” (bedbug) and with each verse you leave off a letter: Wanz, Wan, Wa, W, – silence.

I’ll do the same here, but not with a bedbug:

Let’s sing:

<soap:Envelope xmlns:soap=”” xmlns:wsaddr=”” xmlns:wsrm=”” xmlns:wsu=”” xmlns:app=””>
   <soap:Header>
         <addr:Action>http://tempuri.org/1.0/Status.set</addr:Action>
         <wsrm:Sequence>
              <wsrm:Identifier>urn:session-id</wsrm:Identifier>
              <wsrm:MessageNumber>5</wsrm:MessageNumber>
          </wsrm:Sequence>
          <wsse:Security xmlns:wsse=”…”>
               <wsse:BinarySecurityToken ValueType="
http://tempuri.org#CustomToken"
                                         EncodingType="...#Base64Binary" wsu:Id=" MyID ">
                          FHUIORv...
                </wsse:BinarySecurityToken>
               <ds:Signature>
                  <ds:SignedInfo>
                      <ds:CanonicalizationMethod Algorithm="
http://www.w3.org/2001/10/xml-exc-c14n#"/>
                      <ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#md5"/
                      <ds:Reference URI="#MsgBody">
                            <ds:DigestMethod  Algorithm="
http://www.w3.org/2000/09/xmldsig#md5"/> 
                            <ds:DigestValue>LyLsF0Pi4wPU...</ds:DigestValue>
                      </ds:Reference>
                 </ds:SignedInfo>  
                 <ds:SignatureValue>DJbchm5gK...</ds:SignatureValue>
                 <ds:KeyInfo> 
                  <wsse:SecurityTokenReference> 
                    <wsse:Reference URI="#MyID"/>
                   </wsse:SecurityTokenReference>
               </ds:KeyInfo>
             </ds:Signature>
         </wsse:Security>
         <app:ResponseFormat>Xml</app:ResponseFormat>
         <app:Key wsu:Id=”AppKey”>27729912882….</app:Key>
    <soap:Header>
    <soap:Body wsu:Id=”MyId”>
          <app:status>Hello, I’m good</app:status>
     </soap:Body>
</soap:Envelope>

Not a very pretty song, I’ll admit. Let’s drop a some stuff. Let’s assume that we don’t need to tell the other party that we’re looking to give it an MD5 signature, but let’s say that’s implied and so were the canonicalization algorithm. Let’s also assume that the other side already knows the security token and the key. Since we only have a single signature digest here and yield a single signature we can just collapse to the signature value. Heck, you may not even know about what that all means. Verse 2:

<soap:Envelope xmlns:soap=”” xmlns:wsaddr=”” xmlns:wsrm=”” xmlns:wsu=”” xmlns:app=””>
   <soap:Header>
         <addr:Action>http://tempuri.org/1.0/Status.set</addr:Action>
         <wsrm:Sequence>
              <wsrm:Identifier>urn:session-id</wsrm:Identifier>
              <wsrm:MessageNumber>5</wsrm:MessageNumber>
          </wsrm:Sequence>
          <wsse:Security xmlns:wsse=”…”>
               <ds:Signature>
                  <ds:SignatureValue>DJbchm5gK...</ds:SignatureValue>
             </ds:Signature>
         </wsse:Security>
         <app:ResponseFormat>Xml</app:ResponseFormat>
         <app:Key wsu:Id=”AppKey”>27729912882….</app:Key>
    <soap:Header>
    <soap:Body wsu:Id=”MyId”>
          <app:status>Hello, I’m good</app:status>
     </soap:Body>
</soap:Envelope>

Better. Now let’s strip all these extra XML namespace decorations since there aren’t any name collisions as far as I can see. We’ll also collapse the rest of the security elements into one element since there’s no need for three levels of nesting with a single signature. Verse 3:

<Envelope>
   <Header>
         <Action>http://tempuri.org/1.0/Status.set</Action>
         <Sequence>
              <Identifier>urn:session-id</Identifier>
              <MessageNumber>5</MessageNumber>
          </Sequence>
          <SignatureValue>DJbchm5gK...</SignatureValue>
          <ResponseFormat>Xml</ResponseFormat>
          <Key>27729912882….</Key>
    <Header>
    <Body>
       <status>Hello, I’m good</status>
     </Body>
</Envelope>

Much better. The whole angle-bracket stuff and the nesting seems semi-gratuitous and repetitive here, too. Let’s make that a bit simpler. Verse 4:

         Action=http://tempuri.org/1.0/Status.set
         Sequence-Identifier=urn:session-id
         Sequence-MessageNumber=5
         SignatureValue=DJbchm5gK...
         ResponseFormat=Xml
         Key=27729912882….
         status=Hello, I’m good

Much, much better. Now let’s get rid of that weird URI up there and split up the action and the version info, make some of these keys are little more terse and turn that into a format that’s easily transmittable over HTTP. By what we have here application/www-form-urlencoded would probably be best. Verse 5:

         method=Status.set
         &v=1.0
         &session_key=929872172..
         &call_id=5
         &sig=DJbchm5gK...
         &format=Xml
         &api_key=27729912882….
         &status=Hello,%20I’m%20good

Oops. Facebook’s Status.set API. How did that happen? I thought that was REST?

Now play the song backwards. The “new thing” is largely analogous to where we started before the WS* Web Services stack and its CORBA/DCE/DCOM predecessors came around and there are, believe it or not, good reasons for having of that additional “overhead”. A common way to frame message content and the related control data, a common way to express complex data structures and distinguish between data domains, a common way to deal with addressing in multi-hop or store-and-forward messaging scenarios, an agreed notion of sessions and message sequencing, a solid mechanism for protecting the integrity of messages and parts of messages. This isn’t all just stupid.

It’s well worth discussing whether messages need to be expressed as XML 1.0 text on the wire at all times. I don’t think they need to and there are alternatives that aren’t as heavy. JSON is fine and encodings like the .NET Binary Encoding or Fast Infoset are viable alternatives as well. It’s also well worth discussing whether WS-Security and the myriad of related standards that were clearly built by security geniuses for security geniuses really need to be that complicated or whether we could all live with a handful of simple profiles and just cut out 80% of the options and knobs and parameters in that land.

I find it very sad that the discussion isn’t happening. Instead, people use the “REST” moniker as the escape hatch to conveniently ignore any existing open standard for tunnel-through-HTTP messaging and completely avoid the discussion.

It’s not only sad, it’s actually a bit frustrating. As one of the people responsible for the protocol surface of the .NET Service Bus, I am absolutely not at liberty to ignore what exists in the standards space. And this isn’t a mandate handed down to me, but something I do because I believe it’s the right thing to live with the constraints of the standards frameworks that exist.

When we’re sitting down and talk about a REST API, were designing a set of resources – which may result in splitting a thing like a queue into two resources, head and tail - and then we put RFC2616 on the table and try to be very precise in picking the appropriate predefined HTTP method for a given semantic and how the HTTP 2xx, 3xx, 4xx, 5xx status codes map to success and error conditions. We’re also trying to avoid inventing new ways to express things for which standards exists. There’s a standard for how to express and manage lists with ATOM and APP and hence we use that as a foundation. We use the designed extension points to add data to those lists whenever necessary.

When we’re designing a RPC SOAP API, we’re intentionally trying to avoid inventing new protocol surface and will try to leverage as much from the existing and standardized stack as we possibly can – at a minimum we’ll stick with established patterns such as the Create/GetInfo/Renew/Delete patterns for endpoint factories with renewal (which is used in several standards). I’ll add that we are – ironically - a bit backlogged on the protocol documentation for our SOAP endpoints and have more info on the REST endpoint in the latest SDK, but we’ll make that up in the near future.

So - can I build “REST” (mind the quotes) protocols that are as reduced as Facebook, Twitter, Flickr, etc? Absolutely. There wouldn’t be much new work. It’s just a matter of how we put messages on and pluck message off the wire. It’s really mostly a matter of formatting and we have a lot of the necessary building blocks in the shipping WCF bits today. I would just omit a bunch of decoration as things go out and make a bunch of assumptions on things that come in.

I just have a sense that I’d be hung upside down from a tree by the press and the blogging, twittering, facebooking community if I, as someone at Microsoft, wouldn’t follow the existing open and agreed standards or at least use protocols that we’ve published under the OSP and instead just started to do my own interpretative dance - even if that looked strikingly similar to what the folks down in the Valley are doing. At the very least, someone would call it a rip-off.

What do you think? What should I/we do?

Categories: .NET Services | Architecture | Azure | Technology | ISB | Web Services

We've got a discussion forum up on MSDN where you can ask questions about Microsoft .NET Services (Service Bus, Workflow, Access Control): http://social.msdn.microsoft.com/Forums/en-US/netservices/threads/

 

Categories: Talks | Technology | ISB | WCF

A flock of pigs has been doing aerobatics high up over Microsoft Campus in Redmond in the past three weeks. Neither City of Redmond nor Microsoft spokespeople returned calls requesting comments in time for this article. An Microsoft worker who requested anonymity and has seen the pigs flying overhead commented that "they are as good as the Blue Angels at Seafair, just funnier" and "they seem to circle over building 42 a lot, but I wouldn't know why".

In related news ...

We wrapped up the BizTalk Services "R11" CTP this last Thursday and put the latest SDK release up on http://labs.biztalk.net/. As you may or may not know, "BizTalk Services" is the codename for Microsoft's cloud-based Identity and Connectivity services - with a significant set of further services in the pipeline. The R11 release is a major milestone for the data center side of BizTalk Services, but we've also added several new client-facing features, especially on the Identity services. You can now authenticate using a certificate in addition to username and CardSpace authentication, we have enabled support for 3rd party managed CardSpace cards, and there is extended support for claims based authorization.

Now the surprising bit:

Only about an hour before we locked down the SDK on Thursday, we checked a sample into the samples tree that has a rather unusual set of prerequisites for something coming out of Microsoft:

Runtime: Java EE 5 on Sun Glassfish v2 + Sun WSIT/Metro (JAX-WS extensions), Tool: Netbeans 6.0 IDE.

The sample shows how to use the BizTalk Services Identity Security Token Service (STS) to secure the communication between a Java client and a Java service providing federated authentication and claims-based authorization.

The sample, which you can find in ./Samples/OtherPlatforms/StandaloneAccessControl/JavaEE5 once you installed the SDK, is a pure Java sample not requiring any of our bits on either the service or client side. The interaction with our services is purely happening on the wire.

If you are a "Javahead", it might seem odd that we're shipping this sample inside a Windows-only MSI installer and I will agree that that's odd. It's simply a function of timing and the point in time when we knew that we could get it done (some more on that below). For the next BizTalk Services SDK release I expect there to be an additional .jar file for the Java samples.

It's important to note that this isn't just a thing we did as a one-time thing and because we could. We have done a significant amount of work on the backend protocol implementations to start opening up a very broad set of scenarios on the BizTalk Services Connectivity services for platforms other than .NET. We already have a set of additional Java EE samples lined up for when we enable that functionality on the backend. However, since getting security and identity working is a prerequisite for making all other services work, that's where we started. There'll be more and there'll be more platform and language choice than Java down the road.

Just to be perfectly clear: Around here we strongly believe that .NET and the Windows Communication Foundation in particular is the most advanced platform to build services, irrespective of whether they are of the WS-* or REST variety. If you care about my personal opinion, I'll say that several months of research into the capabilities of other platforms has only reaffirmed that belief for me and I don't even need to put a Microsoft hat on to say that.

But we recognize and respect that there are a great variety of individual reasons why people might not be using .NET and WCF. The obvious one is "platform". If you run on Linux or Unix and/or if your deployment target is a Java Application Server, then your platform is very likely not .NET. It's something else. If that's your world, we still think that our services are something that's useful for your applications and we want to show you why. And it is absolutely not enough for us to say "here is the wire protocol documentation; go party!". Only Code is Truth.

I'm also writing "Only Code is Truth" also because we've found - perhaps not too surprisingly - that there is a significant difference between reading and implementing the WS-* specs and having things actually work. And here I get to the point where a round of public "Thank You" is due:

The Metro team over at Sun Microsystems has made a very significant contribution to making this all work. Before we started making changes to accommodate Java, there would have been very little hope for anyone to get this seemingly simple scenario to work. We had to make quite a few changes even though our service did follow the specs.

While we were adjusting our backend STS accordingly, the Sun Metro team worked on a set of issues that we identified on their end (with fantastic turnaround times) and worked those into their public nightly builds. The Sun team also 'promoted' a nightly build of Metro 1.2 to a semi-permanent download location (the first 1.2 build that got that treatment), because it is the build tested to successfully interop with our SDK release, even though that build is known to have some regressions for some of their other test scenarios. As they work towards wrapping up their 1.2 release and fix those other bugs, we’ll continue to test and talk to help that the interop scenarios keep working.

As a result of this collaboration, Metro 1.2 is going to be a better and more interoperable release for the Sun's customers and the greater Java community and BizTalk Services as well as our future identity products will be better and more interoperable, too. Win-Win. Thank you, Sun.

As a goodie, I put some code into the Java sample that might be useful even if you don't even care about our services. Since configuring the Java certificate stores for standalone applications can be really painful, I added some simple code that's using a week-old feature of the latest Metro 1.2 bits that allows configuring the Truststores/Keystores dynamically and pull the stores from the client's .jar at runtime. The code also has an authorization utility class that shows how to get and evaluate claims on the service side by pulling the SAML token out of the context and pulling the correct attributes from the token.

Have fun.

[By the way, this is not an April Fool's joke, in case you were wondering]

Categories: Architecture | IT Strategy | Technology | CardSpace | ISB | WCF

August 28, 2007
@ 12:58 AM
Categories: Technology | CardSpace

Having an Internet Service Bus up in the cloud is not very entertaining unless there are services in the bus. Therefore, I built one (and already showed some of the code basics) that’s hopefully fun to play with and will soon share the first version with you after some scrubbing and pending a few updates to the ISB that will optimize the authentication process. It’s a 0.1 version and an experiment. The code download should be ready in the next two weeks, including those adjustments. But you can actually play with parts of it today without compiling or installing anything. The info is at the bottom of this post.

To make matters really interesting, this sample not only shows how to plug a service into the cloud and call it from some Console app, but is a combo of two rather unusual hosts for WCF services: A Windows Live Messenger Add-In that acts as the server, and a Windows Vista Sidebar gadget that acts as the client.

Since the Silicon Valley scene is currently all over Twitter and clones of Twitter are apparently popping up somewhere every day, I thought I could easily provide fodder to the proponents of the alleged Microsoft tradition of purely relying on copying other’s ideas and clone them as well ;-)  Well, no, maybe not. This is a bit different.

TweetieBot is an example of a simple personal service. If you choose to host it, you own it, you run it, you control it. The data is held nowhere but on your personal machine and it’s using the BizTalk Services ISB to stick its head up into the cloud and at a stable endpoint so that its easily reachable for a circle of friends, bridging the common obstacles of dynamic IPs, firewalls and NAT. No need to use UPnP or open up ports on your router. If you choose to do so, you can encrypt traffic so that there’s no chance that anyone looking at our ISB nor anyone else can see the what’s actually going across the wire.

Right now, lots of the Web 2.0 world lives on the assumption that everything needs to live at central places and that community forms around ad-driven hubs. The mainframe folks had a similar stance in the 70s and 80s and then Personal Computers came along. The pendulum is always swinging and I have little doubt that it will swing back to “personal” once more and that the federation of personal services will seriously challenge the hub model once more.

So what does the sample do? As indicated, TweetieBot is a bot that plugs into a Windows Live Messenger using a simple Add-In. Bart De Smet has a brilliant summary for how to build such Add-Ins. When the Add-In is active and someone chats the bot, it answers politely and remembers the chat line, time and sender. The bird has a leaky long term memory, though. It forgets everything past the last 40 lines.

Where it gets interesting is that the Add-In can stick three endpoints into the BizTalk Services ISB:

  • A Request/Response Web Service that allows retrieving the list of the last 40 (or less) “tweets” and also allows client to submit tweets programmatically.
  • An RSS service that allows (right now) anyone to peek in to the chat log of the last 40 tweets.
  • An Event service that allows subscribers to get real-time notifications whenever a new tweet is recorded.

The accompanying Sidebar Gadget, which is implemented using WPF, is a client for two of these services.

 When you drop the Gadget on the Sidebar, it will prompt for the IM address of the TweetieBot service you’d like to subscribe to. Once you’ve authenticated at the relay using your registered Information Card, the gadget will pull and show the current list of Tweets and subscribe to the Events service for real-time updates. And whenever someone chats the bot, the Sidebar gadget will immediately show the new entry. So even though the Gadget lives on some client machine that’s hidden between several layers of firewalls and behind NAT, it can actually get push-style event notifications through the cloud!

“How do I send events to clients?” must be one of the most frequent questions that I’ve been asked about Web Services in the past several years. Well, this is your answer right here.

While I’m still toying around with the code and the guys on the 1st floor in my building are doing some tweaks on the ISB infrastructure to make multi-endpoint authentication simpler, you can already play with the bot and help me a bit:

Using Windows Live Messenger you can chat (click here) tweetiebot@hotmail.com now. Drop a few lines. If the bot is online (which means that I’m not tinkering with it) it will reply. Then look at this RSS feed [1] and you can see what you and everyone else have been telling the bot recently. Enjoy.

[1] http://connect.biztalk.net/services/tweetiebot/tweetiebot%40hotmail.com/rss

Categories: Technology | BizTalk | ISB | WCF

I wrote a slightly Twitter-inspired, fun app over the weekend that's using the BizTalk Services Connectivity service and relay. In the spirit of Software+Services I'm going to give you half of it [for now] ;-)   You must have the BizTalk Services SDK installed to run the sample.

The server app, which I'm keeping to myself for the next few days as part of the experiment, is an extension (add-in) to Windows Live Messenger. The Messenger add-in monitors all chats with tweetiebot@hotmail.com and keeps circular buffer with the last 40 incoming messages. Using the client (which is in the attached archive), you can get a list of "Tweets" and add a new one (same as chatting)

[ServiceContract(Name = "TweetieBot", Namespace = http://samples.vasters.com/2007/05/tweetiebot)]
public interface ITweetieBot
{
  [OperationContract]
  IList<Tweet> GetTweets(DateTime? since);
  [OperationContract]
  void Tweet(string nickname, string text);
}

or you can subscribe to new tweets and get them as they arrive

[ServiceContract(Name = "TweetieEvents", Namespace = http://samples.vasters.com/2007/05/tweetiebot)]
public interface ITweetieEvents
{
  [OperationContract(IsOneWay=true)]
  void OnTweet(Tweet tweet);
}

The client application hooks up to the client (that lives right on my desktop machine) through the BizTalk Services ISB and the server fires events back through the ISB relay into the client as new tweets arrive. So when you run the attached client app, you'll find that it starts with a dump of the current log of the bot and then keeps spitting out events as they arrive.

The client is actually pretty simple. The EventsClient is the subscriber for the pub/sub service (ConnectionMode.RelayMulticast) that writes out the received events to the console. The rest all happens in Main (parsing an validating the command line argument) and in Run.

    class Program
    {
       
class EventsClient :
ITweetieEvents
        {
           
public void OnTweet(Tweet
tweet)
            {
               
Console.WriteLine("[{0}] {1}:{2}"
, tweet.Time, tweet.User, tweet.Text);
            }
        }

       
static void Main(string
[] args)
        {
           
string usageMessage = "Usage: IMBotClient <messenger-email-address>"
;
           
if
(args.Length == 0)
            {
               
Console
.WriteLine(usageMessage);
            }
           
else
            {
               
if (!Regex.IsMatch(args[0], @"^([\w\-\.]+)@((\[([0-9]{1,3}\.){3}[0-9]{1,3}\])|(([\w\-]+\.)+)([a-zA-Z]{2,4}))$"
))
                {
                   
Console
.WriteLine(usageMessage);
                   
Console.WriteLine("'{0}' is not a valid email address"
);
                }
               
else
                {
                    Run(args[0]);
                }
            }
        }

       
private static void Run(string
emailAddress)
        {
           
EndpointAddress serviceAddress =
               
new EndpointAddress(String.Format(String.Format("sb://{0}/services/tweetiebot/{1}/service"
                                    
RelayBinding.DefaultRelayHostName, Uri
.EscapeDataString(emailAddress))));
           
EndpointAddress eventsAddress =
               
new EndpointAddress(String.Format(String.Format("sb://{0}/services/tweetiebot/{1}/events",
                                   
RelayBinding.DefaultRelayHostName, Uri
.EscapeDataString(emailAddress))));

The URI scheme for services that hook into the ISB is "sb:" and the default address of the relay is encoded in the SDK assemblies. We set up two endpoints here. One for the client channel to fetch the initial list and one for the event subscriber. 

            RelayBinding relayBinding = new RelayBinding();
     

            
ServiceHost eventsHost = new ServiceHost(typeof(EventsClient
));
           
RelayBinding eventBinding = new RelayBinding(RelayConnectionMode
.RelayedMulticast);
            eventsHost.AddServiceEndpoint(
typeof(ITweetieEvents
), eventBinding, eventsAddress.ToString());
            eventsHost.Open();

           
ChannelFactory<TweetieBotChannel> channelFactory = new ChannelFactory<TweetieBotChannel
>(relayBinding, serviceAddress);
           
TweetieBotChannel
channel = channelFactory.CreateChannel();
            channel.Open();

The two *.Open() calls will each prompt for a CardSpace authentication, so you will have to be registered to run the sample. Once you have opened the channels (and my service is running), you'll be able to pull the list of current tweets. Meanwhile, whenever a new event pops up, the EventsClient above will write out a new line.

            IList<Tweet> tweets = channel.GetTweets(lastTime);
           
foreach (Tweet tweet in
tweets)
            {
               
Console.WriteLine("[{0}] {1}:{2}"
, tweet.Time, tweet.User, tweet.Text);
            }

           
Console.WriteLine("Press ENTER to quit at any time"
);
           
Console
.ReadLine();

            eventsHost.Close();
            channel.Close();
            channelFactory.Close();
        }        


So when you run the app, you can chat (anyone can, you don't need to be a buddy) tweetiebot@hotmail.com through Live Messenger and you'll see your chat lines (and potentially others') popping out as events from the service bus.

To run the sample with my bot, you need to call the client with "IMBotClient tweetiebot@hotmail.com" and select your BizTalk Services Information Card twice as you are prompted.

Privacy notice: I'm anonymizing the name of the contact only insofar as I'm clipping anything including and following the "at" sign of the user that chats the bot. So whatever you say is published as "emailname: text line"

IMBotClient.zip (3.61 KB)
Categories: Technology | BizTalk | CardSpace | ISB

We love WS-* as much as we do love Web-Style services. I say "Web-style", full knowing that the buzzterm is REST. Since REST is an architectural style and not an implementation technology, it makes sense to make a distinction and, also, claiming complete RESTfulness for a system is actually a pretty high bar to aspire to. So in order to avoid monikers like POX or Lo-REST/Hi-REST, I just call it what it what this is all about to mere mortals whose don't have an advanced degree in HTTP Philosophy: Services that work like the Web - or Web-Style. That's not to say that a Web-Style service cannot be fully RESTful. It surely can be. But if all you want to do is GET to serve up data into mashups and manipulate your backend resources in some other way, that's up to you. Anyways....

Tomorrow at 10:00am (Session DEV03, Room Delfino 4101A), our resident Lo-REST/Hi-REST/POX/Web-Style Program Manager Steve Maine and our Architect Don Box will explain to you how to use the new Web-Style "Programmable Web" features that we're adding to the .NET Framework 3.5 to implement the server magic and the service-client magic to power all the user experience goodness you've seen here at MIX.

Navigating the Programmable Web
Speaker(s): Don Box - Microsoft, Steve Maine
Audience(s): Developer
RSS. ATOM. JSON. POX. REST. WS-*. What are all these terms, and how do they impact the daily life of a developer trying to navigate today’s programmable Web? Join us as we explore how to consume and create Web services using a variety of different formats and protocols. Using popular services (Flickr, GData, and Amazon S3) as case studies, we look at what it takes to program against these services using the Microsoft platform today and how that will change in the future.
If you are in Vegas for MIX, come see the session. I just saw the demo, it'll be good.
Categories: Talks | Technology | WCF | Web Services

Steve has a great analysis of what BizTalk Services means for Corzen and how he views it in the broader industry context.

Categories: Architecture | SOA | IT Strategy | Technology | BizTalk | WCF | Web Services

December 20, 2006
@ 11:07 PM

It's been slashdotted and also otherwise widely discussed that Google has deprecated their SOAP API. A deadly blow for SOAP as people are speculating? Guess not.

What I find striking are the differences in the licenses between the AJAX API and the SOAP API. That's where the beef is. While the results obtained through the SOAP API can be used (for non-commercial purposes) practically in any way except that "you may not use the search results provided by the Google SOAP Search API service with an existing product or service that competes with products or services offered by Google.", the AJAX API is constrained to use with web sites with the terms of use stating that "The API is limited to allowing You to host and display Google Search Results on your site, and does not provide You with the ability to access other underlying Google Services or data."

The AJAX API is a Web service that works for Google because its terms of use are very prescriptive for how to build a service that ensures Google's advertising machine gets exposure and clicks. That's certainly a reasonable business decision, but has nothing to do with SOAP vs. REST or anything else technical. There's just no money in application-to-application messaging for Google (unless they'd actually set up an infrastructure to charge for software as a service and provide support and proper SLAs for it that is saying more than "we don't make any guarantees whatsoever") while there's a lot of money for them in being able to get lots and lots of people to give them a free spot on their own site onto which they can place their advertising. That's what their business is about, not software.

Categories: IT Strategy | Technology | Web Services

I was sad when "Indigo" and "Avalon" went away. It'd be great if we'd have a pool of cool legal-approved code-names for which we own the trademark rights and which we could stick to. Think Delphi or Safari. "Indigo" was cool insofar as it was very handy to refer to the technology set, but was removed far enough from the specifics that it doesn't create a sharply defined, product-like island within the larger managed-code landscape or has legacy connotations like "ADO.NET".  Also, my talks these days could be 10 minutes shorter if I could refer to Indigo instead of "Windows Communications Foundation". Likewise, my job title wouldn't have to have a line wrap on the business card of I ever spelled it out in full.

However, when I learned about the WinFX name going away (several weeks before the public announcement) and the new "Vista Wave" technologies (WPF/WF/WCF/WCS) being rolled up under the .NET Framework brand, I was quite happy. Ever since it became clear in 2004 that the grand plan to put a complete, covers-all-and-everything managed API on top (and on quite a bit of the bottom) of everything Windows would have to wait until siginificantly after Vista and that therefore the Win16>Win32>WinFX continuity would not tell the true story, that name made only limited sense to stick to. The .NET Framework is the #1 choice for business applications and a well established brand. People refer to themselves as being "dotnet" developers. But even though the .NET Framework covers a lot of ground and "Indigo", "Avalon", "InfoCard", and "Workflow" are overwhelmingly (or exclusively) managed-code based, there are still quite a few things in Windows Vista that still require using P/Invoke or COM/Interop from managed code or unmanaged code outright. That's not a problem. Something has to manage the managed code and there's no urgent need to rewrite entire subsystems to managed code if you only want to add or revise features. 

So now all the new stuff is now part of the .NET Framework. That is a good, good, good change. This says what it all is.

Admittedly confusing is the "3.0" bit. What we'll ship is a Framework 3.0 that rides on top of the 2.0 CLR and includes the 2.0 versions of the Base-Class Library, Windows Forms, and ASP.NET. It doesn't include the formerly-announced-as-to-be-part-of-3.0 technologies like VB9 (there you have the version number consistency flying out the window outright), C# 3.0, and LINQ. Personally, I think that it might be a tiny bit less confusing if the Framework had a version-number neutral name such as ".NET Framework 2006" which would allow doing what we do now with less potential for confusion, but only a tiny bit. Certainly not enough to stage a war over "2006" vs. "3.0".

It's a matter of project management reality and also one of platform predictability that the ASP.NET, or Windows Forms teams do not and should not ship a full major-version revision of their bits every year. They shipped Whidbey (2.0) in late 2005 and hence it's healthy for them to have boarded the scheduled-to-arrive-in-2007 boat heading to Orcas. We (the "WinFX" teams) subscribed to the Vista ship docking later this year and we bring great innovation which will be preinstalled on every copy of it. LINQ as well as VB9 and C# incorporating it on a language-level are very obviously Visual Studio bound and hence they are on the Orcas ferry as well. The .NET Framework is a steadily growing development platform that spans technologies from the Developer Division, Connected Systems, Windows Server, Windows Client, SQL Server, and other groups, and my gut feeling is that it will become the norm that it will be extended off-cycle from the Developer Division's Visual Studio and CLR releases. Whenever a big ship docks in the port, may it be Office, SQL, BizTalk, Windows Server, or Windows Client, and as more and more of the still-unmanaged Win32/Win64 surface area gets wrapped, augmented or replaced by managed-code APIs over time and entirely new things are added, there might be bits that fit into and update the Framework.  

So one sane way to think about the .NET Framework version number is that it merely labels the overall package and not the individual assemblies and components included within it. Up to 2.0 everything was pretty synchronized, but given the ever-increasing scale of the thing, it's good to think of that being a lucky (even if intended) coindicence of scheduling. This surely isn't the first time that packages were versioned independently of their components. There was and is no reason for the ASP.NET team to gratuitously recompile their existing bits with a new version number just to have the GAC look pretty and to create the illusion that everything is new - and to break Visual Studio compatibility in the process.

Of course, once we cover 100% of the Win32 surface area, we can rename it all into WinFX again ;-)  (just kidding)

[All the usual "personal opinion" disclaimers apply to this post]

Update: Removed reference to "Win64".

Categories: IT Strategy | Technology | ASP.NET | Avalon | CLR | Indigo | Longhorn | WCF | Windows

Just so that you know: In addition to the regular breakout sessions, we have a number of interactive chalk talks scheduled here at the Connected Systems Technical Learning Center in the Expo Hall. Come by.

Categories: TechEd US | Technology | Indigo | WCF | Workflow

June 12, 2006
@ 12:48 PM

This is my first TechEd! - as a Microsoft employee. It's of course not my first tech event in my new job (Egypt, Jordan, UK, France, Switzerland, Holland, Belgium, Denmark, Las Vegas/USA, Slovenia, and Israel are on the year-to-date list - on top of three long-distance commutes to Redmond), but the big TechEds are always special. It'll be fun. Come by the Connected Systems area in the exhibition hall and find me to chat if you are here in Boston.

Frankly, I didn't expect a Sunday night keynote to be nearly as well attended as it was, but it looks that experiment mostly worked. The theme of the keynote were Microsoft's 4 Core Promises for IT Pros and Developers nicely wrapped into a video story based on the TV show "24" and with that show's IT superwoman Chloe O'Brian (actress Mary Lynn Rajskub) up on stage with Bob Muglia (our team's VP far up above in my chain of command), who acted as the MC for the show. Finally we got an apology from a Hollywood character for all the IT idiocy the put up on screen. Thanks, Chloe.

Our team has a lot of very cool stuff to talk about at this show. The first highlight is John Justice's WCF Intro talk (Session CON208, Room 157ABC) today at 5:00pm with a "meet the team" panel Q&A session at the end. Block the time.

Categories: Technology | Indigo | WCF

May 30, 2006
@ 11:09 PM

I may work for the firm, but ... As a good corporate citizen I just installed the Windows Update item that got pushed out to me. The "Windows Genuine Advantage Notification" tool that's supposed to notify me -- I am paraphrasing the decription that I clicked away already -- whether my copy of Windows is genuine and to help me acquire a legal copy if it finds out that its not (whatever that might do). I think I have a bit of an understanding why there's such a tool and why that Windows Genuine Advantage program exists. Like it or not, we make software that we ask people to buy.

But why, why, why does that particular update want me to reboot my machine after the install?  

Categories: Technology

Inside the big house....

Back in December of last year and about two weeks before I publicly announced that I will be working from Microsoft, I started a nine-part series on REST/POX* programming with Indigo WCF. (1, 2, 3, 4, 5, 6, 7, 8, 9). Since then, the WCF object model has seen quite a few feature and usability improvements across the board and those are significant enough to justify that I rewrite the entire series to get it up to the February CTP level and I will keep updating it through Vista/WinFX Beta2 and as we are marching towards our RTM. We've got a few changes/extensions in our production pipeline to make the REST/POX story for WCF v1 stronger and I will track those changes with yet another re-release of this series.

Except in one or two occasions, I haven't re-posted a reworked story on my blog. This here is quite a bit different, because of it sheer size and the things I learned in the process of writing it and developing the code along the way. So even though it is relatively new, it's already due for an end-to-end overhaul to represent my current thinking. It's also different, because I am starting to cross-post content to http://blogs.msdn.com/clemensv with this post; however http://friends.newtelligence.net/clemensv remains my primary blog since that runs my engine ;-)

Listening

The "current thinking" is of course very much influenced by now working for the team that builds WCF instead of being a customer looking at things from the outside. That changes the perspective quite a bit. One great insight I gained is how non-dogmatic and customer-oriented our team is. When I started the concrete REST/POX work with WCF back in last September (on the customer side still working with newtelligence), the extensions to the HTTP transport that enabled this work were just showing up in the public builds and they were sometimes referred to as the "Tim/Aaaron feature". Tim Ewald and Aaron Skonnard had beat the drums for having simple XML (non-SOAP) support in WCF so loudly that the team investigated the options and figured that some minimal changes to the HTTP transport would enable most of these scenarios**. Based on that feature, I wrote the set of dispatcher extensions that I've been presenting in the V1 of this series and newtellivision as the applied example did not only turn out to be a big hit as a demo, it also was one of many motivations to give the REST/POX scenario even deeper consideration within the team.

REST/POX is a scenario we think about as a first-class scenario alongside SOAP-based messaging - we are working with the ASP.NET Atlas team to integrate WCF with their AJAX story and we continue to tweak the core WCF product to enable those scenarios in a more straightforward fashion. Proof for that is that my talk (PPT here) at the MIX06 conference in Las Vegas two weeks ago was entirely dedicated to the non-SOAP scenarios.

What does that say about SOAP? Nothing. There are two parallel worlds of application-level network communication that live in peaceful co-existence:

  • Simple point-to-point, request/response scenarios with limited security requirements and no need for "enterprise features" along the lines of reliable messaging and transaction integration.
  • Rich messaging scenarios with support for message routing, reliable delivery, discoverable metadata, out-of-band data, transactions, one-way and duplex, etcetc.

The Faceless Web

The first scenario is the web as we know it. Almost. HTTP is an incredibly rich application protocol once you dig into RFC2616 and look at the methods in detail and consider response codes beyond 200 and 404. HTTP is strong because it is well-defined, widely supported and designed to scale, HTTP is weak because it is effectively constrained to request/response, there is no story for server-to-client notifications and it abstracts away the inherent reliability of the transmission-control protocol (TCP). These pros and cons lists are not exhaustive.

What REST/POX does is to elevate the web model above the "you give me text/html or */* and I give you application/x-www-form-urlencoded" interaction model. Whether the server punts up markup in the form of text/html or text/xml or some other angle-bracket dialect or some raw binary isn't too interesting. What's changing the way applications are built and what is really creating the foundation for, say, AJAX is that the path back to the server is increasingly XML'ised. PUT and POST with a content-type of text/xml is significantly different from application/x-www-form-urlencoded. What we are observing is the emancipation of HTTP from HTML to a degree that the "HT" in HTTP is becoming a misnomer. Something like IXTP ("Interlinked XML Transport Protocol" - I just made that up) would be a better fit by now.

The astonishing bit in this is that there has been been no fundamental technology change that has been driving this. The only thing I can identify is that browsers other than IE are now supporting XMLHTTP and therefore created the critical mass for broad adoption. REST/POX rips the face off the web and enables a separation of data and presentation in a way that mashups become easily possible and we're driving towards a point where the browser cache becomes more of an application repository than merely a place that holds cacheable collateral. When developing the newtellivision application I have spent quite a bit of time on tuning the caching behavior in a way that HTML and script are pulled from the server only when necessary and as static resources and all actual interaction with the backend services happens through XMLHTTP and in REST/POX style. newtellivision is not really a hypertext website, it's more like a smart client application that is delivered through the web technology stack.

Distributed Enterprise Computing

All that said, the significant investments in SOAP and WS-* that were made my Microsoft and industry partners such as Sun, IBM, Tibco and BEA have their primary justification in the parallel universe of highly interoperable, feature-rich intra and inter-application communication as well as in enterprise messaging. Even though there was a two-way split right through through the industry in the 1990s with one side adopting the Distributed Computing Environment (DCE) and the other side driving the Common Object Request Broker Architecture (CORBA), both of these camps made great advances towards rich, interoperable (within their boundaries) enterprise communication infrastructures. All of that got effectively killed by the web gold-rush starting in 1994/1995 as the focus (and investment) in the industry turned to HTML/HTTP and to building infrastructures that supported the web in the first place and everything else as a secondary consideration. The direct consequence of the resulting (even if big) technology islands hat sit underneath the web and the neglect of inter-application communication needs was that inter-application communication has slowly grown to become one of the greatest industry problems and cost factors. Contributing to that is that the average yearly number of corporate mergers and acquisitions has tripled compared to 10-15 years ago (even though the trend has slowed in recent years) and the information technology dependency of today's corporations has grown to become one of the deciding if not the deciding competitive factor for an ever increasing number of industries.

What we (the industry as a whole) are doing now and for the last few years is that we're working towards getting to a point where we're both writing the next chapter of the story of the web and we're fixing the distributed computing story at the same time by bringing them both onto a commonly agreed platform. The underpinning of that is XML; REST/POX is the simplest implementation. SOAP and the WS-* standards elevate that model up to the distributed enterprise computing realm.

If you compare the core properties of SOAP+WS-Adressing and the Internet Protocol (IP) in an interpretative fashion side-by-side and then also compare the Transmission Control Protocol (TCP) to WS-ReliableMessaging it may become quite clear to you what a fundamental abstraction above the networking stacks and concrete technology coupling the WS-* specification family has become. Every specification in the long list of WS-* specs is about converging and unifying formerly proprietary approaches to messaging, security, transactions, metadata, management, business process management and other aspects of distributed computing into this common platform.

Convergence

The beauty of that model is that it is an implementation superset of the web. SOAP is the out-of-band metadata container for these abstractions. The key feature of SOAP is SOAP:Header, which provides a standardized facility to relay the required metadata alongside payloads. If you are willing to constrain out-of-band metadata to one transport or application protocol, you don't need SOAP.

There is really very little difference between SOAP and REST/POX in terms of the information model. SOAP carries headers and HTTP carries headers. In HTTP they are bolted to the protocol layer and in SOAP they are tunneled through whatever carries the envelope. [In that sense, SOAP is calculated abuse of HTTP as a transport protocol for the purpose of abstraction.] You can map WS-Addressing headers from and to HTTP headers.

The SOAP/WS-* model is richer, more flexible and more complex. The SOAP/WS-* set of specifications is about infrastructure protocols. HTTP is an application protocol and therefore it is naturally more constrained - but has inherently defined qualities and features that require an explicit protocol implementation in the SOAP/WS-* world; one example is the inherent CRUD (create, read, update, delete) support in HTTP that is matched by the explicitly composed-on-top WS-Transfer protocol in SOAP/WS-*

The common platform is XML. You can scale down from SOAP/WS-* to REST/POX by putting the naked payload on the wire and rely on HTTP for your metadata, error and status information if that suits your needs. You can scale up from REST/POX to SOAP/WS-* by encapsulating payloads and leverage the WS-* infrastructure for all the flexibility and features it brings to the table. [It is fairly straightforward to go from HTTP to SOAP/WS-*, and it is harder to go the other way. That's why I say "superset".]

Doing the right thing for a given scenario is precisely what are enabling in WCF. There is a place for REST/POX for building the surface of the mashed and faceless web and there is a place for SOAP for building the backbone of it - and some may choose to mix and match these worlds. There are many scenarios and architectural models that suit them. What we want is

One Way To Program

* REST=REpresentational State Transfer; POX="Plain-Old XML" or "simple XML"

Categories: Architecture | SOA | MIX06 | Technology | Web Services

March 14, 2006
@ 02:17 PM

I kicked off quite a discussion with my recent post on O/R mapping. Some people think I am completely wrong, some say that it resonates with their experience, some say I wrote this in mean spirit, some are jubilating. I particularly liked the "Architectural Truthiness" post by David Ing and the comment by "Scott E" in my comments section who wrote:

I've hiked up the learning curve for Hibernate (the Java flavor) only to find that what time was saved in mapping basic CRUD functionality got eaten up by out-of-band custom data access (which always seems to be required) and tuning to get performance close to what it would have been with a more specialized, hand-coded DAL.

As always, it's a matter of perspective. Here is mine: I went down the O/R mapping route in a project in '98/'99 when my group at the company I was working for at the time was building a new business framework. We wrote a complete, fully transparent O/R mapper in C++. You walked up to a factory which dehydrated objects and you could walk along the association links and the object graph would either incrementally dehydrate or dehydrate in predefined segments. We had filtering capabilities that allowed to constrain 1:N collections with large N's, we could auto-resolve N:M relationships, had support for inheritance, and all that jazz. The whole framework was written with code generation in mind. Our generators were fed with augmented UML class diagrams and spit out the business layer, whereby we had a "partial classes" concept where we'd keep the auto-gen'd code in one tree and the parts that were supposed to be filled manually in another part of the code tree. Of course we'd preserve changes across re-gen's. Pure OO nirvana.

While the platforms have evolved substantially in the past 7 years, the fundamental challenges for transparent (fully abstracted) mapping of data to objects remain essentially the same.

  • Given metadata to do the mapping, implementing CRUD functionality with an O/R mapper is quite easy. We had to put lots of extra metadata into our C++ classes back in the day, but with .NET and Java the metadata is all there and therefore CRUD O/R mapping is very low-hanging fruit on both platforms. That's why there's such a large number of projects and products.
  • Defining and resolving associations is difficult. 1:N is hard, because you need to know what your N looks like. You don't want to dehydrate 10000 objects to find a value in one of them or to calculate a sum over a column. That's work that's, quite frankly, best left in the database. I realize that some people worry how that leads to logic bleeding into the database, but for me that's a discussion about pureness vs. pragmatism. If the N is small, grabbing all related objects is relatively easy - unless you support polymorphism, which forces the mapper into all sorts of weird query trees. 1:N is so difficult because an object model is inherently about records, while SQL is about sets. N:M is harder.
  • "Object identity" is a dangerous lure. Every object has its own identifier. In memory that is its address, on disk that's some form of unique identifier. The idea of making the persistent identifier also the in-memory identifier often has the design consequence of an in-memory "running object table" with the goal of avoiding to load the same object twice but rather linking it appropriately into the object graph. That's a fantastic concept, but leads to all sort of interesting concurrency puzzles: What do you do if you happen to find an object you have already loaded as you resolve an 1:N association and realize that the object has meanwhile changed on disk? Another question is what the scope of the object identity is. Per appdomain/process, per machine or even a central object server (hope not)?
  • Transactions are hard. Databases are doing a really good job with data concurrency management, especially with stored procedures. If you are loading and managing data as object-graphs, how do you manage transaction isolation? How do you identify the subtree that's being touched by a transaction? How do you manage rollbacks? What is a transaction, anyways?
  • Changing the underlying data model is hard. I've run into several situations where existing applications had to be, with the customer willing to put money on the table, be integrated with existing data models. O/R mapping is relatively easy of the data model falls out of the object model. If an existing data model bubbles up against an object model, you often end up writing a DAL or the O/R in stored procedures.
  • Reporting and data aggregation is hard. I'll use an analogy for that: It's really easy to write an XPath query against an XML document, but it is insanely difficult to do the same navigating the DOM.

That said, I am not for or against O/R mapping. There are lots of use cases with a lot of CRUD work where O/R saves a lot of time. However, it is a leaky abstraction. In fact is is so leaky that we ended up not using all that much of the funkyness we put into our framework, because "special cases" kept popping up. I am pointing out that there are a lot of fundamental differences between what an RDBMS does with data and how OOP treats data. The discussion is in part a discussion about ISAM vs. RDBMS.

The number of brain cycles that need to be invested for a clean O/R mapping of a complex object model in the presence of the fundamental challenges I listed here (and that list isn't exhaustive) are not automatically less than for a plain-old data layer. It may be more. YMMV.

Now you can (and some already have) ask how all of that plays with LINQ and, in particular, DLINQ. Mind that I don't work in the LINQ team, but I think to be observing a subtle but important difference between LINQ and O/R*: 

  • O/R is object->relational mapping.
  • LINQ is relational->object mapping.

LINQ acknowledges the relational nature of the vast majority of data, while O/R attempts to deny it. LINQ speaks about entities, relations and queries and maps result-sets into the realm of objects, even cooking up classes on the fly if it needs to. It's bottom up and the data (from whatever source) is king. Objects and classes are just tooling. For O/R mapping, the database is just tooling.

Categories: Architecture | Technology

January 14, 2006
@ 04:11 PM

private void exitMenuItem Click(object sender, EventArgs e)
{
    if (runtime != null && runtime.Running)
    {
        ThreadPool.QueueUserWorkItem(delegate(object obj)
        {

            runtime.Stop();
            Invoke(new ThreadStart(delegate()
            {

                this.Close();
            }));
        });
    }
    else
    {
        this.Close();
    }
           
}

Just caught myself coding this up. The shown method is a Windows Forms event handler for a menu item. The task at hand is to check a local variable and if that’s set to a specific value, to switch to a different thread and perform an action (that particular job can’t be done on the Windows Forms STA thread), and to close the form back on the main STA thread once that’s done. I colored the executing threads; yellow is the Windows Forms STA thread, blue is an arbitrary pool thread. Works brilliantly. Sick, eh?

 

[Mind that I am using ThreadStart just because it’s a convenient void target(void) delegate]

Categories: Technology

My blogging efforts this year aren’t really impressive, are they? Well, the first half of the year I was constantly on the road at a ton of conferences and events and didn’t really get the time to blog much. After TechEd Europe, I was simply burned out, took three weeks of vacation to recover somewhat and since then I’ve been trying to get some better traction with areas of Visual Studio that I hadn’t really looked at well enough since Beta 2 came out. And of course there’s WinFX with Avalon and Indigo that I need to track closely.

Now, of course I have the luxury of being able to dedicate a lot of time to learning, because taking all the raw information in, distilling it down and making it more accessible to folks who can’t spend all that time happens to be my job. However, I am finding myself in this rather unfortunate situation again that I will have to throw some things overboard and will have to focus on a set of areas.

At some point rather early in my career I decided that I just can’t track all hardware innovations anymore. I certainly still know what’s generally going on, but when I buy a new machine I am just a client for Dell, Siemens or Alienware like the next guy. I don’t think I could make a qualified enough choice on good hardware components to build my own PC nowadays and I actually have little interest to do so. All that stuff comes wrapped in a case and if I don’t really have to open it to put an extension card in, I have no interest to look inside. The same goes for x86 assembly language. The platform is still essentially the same even after more than a decade, and whenever Visual Studio is catching an exception right in the middle of “the land of no source code”, I can – given I have at least the debug symbols loaded – actually figure out where I am and, especially if it’s unmanaged code, often figure out what’s failing. But if someone were ask me about the instruction set innovations of the Pentium 4 processor generation I’ll happily point to someone else. In 2001, I wrote a book on BizTalk and probably knew the internals and how everything fits together as good as someone outside of Microsoft possibly could know it. BizTalk 2006 is so far away from me now that I’d have a hard time giving you the feature delta between the 2004 and 2006 versions. Over time, many things went over board that way; hardware and assembly being the first and every year it seems like something else needs to go.

The reason is very simple: Capacity. There’s a limit to how much information an individual can process and I think that by now, Microsoft managed to push the feature depth to the point where I can’t fit Visual Studio and related technologies into my head all at once any longer. In my job, it’s reasonable for people to expect that whenever I get up on stage or write an article that I give them the 10%-20% of pre-distilled “essentials” that they need 80% of the time out of a technology and that I know close to 100% of the stuff that’s underneath, so that they can ask me questions and I can give them good, well founded answers.  In the VS2003/NETFX 1.1 wave, I’ve done something (and even if it was just a demo) with every single public namespace and I am confident that I can answer a broad range of customer questions without having to guess.

Enter VS2005 and the summary of trying to achieve the same knowledge density is: “Frustrating”.

There is so much more to (have to) know, especially given that there’s now Team System and the Team Foundation Server (TFS). TFS is “designed to be customized” and the product makes that clear wherever you look. It is a bit like an ERP system in that way. You don’t really have a solution unless you set up a project to customize the product for your needs. Hence, “Foundation” is a more than fitting name choice.

I’ve been in a planning project for the past two weeks where the customer has a well thought out idea about their analysis, design and development processes and while TFS seems like a great platform for them, they will definitely need customized events, a custom-tailored version of MSF Agile with lots of new fields and custom code analysis and check-in policies, integration with and bi-directional data flow from/into “satellite products” such as a proper requirements analysis system, a help-desk solution and a documentation solution, and probably will even want to get into building their own domain specific language (DSL).  All of that is possible and the extensibility of Visual Studio and TFS is as broad as the customer would need it to be, but … who would know? The Team System Extensibility Kit has documentation to extend and customize process templates, work items, source control, the build system, the team explorer, test tools, and reporting and that’s just the headlines. Add the tools for custom domain specific languages (huge!) and the class designer extensibility and you’ve got more than enough stuff to put your head into complete overflow mode.

And at that point you haven’t even looked at the news in Windows Forms (where I like all the new data binding capabilities a lot) and much less at ASP.NET 2.0, which is an entire planet all by itself. Oh, and of course there is the new deployment model (aka “ClickOnce”), SQL 2005, with all those new features (whereby SQL/CLR is the least interesting to me) and BizTalk 2006 and, and, and …

And of course, my core interest is really with the Windows Vista WinFX technology wave including of course Indigo (don’t make me use “WCF”) and for me to a lesser degree Avalon (yes, yes: “WPF”) for which knowing the VS2005/NETFX 2.0 foundation is of course a prerequisite.

What kills me with Avalon, for instance, is that I’ve got quite a bit of the 2D stuff cornered and know how to do things even with just an XML editor in hands, but that the 3D stuff is nicely integrated and sits right there in front of me and I just don’t have the necessary knowledge depth about building 3D apps to do the simplest thing and not the time to acquire that knowledge. And I’ve got such great ideas for using that stuff.

It looks like it’s time to take some things off the table again and that’s an intensely frustrating decision to make.

Don’t get me wrong … I am not complaining about Microsoft adding too many features to the platform. Au contraire! I think that we’re seeing a wave of innovation that’s absolutely fantastic and will enable us out here to build better, more featured applications.

But for a development team to benefit from all these technologies, specialization is absolutely needed. The times when development teams had roughly the same technology knowledge breadth and everyone could do everything are absolutely coming to an end. And the number of generalists who have a broad, qualified overview on an entire platform is rapidly shrinking.

And the development teams will change shape. Come Avalon, and game developers (yes, game developers) will be in great demand in places that are as far away from gaming as you could imagine. I’ve just had meetings with a very conservative and large investment management company and they are thinking hard about adding multi-layer, alpha-blended, 3D data visualizations complete with animations and all that jazz to their trading system UIs, and they’ve got the business demand for it. Of course, the visualization experts won’t be data mining and data analysis or software integration specialists; that’s left for others to do.

For “generalists” like me, these are hard and frustrating times if they’re trying to stay generalists. Deep and consequent specialization is a great opportunity for everyone and the gamble is of course to pick the right technology to dig into and become “the expert” in. If that technology or problem space becomes the hottest thing everyone must have – you win your bet. Otherwise you might be in trouble.

Here are some ideas and “predictions” for such sought-after specialists – but, hey, that’s just my unqualified opinion:

·         Cross-platform Web service wire doctors. As much as all the vendors will try to make their service platforms such as Indigo and Web Sphere and Web Logic and Apache interoperable, customers will try hard to break it all by introducing “absolutely necessary” whacky protocol extensions and by using every extensibility point fathomable. As if that wasn’t hard enough already today where most interop happens with more or less bare-bones SOAP envelopes, just wait until bad ideas get combined with the full WS-* stack, including reliable messaging, security and routing. These folks of course will have to know everything about security aspects like federation, single-sign-on, etc.  

·         Visualization Developers. Avalon is a bit like an advanced 3D gaming engine for business application developers. While that seems like a whacky thing to say – just wait what’ll happen. Someone will come along and build a full-blown ERP or CRM package whose data visualization capabilities and advanced data capture methods will blow everything existing out of the water and everything with white entry fields on a gray background with boring fonts and some plain bar charts will suddenly look not much better than a green-screen app. In 3 years you will have people modeling 3D wire-frames on your development teams or you are history – and the type of app doesn’t really matter much.

·         Development Tool and Process Customization Specialists:  I expect Team System to become the SAP R/3 of software development. No deployment without customization, even if that only happens over time. Brace for the first product update that comes around and changes and extends the foundation’s data structures. I fully expect Team System and the Team Foundation Server to gain substantial market share and I fully expect that there’ll be a lot of people dying to get customization assistance.

 

That said: I am off to learn more stuff.

Categories: IT Strategy | Technology

In the past months I’ve been throwing ideas back and forth with some of my friends and we’re slowly realizing that “Service Oriented Architecture” doesn’t really exist.

The term “Service Oriented Architecture” implies that there is something special about architecture when it comes to service orientation, Web services, XML, loose coupling and all the wonderful blessings of the past 5 years in this wave. But if you look at it, there really isn’t much special about the good, old, proven architectural principles once you throw services into the picture.

I’ll try to explain what I mean. There are five pillars of software architecture (this deserves more elaboration, but I will keep it short for now):

·        Edges: Everything that talks about how the network edge of a software system is shaped, designed, and implemented. SOAP, WSDL, WS-*, IIOP, RMI, DCOM are at home here, along with API and message design and ideas about coupling, versioning, and interoperability.

·        Protocols: Which information do you exchange between two layers of a system or between systems and how is that communication shaped? What are the communication patterns, what are the rules of communication? There are low-level protocols that are technically motivated, there are high-level protocols that are about punting business documents around. Whether you render a security token as a binary thing in DCOM or as an angle brackets thing is an edge concern. The fact that you do and when and in which context is a protocol thing. Each protocol can theoretically be implemented on any type of edge. If you were completely insane, you could implement TCP on top of SOAP and WS-Addressing and some other transport.

·        Runtimes: How do you implement a protocol? You pick an appropriate runtime, existing class or function libraries, and a programming language. That’s an architectural decision, really. There are good reasons why people pick C#, Java, Visual Basic, or FORTRAN, and not all of them are purely technical. Technically, the choice of a runtime and language is orthogonal to the choice of a protocol and the edge technology/design. That’s why I list it as another pillar. You could choose to do everything in Itanium assembly language and start from scratch. Theoretically, nothing stops you from doing that, it’s just not very pragmatic.

·        Control Flow: For a protocol to work and really for any program to work, you need concepts like uni- and bidirectional communication and their flavors such as datagrams, sockets, and queues, which support communication styles such as monologues, dialogues, multicast, or broadcast. You need to ideas like parallelization and synchronization, and iterations and sequences. All of these are abstract ideas. You can implement those on any runtime. They are not dependent on a special edge. They support protocols, but don’t require them. Another pillar.

·        State: This is why we write software (most of it, at least). We write software to transform a system from one state to the next. Press the trigger button and a monster in Halo turns into a meatloaf, and you score. Send a message to a banking system and $100.000 change owners. Keeping track of state, keeping it isolated, current, and consistent or things to consider. Is it ok to have it far away or do you need it close by? Do you cache, and replicate it for the purpose? Is it reference data or business data? Consolidated, preprocessed, or raw? How many concurrent clients have access to the data and how do you deal with the concurrency? All these are questions that have to do with state, and only state. None of this is depends on having a special technology that is being talked through way up above at the edge.

Service orientation only speaks about the edge. Its tenets are about loose coupling, about independent evolution and versioning of contracts, and about technology-agnostic metadata exchange. All this is important to make systems interoperate better and to create systems where the effects of changes to one of its parts to any other part are minimized.

But none of the SO tenets really speaks about architecture [Sidenote: The “autonomy” is about autonomous development teams and not about autonomous computing]. When you look at what’s being advertised as “serviced oriented architecture”, you see either the marketing-glorified repackaging of Ethernet, TCP/IP, and LDAP (“Enterprise Service Bus”), or architectural blueprints that looks strikingly similar to things that people have been doing for a long time with DCE, CORBA, J2EE, COM, or mainframe technologies. What’s different now is that it is easier, cheaper and likely more productive to create bridges between systems. And even that comes at a significant price at this point. Realistically, the (web) services stacks yet have to catch up with these “proprietary” stacks in terms of reliability, security, and performance.

There is Service Orientation – and that’s good. There is appropriate architecture for a problem solution – and that’s good too. These are two things. Combining the two is excellent. But “Service Oriented Architecture” is not an isolated practice. I’ve started to use “SO/A” to make clear that I mean architecture that benefits from service orientation.

I understand that there is an additional architectural tier of “service orientation” that sits at the business/technology boundary. On that meta-level, there could indeed be something like “service oriented architecture” along the lines of the service convergence that Rafal, Pat and myself were discussing on stage at TechEd Europe last year. But when I see or hear SOA discussed, people speak mostly about technology and software architecture. In that context, selling “SOA” as a completely new software architecture school does not (no longer) make sense to me.

Or am I missing something?

Categories: Technology

The little series I am currently writing here on my blog has inspired me to write way too more code than actually necessary to get my point across ;-) So by now I've got my own MSMQ transport for WSE 2.0 (yes, I know that others have written that already, but I am shooting for a "enterprise strength" implementation), a WebRequest/WebResponse pair to smuggle under arbitrary ASMX proxies and I am more than halfway done with a server-side host for MSMQ-to-ASMX (spelled out: ASP.NET Web Services).

What bugs me is that WSE 2.0's messaging model is "asynchronous only" and that it always performs a push/pull translation and that there is no way to push a message through to a service on the receiving thread. Whenever I grab a message from the queue and put it into my SoapTransport's "Dispatch()" method, the message gets queued up in an in-memory queue and that is then, on a concurrent thread, pulled (OnReceiveComplete) by the SoapReceivers collection and submitted into ProcessMessage() of the SoapReceiver (like any SoapService derived implementation) matching the target endpoint. So while I can dequeue from MSMQ within a transaction scope (ServiceDomain), that transaction scope doesn't make it across onto the thread that will actually execute the action inside the SoapReceiver/SoapService.

So now I am sitting here, contemplating and trying to figure out a workaround that doesn't require me to rewrite a big chunk of WSE 2.0 (which I am totally not shy of if that is what it takes). Transaction marshaling, thread synchronization, ah, I love puzzles. Once I am know how to solve this and have made the adjustments, I'll post the queue listener I promised to wrap up the series. The other code I've written in the process will likely surface in some other way.

I was a little off when I compared my problem here to a tail call. Gordon Weakliem corrected me with the term "continuation".

The fact that the post got 28 comments shows that this seems to be an interesting problem and, naming aside, it is indeed a tricky thing to implement in a framework when the programming language you use (C# in my case) doesn't support the construct. What's specifically tricky about the concrete case that I have is that I don't know where I am yielding control to at the time when I make the respective call.

I'll recap. Assume there is the following call

CustomerService cs = new CustomerService();
cs.FindCustomer(customerId);

FindCustomer is a call that will not return any result as a return value. Instead, the invoked service comes back into the caller's program at some completely different place such this:

[WebMethod]
public void
FindCustomerReply(Customer[] result)
{
   ...
}

So what we have here is a "duplex" conversation. The result of an operation initiated by an outbound message (call) is received, some time later, through an inbound message (call), but not on the same thread and not on the same "object". You could say that this is a callback, but that's not precisely what it is, because a "callback" usually happens while the initiating call (as above FindCustomer) has not yet returned back to its scope or at least while the initiating object (or an object passed by some sort of reference) is still alive. Here, instead, processing of the FindCustomer call may take a while and the initiating thread and the initiating object may be long gone when the answer is ready.

Now, the additional issue I have is that at the time when the FindCustomer call is made, it is not known what "FindCustomerReply" message handler it going to be processing the result and it is really not know what's happening next. The decision about what happens next and which handler is chosen is dependent on several factors, including the time that it takes to receive the result. If the FindCustomer is called from a web-page and the service providing FindCustomer drops a result at the caller's doorstep within 2-3 seconds [1], the FindCustomerReply handler can go and hijack the initial call's thread (and HTTP context) and render a page showing the result. If the reply takes longer, the web-page (the caller) may lose its patience [2] and choose to continue by rendering a page that says "We are sending the result to your email account." and the message handler with not throw HTML into an HTTP response on an open socket, but rather render it to an email and send it via SMTP and maybe even alert the user through his/her Instant Messenger when/if the result arrives.

[1] HTTP Request => FindCustomer() =?> "FindCustomerReply" => yield to CustomerList.aspx => HTTP Response
[2] HTTP Request => FindCustomer() =?> Timeout!            => yield to YouWillGetMail.aspx => HTTP Response
                               T+n =?> "FindCustomerReply" => SMTP Mail
                                                           => IM Notification

So, in case [1] I need to correlate the reply with the request and continue processing on the original thread. In case [2], the original thread continues on a "default path" without an available reply and the reply is processed on (possibly two) independent threads and using two different notification channels.

A slightly different angle. Consider a workflow application environment in a bank, where users are assigned tasks and simply fetch the next thing from the to-do list (by clicking a link in an HTML-rendered list). The reply that results from "LookupAndDoNextTask" is a message that contains the job that the user is supposed to do.  

[1] HTTP Request => LookupAndDoNextTask() =?> Job: "Call Customer" => yield to CallCustomer.aspx => HTTP Response
[2] HTTP Request => LookupAndDoNextTask() =?> Job: "Review Credit Offer" => yield to ReviewCredit.aspx => HTTP Response
[3] HTTP Request => LookupAndDoNextTask() =?> Job: "Approve Mortgage" => yield to ApproveMortgage.aspx => HTTP Response
[4] HTTP Request => LookupAndDoNextTask() =?> No Job / Timeout => yield to Solitaire.aspx => HTTP Response

In all of these cases, calls to "FindCustomer()" and "LookupAndDoTask()" that are made from the code that deals with the incoming request will (at least in the theoretical model) never return to their caller and the thread will continue to execute in a different context that is "TBD" at the time of the call. By the time the call stack is unwound and the initiating call (like FindCustomer) indeed returns, the request is therefore fully processed and the caller may not perform any further actions. 

So the issue at hand is to make that fact clear in the programming model. In ASP.NET, there is a single construct called "Server.Transfer()" for that sort of continuation, but it's very specific to ASP.NET and requires that the caller knows where you want to yield control to. In the case I have here, the caller knows that it is surrendering the thread to some other handler, but it doesn't know to to whom, because this is dynamically determined by the underlying frameworks. All that's visible and should be visible in the code is a "normal" method call.

cs.FindCustomer(customerId) might therefore not be a good name, because it looks "too normal". And of course I don't have the powers to invent a new statement for the C# language like continue(cs.FindCustomer(customerId)) that would result in a continuation that simply doesn't return to the call location. Since I can't do that, there has to be a different way to flag it. Sure, I could put an attribute on the method, but Intellisense wouldn't show that, would it? So it seems the best way is to have a convention of prefixing the method name.

There were a bunch of ideas in the comments for method-name prefixes. Here is a selection:

  • cs.InitiateFindCustomer(customerId)
  • cs.YieldFindCustomer(customerId)
  • cs.YieldToFindCustomer(customerId)
  • cs.InjectFindCustomer(customerId)
  • cs.PlaceRequestFindCustomer(customerId)
  • cs.PostRequestFindCustomer(customerId)

I've got most of the underlying correlation and dispatch infrastructure sitting here, but finding a good programming model for that sort of behavior is quite difficult.

[Of course, this post won't make it on Microsoft Watch, eWeek or The Register]

Categories: Architecture | SOA | Technology | ASP.NET | CLR

June 22, 2004
@ 07:34 AM

Achim and myself are currently in a series of very quick rev-cycles for the first public release of the Microsoft/newtelligence FABRIQ project that we did with and for Microsoft EMEA HQ and that was conceived, driven and brillantly managed by my architect colleague Arvindra Sehmi, who gave me the lead architect role for this project.

[Reminder/Disclaimer: this is not a product, but rather a pretty elaborate "how-to" architecture example that comes with an implementation. Hence it's not a supported Microsoft or newtelligence "framework" or an attempt at some general, definitive guidance on how to write services. FABRIQ is an optimized architecture for fast, one-way, message processing within network-distributed nodes consisting of sequences of dynamically composed primitive processing steps. This isn't even trying to get anywhere near the guidance aspirations of Shadowfax, or let alone all the guidance we're getting from the Indigo team or even the parallel work I've been doing for MS by building Proseware.]

We've settled on build 1.0.4173 (yesterday) to be the TechEd version, but we still found a last minute issue where we weren't using WSE 2.0 correctly (not setting the SoapEnvelope.Context.Destination property for use with a bare WSE2 Pipeline in the presence of policy) and when I reassembled the distribution I didn't reset an option that I use for debugging on my machine and that caused installation hiccups over at Achim's machine. Achim commented the hour-long bug hunt with "Ah, you gotta love software!".

There will be hands-on labs at TechEd Europe led by Achim and Jörg that let you play with what we (very much including our friends at Microsoft Argentina and Microsoft EMEA) have built. And even if you don't have a proper use for a one-way queuing network architecture, it actually turned into a fun thing to play with. 

I'll be starting to explain aspects of the spec over the upcoming days and will explain how the architecture works, how you configure it and what its potential uses are. Already posted is some relevant information about the great idea of an XmlReader-based message design (which I designed inspired by the Indigo PDC build) and our use of lightweight transactions.

I am in the boot phase for the next software project right now (proprietary work) and I have identified very many good uses for the FABRIQ model in there already (hint).

Once all parties involved are giving their "thumbs up", we'll also make the source code drop and the binaries available to the public (you) and from there we're looking forward to your input (and contributions?).

Categories: Architecture | TechEd Europe | Technology | FABRIQ

June 8, 2004
@ 08:05 PM

You read it here first. Kimberly Tripp blogs (rss). If you do anything with SQL Server: Subscribe!

Categories: Blog | Technology

Microsoft urgently needs to consolidate all the APIs that are required for provisioning services or sites. The amount of knowledge you need to have and the number APIs you need to use in order to lock down a Web service or Enterprise Services application programmatically at installation time in order to have it run under an isolated user account (with a choice of local or domain account) that has the precise rights to do what it needs to do (but nothing else) is absolutely insane. 

You need to set ACLs on the file system and the registry, you need to modify the local machine's security policy, you need to create accounts and add them to local groups, you must adhere to password policies with your auto-generated passwords, you need to conbfigure identities on Enterprise Services applications and IIS application pools, you need to set ACLs on Message Queues (if you use them), and you need to write WS-Policy documents to secure your WS front. Every single of these tasks uses a different API (and writing policies has none) and most of these jobs require explicit Win32 or COM interop. I have a complete wrapper for that functionality for my app now (which took way too long to write), but that really needs to be fixed on a platform level.

Categories: Technology | ASP.NET | Enterprise Services

June 5, 2004
@ 05:07 PM

It's inevitable, its security improvements are absolutely necessary and it might break your code. I would strongly suggest that you install a test box with XP SP2 now if you haven't already done so. I've had some interesting surprises today.

Categories: Technology

If you are even nearly as ignorant as every other developer including myself about any administrative aspect of SQL Server 2000 beyond the default install, this tool may be for you. I just installed it and I hate the tool already for what it tells me. Good sign.  (Thanks to still-blogless SQL Goddess Kimberly Tripp for the link)

Categories: Technology

June 2, 2004
@ 08:46 AM

Ted Neward has a crusade against DataSets going on on his blog. At this point in time, I really only ever use them inside a service and only at times when I am horribly lazy or when I code under the influence. Otherwise I just go through the rather quick and mostly painless process of mapping plain data structures (generated from schema) to and from stored procedure calls myself. More control, more interoperability, less weight. I really like when my code precisely states how my app interacts with one of the most important components: the data store.

I don't even use DataSets on ASP.NET web pages anymore. The data binding logic allows to bind against anything and if I have a public or protected property "Customer" on my page class that is a data structure, I can simply have an expression like <%# Customer.Name %> on my page and all is good. Likewise, a DataGrid happily binds against anything that is an ICollection (Array, ArrayList, ...) and the DataGridItem.DataItem property will then contain the individual element.  It's just that the design-time support in VS.NET is very DataSet focused and messes things up when you click the wrong things. 

DataSets are really cool for Windows Forms apps. By now I've reached a point where I simply conclude that the DataSet class should be banned from the server-side.

Categories: Technology | ASP.NET

May 19, 2004
@ 09:51 PM

It's rare that I give "must have" tool recommendations, but here is one: If you do any regular expressions work with the .NET Framework, go and get Roy Osherove's Regulator. Roy consolidated a lot of the best things from various free regex tools and added his own wizardry into a pretty cool "RegEx IDE".

Categories: Technology

This here reminds me of the box that's quietly humming in my home office and serves as my domain controller, firewall, RAS and DSL gateway. I upgraded the machine (a rather old 400 MHz Compaq) to Windows Server 2003 the day before I flew to TechEd Malaysia last year (August 23rd, 2003). I configured it to auto-update from Windows Update and reboot at 3:00AM in case updates have been applied.

Guess what: I got back home from that trip (which included 4 days touring the Angkor temples in Cambodia and another 10 days hanging out at the beach on Thailand's Ko Samui island) and realized that I forgot the Administrator password. Tried to get in to no avail. I've got rebuilding the box on my task list, but there's no rush. I haven't really touched or switched off the machine ever since. It keeps patching itself every once in a while and otherwise simply does its job.

Categories: Technology

I am not a “smart client” programmer and probably not even a smart client programmer and this trick has probably been around for ages, but …

For someone who’s been doing WPARAM and LPARAM acrobatics for years and still vividly recalls what (not necessarily good) you can do with WM NCPAINT and WM NCMOUSEMOVE (all that before I discovered the blessings of the server-side), it’s pretty annoying that Windows Forms doesn’t bubble events – mouse events specifically. It is actually hard to believe that that wouldn’t work. But I’ve read somewhere that bubbling events is “new in Whidbey”, so it is probably not my ignorance. Anyways … include the following snippet in your form (add MouseDown, MouseUp, … variants at your leisure), bind the respective events of all labels, panels and all the other “dead stuff” to this very handler (yes, all the controls share that handler) and that’ll have the events bubble up to your form in case you need them. I am just implementing custom resizing and repositioning for some user controls in a little tool and that’s how I got trapped into this. Voilá. Keep it.

 

protected void BubbleMouseMove(object sender, System.Windows.Forms.MouseEventArgs e)
{
      Point pt = this.PointToClient(((Control)sender).PointToScreen(new Point(e.X,e.Y)));
      MouseEventArgs me = new MouseEventArgs(e.Button,e.Clicks,pt.X,pt.Y,e.Delta);
      OnMouseMove(sender,me);
}

Categories: Technology

I didn't spend much time for anything except writing, coding, travel, speaking and being at geek parties in the past weeks. Hence, I am sure I am the last one to notice, but I find it absolutely revolutionary that the Microsoft Visual C++ 2003 command line compiler (Microsoft C/C++ Version 13.1) is now a freebie.

Categories: Technology | CLR

I talked about transactions on several events in the last few weeks and the sample that I use to illustrate that transactions are more than just a database technology is the little tile puzzle that I wrote a while back. For those interested who can't find it, here's the link again. The WorkSet class that is included in the puzzle is a fully functional, lightweight, in-memory 2 phase-commit transaction manager that's free for you to use.

Categories: Architecture | Technology

Sometimes you’re trying to fix a problem for ages (months in our case) and while the solution is really simple, you only find it by complete accident and while looking for something completely different.

(And yes, I do think that we need to finally get a network admin to take care of those things) 

For several months, our Exchange server “randomly” denied communicating with several of our partner’s mail servers. There were several of our partners who were not able to send us email and their emails would always bounce, although we could communicate wonderfully with the rest of the world. What was stunning is that there wasn’t any apparent commonality between the denied senders and the problem came and went and sometimes it would work and sometimes it wouldn’t.

First we thought that something was broken about our DNS entries and specifically about our MX record and how it was mapped to the actual server host record. So we reconfigured that – to no avail. Then we thought it’d be some problem with the SMTP filters in the firewall and spent time analyzing that. When that didn’t go anywhere, we suspected something was fishy about the network routing – it wasn’t any of that either. I literally spent hours looking at network traces trying to figure out what the problem was – nothing.

Yesterday, while looking for something totally different, I found the issue. Some time ago, during one of the email worm floods, we put in an explicit “deny” access control entry into the SMTP service for one Korean and one Japanese SMTP server that were sending us hundreds of messages per minute. The error that we made was to deny access by the server DNS name and not by their concrete IP address.

What happened was that because of this setting our SMTP server would turn around and try to resolve every sender’s IP address back to a host name to perform that check and that’s independent of the “Perform reverse DNS lookup on incoming messages” setting in the “Delivery”/“Advanced Delivery” dialog. It would then simply deny access to all those servers for which it could not find a host name by reverse lookup. I removed those two entries and now it all works again.

Of course, the error isn’t really ours, but the problem was. What’s broken is that the whole reverse DNS lookup story is something that seems (is) really hard to set up and that quite a few mail servers simply don’t reversely resolve into any host name. DNS sucks.

Categories: Technology

On our 4 hour taxi ride from Portoroz in Slovenia to Zagreb in Croatia, I decided to make some significant changes to my Indigo slide deck for the tour. David Chappell called my talk an “impossible problem”, mostly because the scope of the talks we are doing is so broad, ranging from the big picture of Longhorn over Avalon and WinFS to the Whidbey innovations and I am stuck in the middle with a technology that solves problems most event attendees don’t consider to have.

So I took a rather dramatic step: I dropped almost all of the slides that explain how Indigo works. What’s left is mostly only the Service Model’s programming surface. For the eight slides I dropped, I added and modified six slides from the “Scalability” talk written by Steve Swartz and myself for last year’s “Scalable Applications Tour”, which now front the talk. Until about 20 minutes into the “new” talk, I don’t speak about Indigo, at all. And that turned out to be a really good idea.

As I’ve written before, many people who attend the events on this tour have no or little experience in writing distributed applications. In reality, the classic 2-tier client/server model where all user-code sits on one tier (let it be Windows Forms, VB6, ASP or ASP.NET) and the other tier is the database does still rule the world. And, no, the browser doesn’t count as a tier for me; it’s just a “remote display surface” for the presentation tier.

Instead of talking about features, I now talk about motivation. Using two use-case scenarios and high-level architectural overviews modeled after Hotmail and Amazon (that everybody knows) I explain the reasons for why distributing work across multiple systems is a good thing, how such systems can be separated so that each of them can scale independently and what sort of services infrastructure is needed to implement them. And it works great. Once I have the audience nodding to the obvious goodness I can continue and map the requirements to Indigo features and explain the respective aspects of the service model. The flow of the talk is much better and the attendees get more and immediate value out of it. If I weren’t so time constrained I would probably map it to Enterprise Services (now) and Indigo (future) all in the same talk and also show to do the transition. I am sure that I can do that sort of talk at some event this year.

Lesson learned: Less features, more why. With the majority of developers the challenge isn’t about showing them how distributed systems are being improved; it’s about getting them to understand and possibly adopt the idea in the first place.

Categories: Talks | EMEA Longhorn Preview | Technology | Indigo

January 24, 2004
@ 09:45 PM

Don says that BEA's Deputy CTO has missed the cluetrain. I absolutely agree with Don's opinion on this article and what's even worse than the things said is what the article implies. If that is BEA's official position, this is nothing less than an outing that they are passengers in the backseat of a car that is driven by IBM and Microsoft (switching drivers every once in a while) and that they're neither behind the spirit of the whole undertaking nor do they fully understand the specifications they have put their names on. Integration or standardization on the API level has failed miserably in countless attempts and any middleware company (including BEA) that is out there to compete on features must go beyond the least common denominator approach to win over customers. Does BEA have Indigo envy?

Categories: Technology | Indigo

The evolution of in-memory concept of messages in the managed Microsoft Web Services stack(s) is quite interesting to look at. When you compare the concepts of System.Web.Services (ASMX), Microsoft.Web.Services (WSE) and System.MessageBus (Indigo M4), you'll find that this most fundamental element has undergone some interesting changes and that the Indigo M4 incarnation of "Message" is actually a bit surprising in its design.

ASMX

In the core ASP.NET Web Services model (nicknamed ASMX), the concept of an in-memory message doesn't really surface anywhere in the programming model unless you use the ASMX extensibility mechanism. The abstract SoapMessage class, which comes in concrete SoapClientMessage and SoapServerMessage flavors has two fundamental states that depend on the message stage that the message is inspected in: The message is either unparsed or parsed (some say "cracked").

If it's parsed you can get at the parameters that are being passed to the server or are about to be returned to the client, but the original XML data stream of the message is no longer available and all headers have likewise either been mapped onto objects or lumped into a "unknown headers" array. if the message is unparsed, all you get is an text stream that you'll have to parse yourself. If you want to add, remove or modify headers while processing a message in an extension, you will have to read and parse your copy of the input stream (the message text) and write the resulting mesage to an output stream that's handed onwards to the next extension or to the infrastructure. In essence that means that if you had two or three ASMX-style SOAP extensions that implement security, addressing and routing functionality, you'd be parsing the message three times and serializing it three times just so that the infrastructure would parse it yet again. Not so good.

WSE

The Web Services Enhancements (WSE) have a simple, but very effective fix for that problem. The WSE team needed to use the ASMX extensibility point but found that if they'd build all their required extensions using the ASMX model, they'd run into that obvious performance problem. Therefore, WSE has its own pipeline and its own extensibility mechanism that plugs as one big extension into ASMX and when you write extensions (handlers) for WSE, you don't get a stream but an in-memory info-set in form of a SoapEnvelope (that is derived from System.Xml.XmlDocument and therefore a DOM). Parsing the XML text just once and have all processing steps work on a shared in-memory object-model seems optimal. Can it really get any better than "parse once" as WSE does it?

Indigo

When you look at the Indigo concept of Message (the Message class in the next milestone will be the same in spirit, similar in concept and different in detail and simpler as a result), you'll find that it doesn't contain a reference to an XmlDocument or some other DOM-like structure. The Indigo message contains a collection of headers (which in the M4 milestone also come in an "in-memory only" flavor) and a content object, which has, as its most important member, an XmlReader-typed Reader property.

When I learned about this design decision a while ago, I was a bit puzzled why that's so. It appeared clear to me that if you kept the message parsed in a DOM, you'd have a good solution if you want to hand the message down a chain of extensibility points, because you don't need to reparse. The magic sentence that woke me up was "We need to support streaming". And then it clicked.

Assume you want to receive a 1GB video stream over an Indigo TCP multicast or UDP connection (even if you think that's a silly idea - work with me here). Because Indigo will represent the message containing that video as an XML Infoset (mind that this doesn't imply that we're talking about base64-encoded content in an UTF-8 angle bracket document and therefore 2GB on the wire), we've got some problems if there was a DOM based solution. A DOM like XmlDocument is only ready for business when it has seen the end tag of its source stream. This is not so good for streams of that size, because you surely would want to see the video stream as it downloads and, if the video stream is a live broadcast, there may simply be no defined end: The message may have a virtually infinite size with the "end-tag" being expected just shortly before judgment day.

There's something philosophically interesting about a message relaying a 24*7*365 video stream where the binary content inside the message body starts with the current video broadcast bits as of the time the message is generated and then never ends. The message can indeed be treated as being well-formed XML because there is always a theoretical end to it. The end-tag just happens to be a couple of "bit-years" away.

Back to the message design: When Indigo gets its hands on a transport stream it layers a Message object over the raw bits available on the message using an XmlReader. Then it peeks into the message and parses soap:Envelope and everything inside soap:Header. The headers it finds go into the in-memory header collection. Once it sees soap:Body, Indigo stops and backs off. The result of this is a partially parsed in-memory message for which all headers are available in memory and the body of the message is left sitting in an XmlReader. When the XmlReader sits on top of a NetworkStream, we now have a construct where Indigo can already work on the message and its control information (headers) while the network socket is still open and the rest of the message is still arriving (or portions haven't even been sent by the other party).

Unless an infrastructure extension must touch the body (in-message body encryption or signature do indeed spoil the party here), Indigo can process the message, just ignore the body portion and hand it to the application endpoint for processing as-is. When the application endpoint reads the message through the XmlReader it therefore pulls the bits directly off the wire. Another variant of this, and the case where it really gets interesting, is that using this technique, arbitrary large data streams can be routed over multiple Indigo hops using virtualized WS-Addressing addressing where every intermediary server just forwards the bits to the next hop as they arrive. Combine this with publish and subscribe services and Indigo's broadcasting abilities and this is getting really sexy for all sorts of applications that need to traverse transport-level obstacles such as firewalls or where you simply can't use IP.     

For business applications, this support for very large messages is not only very interesting but actually vital for a lot of applications. In our BizTalk workshops we've had quite a few customers who exchange catalogs for engineering parts with other parties. These catalogs easily exceed 1GB in size on the wire. If you want to expand those messages up into a DOM you've got a problem. Consequently, neither WSE nor ASMX nor BizTalk Server nor any other DOM based solution that isn't running on a well equipped 64-bit box can successfully handle such real-customer-scenario messages. Once messages support streaming, you have that sort of flexibility.

The problem that remains with XmlReader is that once you touch the body, things get a bit more complex than with a DOM representation. The XmlReader is a "read once" construct that usually can't be reset to its initial state. That is specifically true if the reader sits on top of a network stream and returns the translated bits as they arrive. Once you touch the message content is the infrastructure, the message is therefore "consumed" and can't be used for further processing. The good news is, though, that if you buffer the message content into a DOM, you can layer an XmlNodeReader over the DOM's document element and forward the message with that reader. If you only need to read parts of the message or if you don't want to use the DOM, you can layer a custom XML reader over a combination of your buffer data and the original XmlReader.

Categories: Technology | Indigo | Web Services

December 23, 2003
@ 09:32 AM

The suggested answer to the first question is incorrect and illustrates a security problem.

Categories: Technology

October 28, 2003
@ 09:31 PM

Stephen's world is ruled by tables, rows and columns. That's fine. WinFS uses the power of Yukon to index anything it stores; but what it stores doesn't end up in rows and columns. There's goodness in mixing these things. I'll keep working on my relational friend. I'll succeed.

Categories: Technology

Brad More is asking whether and why he should use Enterprise Services.

Brad, if you go to the PDC, you can get the definitive, strategic answer on that question in this talk:

“Indigo”: Connected Application Technology Roadmap
Track: Web/Services   Code: WSV203
Room: Room 409AB   Time Slot: Wed, October 29 11:30 AM-12:45 PM
Speakers: Angela Mills, Joe Long

Joe Long is Product Unit Manager for Enterprise Services at Microsoft, a product unit that is part of the larger Indigo group. The Indigo team owns Remoting, ASP.NET Web Services, Enterprise Services, all of COM/COM+ and everything that has to do with Serialization.

And if you want to hear the same song sung by the technologyspeakmaster, go and hear Don:

“Indigo": Services and the Future of Distributed Applications
Track: Web/Services   Code: WSV201
Room: Room 150/151/152/153   Time Slot: Mon, October 27 4:45 PM-6:00 PM
Speaker: Don Box

If you want to read the core message right now, just scroll down here. I've been working directly with the Indigo folks on the messaging for my talks at TechEd in Dallas earlier this year as part of the effort of setting the stage for Indigo's debut at the PDC.

I'd also suggest that you don't implement your own ES clone using custom channel sinks, context sinks, or formatters and ignore the entire context model of .NET Remoting if you want to play in Indigo-Land without having to rewrite a large deal of your apps. The lack of security support of Remoting is not a missing feature; Enterprise Services is layered on top of Remoting and provides security. The very limited scalability of Remoting on any transport but cross-appdomain is not a real limitation; if you want to scale use Enterprise Services. Check out this page from my old blog for a few intimate details on transport in Enterprise Services.

ASMX is the default, ES ist the fall-back strategy if you need the features or the performance and Remoting the the cheap, local ORPC model. 

If you rely on ASMX and ES today, you'll have a pretty smooth upgrade path. Take that expectation with you and go to Joe's session.

[PS: What I am saying there about ES marshaling not using COM/Interop is true except for two cases that I found later: Queued Components and calls with isomorphic call signatures where the binary representation of COM and the CLR is identical - like with a function that passes and returns only ints. The reason why COM/Interop is used in those cases is very simple: it's a lot faster.] 

Categories: PDC 03 | Technology | COM | Enterprise Services | Indigo

October 1, 2003
@ 01:43 PM

We just had a short discussion here at the office on the goodness and badness of using Reflection to cheat around private and protected and cases where it does and doesn't work (it's of course a System.Security.Permissions.ReflectionPermission thing). The discussion brought back memories of that old C/C++ hack that I've been using for almost any application back in my Borland C++ and OWL days:

#define private public
#define protected public
#include <owl.h>

 

Categories: Technology | CLR

Dan Farino, who wrote the CLR based, Regex extended stored procedure on which I put a warning sign yesterday, wrote me an email back (I notified him that I wrote that blog entry) and told me that he just uploaded an unmanaged version. Haven't downloaded it, yet, but it seems to be functionally equivalent. If it's stable and quick, I can think of 2 bazillon uses for this -- including, of course, Regex based XML parsing inside SQL Server (while we wait for Yukon).

Categories: CLR | Technology | XML

August 10, 2003
@ 07:21 AM

Tim Bray, author of the Namespace spec, enlightens us here that I am technically wrong by using the term "empty namespace". Yes. Absolutely. But the "not member of any namespace" vs. "member of the empty namespace" distinction becomes meaningless once you start coding against an XML infrastructure, because in the programming models, there is pretty much always a namespace qualifier. In the .NET Framework, for instance, the NamespaceURI is an empty string in such cases (... and it should really be null).

Categories: Technology | XML

Jon Udell writes in his most recent column that some think that there is a "controversy" about the use of XML namespaces. This seems to stem from the sad fact that RSS never got a proper namespace assigned to it and is one of the hottest schemas specs in the XML space right now. Sorry, there may people in disbelief, but the XML Namespaces spec is normative and referenced in the current XML 1.0 (Second Edition) spec. The empty namespace is a namespace.

Some notable experts — including Sean McGrath, CTO of Propylon in Dublin, Ireland — argue that namespaces should be avoided for that reason.

You can't avoid namespaces, they are automatic if you use XML today. If you don't declare one for your vocabulary/schema, you are contributing to a large cloud of "stuff" sitting in the "not part of any" namespace. The "empty" namespace (which essentially says "not part of any namespace") is the XML equivalent of a the "these are just some tags" garbage dump.

Categories: Web Services | Technology | XML

A good deal of yesterday and some of this morning I've been fiddling around with nested ASP.NET DataGrids. Binding nested grids is pretty easy and they show all you want, but editing items in a nested grid just doesn't work as easy as editing in a simple Grid. In fact, it doesn't work at all. What happens is that you can put a nested grid into edit mode, but you never seem to be able to catch any Update/Cancel events from the edited item.

I tried to look for a solution by asking Google, but the answers that I found were very unsatisfactory, since there was no explanation on why exactly it doesn't work. So, here's why ... and it's very, very simple: Nested DataGrids lose all of their ViewState on any roundtrip. That seems to be some sort of problem that's actually related to how the entire TemplateControl infrastructure works, but that's what it is.

Since that's the case, the EditItemIndex isn't preserved across the roundtrip and the DataGrid doesn't know how to dispatch the Update event. Now, how do I work around it? Again, pretty simple: You need to store the EditItemIndex (and SelectedItemIndex, etc.) of the nested data-grid in the Page's ViewState whenever they change (Edit event, Cancel event, etc.), keyed by the UniqueID of the DataGrid and a matching suffix. When you reload the sub-grid on a roundtrip, recover the value(s) from the ViewState of the page and DataBind(). 

I've put the workaround into my current working copy for dasBlog (the OPML editor gets a hierarchical editor now) and it works great. Next build that gets released, you can look at it.

 

 

Categories: Technology | ASP.NET

July 24, 2003
@ 10:37 AM

The UDDI OPML list is now being cached in memory and refreshed every three minutes on a background thread. Unless the AppDomain recycles, the responses should be instantaneous now, unless you are doing a query on categories, because that needs to happen live because I don't want to cache each and every request combination on our server here and have it take up too much memory for this demo.

I looked at IBM's UDDI service yesterday and at least the web interface is by several magnitudes quicker (from here). I should probably play with their service a little to improve response times. ( ... just did, IBM and SAP aren't any faster)

Categories: Technology | UDDI

July 22, 2003
@ 05:05 PM
Category-based UDDI RSS search. Done.
Categories: Technology | UDDI

July 22, 2003
@ 03:18 PM

Matevz Gacnik got it and put himself on the list 

(Note: I set the output cache to expire every 180 seconds and therefore it mostly takes a little while to rebuild by getting the fresh data from UDDI) 

Categories: Technology | UDDI

July 22, 2003
@ 09:57 AM
I wrote a little OPML renderer that grabs all RSS feeds that are registered in Microsoft's UDDI registry
Categories: Technology | UDDI

July 21, 2003
@ 10:10 PM
A quick overview about my changes to BlogX and why it just isn't BlogX anymore. And, yes, you'll get it.
Categories: Blog | Technology | ASP.NET

BloggerAPI, MT API, MetaWeblog API, Comment API, Pingback API, Trackback  ...  are you nuts?

I must admit that until last week I didn't really pay much close attention to all the blogging related APIs and specs beyond "keeping myself informed". Today I copied my weekend's work over to this server and now I have all of them implemented as client and server versions. Sam's and Mark's validator is happy with my RSS 2.0 feed and the experimental Atom (Pie/Echo) feed.

I have to say ... the state of affairs in this space is absolutely scary. Most of the specs, especially for the APIs are lacking proper information detail, are often too informal with too much room for ambiguities and you need to be lucky to find a reasonably recent one. Sam laments that people don't read specs carefully and I agree, but I would argue that the specs need to be written carefully, too. It also seems that because the documentation on expected behavior is so thin, everybody implements their own flavor and extensions and not only do the APIs have huge overlap, but it seems like any random selection of offline blogging tools will use its own arbitrary selection of these APIs in any random order. Since my implementation didn't "grow" over time, but I implemented it all in one shot essentially only since last Thursday and had to look at this all at once and what I found was just saddening. All of this has to be consolidated and it will be.

I am all for the Atom project and creating a consolidated, SOAP-based API for all blogging functions that the aforementioned APIs offer. XML-RPC was a good thing to start with but its time is up.  I am also for replacing RSS x.x with a spec that's open and under the umbrella of a recognized standards body and not of a law school, that's XML as of ca. 2003 and not as of ca. 1998, and that's formally documented (with a proper schema). What's there right now smells all like "let's hack something up" and not very much like serious software engineering. Ok, it's proven that it all works, but how about dumping the prototypes now?

 

Categories: Blog | Technology | ASP.NET | Weblogs | Atom

July 19, 2003
@ 08:39 AM

This morning I got up early (I going to be picked to play Paintball in an hour or so) and implemented image and attachment uploads for the blogging site. This is the test for the live site.

[Here's a copy of the SoapExtension Wizard for Visual Studio.NET: ASPNETSoapExtensionWizard.zip (53.82 KB)]

 

Categories: Technology | ASP.NET | Blog

July 18, 2003
@ 08:28 PM

Productivity and ASP.NET

It took me less than an hour to implement, test and deploy pingback support for this blog here using ASP.NET and XML-RPC.NET (and that includes reading the spec). Yesterday and today, it took me less than 2hrs total (including addressing two comments/suggestions/corrections from Sam Ruby) to get (n)echo/pie/atom support working so that it can be validated.

Categories: Technology | ASP.NET

A little IHttpModule implementation for ASP.NET that maps between URLs using regular expressions. In use here.
Categories: Technology | ASP.NET | Blog

I couldn't find one, so I made a WS-PolicyAttachment UDDI bootstrap file for import into Windows UDDI Services.

When I put that together, I ran into a bug in the spec. Point 5.1 shows the tModel for the remote policy reference. The tModelKey shown there is

<tModel tModelKey="uuid:0b1b5a47-bebf-3b7d-9802-f2dd80a91adebd3966a8-faa5-416e-9772-128554343571">

which is a bit long for a uuid, isn't it? Correct is the following (as the spec later explains):

<tModel tModelKey="uuid:0b1b5a47-bebf-3b7d-9802-f2dd80a91ade">

The bug even survived the revision from 1.0 to 1.1, which makes me wonder whether anyone ever reads these specs in any depth

Categories: Web Services | Technology | UDDI

Current output of this is that:

---------------
Feed Name: Clemens Vasters: Enterprise Development & Alien Abductions
Access Point:
http://radio.weblogs.com/0108971/rss.xml
RSS Version: 2.0
Description: Clemens Vasters' Weblog, Language: en
Press any key to continue

Categories: Technology | UDDI

Getting serious about UDDI, Step 1

If you search uddi.microsoft.com for "Services" by the "RSS - Version 2.0" tModel, there's exactly one entry at this time. Mine. Fix that.

Using the UDDI SDK 2.0 (part of the Platform SDK), the following snippet is a simple console app that lists all weblogs registered with the RSS 2.0 tModel in the Microsoft UDDI registry [thanks to Karsten Januszewski for his doc, tModel bootstrap and code]. The advantage? If aggregators were able to remember my service key instead or in addition to my absolute access point (right now http://radio.weblogs.com/0108971/rss.xml), I could move the RSS feed around to any arbitrary locations without any pain and clients would still be able to find it using a simple lookup into the registry. And the infrastructure is all there. No future thing.

using System;
using Microsoft.Uddi;
using Microsoft.Uddi.TModels;
using Microsoft.Uddi.Services;

namespace ListRSS20Feeds
{
    class MainApp
    {
        static void Main(string[] args)
        {
            UddiConnection uddiConnection = new UddiConnection();
            /* RSS 2.0 tModel key from config */
            string RSS20TModelKey = "uuid:bacbe300-4b2b-11d7-bc51-000629dc0a53";
            /* setup the UDDI parameters on the connection */
            uddiConnection.InquireUrl = "http://uddi.microsoft.com/inquire";


            /* create a UDDI FindService and ServiceList objects */

            FindService fs = new FindService();
            ServiceList sl = new ServiceList();
            /* add the rss tModel key */
            fs.TModelBag.Add( RSS20TModelKey );


            try
            {
                /* send to uddi */
                sl = fs.Send(uddiConnection);
            }
            catch ( Exception ex )
            {
                Console.WriteLine( ex.Message );
                return;
            }

            /* create FindBinding and BindingDetail objects */
            FindBinding fb = new FindBinding();
            BindingDetail bd = new BindingDetail();
            fb.TModelBag.Add( RSS20TModelKey );

            /* get the bindings */
            foreach ( ServiceInfo si in sl.ServiceInfos  )
            {
                /* set the serviceKey */
                fb.ServiceKey = si.ServiceKey;
                try
                {
                    /* send to UDDI */
                    bd = fb.Send(uddiConnection);
                }
                catch ( Exception ex )
                {
                    Console.WriteLine( ex.Message );
                    return;
                }
                foreach ( BindingTemplate bt in bd.BindingTemplates )
                {
                    Console.WriteLine(

                        "---------------\n"+
                        "Feed Name: {0}\n"+
                        "Access Point: {1}", 
                        si.Names[0].Text, 
                        bt.AccessPoint.Text);

                    /* get out the tModelInstanceInfo for the tModelKey 
                        that represents RSS 2.0 and get the version info */
                    foreach ( TModelInstanceInfo tmii in bt.TModelInstanceInfos )
                    {
                        if ( tmii.TModelKey == RSS20TModelKey ) 
                        {
                            if ( tmii.InstanceDetails.InstanceParameters != null )
                                Console.WriteLine("RSS Version: {0}", 
                                  tmii.InstanceDetails.InstanceParameters);
                            else
                                Console.WriteLine("RSS Version: n/a");
                        }
                    }
                    
                    if ( bt.Descriptions.Count > 0 )
                    {
                        Console.WriteLine( "Description: {0}, Language: {1}", 
                           bt.Descriptions[0].Text, 
                           bt.Descriptions[0].IsoLanguageCode);
                    }
                }
            }
        }
    }
}

Categories: Technology | UDDI

H2/2003, moving up one notch on the WS stack.

Yesterday, all the travel madness of H1/2003 which begun in January has officially ended. I have a couple of weeks at the office ahead of me and that's, even if it may sound odd, a fantastic thing. The first half of the year and quite a bit of last year too, I spent most of my research time working deep down in the public and not-so-public extensibility points of Enterprise Services and Web Services, trying to understand the exact details of how they work, figuring out how to inject more and tweak existing functionality and whether certain development patterns such as AOP could enhance the development experience and productivity of my clients (and all of you out there who are reading my blog). I've been in 21 countries in this first half of the year alone and at about 40 different events, talking about what I found working with these technologies on some more and some less serious projects and doing that and speaking to people I learned a lot and I also think that I helped to inspire quite a few people's thinking.

Now it's time to move on and focus on the bigger picture. Starting with version 2.0 of Microsoft Web Service Enhancements that's due out by end of this summer, Web Services will finally become less Web and more Services. The WSE 2.0 stack will break the tie between HTTP and SOAP by enabling other transports and they'll add support for some of the most important WS-* specs such as WS-Policy, WS-Addressing and related specs. The now released UDDI services in Windows Server 2003 put a serious local UDDI registry at my fingertips. BizTalk Server 2004's new orchestration engine looks awesome. There's a lot of talk about Service Oriented Architectures, but too less to see and touch for everyone to believe that this stuff is real. I think that's a good job description for H2/2003. My UDDI provider key: 7f0baedf-3f0d-4de1-b5e7-c35f668964d5

Categories: Web Services | Technology | UDDI