I just read the post Privacy in the Smart Home - Why we need an Intranet of Things by Kai Kreuzer from the openHAB.org project in which he is advocating an "Intranet of Things" enabled by a local integration hub, which is a model I refer to as "local gateway" in my "Service Assisted Communication" for Connected Devices post:

All connections to and from the device are made via or at least facilitated via a gateway, unless the device is peered with a single service, in which case that service takes on the role of the gateway. Eventual peer-to-peer connections are acceptable, but only if the gateway permits them and facilitates a secure handshake. The gateway that the device peers with may live on the local network and thus govern local connections. Towards external networks, the local gateway acts as a bridge towards the devices and is itself connected by the same set of principles discussed here, meaning it's acting like a device connected to an external gateway.

OpenHAB is an integration hub and automation software for home automation that runs on top the JVM across a range of platforms and also scales down to the Raspberry Pi. A motivation for Kreuzer's post seems to be to announce the new companion service:

To cater for secure remote access, we have furthermore just started a private beta of a new service: my.openHAB will provide you the ability to connect to your openHAB over the Internet, securely, through commercial SSL certificates, without a need for making any holes in your home router and without a need for a static IP or dynamic DNS service. It does not store any data, but simply acts as a proxy that blindly forwards the communication.

The reason I'm picking up the post and comment on it here is twofold: First, the way how openHAB acts towards devices and how it federates with its "my.openHAB" service is a splendid illustration of the "Service Assisted Communicated" principles I spelled out in my write-up. Mind that I explicitly mentioned there that they're broadly implemented already and this is supporting evidence. Second, while I agree with the architectural foundation and I do find a pure "Intranet of Things" notion interesting, I don't think that's how things will play out in the long run, and I also believe there is, very unfortunately, a bit too much fear-mongering involved in trying to bring the point home. I also think there's a discussion to be made about explicit privacy tradeoffs.

The key concerns that are being raised are the following:

  • You are not the owner of your data; everything is sent to the cloud server, if you wish it or not. What happens with the data is not decided by yourself, but by the cloud service. You will only receive results of the data mining processes, be it as "smart" actions being triggered or as a colorful time series chart. I always thought of this as a no-go and wondered that other people did not mind this fact. […]
  • Even if you have full trust the cloud service company, the NSA affair should have shown you that your data is sniffed and stored in dubious places around the world. […]
  • Every device that creates a connection to a cloud service is a potential security risk. Most of these devices are embedded systems and many lack the possibility of receiving firmware updates for vulnerabilities. There are already many examples where such systems have been hacked - e.g. for heating systems or IP cameras. […]

Let's look at these.

First, whether or not you are owner of your data when using a cloud service is a matter of the service's clear and explicit privacy policy as well as of legal regulation. I am personally an advocate of regulatory frameworks governing the use of telemetry and am pointing out the importance of implementation of clear privacy policies including ways for customers to opt out of data collection and having any previously collected data provably destroyed.

But I also believe that telemetry data collected by manufacturers of devices will yield better products and will help making these products more reliable as we use them.

The privacy problem is not one of "cloud". The problem is whether you trust the manufacturer and service provider and whether you understand the policies. If the privacy policy is 5 pages of 5pt legalese, return the product to the store or don't connect it to a network, ever. Because however good your intentions about keeping things private are, if a regular consumer buys a network-enabled appliance and connects it to a local network, that device will, in very many cases, promptly phone home to the manufacturer saying at least that it has been activated and it will do that ignoring the fact whether there's a home hub on the network. That is not a cloud problem. That is a device problem. What is the device gesture to opt into the "customer experience improvement program"?

I strongly believe that very many customers, indeed the vast majority, will gladly make a privacy tradeoff if they see obvious benefits and when the service provider is honest and transparent about what is being collected, what the customer's rights are, and if the customer can trust that an opt-out leads to an effective destruction of the raw data they've contributed and any data that could further be traced to their identity. There's obviously a gray zone on aggregate data. Opting out now clearly won't change the count of "How many dishwashers were activated in the city of Mönchengladbach in January 2014". Earning trust with concerned customers means to draw the line on that gray zone clearly. What if the manufacturer cheats? Sue them along with 10,000 of your best friends.

The way we can make this scale is by supervision. I believe it would possible to have a globally standardized and auditable privacy practices seal along the lines of ISO 900x by 2018 and ways to anchor this privacy seal into the consumer hive-mind by that time. "If it doesn't carry this label, don't buy this product." The existence of that seal will also make competitors having a very close eye at their respective practices and be loud if they see the other infringing.

Once there is clarity and auditable process on privacy practices and data collection is opt-in, only then we can even get to the question of consumer choice. All of this is a prerequisite for even enabling consumers to make a choice between a local hub and a cloud service to connect your devices to. Without such a framework, manufacturers can largely do whatever they like once you give the devices network.

What benefits would customers trade some of their data privacy for? Remote control of devices around the home, energy efficiency management for their heating and cooling systems, avoiding utility grid black-/brownouts with service credit for opting in, device feature updates, general usage statistics, seamless home/mobile/work user experiences, rental property management, and more. Most scenarios that go beyond simple remote control and local stats do require data pooling in the cloud and producing insights that manufacturers, service providers, and utilities can provide higher level services on top of. Some people will find it creepy when they get a notification that the grinder in their coffee-maker is about to fail due to wear and tear and whether they want to have it replaced – I, for one, would welcome that with open arms.

Kreuzer's second point about the NSA and other government agencies is one that I'm sympathetic with, but it's also a sad one to bring up, because he's announcing a service that falls into the same category as all cloud services and he's assuming that an Intranet is generally safe from snooping. Let me preface this with the reminder that I'm speaking for myself and not at all for my employer here. Fact of the matter is that when the government of the country where the gateway service is hosted walks in with a court warrant, the good intentions come to a screeching halt or the service does. It is in the best commercial interest of all public cloud providers to keep customers data private as much as it is in the altruistic best interest of openHAB. The motivations may differ, but the goal is the same. We all want to lock the spies out and will do so until the Gewaltmonopol (state's monopoly on physical force) shows up. The state's ability to force providers to act against their will and goals also extends to the telecom operators and has done that for decades. If you bring up "NSA" as an argument for keeping things in the Intranet, you will also have to allow the conspiracy theory that operator-supplied cable and DSL modem-devices can be abused as bridge-heads into local area networks.

With this I am not defending, belittling, or justifying anything that we've learned about recently from the Snowden disclosures. I believe we've been betrayed by the governments, but fixing this is a political cleanup task and not a technical one. If the state shows up with a court order (even secretly if allowed by law) they're entitled to whatever that order says. If there's no such order, the government is clearly acting against the law – which computer systems can't read and interpret. What we can do is tighten security across the board, but it's an illusion to consider the "Intranet" a safe haven.

Which gets me to the third point about "every device that creates a connection to a cloud service is a potential security risk" which I consider to be tragically shortsighted. If we broaden the scope, though, it becomes instantly true: "every device that creates a connection is a potential security risk".

Home Intranets are the least defended and most negligently secured network spaces in existence. If you connect a BluRay player or Smart-TV or the legendary Refrigerator to your home network, that device has a very broad bouquet of options to see things and talk to things. And you will have no idea what it actually does unless you're skilled enough to use a tool like Wireshark for traffic analysis, which is only true for total network geeks.

In all actuality, it frightens me much less that the Refrigerator sends an hourly health-status package to the manufacturer than the Refrigerator having any access to anything on my network without me explicitly approving that. For the exact reasons that Kreuzer cites: Most of these devices are embedded systems and many lack the possibility of receiving firmware updates for vulnerabilities.

I want those devices off my private network rather than on it for those exact reasons. Exactly contrary to the "Intranet" mantra, I would want devices that want to piggyback on my home network to be banned from talking to anything but the outside network either by ways of a special flag in the MAC address and forced routing rules and/or by forcing them into an IPSec tunnel with the network gateway device. And I will only unblock them when I want to. Otherwise I'm perfectly fine with those device carrying their own GSM SIM or other long-range RF circuit and communicating with an external network when I have agreed to a policy to allow that and/or have explicitly enabled that functionality. I personally prefer for devices to rendezvous in public network space where they are considered as potentially hostile to each other.

I believe that the notion of by-default privileged mutual access for an arbitrary hodgepodge of devices by the sole fact that they are plugged into the same network is asking for trouble. Tricking devices into downloading and executing malicious payloads will be the favorite mass-exploitation vector for getting a local bridgehead into the home. Going through a local hub will help with that, but that will require that all devices will use it, which I consider wishful thinking at best. My second-most favorite vector and the one with the potential to inflict direct physical or monetary harm is parking a van in front of the house and going straight through poorly protected local radio traffic based on flawed standards with weak protection of which there are still many in home automation. That's something not out of reach for a skilled stalker or would-be burglar or a private investigator doing a "background check". So now you've got someone on the "Intranet".

I believe in the model of having federations of local and external gateways help with protecting and governing access to devices and laid this out in my previous post in great detail. But I also believe that we can't trust any of the devices we bring home from the store and that a notion of "Intranet" is naively dangerous and will become worse as we connect more devices. The privacy issue is one we need to tackle by (self-) regulatory means and by establishing a model that allows consumers to make informed decisions whether a product is trustworthy and we need to establish measures to audit this and also sanction violations. Privacy is not nearly as easy as cloud and local. Privacy is about trust, trustworthiness, and betrayal.


There is good reason to be worried about the "Internet of Things" on current course and trajectory. Both the IT industry as well as manufacturers of "smart products" seem to look at connected special-purpose devices and sensors as a mere variation of the information technology assets like servers, PCs, tablets, or phones. That stance is problematic as it neglects important differences between the kinds of interactions that we're having with a phone or PC, and the interactions we're having with a special-purpose devices like a gas valve, a water heater, a glass-break sensor, a vehicle immobilizer, or a key fob.

Before I get to a proposal for how to address the differences, let's take a look at the state of things on the Web and elsewhere.

Information Devices

PCs, phones, and tablets are primarily interactive information devices. Phones and tablets are explicitly optimized around maximizing battery lifetime, and they preferably turn off partially when not immediately interacting with a person, or when not providing services like playing music or guiding their owner to a particular location. From a systems perspective, these information technology devices are largely acting as proxies towards people. They are "people actuators" suggesting actions and "people sensors" collecting input.

People can, for the most part, tell when something is grossly silly and/or could even put them into a dangerous situation. Even though there is precedent of someone driving off a cliff when told to do so by their navigation system, those cases are the rarest exceptions.

Their role as information gathering devices allowing people to browse the Web and to use a broad variety of services, requires these devices to be "promiscuous" towards network services. The design of the Web, our key information tool, centers on aggregating, combining, and cross referencing information from a myriad of different systems. As a result, the Web's foundation for secure communication is aligned with the goal of this architecture. At the transport protocol level, Web security largely focuses on providing confidentiality and integrity for fairly short-lived connections.

User authentication and authorization are layered on top, mostly at the application layer. The basic transport layer security model, including server authentication, builds on a notion of federated trust anchored in everyone (implicitly and largely involuntarily) trusting in a dozen handfuls of certification authorities (CA) chosen by their favorite operating system or browser vendor. If one of those CAs deems an organization trustworthy, it can issue a certificate that will then be used to facilitate secure connections, also meaning to express an assurance to the user that they are indeed talking to the site they expect to be talking to. To that end, the certificate can be inspected by the user. If they know and care where to look.

This federated trust system is not without issues. First, if the signing key of one of the certification authorities were to be compromised, potentially undetected, whoever is in possession of the key can now make technically authentic and yet forged certificates and use those to intercept and log communication that is meant to be protected. Second, the system is fairly corrupt as it takes all of $3 per year to buy a certification authority's trust with minimal documentation requirements. Third, the vast majority of users have no idea that this system even exists.

Yet, it all somehow works out halfway acceptably, because people do, for the most part, have common sense enough to know when something's not quite right, and it takes quite a bit of work to trick people into scams in huge numbers. You will trap a few victims, but not very many and not for very long. The system is flawed and some people get tricked, but that can also happen at the street corner. Ultimately, the worst that can happen – without any intent to belittle the consequences – is that people get separated from some of their money, or their identities get abused until the situation is corrected by intervention and, often, some insurance steps in to rectify these not entirely unexpected damages.

Special-Purpose Devices

Special-purpose devices, from simple temperature sensors to complex factory production lines with thousands of components inside them are different. The devices are much more scoped in purpose and even if they may provide some level of a people interface, they're largely scoped to interfacing with assets in the physical world. They measure and report environmental circumstances, turn valves, control servos, sound alarms, switch lights, and do many other tasks. They help doing work for which an information device is either too generic, too expensive, too big, or too brittle.

If something goes wrong with automated or remote controllable devices that can influence the physical world, buildings may burn down and people may die. That's a different class of damage than someone maxing out a stolen credit-card's limit. The security bar for commands that make things move, and also for sensor data that eventually results in commands that cause things to move, ought to be, arguably, higher than in an e-commerce or banking scenario.

What doesn't help on the security front is that machines, unlike most people, don't have a ton of common sense. A device that goes about its day in its programmed and scheduled ways has no notion of figuring when something it not quite right. If you can trick a device into talking to a malicious server or intermediary, or into following a network protocol redirection to one, it'll dutifully continue doing its work unless it's explicitly told to never do so.

Herein lies one of the challenges. A lot of today's network programming stacks and Web protocols are geared towards the information-oriented Web and excellently enable building promiscuous clients by default. In fact, the whole notion of REST rests on the assumption that the discovery and traversal of resources is performed though hypertext links included in the returned data. As the Web stacks are geared towards that model, there is extra work required to make a Web client faithful to a particular service and to validate, for instance, the signature thumbnail of the TLS certificate returned by the permitted servers. As long as you get to interact with the web stack directly, that's usually okay, but the more magic libraries you use on top of the Web stack basics, the harder that might get. And you have, of course, and not to be underestimated in complexity, to teach the device the right thumbnail(s) and thus effectively manage and distribute an allow-list.

Generally, device operators will not want to allow unobserved and non-interactive devices that emit telemetry and receive remote commands to be able to stray from a very well-defined set of services they're peered with. They should not be promiscuous. Quite the opposite.

Now – if the design goal is to peer a device with a particular service, the federated certificate circus turns into more of a burden than being a desired protocol-suite feature. As the basic assumptions about promiscuity towards services are turned on their head, the 3-6 KByte and 2 network roundtrips of certificate exchange chatter slow things down and also may cost quite a bit of real money paying for precious, metered wireless data volume. Even though everyone currently seems to assume Transport Layer Security (TLS) being the only secure channel protocol we'll ever need, it's far from being ideal for the 'faithful' connected devices scenario.

If you allow me to take you into the protocol basement for a second: That may be somewhat different if we could seed clients with TLS RFC5077 session resumption tickets in an out-of-band fashion, and have a TLS mode that never falls back to certs. Alas, we do not.

Bi-Directional Addressing

Connected and non-interactive devices not only differ in terms of the depth of their relationship with backend services, they also differ very much in terms of the interaction patterns with these services when compared to information-centric devices. I generally classify the interaction patterns for special-purpose devices into the categories Telemetry, Inquiries, Commands, and Notifications.

  • Telemetry is unidirectionally flowing information which the device volunteers to a collecting service, either on a schedule or based on particular circumstances. That information represents the current or temporally aggregated state of the device or the state of its environment, like readings from sensors that are associated with it.
  • With Inquiries, the device solicits information about the state of the world beyond its own reach and based on its current needs; an inquiry can be a singular request, but might also ask a service to supply ongoing updates about a particular information scope. A vehicle might supply a set of geo-coordinates for a route and ask for continuous traffic alert updates about particular route until it arrives at the destination.
  • Commands are service-initiated instructions sent to the device. Commands can tell a device to provide information about its state, or to change the state of the device, including activities with effects on the physical world. That includes, for instance, sending a command from a smartphone app to unlock the doors of your vehicle, whereby the command first flows to an intermediating service and from there it's routed to the vehicle's onboard control system.
  • Notifications are one-way, service-initiated messages that inform a device or a group of devices about some environmental state they'll otherwise not be aware of. Wind parks will be fed weather forecast information and cities may broadcast information about air pollution, suggesting fossil-fueled systems to throttle CO2 output or a vehicle may want to show weather or news alerts or text messages to the driver.

While Telemetry and Inquiries are device-initiated, their mirrored pattern counterparts, Command and Notifications, are service-initiated – which means that there must be a network path for messages to flow from the service to the device and that requirement bubbles up a set of important technical questions:

  • How can I address a device on a network in order to route commands and notifications to it?
  • How can I address a roaming and/or mobile device on a network in order to route commands and notifications to it?
  • How can I address a power constrained device on a network in order to route commands and notifications to it?
  • How can I send commands or notifications with latency that's acceptable for my scenario?
  • How can I ensure that the device only accepts legitimate commands and trustworthy notifications?
  • How can I ensure that the device is not easily susceptible to denial-of-service attacks that render it inoperable towards the greater system? (not good for building security sensors, for instance)
  • How can I do this with several 100,000 or millions of devices attached to a telemetry and control system?

Most current approaches that I'm running into are trying to answer the basic addressing question with traditional network techniques. That means that the device either gets a public network address or it is made part of a virtual network and then listens for incoming traffic using that address, acting like a server. For using public addresses the available options are to give the device a proper public IPv4 or IPv6 address or to map it uniquely to a well-known port on a network address translation (NAT) gateway that has a public address. As the available pool of IPv4 addresses has been exhausted and network operators are increasingly under pressure to move towards providing subscribers with IPv6 addresses, there's hope that every device could eventually have its very own routable IPv6 address. The virtual network approach is somewhat similar, but relies on the device first connecting to some virtual network gateway via the underlying native network, and then getting an address assigned within the scope of the virtual network, which it shares with the control system that will use the virtual network address to get to the device.

Both of those approaches are reasonable from the perspective of answering the first, basic addressing question raised above, and if you pretend for a moment that opening inbound ports through a residential edge firewall is acceptable. However, things get tricky enough once we start considering the other questions, like devices not being in the house, but on the road.

Roaming is tricky for addressing and even trickier if the device is switching networks or even fully mobile and thus hopping through networks and occasionally dropping connections as it gets out of radio range. While there are "Mobile IP" roaming standards for both IPv4 (RFC3344) and IPv6 (RFC6275), but those standards rely on a notion of traffic relaying through agents and those are problematic at scale with very large device populations as the relay will have to manage and relay traffic for very many routes and also needs to keep track of the devices hopping foreign networks. Relaying obviously also has significant latency implications with global roaming. What even the best implementations of these standards-based approaches for roaming can't solve is that you can't connect to a device that's outside of radio coverage and therefore not connected, at all.

The very same applies to the challenge of how to reliably deliver commands and notifications to power-constrained devices. Those devices may need to survive on battery power for extended periods (in some cases for years) between battery recharges, or their external power source, like "power stealing" circuits employed in home building automation devices, may not yield sufficient power for sustained radio connectivity to a base station. Even a vehicle battery isn't going to like powering an always-on radio when parked in the long-term airport garage while you're on vacation for 2 weeks.

So if a device design aims to conserve power by only running the radio occasionally or if the device is mobile and frequently in and out of radio coverage or hopping networks, it gets increasingly difficult to reach it naively by opening a network connection to it and then hoping for that to remain stable if you're lucky enough to catch a moment when the device is indeed ready to talk. That's all even assuming that the device were indeed having a stable network address provided by one of the cited "Mobile IP" standards, or the device was registering with an address registration/lookup service every time it comes online with a new address so that the control service can locate it.

All these approaches aiming to provide end-to-end network routes between devices and their control services are almost necessarily brittle. As it tries to execute a command, the service needs to locate the device, establish a connection to it, issue the command and collect the command feedback all while, say, a vehicle drives through a series of tunnels. Not only does this model rely on the device being online and available at the required moment, it also introduces a high number of tricky-to-diagnose failure points (such as the device flipping networks right after the service resolved its address) with associated security implications (who gets that newly orphaned address next?), it also has inherent reliability issues at the application layer since all faults that occur after the control system has sent the command, do introduce doubt in the control system on whether the command could be successfully executed; and not all commands are safe to just blindly retry, especially when they have physical consequences.

For stationary power constrained or wirelessly connected devices, the common approach to bridging the last meters/yards is a hub device that's wired to the main network and can bridge to the devices that live on a local network. The WLAN hub(s) in many homes and buildings are examples of this as there is obviously a need to bridge between devices roaming around house and the ISP network. From an addressing perspective, these hubs don't change the general challenge much as they themselves need to be addressable for commands they then ought to forward to the targeted device and that means you're still opening up a hole in the residential firewall, either by explicit configuration or via (don't do this) UPnP.

If all this isn't yet challenging enough for your taste, there's still security. Sadly, we can't have nice and simple things without someone trying to exploit them for malice or stupid "fun".

Trustworthy Communication

All information that's being received from and sent to a device must be trustworthy if anything depends on that information – and why would you send it otherwise? "Trustworthy communication" means that information is of verifiable origin, correct, unaltered, timely, and cannot be abused by unauthorized parties in any fashion. Even telemetry from a simple sensor that reports a room's temperature every five minutes can't be left unsecured. If you have a control system reacting on that input or do anything else with that data, the device and the communication paths from and to it must be trustworthy.

"Why would anyone hack temperature sensors?" – sometimes "because they can", sometimes because they want to inflict monetary harm on the operator or physical harm on the facility and what's in it. Neglecting to protect even one communication path in a system opens it up for manipulation and consequential harm.

If you want to believe in the often-cited projection of 50 billion connected devices by 2020, the vast majority of those will not be classic information devices, and they will not be $500 or even $200 gadgets. Very many of these connected devices will rather be common consumer or industry goods that have been enriched with digital service capabilities. Or they might even just be super inexpensive sensors hung off the side of buildings to collect environmental information. Unlike apps on information devices, most of these services will have auxiliary functions. Some of these capabilities may be even be largely invisible. If you have a device with built-in telemetry delivery that allows the manufacturer or service provider to sense an oncoming failure and proactively get in touch with you for service – which is something manufacturers plan to do – and then the device just never breaks, you may even not know such a capability exists, especially if the device doesn't rely on connectivity through your own network. In most cases, these digital services will have to be priced into the purchase price of the product or even be monetized through companion apps and services as it seems unlikely that consumers will pay for 20 different monthly subscriptions connected appliances. It's also reasonable to expect that many devices sold will have the ability to connect, but their users will never intentionally take advantage these features.

On the cost side, a necessary result from all this is that the logic built into many products will (continue to) use microcontrollers that require little power, have small footprint, and are significantly less expensive than the high-powered processors and ample memory in today's information devices – trading compute power for much reduced cost. But trading compute power and memory for cost savings also means trading cryptographic capability and more generally resilience against potential attacks away.

The horror-story meme "if you're deep into the forest nobody will hear your screams" is perfectly applicable to unobserved field-deployed devices under attack. If a device were to listen for unsolicited traffic, meaning it listens for incoming TCP connections or UDP datagrams or some form of UDP-datagram based sessions and thus acting as server, it would have to accept and then triage those connection attempts into legitimate and illegitimate ones.

With TCP, even enticing the device to accept a connection is already a very fine attack vector, because a TCP connection burns memory in form of a receive buffer. So if the device were to use a network protocol circuit like, for instance, the WizNet W5100 used one the popular enthusiast tinker platform Arduino Ethernet, the device's communication capability is saturated at just 4 connections, which an attacker could then service in a slow byte-per-packet fashion and thus effectively take the device out. As that happens, the device now also wouldn't have a path to scream for help through, unless it made – assuming the circuit supports it – an a priori reservation of resources for an outbound connection to whoever plays the cavalry.

If we were to leave the TCP-based resource exhaustion vector out of the picture, the next hurdle is to establish a secure baseline over the connection and then triaging connections into good and bad. As the protocol world stands, TLS (RFC5246) and DTLS (RFC6347) are the kings of the security protocol hill and I've discussed the issues with their inherent client promiscuity assumption above. If we were indeed connecting from a control service to a device in an outbound fashion, and the device were to act as server, the model may be somewhat suitable as the control service will indeed have to speak to very many and potentially millions of devices. But contrary to the Web model where the browser has no idea where the user will send it, the control system has a very firm notion of the devices it wants to speak to. There are many of those, but there no promiscuity going on. If they play server, each device needs to have its own PKI certificate (there is a specified option to use TLS without certificates, but that does not matter much in practice) with their own private key since they're acting as servers and since you can't leak shared private keys into untrusted physical space, which is where most of the devices will end up living.

The strategy of using the standard TLS model and having the device play server has a number of consequences. First, whoever provisions the devices will have to be a root or intermediate PKI certification authority. That's easy to do, unless there were any need to tie into the grand PKI trust federation of today's Web, which is largely anchored in the root certificate store contents of today's dominant client platforms. If you had the notion that "Internet of Things" were to mean that every device could be a web server to everyone, you would have to buy yourself into the elite circle of intermediate CA authorities by purchasing the necessarily signing certificates or services from a trusted CA and that may end up being fairly expensive as the oligopoly is protective of their revenues. Second, those certificates need to be renewed and the renewed ones need to be distributed securely. And when devices get stolen or compromised or the customer opts out of the service, these certificates also need to get revoked and that revocation service needs to be managed and run and will have to be consulted quite a bit.

Also, the standard configuration of most application protocol stacks' usage of TLS tie into DNS for certificate validation, and it's not obvious that DNS is the best choice for associating name and network address for devices that rapidly hop networks when roaming – unless of course you had a stable "home network" address as per the IPv6 Mobile IP. But that would mean you are now running an IPv6 Mobile relay. The alternative is to validate the certificate by some other means, but then you'll be using a different validation criterion in the certificate subject and will no longer be aligned with the grand PKI trust federation model. Thus, you'll be are back to effectively managing an isolated PKI infrastructure, with all the bells and whistles like a revocation service, and you will do so while you're looking for the exact opposite of the promiscuous security session model all that enables.

Let's still assume none of that would matter and (D)TLS with PKI dragged in its wake were okay and the device could use those and indeed act as a server accepting inbound connections. Then we're still faced with the fact that cryptography computation is not cheap. Moving crypto into hardware is very possible, but impacts the device cost. Doing crypto in software requires that the device deals with it inside of the application or underlying frameworks. And for a microcontroller that costs a few dollars that's non-negligible work. So the next vector to keep the device from doing its actual work is to keep it busy with crypto. Present it with untrusted or falsely signed client certificates (if it were to expect those). Create a TLS link (even IPSec) and abandon it right after the handshake. Nice ways to burn some Watts.

Let's still pretend none of this were a problem. We're now up at the application level with transport layer security underneath. Who is authorized to talk to the device and which of the connections that pop up through that transport layer are legitimate? And if there is an illegitimate connection attempt, where do you log these and if that happens a thousand times a minute, where do you hold the log and how do you even scream for help if you're pegged on compute by crypto? Are you keeping an account store in the device? Quite certainly not in a system whose scope is more than one device. Are you then relying on an external authentication and authorization authority issuing authorization tokens? That's more likely, but then you're already running a token server.

The truth, however inconvenient, is that non-interactive special-purpose devices residing in untrusted physical spaces are, without getting external help from services, essentially indefensible as when acting as network servers. And this is all just on top of the basic fact that devices that live in untrusted physical space are generally susceptible to physical exploitation and that protecting secrets like key material is generally difficult.

Here's the recipe to eradicate most of the mess I've laid out so far: Devices don't actively listen on the network for inbound connections. Devices act as clients. Mostly.

Link vs. Network vs. Transport vs. Application

What I've discussed so far are considerations around the Network and Transport layers (RFC1122, 1.1.3) as I'm making a few general assumptions about connectivity between devices and control and telemetry collections systems, as well as about the connectivity between devices when they're talking in a peer-to-peer fashion.

First, I have so far assumed that devices talk to other systems and devices through a routable (inter-)network infrastructure whose scope goes beyond a single Ethernet hub, WLAN hotspot, Bluetooth PAN, or cellular network tower. Therefore I am also assuming the usage of the only viable routable network protocol suite and that is the Internet Protocol (v4 and v6) and with that the common overlaid transport protocols UDP and TCP.

Second, I have so far assumed that the devices establish a transport-level and then also application-level network relationship with their communication peers, meaning that the device commits resources to accepting, preprocessing, and then maintaining the connection or relationship. That is specifically true for TCP connections (and anything riding on top of it), but is also true for Network-level links like IPSec and session-inducing protocols overlaid over UDP, such as setting up agreements to secure subsequent datagrams as with DTLS.

The reason for assuming a standards-based Network and Transport protocol layer is that everything at the Link Layer (including physical bits on wire or through space) is quite the zoo, and one that I see growing rather than shrinking. The Link Layer will likely continue to be a space of massive proprietary innovation around creative use of radio frequencies, even beyond what we've seen in cellular network technology where bandwidth from basic GSM's 9.6Kbit/s to today's 100+ MBit/s on LTE in the last 25 years. There are initiatives to leverage new "white space" spectrums opened up by the shutdown of Analog TV, and there are services leveraging ISM frequency bands, and there might be well-funded contenders for licensed spectrum emerging that use wholly new stacks. There is also plenty of action on the short-range radio front, specifically also around suitable protocols for ultra-low power devices. And there are obviously also many "wired" transport options over fiber and copper that have made significant progress and will continue to do so and are essential for device scenarios, often in conjunction with a short-range radio hop for the last few meters/yards. Just as much as it was a losing gamble to specifically bet on TokenRing or ARCnet over Ethernet in the early days of Local Area Networking, it isn't yet clear what to bet on in terms of protocols and communication service infrastructures as the winners for the "Internet of Things", not even today's mobile network operators.

Betting on a particular link technology for inter-device communication is obviously reasonable for many scenarios where the network is naturally scoped by physical means like reach by ways of radio frequency and transmission power, the devices are homogeneous and follow a common and often regulation-imposed standard, and latency requirements are very narrow, bandwidth requirements are very high, or there is no tolerance for failure of intermediaries. Examples for this are in-house device networks for home automation and security, emerging standards for Vehicle-To-Vehicle (V2V) and Vehicle-To-Infrastructure (V2I) communication, or Automatic Dependent Surveillance (ADS, mostly ADS-B) in Air Traffic Control. Those digital radio protocols essentially form peer meshes where everyone listens to everything in range and filters out what they find interesting or addressed specifically at them. And if the use of the frequencies gets particularly busy, coordinated protocols impose time slices on senders.

What such link-layer or direct radio information transfers have generally struggled with is trustworthiness – allow me to repeat: verifiable origin, correct, unaltered, timely, and cannot be abused by unauthorized parties in any fashion.

Of course, by its nature, all radio based communication is vulnerable to jamming and spoofing, which has a grand colorful military history as an offensive or defensive electronic warfare measure along with fitting countermeasures (ECM) and even counter-countermeasures (ECCM). Radio is also, especially when used in an uncoordinated fashion, subject to unintended interference and therefore distortion.

ADS-B, which is meant to replace radar in Air Traffic Control doesn't even have any security features in its protocol. The stance of the FAA is that they will detect spoofing by triangulation of the signals, meaning they can tell whether a plane that say it's at a particular position is actually there. We should assume they have done their ECM and ECCM homework.

IEEE 1609 for Wireless Access in Vehicular Environments that's aiming to facilitate ad-hoc V2V and V2I communication, spells out an elaborate scheme to manage and use and roll X.509 certificates, but relies on the broad distribution of certificate revocation lists to ban once-issued certificates from the system. Vehicles are sold, have the telematics units replaced due to malfunction or crash damage, may be tampered with, or might be stolen. I can see the PKI's generally overly optimistic stance on revocations being challenging at the scale of tens if not hundreds of million vehicles, where churn will be very significant. The Online Certificate Status Protocol (OCSP, RFC6960) might help IEEE 1609 deal with the looming CRL caching issues due to size, but then requires very scalable validation server infrastructure that needs to be reachable whenever two vehicles want to talk, which is also not acceptable.

Local radio link protocols such as Bluetooth, WLAN (802.11x with 802.11i/WPA2-PSK), or Zigbee often assume that participants in a local link network share a common secret, and can keep that secret secret. If the secret leaks, all participants need to be rolled over to a new key. IEEE 802.1X, which is the foundation for the RADIUS Authentication and Authorization of participants in a network, and the basis of "WPA2 Enterprise" offers a way out of the dilemma of either having to rely on a federated trust scheme that has a hard time dealing with revocations of trust at scale, or on brittle pre-shared keys. 802.1X introduces the notion of an Authentication (and Authorization) server, which is a neutral third party that makes decisions about who gets to access the network.

Unfortunately, many local radio link protocols are not only weak at managing access, they also have a broad history of having weak traffic protection. WLAN's issues got largely cleaned up with WPA2, but there are plenty of examples across radio link protocols where the broken WEP model or equivalent schemes are in active use, or the picture is even worse. Regarding the inherent security of cellular network link-level protection, it ought to be sufficient to look at the recent scolding of politicians in Europe for their absent-mindedness to use regular GSM/UMTS phones without extra protection measures – and the seemingly obvious result of dead-easy eavesdropping by foreign intelligence services. Ironically, mobile operators make some handsome revenue by selling "private access points" (private APNs) that terminate cellular device data traffic in a VPN and that the customer then tunnels into across the hostile Internet to meet the devices on this fenced-off network, somehow pretending that the mobile network somehow isn't just another operator-managed public network and therefore more trustworthy.

Link-layer protection mechanisms are largely only suitable for keeping unauthorized local participants (i.e. intruders) from getting link-layer data frames up to any higher-level network logic. In link-layer-scoped peer-to-peer network environments, the line between link-layer data frames and what's being propagated to the application is largely blurred, but the previous observation stays true. Even if employed, link-layer security mechanisms are not much help on providing security on the network and transport layers, as many companies are learning the hard way when worms and other exploits sweep through the inside of their triply-firewalled, WPA2 protected, TPM-tied-IPSec-protected networks, or as travelers can learn when they don't have a local firewall up on their machine or use plaintext communication when connecting to the public network at a café, airport, or hotel.

Of course, the insight of public networks not being trustworthy has led many companies interconnecting sites and devices down the path of using virtual private network (VPN) technology. VPN technology, especially when coming in the form of a shiny appliance, makes it very easy to put a network tunnel terminator on either end of a communication path made up of a chain of untrustworthy links and networks. The terminator on either end conveniently surfaces up as a link-layer network adapter. VPN can fuse multiple sites into a single link-layer network and it is a fantastic technology for that. But like all the other technologies I discussed above, link-layer protection is a zoning mechanism, the security mechanisms that matter to protect digital assets and devices sit at the layers above it. There is no "S" for Security in "VPN". VPN has secure virtual network cables, it doesn't make the virtual hub more secure that they plug into. Also, in the context of small devices as discussed above, VPN is effectively a non-starter due to its complexity.

What none of these link-layer protection mechanisms help with, including VPN, is to establish any notion of authentication and authorization beyond their immediate scope. A network application that sits on the other end of a TCP socket, where a portion of the route is facilitated by any of these link layer mechanisms, is and must be oblivious to their existence. What matters for the trustworthiness of the information that travels from the logic on the device to a remote control system not residing on the same network, as well as for commands that travel back up to the device, is solely a fully protected end-to-end communication path spanning networks, where the identity of the parties is established at the application layer, and nothing else. The protection of the route at the transport layer by ways of signature and encryption is established as a service for the application layer either after the application has given its permission (e.g. certificate validation hooks) or just before the application layer performs an authorization handshake, prior entering into any conversations. Establishing end-to-end trust is the job of application infrastructure and services, not of networks.

Service Assisted Communication

The findings from this discussion so far can be summarized in a few points:

  • Remote controllable special-purpose devices have a fundamentally different relationship to network services compared to information devices like phones and tablets and require an approach to security that enables exclusive peering with a set of services or a gateway.
  • Devices that take a naïve approach to connectivity by acting like servers and expecting to accept inbound connections pose a number of network-related issues around addressing and naming, and even greater problems around security, exposing themselves to a broad range of attack vectors.
  • Link-layer security measures have varying effectiveness at protecting communication between devices at a single network scope, but none is sufficient to provide a trustworthy communication path between the device and a cloud-based control system or application gateway.
  • The PKI trust model is fundamentally flawed in a variety of ways, including being too static and geared towards long-lived certificates, and it's too optimistic about how well certificates are and can be protected by their bearers. Its use in the TLS context specifically enables the promiscuous client model, which is the opposite of the desired model for special-purpose devices.
  • Approaches to security that provide a reasonable balance between system throughput, scalability, and security protection are generally relying on third party network services that validates user credentials against a central pool, issues security tokens, or validates assurances made by an authority for their continued validity.

The conclusion I draw from these findings is an approach I call "Service Assisted Communication" (SAC). I'm not at all claiming the principles and techniques being an invention, as most are already broadly implemented and used. But I do believe there is value in putting them together here and to give them a name so that they can be effectively juxtaposed with the approaches I've discussed above.

The goal of Service Assisted Communication is to establishing trustworthy and bi-directional communication paths between control systems and special-purpose devices that are deployed in untrusted physical space. To that end, the following principles are established:

  • Security trumps all other capabilities. If you can't implement a capability securely, you must not implement it. You identify threats and mitigate them or you don't ship product. If you employ a mitigation without knowing what the threat is you don't ship product, either.
  • Devices do not accept unsolicited network information. All connections and routes are established in an outbound-only fashion.
  • Devices generally only connect to or establish routes to well-known services that they are peered with. In case they need to feed information to or receive commands from a multitude of services, devices are peered with a gateway that takes care of routing information downstream, and ensuring that commands are only accepted from authorized parties before routing them to the device
  • The communication path between device and service or device and gateway is secured at the application protocol layer, mutually authenticating the device to the service or gateway and vice versa. Device applications do not trust the link-layer network
  • System-level authorization and authentication must be based on per-device identities, and access credentials and permissions must be near-instantly revocable in case of device abuse.
  • Bi-directional communication for devices that are connected sporadically due to power or connectivity concerns may be facilitated through holding commands and notifications to the devices until they connect to pick those up.
  • Application payload data may be separately secured for protected transit through gateways to a particular service

The manifestation of these principles is the simple diagram on the right. Devices generally live in local networks with limited scope. Those networks are reasonably secured, with link-layer access control mechanisms, against intrusion to prevent low-level brute-force attacks such as flooding them with packets and, for that purpose, also employ traffic protection. The devices will obviously observe link-layer traffic in order to triage out solicited traffic, but they do not react to unsolicited connection attempts that would cause any sort of work or resource consumption from the network layer on up.

All connections to and from the device are made via or at least facilitated via a gateway, unless the device is peered with a single service, in which case that service takes on the role of the gateway. Eventual peer-to-peer connections are acceptable, but only if the gateway permits them and facilitates a secure handshake. The gateway that the device peers with may live on the local network and thus govern local connections. Towards external networks, the local gateway acts as a bridge towards the devices and is itself connected by the same set of principles discussed here, meaning it's acting like a device connected to an external gateway.

When the device connects to an external gateway, it does so by creating and maintaining an outbound TCP socket across a network address translation boundary (RFC2663), or by establishing a bi-directional UDP route, potentially utilizing the RFC5389 session traversal utilities for NAT, aka STUN. Even though I shouldn't have to, I will explicitly note that the WebSocket protocol (RFC6455) rides on top of TCP and gets its bi-directional flow capability from there. There's quite a bit of bizarre information on the Interwebs on how the WebSocket protocol somehow newly and uniquely enables bi-directional communication, which is obviously rubbish. What it does is to allow port-sharing, so that WebSocket aware protocols can share the standard HTTP/S ports 80 (RFC2616) and 443 (RFC2818) with regular web traffic and also piggyback on the respective firewall and proxy permissions for web traffic. The in-progress HTTP 2.0 specification will expand this capability further.

By only relying on outbound connectivity, the NAT/Firewall device at the edge of the local network will never have to be opened up for any unsolicited inbound traffic.

The outbound connection or route is maintained by either client or gateway in a fashion that intermediaries such as NATs will not drop it due to inactivity. That means that either side might send some form of a keep-alive packet periodically, or even better sends a payload packet periodically that then doubles as a keep-alive packet. Under most circumstances it will be preferable for the device to send keep-alive traffic as it is the originator of the connection or route and can and should react to a failure by establishing a new one.

As TCP connections are endpoint concepts, a connection will only be declared dead if the route is considered collapsed and the detection of this fact requires packet flow. A device and its gateway may therefore sit idle for quite a while believing that the route and connection is still intact before the lack of acknowledgement of the next packet confirms that assumption is incorrect. There is a tricky tradeoff decision to be made here. So-called carrier-grade NATs (or Large Scale NAT) employed by mobile network operators permit very long periods of connection inactivity and mobile devices that get direct IPv6 address allocations are not forced through a NAT, at all. The push notification mechanisms employed by all popular Smartphone platforms utilize this to dramatically reduce the power consumption of the devices by maintaining the route very infrequently, once every 20 minutes or more, and therefore being able to largely remain in sleep mode with most systems turned off while idly waiting for payload traffic. The downside of infrequent keep-alive traffic is that the time to detection of a bad route is, in the worst-case, as long as the keep-alive interval. Ultimately it's a tradeoff between battery-power and traffic-volume cost (on metered subscriptions) and acceptable latency for commands and notifications in case of failures. The device can obviously be proactive in detecting potential issues and abandon the connection and create a new one when, for instance, it hops to a different network or when it recovers from signal loss.

The connection from the device to the gateway is protected end-to-end and ignoring any underlying link-level protection measures. The gateway authenticates with the device and the device authenticates with the gateway, so neither is anonymous towards the other. In the simplest case, this can occur through the exchange of some proof of possession of a previously shared key. It can also happen via a (heavy) X.509 certificate exchange as performed by TLS, or a combination of a TLS handshake with server authentication where the device subsequently supplies credentials or an authorization token at the application level. The privacy and integrity protection of the route is also established end-to-end, ideally as a byproduct of the authentication handshake so that a potential attacker cannot waste cryptographic resources on either side without producing proof of authorization.

The current reality is that we don't have many if any serious alternatives to TLS/DTLS or SSH for securing this application-level connection or route today. TLS is far from being a perfect fit for many reasons I laid out here, and not least because of the weight in footprint and compute effort of the TLS stack which is too heavy for inexpensive circuitry. SSH is a reasonable alternative from the existing popular protocol suites, but suffers from lack of a standardized session resumption gesture. My hope is that we as an industry fix either of those to make it a better fit for the connected devices scenarios or we come up with something better. Here's a summary of criteria.

The result of the application-level handshake is a secure peer connection between the device and a gateway that only the gateway can feed. The gateway can, in turn, now provide one or even several different APIs and protocol surfaces, that can be translated to the primary bi-directional protocol used by the device. The gateway also provides the device with a stable address in form of an address projected onto the gateway's protocol surface and therefore also with location transparency and location hiding.

The device could only speak AMQP or MQTT or some proprietary protocol, and yet have a full HTTP/REST interface projection at the gateway, with the gateway taking care of the required translation and also of enrichment where responses from the device can be augmented with reference data, for instance. The device can connect from any context and can even switch contexts, yet its projection into the gateway and its address remains completely stable. The gateway can also be federated with external identity and authorization services, so that only callers acting on behalf of particular users or systems can invoke particular device functions. The gateway therefore provides basic network defense, API virtualization, and authorization services all combined into in one.

The gateway model gets even better when it includes or is based on an intermediary messaging infrastructure that provides a scalable queuing model for both ingress and egress traffic.

Without this intermediary infrastructure, the gateway approach would still suffer from the issue that devices must be online and available to receive commands and notifications when the control system sends them. With a per-device queue or per-device subscription on a publish/subscribe infrastructure, the control system can drop a command at any time, and the device can pick it up whenever it's online. If the queue provides time-to-live expiration alongside a dead-lettering mechanism for such expired messages, the control system can also know immediately when a message has not been picked up and processed by the device in the allotted time.

The queue also ensures that the device can never be overtaxed with commands or notifications. The device maintains one connection into the gateway and it fetches commands and notifications on its own schedule. Any backlog forms in the gateway and can be handled there accordingly. The gateway can start rejecting commands on the device's behalf if the backlog grows beyond a threshold or the cited expiration mechanism kicks in and the control system gets notified that the command cannot be processed at the moment.

On the ingress-side (from the gateway perspective) using a queue has the same kind of advantages for the backend systems. If devices are connected at scale and input from the devices comes in bursts or has significant spikes around certain hours of the day as with telematics systems in passenger cars during rush-hour, having the gateway deal with the traffic spikes is a great idea to keep the backend system robust. The ingestion queue also allows telemetry and other data to be held temporarily when the backend systems or their dependencies are taken down for service or suffer from service degradation of any kind. You can find more on the usage of brokered messaging infrastructures for these scenarios in a MSDN Magazine article I wrote a year back.


An "Internet of Things" where devices reside in unprotected physical space and where they can interact with the physical world is a very scary proposition if we solely rely on naïve link and network-level approaches to connectivity and security, which are the two deeply interwoven core aspects of the "I" in "IoT". Special-purpose devices don't benefit from constant human oversight as phones and tablets and PCs do, and we struggle even to keep those secure. We have to do a better job, as an industry, to keep the devices secure that we want to install in the world without constant supervision.

"Trustworthy communication" means that information exchanged between devices and control systems is of verifiable origin, correct, unaltered, timely, and cannot be abused by unauthorized parties in any fashion. Such trust cannot be established at scale without employing systems that are designed for the purpose and keep the "bad guys" out. If we want smarter devices around us that helping to improve our lives and are yet power efficient and affordable, we can't leave them alone in untrustworthy physical space taking care of their own defenses, because they won't be able to.

Does this mean that the refrigerator cannot talk to the laundry washing machine on the local network? Yes, that is precisely what that means. Aside from that idea being somewhat ludicrous, how else does the washing machine defend itself from a malicious refrigerator if not by a gateway that can. Devices that are unrelated and are not part of a deeply integrated system meet where they ought to meet: on the open Internet, not "behind the firewall".

Categories: Architecture | Technology

I have an immediate job opening for an open standard or multivendor transport layer security protocol that

  1. does NOT rely on or tie into PKI and especially
  2. doesn’t require the exchange of X.509 certificates for an initial handshake,
  3. supports session resumption, and
  4. can be used with a minimal algorithm suite that is microcontroller friendly (AES-256, SHA-256, ECDH)


  1. For “service assisted connectivity” where a device relies on a gateway to help with any defensive measures from the network layer on up, the device ought to be paired with exactly one (cluster of) gateway(s). Also, an unobserved device should not pose any threat to a network that it is deployed into (see the fridges abused as spam bots or local spies) and therefore outbound communication should be funneled through the gateway as well. TLS/PKI specifically enables promiscuous clients that happily establish sessions with any “trustworthy” (per CA) server, often under direction of an interactive user. Here, I want to pair a device with a gateway, meaning that the peers are known a priori and thus
  2. the certificate exchange is 3-6kb of extra baggage that’s pure overhead if the parties have an existing and well-known peer relationship.
  3. Session resumption is required because devices will get disconnected while roaming and on radio or will temporarily opt to turn off the radio, which might tear sockets. It’s also required because the initial key exchange is computationally very expensive and imposes significant latency overhead due to the extra roundtrips.
  4. Microcontroller based devices are often very constrained with regards to program storage and can’t lug a whole litany of crypto algorithms around. So the protocol must allow for a compliant implementation to only support a small set of algos that can be implemented on MCUs in firmware or in silicone.

Now, TLS 1.2 with a minimal crypto suite profile might actually be suitable if one could cheat around the whole cert exchange and supply clients with an RFC5077 session resumption ticket out-of-band in such a way that it effectively acts as a long-term connection authN/Z token. Alas, you can't. SSH is also a candidate but it doesn't have session resumption.

Ideas? Suggestions? clemensv@microsoft.com or Twitter @clemensv

Categories: Technology

Terminology that loosely ring-fences a group of related technologies is often very helpful in engineering discussions – until the hype machine gets a hold of them. “Cloud” is a fairly obvious victim of this. Initially conceived to describe large-scale, highly-available, geo-redundant, and professionally-managed Internet-based services that are “up there and far away” without the user knowing of or caring about particular machines or even datacenter locations, it’s now come so far that a hard drive manufacturer sells a network attached drive as a “cloud” that allows storing content “safely at home”. Thank you very much. For “cloud”, the dilution of the usefulness of the term took probably a few years and included milestones like the labeling of datacenter virtualization as “private cloud” and more recently the broad relabeling of practically all managed hosting services or even outsourced data center operations as “cloud”.

The term “Internet of Things” is being diluted into near nonsense even faster. It was initially meant to describe, as a sort of visionary lighthouse, the interconnection of sensors and physical devices of all kinds into a network much like the Internet, in order to allow for gaining new insights about and allow new automated interaction with the physical world – juxtaposed with today’s Internet that is primarily oriented towards human-machine interaction. What we’ve ended up with in today’s discussions is that the term has been made synonymous with what I have started to call “Thing on the Internet”.

A refrigerator with a display and a built-in browser that allows browsing the next super-market’s special offers including the ability to order them may be cool (at least on the inside, even when the gadget novelty has worn off), but it’s conceptually and even technically not different from a tablet or phone – and that would even be true if it had a bar code scanner with which one could obsessively check the milk and margarine in and out (in which case professional help may be in order). The same is true for the city guide or weather information functions in a fancy connected car multimedia system or today’s top news headline being burnt into a slice of bread by the mythical Internet toaster. Those things are things on the Internet. They’re the long oxidized fuel of the 1990s dotcom boom and fall. Technically and conceptually boring. Islands. Solved problems.

The challenge is elsewhere.

“Internet of Things” ought to be about internetworked things, about (responsibly) gathering and distributing information from and about the physical world, about temperature and pollution, about heartbeats and blood pressure, about humidity and mineralization, about voltages and amperes, about liquid and gas pressures and volumes, about seismic activity and tides, about velocity, acceleration, and altitude – it’s about learning about the world’s circumstances, drawing conclusions, and then acting on those conclusions, often again affecting the physical world. That may include the “Smart TV”, but not today’s.

The “Internet of Things” isn’t really about things. It’s about systems. It’s about gathering information in certain contexts or even finding out about new contexts and then improving the system as a result. You could, for instance, run a bus line from suburb into town on a sleepy Sunday morning with a promise of no passenger ever waiting for more than, say, 10 minutes, and make public transport vastly more attractive instead of running on a fixed schedule of every 60-90 minutes on that morning, if the bus system only knew where the prospective passengers were and can dynamically dispatch and route a few buses along a loose route.

“Let’s make an app” is today’s knee-jerk approach to realizing such an idea. I would consider it fair if someone were to call that discriminating and elitist as it excludes people too poor to afford a $200 pocket computer with a service plan, as well as many children, and very many elderly people who went through their lives without always-on Internet and have no interest in dealing with it now.

It’s also unnecessary complication, because the bus stop itself can, with a fairly simple (thermographic) camera setup, tell the system whether anyone’s waiting and also easily tell whether they’re actually staying around or end up wandering away, and the system can feed back the currently projected arrival time to a display at the bus stop – which can be reasonably protected against vandalism attempts by shock and glass break sensors triggering alarms as well as remote-recording any such incidents with the camera. The thermographic camera won’t tell us which bus line the prospective passenger wants to take, but a simple button might. It does help easily telling when a rambunctious 10 year-old pushes all the buttons and runs away.

Projecting the bus’ arrival time and planning the optimal route can be aided by city-supplied traffic information collected by induction loops and camera systems in streets and on traffic lights at crossings that can yield statistical projections for days and the time of day as well as ad-hoc data about current traffic disturbances or diversions as well as street conditions due to rain, ice, or fog – which is also supplied by the buses themselves (‘floating car data’) as they’re moving along in traffic. It’s also informed by the bus driver’s shift information, the legal and work-agreement based needs for rest times during the day, as well as the bus’ fuel or battery level, or other operational health parameters that may require a stop at a depot.

All that data informs the computation of the optimal route, which is provided to the bus stops, to the bus (-driver), and those lucky passengers who can afford a $200 pocket computer with a service plan and have asked to be notified when it’s time to leave the corner coffee shop in order to catch the next bus in comfort. What we have in this scenario is a set of bidirectional communication paths from and to bus, bus driver, bus stop, and passengers, aided by sensor data in streets and lights, all connecting up to interconnected set of control and information systems making decisions based on a combination of current input and past experience. Such systems need to ingest, process, and distribute information from and to tens of thousands of sources at the municipal level, and for them to be economically viable for operators and vendors they need to scale across thousands of municipalities. And the scenario I laid just out here is just one slice out of one particular vertical.

Those systems are hard, complex, and pose challenges in terms of system capacity, scalability, reliability, and – towering above all – security that are at the cutting edge or often still beyond the combined manufacturing (and IT) industry’s current abilities and maturity.

“Internet of Things” is not about having a little Arduino box fetch the TV schedule and sounding an alarm when your favorite show is coming on.

That is cool, but it’s just a thing on the Internet.

Categories: Architecture

Just replied yet again to someone whose customer thinks they're adding security by blocking outbound network traffic to cloud services using IP-based allow-lists. They don't.

Service Bus and many other cloud services are multitenant systems that are shared across a range of customers. The IP addresses we assign come from a pool and that pool shifts as we optimize traffic from and to datacenters. We may also move clusters between datacenters within one region for disaster recovery, should that be necessary. The reason why we cannot give every feature slice an IP address is also that the world has none left. We’re out of IPv4 address space, which means we must pool workloads.

The last points are important ones and also shows how antiquated the IP-address lockdown model is relative to current practices for datacenter operations. Because of the IPv4 shortage, pools get acquired and traded and change. Because of automated and semi-automated disaster recovery mechanisms, we can provide service continuity even if clusters or datacenter segments or even datacenters fail, but a client system that’s locked to a single IP address will not be able to benefit from that. As the cloud system packs up and moves to a different place, the client stands in the dark due to its firewall rules. The same applies to rolling updates, which we perform using DNS switches.

The state of the art of no-downtime datacenter operations is that workloads are agile and will move as required. The place where you have stability is DNS.

Outbound Internet IP lockdowns add nothing in terms of security because workloads increasingly move into multitenant systems or systems that are dynamically managed as I’ve illustrated above. As there is no warning, the rule may be correct right now and pointing to a foreign system the next moment. The firewall will not be able to tell. The only proper way to ensure security is by making the remote system prove that it is the system you want to talk to and that happens at the transport security layer. If the system can present the expected certificate during the handshake, the traffic is legitimate. The IP address per-se proves nothing. Also, IP addresses can be spoofed and malicious routers can redirect the traffic. The firewall won’t be able to tell.

With most cloud-based services, traffic runs via TLS. You can verify the thumbprint of the certificate against the cert you can either set yourself, or obtain from the vendor out-of-band, or acquire by hitting a documented endpoint (in Windows Azure Service Bus, it's the root of each namespace). With our messaging system in ServiceBus, you are furthermore encouraged to use any kind of cryptographic mechanism to protect payloads (message bodies). We do not evaluate those for any purpose. We evaluate headers and message properties for routing. Neither of those are logged beyond having them in the system for temporary storage in the broker.

The server having access to Service Bus should have outbound Internet access based on the server’s identity or the running process’s identity. This can be achieved using IPSec between the edge and the internal system. Constraining it to the Microsoft DC ranges it possible, but those ranges shift and expand without warning.

The bottom line here is that there is no way to make outbound IP address constraints work with cloud systems or high availability systems in general.

Categories: Technology

Messaging with Windows Azure Service Bus

Windows Azure Service Bus offers a rich set of messaging capabilities in the cloud as well as on-premises. This session discusses some of the advanced messaging capabilities in Service Bus. Join us to learn about publish-subscribe patterns, Using the Service Bus sessions, interoperability with AMQP, scaling with Service Bus, and messaging strategies for server to cloud federation. >> Channel 9

Connected Clients and Continuous Services with Windows Azure Service Bus

Most applications today involve “connected clients”—smart phones, tablets and special-purpose devices—that extend the applications’ reach out to users or assets that can be located anywhere. In this session, we will explore the common messaging challenges associated with building such apps—including mobile user engagement, location transparency/addressability, integrating with a diverse set of client platforms, and providing a common model for client auth. Taking these challenges one-by-one, we will delve into the rich options provided by the Azure Service Bus for building “connected clients" >> Channel 9

Categories: TechEd Europe

I published a new video over on Subscribe about the "Internet of Things". Check it out.

Categories: Architecture

We're talking a lot about "Mobile" solutions in the industry, but the umbrella that this moniker casts has become far too big to be useful and doesn't represent any particular scenario subset that's useful for planning services for "mobile" devices. Nearly every personal computing scenario that consumers encounter today is "mobile".

This post is a personal perspective on "mobile" applications and how applications that run on devices labeled under this umbrella really enable a range of very different scenarios, and do require a different set of backend services, depending on the core scenario.

From this perspective, I present a taxonomy for these experiences that may be helpful with regards to their relationship to cloud services: Mobile, Outside, Inside, and Attached.

  • Mobile only applies to all scenarios where I am literally mobile. I'm moving.
  • Outside applies to all scenarios where I'm away from my office or my house, but at rest.
  • Inside applies to all scenarios where I'm at the office or at the house, but potentially roaming.
  • Attached applied to all scenarios where the experience is immovably attached to a physical location or device, appliance, or other machinery.


As soon as I get into my car and drive, my $700 phone is not really all that useful. Things will beep and ring and try to catch my attention, but they do that without respecting my physical world situation where I probably shouldn't pay much if any attention.

That said – the phone does integrate with my car's entertainment system, so that I can listen to music, podcasts, and Internet streams and the phone functionality also integrates for a hands-free experience. It also reads text message to me out loud as they arrived when the phone is paired with my car. That all happens because the OS supports these core functions directly.

In the case of my particular car, a 2013 Audi A6 with MMI Touch and Audi Connect services (I'm not at all meaning to be boasting here), the phone/entertainment system has even its own phone SIM, so all my phone gets to contribute is its address book and the music/audio link for playing songs from the phone. Text messages and phone communication and the car's built-in navigation features including getting live traffic data is all natively supported by the vehicle without needing the phone's help.

To help making sure that the traffic data is accurate, the vehicle sends – today; this isn't science fiction – anonymous motion information, so-called "floating car data" telemetry into a data pool where it gets analyzed and yields real-time information about slowdowns and traffic jams complementing stationary systems.

If you need to catch my attention while I am mobile in the sense of 'in motion' you either have to call me and hope that I choose to pick up and am not talking to someone else already, leave me a voice mail, send me a text message and hope I'll call back or, otherwise, wait. A text message will reach me right in the central dashboard display of my car.

If you send me a Twitter message or send me something on Facebook or send me an Email I most certainly won't see it until it's safe for me to take my eyes off the street – because it is much less targeted and not integrated in the experience that my personal safety depends on in that situation.

When I'm walking on the Microsoft campus or I'm at an airport in line to board a plane, it's very similar. You can try to reach me via any of these channels, but it's not too unlikely that I'll make you wait when the immediate circumstances demand my attention. Boarding that plane or getting to the next building in time for a meeting with 20 people while it's raining outside is likely of higher urgency than your message – I'm sorry.

A 'mobile' experience is one that supports my mobility and the fact that my primary focus is and must be elsewhere. It can augment that experience but it must not attempt to take center stage because that ought not to be its role. The "My Trips" app for TripIt.com on Windows Phone is a near perfect example of an experience that is truly tailored to mobility. The app doesn't make me ask questions. It knows my itinerary and anticipates what info I will need the next time I look at the live tile.

When I'm arriving at an airport, it will have looked up my connecting flight and will have sent a notification or repeatedly try to send one to fill the Live Tile with information about the connecting flight status and gate information. I don't even have to open the app. If there are critical disruptions it will send me a Toast notification that comes with an audible alarm and vibration to help getting my attention.

Avis, the rental car company, does the same thing via email and also their app for me since I'm a "Preferred" customer. Just before the scheduled pick-up time, which they can also adjust since I give them my flight info, I get a timely email with all the information I need to proceed straight to the stall where my rental car is parked and will find that within the last handful of emails as my plane lands. I proceed to the rental care facility, get into the vehicle, and I get the rental agreement slip as I exit the facility presenting my driver's license. No need to ask for anything; the system anticipates what I'll need and it excels at that.

The phone's calendar is obviously similar. It will show me the next relevant appointment including the location info so that's available at a glance when I just look at the phone while I'm walking to another building; and it will provide the most recent updates so if the meeting gets moved between rooms as I'm on my way then I'll see that reflected on the lock screen.

All these mobile experiences that I'm using today as I'm traveling, share that they are decoupled, asynchronous, often time-driven, and message based. I don't ask for things. I answer to and react to what needs my urgent attention and otherwise I will observe and then "get to it" as I truly have time to focus on something other than getting from A to B and being mobile. Mobilility is driven by messaging, not by request/response.


Being on the road, doesn't literally mean to be driving all the time, of course. Once I sit down and indeed start interacting with a device in order to read email, go through my other messages, read/watch news, or get some work done, I am still outside of the office or the house, but I am yet not on the move. I am at rest in relatively safety and can pay closer attention to the interaction with my information device.

The shape of that interaction differs from the pure mobile experience in that I commonly ask questions and interact with the device, with focus on the device experience. That includes everything from browsing the news, to researching with Wikipedia, watching training videos, to enjoying a movie. Listening to podcasts and/or radio is also one of those experiences even if we're often doing so while being on the move, i.e. walking or driving, as we're instantly able to turn our attention to more important matters as needed – like a nearing ambulance – if we're managing the audio volume as appropriate for the situation.

The outside experience is one where I can indeed get at most of my data assets, as much of it is readily accessible from anywhere since it's stored on the cloud or networked and accessible via VPN. Whether the device I am using to access that data is connected via 3G, LTE, WLAN, or wired Ethernet, and whether the screen is 5" or 27" is largely a question of what sort of an experience I'm looking for, and how big of a device I want to carry to where I'm going.

For many, if not most consumers, this outside experience is often the preferred interaction mode with their devices – and when they own only a single device it's largely indistinguishable from the Inside experience that I'll expand on in the next section. They sit in a cafe or elsewhere comfortable with connectivity, make notes, write email, hatch plans, capture snippets of their life in photos or videos and share them with friends through Instagram, Twitter, or Facebook.

For me, the Outside experience is however quite different from the Inside experience because it's constrained in two key ways: First, while and when connectivity is available, it's commonly either metered or it's provided on someone else's terms, for free or even paid, and that means I don't get a say on the bandwidth and quality and the bandwidth may be seriously constrained as it is, for instance, in most hotels.

What Outside also often means is that connectivity is sparse or non-existent. If I'm traveling and outside the country where I have my primary data contract, I will pay a platinum-coated-bits premium for data. Therefore I find myself Hotspot-hopping quite a bit. Outside may also mean that I'm going away from the core coverage zones of wireless networks, which means that I might quite well end up with no reliable access to network services because I'm either in a remote valley or inside the Faraday-Cage hull of a ship. It might also mean that I am in a stadium with 52,000 other people who are trying to use the same set of cell towers – which is the case about every two weeks for me.

Second, what I am connecting to is a shared network that I cannot trust, which is not well suited for easy discovery and sharing scenarios that rely on UPnP/SSDP and similar protocols.

From an infrastructure perspective, apps that focus on Outside experiences work best if they can deal with varying quality and availability of connectivity, and if they are built to hold and/or access data in a way that is independent – for better and worse – of the scope and sandboxing provided by the local network that I'm connecting to. Thus, Outside experiences are best suited for using cloud-based services.


The Inside experience is much like the Outside one but with the key difference that I either directly own the environment that I'm connecting into or that I at least have reason to trust the owners and anyone else they allow to connect to the environment. That's true for my home network and it's also true, even though with a few caveats, for the office network.

The Outside/Inside split and a further differentiation of Inside into work and home environments is also what the Windows Firewall uses to categorize networks. The public, outside networks are on the lowest trust level, domain networks are a notch higher, and private networks are most trusted.

The experiences that I use on my Inside network at home are indeed different from the experiences I use when Outside. Xbox Smart Glass is a pure inside experience that pairs my mobile device with my Xbox as a companion experience. Xbox connects to my Windows Media Center to make my DVB-S2 SmartCard tuner available to in the guest room, I have a remote control on my phone for my Onkyo A/V receiver, I have IPTV apps with which I can tune into HDTV streams available on my Internet service, I use file sharing to access my multi-TB local photo archive.

A great Inside-experience needs services that are very similar to those of Outside experiences, including state-roaming between devices, and even more so support for seamless multi-device "continuous client" experiences – but they are not necessarily cloud-bound.


Some of the latter Inside experiences, especially the photo archive, are on the brink towards being Attached scenarios. Since I'm shooting photos in RAW and video in 1080p/50, I easily bring home 30GB+ or more from a day out at a museum or air show, and I tend to keep everything. That much data develops quite a bit of gravitational pull, meaning to say that it's not easily moved around.

What's not easily moved around, at all, are experiences that depend on a particular physical asset that is located at a particular place. The satellite dish at my house is something I need to be close to or go to (in the network sense) in order to get at content that is exclusively delivered via that channel. It also, has to be decoded with that one precious smart card that I rent from the Pay-TV provider.

If I had surveillance cameras and motion sensors around the house (I'll let you speculate on whether I really do), those cameras and sensors are location locked and I need to go to them. I can conceivably take a WLAN hub and my Xbox when I go on a vacation trip (and some people do) to make an Inside experience at a hotel room, but I can hardly take the satellite dish and the cameras.

In the business world, even when interacting with consumers, there are plenty of these immobile experiences. An ATM is a big and heavy money safe designed to be as immobile as possible that is equipped with a computer that controls how much cash I can take from that safe. A check-in terminal at an airport makes sense there as a shared experience because it gives me a printout of a document – the boarding pass – that I can use to document my authorization to travel on a particular flight. That's convenient, since paper doesn't run out of battery.

What's particularly noteworthy is that some attached experiences, such as the huge center screen in Tesla Motors' Model S, are attached and inseparable from the larger context, and yet fulfill a mobility role at times – and at other times they function like an outside appliance.

We encounter "attached" experiences while we are mobile, but they're stationary in their own context. That context may, however, be mobile if the attached experience is an in-flight entertainment system or an information terminal on a train.


The Mobile, Inside, Outside, Attached terminology may be a tad bit factual and dry, but I believe it's a useful taxonomy nevertheless. If you have a set of catchier monikers I'm all ears. Let me know whether you find this useful.

Categories: Architecture