Internet of Things: Is VPN a False Friend?
"Internet of Things" (IoT) is the grand catchphrase for network-enabling everyday objects and leveraging the new connectivity to collect information from the devices, allowing network-side control, and supplying information to those objects that allows them to do new tricks – like telling a toaster about the day's weather forecast so that it can burn a sun or a cloud into your morning slice of bread.
The opportunities and the use-cases in this space are almost limitless. Network-enabled, commercial vehicles or even subsystems like engines or brake systems can leverage the connectivity for conveying servicing information and predictive failure analysis, for route optimization, and driver-safety programs. Devices attached to power grid components can provide deeper insight into the health even of decades old equipment, and they can help with managing capacity now that consumers turn producers with wind-power generators and the flow of electricity has become a two-way flow. Smart devices will also help consumers to closely track their own energy consumption and, obviously, automate aspects of their households.
The "Internet of Things" wave will span many more industries and I could easily go on for many pages with more examples of which none are science-fiction. They're real and either already deployed in small scale or on the drawing boards of engineers under active development.
What's decidedly different about this new Internet wave is that it is driven by an entirely different class of people and companies as the wave of the consumer Web. The "Internet of Things" is (surprise!) about "Things" and the drivers are the makers of such things. Everyday devices and machines that consumers and businesses are buying today and that the manufacturers are looking to make better for tomorrow.
The Web is the grand success that it is because it was built for and on people-centric, general-purpose computing devices. It's also largely focused on people's interaction with information stores and sources, meaning that the impulse for interaction is usually coming from the end-user. Where user-initiated requests and the subsequent, hypertext-driven requests rooting from the same impulse are the overwhelming traffic motivation, the dominant HTTP protocol with its focus on request/response exchanges initiated by the client is a logical best fit.
Web technology also provides for an enormously low barrier of entry for new commercial providers of software-based services, as those are running on commodity hardware and solution builders can pick from a great choice of commoditized software platforms.
The Internet of Things is not the Web
There are some good reasons to believe that the Internet of Things will be different from the Web.
To play in this space, companies typically start with a set of existing physical products, a set of use-cases around these products, and very often an established and loyal customer base that's looking for new capabilities in the things they already use for their business or personal lives. There are a good number of startups operating in this space, but disrupting and displacing incumbents in the maritime industries, commercial vehicles, specialized production machinery, or high-speed train manufacturing may quite well happen at the component supplier level, but is harder to see coming at the product level.
If you are operating fleets of seagoing vessels, taxis, or trucks, or you run an electricity grid or manage street-lamps, there's a clear set of primary use-cases, like shipping things from A to B, along with well-established industries supplying machinery for these use-cases, and the digital component is a value add, not a purpose in itself.
While a 40,000 feet view onto the topology of a network of devices may look very similar to the topology of the network of the Web, with an interconnected core of services and a peripheral cloud of clients. As you get closer, the similarities start waning, though. While the Web is geared towards a primary interaction pattern where the clients initiate all activities and interaction is typically some form of information exchange – may that be a query being traded for a result list or a data-set update traded for a receipt – the interaction patterns are more differentiated for special-purpose devices where direct, human interaction is not in focus.
I generally classify the interaction patterns for 'things' into four major categories: Telemetry, Inquiries, Commands, and Notifications.
- Telemetry is the flow of information about the current or temporally aggregated state of the device or the state of its environment (e.g. readings from its sensors) from the device to some other party. The information flow is unidirectional and away from the device.
- Inquiries are questions that the device has about the state of the outside world based on its current circumstances; an inquiry can be a singular query akin to a database lookup, but it might also ask a service to supply a steady flow of information. For instance, the aforementioned toaster will ask for the weather and get a singular response, but a vehicle might supply a set of geo-coordinates for a route and ask for continuous traffic alert updates about particular route until it arrives at the destination. Only the former of these cases is the regular request/response case that HTTP is geared towards.
- Commands are service-initiated instructions sent to the device. Commands can tell a device to send information about its state – either as a point-in-time observation or over continuously some period – or to change the state of the device, including performing activities with effects in the physical world. That includes, for instance, sending a command from a smartphone app to unlock the doors of your vehicle, whereby the command first flows to an intermediating service and from there it's routed to the vehicle's onboard control system.
- Notifications are one-way, service-initiated messages that inform a device or a group of devices about some environment state they're otherwise not aware of. Cities may broadcast information about air pollution alerts suggesting fossil-fueled systems to throttle CO2 output – or, more simply, a car may want to show weather or news alerts or text messages to the driver.
For all four interaction categories it is, except for that one request-response sub-scenario, a clear requirement to have bi-directional information flow that can be client-initiated or server-initiated, depending on the particular function's need.
A requirement that indirectly grows out of the need for server-initiated flow (like telling your vehicle to toot a tune on its horn using a smartphone app when you've forgotten in what corner of which level of the structure you've parked at the airport a week ago) is that there must be a continuously maintained traffic route towards the client, which may only have to route a few messages per week, but if whenever a message is sent, the expectation is that the latency is in the order of a few seconds.
Because the scenarios around 'things' are quite different from those of the Web, where the focus is on people and their interactions, there's quite a bit of a risk that its well-known technologies turn into false friends. Are they a fit because they're ubiquitous?
VPN to the Rescue?
The standard Web interaction model where the client initiates and a service response is just one out of a several different patterns that are required for connected and non-interactive 'things', and this presents quite a bit of a challenge. How would I send a service-originated command or a notification to a connected device?
One option would indeed be HTTP long-polling or Web Sockets. The client could establish such a connection and the service would hold on to it and subsequently route all service-originated messages for the client through the established channel. That's a reasonable strategy, even if that introduces a solvable, but tricky service-side routing challenge of how the service will route to that ephemeral socket or pending request across a multi-node fabric.
But because that's tricky, many people in the devices space seem to go down a different route today: They're turning the devices into servers – and suddenly that routing problem is magically solved. If I can send a notification or command to a device by ways of issuing an HTTP request to it, I could use off-the-shelf components and HTTP to implement both the client-to-service Telemetry/Inquiry path and the service-to-client Command/Notification. Or even if I'm not into HTTP, I could just use whatever standard or proprietary protocol I like, and yet just treat either party as a server from the respective other party's perspective.
That's enormously attractive. It also poses a new challenge. How do I turn a truck or an off-shore wind-turbine into an addressable server endpoint? The answer to that question is, across many industries and companies, in unison, "VPN".
Virtual Private Networks (VPN) provide a link layer integration model between network participants. Expressed in a more pedestrian fashion, a VPN is akin to hooking everyone connected to the VPN onto the same Ethernet hub, whereby secured public Internet connections act as the network cables. Because the VPN illusion is created down at the link level and largely equivalent to having a network adapter on that network, participants on the VPN can speak practically any protocol, including but not limited to IPv4 and IPv6, and all protocols that ride on top of those two.
The steps for making a field-deployed device network addressable – assuming it supports VPN – are fairly straightforward: The device first establishes an external network identity that allows it to connect either to the public Internet or, as it is sometimes done for GPRS/3G/LTE devices, a carrier-provided closed network by ways of a dedicated in-network access point. Then it establishes the VPN tunnel by connecting to the VPN gateway's endpoint, which either resides on the Internet or on the closed network. Once the tunnel is established, the device is now connected to a separate, second network: the VPN.
Assuming that network is an IP network, the device either already has a pre-assigned address or requests an address lease from the network's DHCP service and is then a fully addressable network participant within the private address space of the VPN. If we further assume that the service who wants to address the device has direct or routed access to the VPN's address space, the service can now directly address the device and talk to any endpoints the device may be listening on. And because all of the tunnels into the VPN are secure, all the traffic exchanged between any of the parties is automatically secure without taking any extra precautions. Perfect solution. Is it?
Where's the Catch?
The biggest issue with the VPN approach for field-deployed devices lies where many people would expect it least: Security. That might be a surprise as VPN is often seen as being seen as almost synonymous with a "secure network space", which is not a proper way to look at it.
VPN provides a virtualized and private (isolated) network space. The secure tunnels are a mechanism to achieve an appropriately protected path into that space, but the space per-se is not secured, at all. It is indeed a feature that the established VPN space is fully transparent to all protocol and traffic above the link layer.
In the two predominant use-cases for VPN technology, its transparency is clearly desirable: The first use-case is the integration of corporate satellite assets like notebooks into secure networks. The second key use-case are inter-datacenter links fusing datacenter or application-scope networks over the public Internet. In the latter case, the connected parties are presumably following datacenter best-practices for physical and network access control. In the former case, the client is commonly in the possession of an authorized employee or vendor, requires individual user credentials for access, is often protected with a smartcard, is subject to device-level encryption, and often allows some degree of remote control including remote wipe if the asset becomes compromised. In both cases, the assets connecting into the VPN are either under immediate control of personnel authorized by the VPN owner, or there are several layers of safeguards in place to prevent access to the VPN should the assets become compromised.
The security of a virtual network space solely depends on controlling and securing all assets that connect into it, which obviously includes physical access security.
Now imagine you're an energy utility company planting a farm of wind-turbines into a field on a remote hill. Or imagine you're a city planting environmental sensors for pollution, humidity, barometric pressure, and temperature onto rooftops. Or imagine you're a manufacturer selling network-attachable kitchen appliances to the general public.
And now imagine that the way you're creating bi-directional connectivity to these devices and to make them addressable is by mapping them into a VPN, together with your services and any other such device – at the link layer.
How much can you trust or control that these devices don't get physically hijacked and compromised? What's the attack surface area of your services and the neighboring devices in case that were to happen?
It's one of the key principles of security that whoever has physical possession of a device owns ("pwns") the device from a security perspective. If you're handing complete strangers networked devices that can log themselves into your VPN based on secrets present on the device, you should expect that you'll eventually have unauthorized link-level visitors in the private network that you will have to be prepared to defend against – and you'll have to defend the device's neighbors as much as the services you map into the same private network space.
The security measures you'll have to put in place for this eventuality are largely equivalent to securing the services and devices as if they were directly attached to the public Internet. If you get uninvited visitors who exploit a device, you will have to assume malicious intent; these intruders will not show up by accident. Therefore, you'll have to firewall all devices, and you have to put authentication and access control measures on all exposed service endpoints. You'll also have to ensure that whatever service software stack is running on the device is "Internet hardened" and that you have an appropriate avenue to promptly distribute security updates, should that become necessary.
In addition to the security challenges, only some advanced VPN protocols over IPSec/IKEv2 (RFC 4555) allow for seamless handling of connection failure scenarios, client network roaming, and reconnect. With devices on unreliable or highly congested networks, or devices that are used in mobility scenarios where connections may be interrupted because of signal interruptions, a VPN client without this support will incur the cost of having to reestablish the tunnel and the VPN session whenever the connection drops. That, in turn, can lead to routing confusion when a client drops and reconnects and shows up on different VPN load balanced router while some service-side component wants to send data to the device.
Lastly, VPN is very resource hungry for establishing data tunnels to hundreds of thousands or more small devices that send and receive relatively few and usually fairly small messages each. It's demanding on the client in terms of the required stack and the processing needs, which may be a problem for small embedded devices. It's also enormously resource consuming on the service-side. Current, dedicated hardware for managing 10,000 simultaneous VPN tunnels can easily cost over $100,000 USD with single redundancy for one site.
As you contemplate the complexity consequences, it is possible you'll come to the conclusion that creating a VPN for the connected devices scenario may not be the obvious best choice that it seemed to be at first.
Alternatives
Even with the laid out constraints, VPN might be a viable model to enable two-way connectivity, if you're willing to make the right security investments on top, and if the devices are capable enough.
If a VPN solution and its consequential complexities were indeed turning out to be too heavyweight for you, you'll again have the problem of how to make the devices individually addressable in order to send a service-originated command or a notification to a connected device.
As mentioned, one possible alternative would be a long-polling Web Sockets based gateway, where the client establishes a connection that the service holds on to, and subsequently routes service-originated messages through the established channel back up to the client.
The advantage of this model is that the client will not have to be directly addressable. If the gateway is hosted on a public-facing Internet address, the client can establish a connection coming through any layers of NATs and/or Proxies and/or Firewalls and the service can route information back over that established link through these intermediary infrastructures.
What this model still wouldn't solve well is the case of clients that get occasionally disconnected due to weak wireless signals or congested networks. The client can park one of these sockets on the gateway and make itself known, but if the connection collapses and the device is out of reach for a little while and a message arrives for it, where does that message go?
Also, once the client comes back it may connect to a different gateway machine by ways of a load balancer, so if you're retaining that message on the original gateway node, you'd now have to route it to the current gateway node. Because the 'current' node may shift if the device repeatedly connects and disconnect when it is, for instance, at the edge of the wireless coverage area, chasing the client gets fairly complicated. And that's something you'd have to build.
Messaging
A very practical and fairly straightforward solution to the entirely problem space is to use a scalable and Internet-facing messaging system using a bi-directional and multiplexing protocol like AMQP to facilitate the outbound as well as the inbound device traffic.
If each device is assigned an exclusive queue or a filtered subscription on a pub/sub topic for messages to it, the addressing problem is moved from the edge where the device (VPN) or its connection (Web Sockets) must be identified to one where messages for a device get routed to a well-known and stable location in the system and from which the device can pick them up as it can, depending on connectivity state. When a message is sent and the device is connected and waiting for a message, the message can be delivered in a matter of a few milliseconds. If the device is temporarily offline, it can pick up messages whenever it regains network access – unless the message expire, which is an option in most common messaging systems so that a command like "unlock door" isn't executed to everyone's surprise a day later if the device was disconnected for that long.
Since the devices will pull messages from the messaging system and send messages on the same path and with AMQP also over the very same multiplexed connection, the communication path – likely enveloped with SSL/TLS – is as secure as a VPN tunnel and an HTTPS-wrapped Web Socket, and have the same advantages as the Web Socket path in terms of not exposing the client to unwanted traffic because all traffic is outbound and coming from behind existing protection layers.
From a scalability perspective, a scalable pub/sub system with addressable entities and well-known scale characteristics also provides a good structure to allow for cleanly partitioning devices and device-groups across as many queues and topics as needed to accommodate a large device population.
Conclusion
Using VPNs for device connectivity is a viable if the solution addresses the inherent security issues. Using a VPN doesn't equate creating a secure network space. It creates a virtual network space with full fidelity at the link layer with protected paths into that network space. That's a big difference. The tax that needs to be paid for VPN support on the client is not insignificant, and securing the virtualized network doesn't pose a smaller challenge than securing Internet-exposed devices, especially when those devices are outside the manufacturer's or operator's immediate physical control.
After weighing these costs, a solution that builds purely on simple, client-originated connectivity with overlaid transport layer security is not only much simpler, but doesn't carry the same security risks or infrastructure tax.
Using a messaging system as the gateway technology for these client-originated connections where each client has a designated 'mailbox' in form of a queue or topic also elegantly solves the addressability issue with the added benefit of being resilient against occasional connection loss – while not causing significant extra latency cost or overhead.
For a walkthrough on how to architect a system of this kind, I'll recommend taking a look at my June 2012 MSDN Magazine article (which may have been published a year before its time) and you can expect more on this topic here in the upcoming months.