I commented on this blog post on LinkedIn last week; it’s worth spending a little time also on the data claims in addition to the platform claims. Amongst the author’s argument is also that IoT data must be “liberated”.

First, the author clearly has a problem understanding today’s function or design of IoT platforms or the cloud when it comes to data. This quote stands nicely by itself without further comment:

The client-server model underlying today’s computing systems greatly compounds the problem. Regardless of data-structure, information in today’s computing systems is machine-centric because its life is tied to the life of a physical machine and can easily become extinct.

Amazingly, in the very same article, the author “visionarily” praises a future where “dispersed computing devices” become authoritative data stores:

Dispersed computing devices will become unified application platforms from which to provide services to devices and users where the applications run, where the data is turned into information, where storage takes place, and where the browsing of information ultimately takes place too.

because

… today’s systems will not be able to scale and interact effectively where there are billions of nodes involved. The notion that all these “things” and devices will produce streaming data that has to be processed in some cloud will simply not work.

The described “dispersed computing devices” are obviously not physical machines that can fail. Or that need more storage. Or that need more compute. Or that need data redundancy. Or any other sort of fallback mechanism if they happen to go poof.

The notion that serious distributed business systems that need to fulfill reliability goals and must be operated securely will happily rely on a broadly “dispersed computing devices” that hold authoritative information is naive. Because the underlying notion of “dispersed” reflects a deeply outdated worldview, signaled in the quote above by “where”.

For building broadly distributed systems, location may be a factor, but only as a data point. If we’re building traffic infrastructure, we care about the totality of the traffic for a particular region, not just a street corner. And we can’t tolerate the traffic system being blinded by a truck running over a box standing by the side of the street. We can’t tolerate the traffic infrastructure being fed false data because someone with physical access owned the allegedly authoritative data store in that box. In broadly distributed IoT systems, the devices and field-deployed infrastructure consists summarily of devices that are subject to physical attack, that are subject to weather and wear and tear, that are subject to destruction by some accident. None of these places are where you want to keep authoritative data.

You will want that data in a highly redundant, geo-distributed fabric of backend services that can help compensate for loss or compromises of devices that are exposed to the elements and interference of all kinds. Those backend services exist today in the form of public cloud and sophisticated private clouds, combined with on-site device and gateway capabilities that allow local communication and local processing. And there is ample headroom in terms of capacity in these cloud systems (they are the backend for the mobile web made up of billions of devices today, after all) and the buildout pace across providers is keeping up nicely.

No science-fiction.

Per the author, information will not only be stored (and retrieved from) a fully dispersed, peer-to-peer infrastructure, it also needs freedom:

With today’s IoT platforms information is not free (and that’s free as in “freedom,” not free as in “free of charge”). In fact, thanks to today’s platforms and information architectures, it’s not free to easily merge with other information and enable any kind of systemic intelligence.

Data consists of captured facts. Those facts are not a novel thing. Any machine that is in existence, from steam engines to rocket ships creates an unlimited amount of facts at all times. Our body produces unlimited amount of facts at any time. Everything produces facts at all times.

Capturing facts and turning them into a data stream generally requires access to the source and permission to do so. There’s no “free as in freedom” here, because facts stem from contexts that have an owner with interest in protecting privacy and trade/military secrets and often also responsibility for what happens with that data. Access to the data is not and cannot be any more free than access to the source of the data and there is no “freedom” since consensus is required for new uses. Do you want your body temperature readings “liberated”?

The capture is done with sensors, the data is transferred to storage or instant processing; which is all an explicit effort that is (hopefully) done with a purpose and application in mind and not for sheer hoarding.

Any debate about rights on the resulting data can only be had amongst participants in capture and processing, like manufacturers, component providers, service providers, owners, medical patients, and operators of machines and devices. And the participants are also those who agree on formats and data semantics for the uses they have for the data. And as it turns out, openness is a required foundation for that agreement to happen. And therefore it happens. Today. It happens in the context of dozens of industry associations and standardization bodies that productively work to create more and more consensus.

This here:

What would truly liberated information be like? It might help to think of the atoms and molecules of the physical world. They have distinct identities, of course, but they are also capable of bonding with other atoms and molecules to create entirely different kinds of matter.

[…]

As these devices and systems become more and more intelligent, the data they produce will become like neurons of the brain, or ants in an anthill, or human beings in a society, as well as information devices connected to each other.

is smart-sounding, pretend-visionary chatter that has absolutely no relevance for anyone needing to get work done today.

We need clear notions of sources of data, we need clear definitions of the semantics of data, we need clear understanding of how data flows. We need to be able to reason about data so that we can combine it with reference information and use it meaningfully in analytics and also in Machine Learning and AI contexts. We need agreement and there will be a lot of work invested in such agreement across many industries in the coming years and beyond. Agreement is platform. Agreement is “liberation”.

Complaints about lack of “freedom” around data generally raise a grand red flag about someone who has no rights to the data and who didn’t participate in capture or movement, and who didn’t participate in creating agreement trying to argue their way into getting access to it, because it is somehow their natural right. It’s not.

Summary

Why do I spend two posts on this? Because lot of the engineers and researchers in and around “IoT”, me included, are getting quite tired of these sorts of fluff “vision” and “projecting competence” pieces that don’t contribute anything new to the conversation (we knew about AI being driven by data and we also knew about edge compute, thanks) while pissing all over the town square where we all try to get some concrete work done.

Regardless of data-structure, information in today’s computing systems is machine-centric because its life is tied to the life of a physical machine and can easily become extinct.

Summary: Don’t take advice from people stuck in the 1990s who tell you today’s technology is unsuitable for some science-fiction “vision” of the future.

Updated: