The sirens are singing: O/R mapping

To (O/R) map or not to map.

The monthly discussion about the benefits and dangers of O/R mapping is making rounds on one of the mailing lists that I am signed up to. One big problem in this space - from my experience of discussing this through with a lot of people over and over – is that O/R mapping is one of those things where the sheer wish for an elegant solution to the data/object schism obscures most of the rational argumentation. If an O/R mapper provides a nice programming or tooling experience, developers (and architects) are often willing to accept performance hits and a less-than-optimal tight coupling to the data model, because they are lured by the aesthetics of the abstraction.

Another argument I keep hearing is that O/R mapping yields a significant productivity boost. However, if that were the case and if using O/R mapping would shorten the average development cost in a departmental development project by – say – a quarter or more, O/R mapping would likely have taken over the world by now. It hasn't. And it's not that the idea is new. It’s been around for well more than a decade.

To me, O/R mapping is one of the unfortunate consequences of trying to apply OOP principles to anything and everything. For "distributed objects", we’re fixing that with the service orientation idea and the consequential constraints when we talk about the network edge of applications. It turns out that the many of the same principles apply to the database edge as well. The list below is just for giving you the idea. I could write a whole article about this and I wish I had the time:

Boundaries are explicit => Database access is explicit
Services avoid coupling (autonomy) => Database schema and in-process data representation are disjoint and mapped explicitly
Share schema not code => Query/Sproc result sets and Sproc inputs form data access schema (aliased result sets provide a degree of separation from phys. schema)

In short, I think the dream of transparent O/R mapping is the same dream that fueled the development of fully transparent distributed objects in the early days of DSOM, CORBA and (D)COM when we all thought that'd just work and were neglecting the related issues of coupling, security, bandwidth, etc.

Meanwhile, we’ve learned the hard way that even though the idea was fantastic, it was rather naïve to apply local development principles to distributed systems. The same goes for database programming. Data is the most important thing in the vast majority of applications. Every class of data items (table) surround special considerations: read-only, read/write, insert-only; update frequency, currency and replicability; access authorization; business relevance; caching strategies; etcetc.

Proper data management is the key to great architecture. Ignoring this and abstracting data access and data management away just to have a convenient programming model is … problematic.

And in closing: Many of the proponents of O/R mapping that I run into (and that is a generalization and I am not trying to offend anyone – just an observation) are folks who don't know SQL and RDBMS technology in any reasonable depth and/or often have no interest in doing so. It may be worth exploring how tooling can better help the SQL-challenged instead of obscuring all data access deep down in some framework and make all data look like a bunch of local objects. If you have ideas, shoot. Comment section is open for business.

Clemens Vasters