I kicked off quite a discussion with my recent post on O/R mapping. Some people think I am completely wrong, some say that it resonates with their experience, some say I wrote this in mean spirit, some are jubilating. I particularly liked the "Architectural Truthiness" post by David Ing and the comment by "Scott E" in my comments section who wrote:

I've hiked up the learning curve for Hibernate (the Java flavor) only to find that what time was saved in mapping basic CRUD functionality got eaten up by out-of-band custom data access (which always seems to be required) and tuning to get performance close to what it would have been with a more specialized, hand-coded DAL.

As always, it's a matter of perspective. Here is mine: I went down the O/R mapping route in a project in '98/'99 when my group at the company I was working for at the time was building a new business framework. We wrote a complete, fully transparent O/R mapper in C++. You walked up to a factory which dehydrated objects and you could walk along the association links and the object graph would either incrementally dehydrate or dehydrate in predefined segments. We had filtering capabilities that allowed to constrain 1:N collections with large N's, we could auto-resolve N:M relationships, had support for inheritance, and all that jazz. The whole framework was written with code generation in mind. Our generators were fed with augmented UML class diagrams and spit out the business layer, whereby we had a "partial classes" concept where we'd keep the auto-gen'd code in one tree and the parts that were supposed to be filled manually in another part of the code tree. Of course we'd preserve changes across re-gen's. Pure OO nirvana.

While the platforms have evolved substantially in the past 7 years, the fundamental challenges for transparent (fully abstracted) mapping of data to objects remain essentially the same.

  • Given metadata to do the mapping, implementing CRUD functionality with an O/R mapper is quite easy. We had to put lots of extra metadata into our C++ classes back in the day, but with .NET and Java the metadata is all there and therefore CRUD O/R mapping is very low-hanging fruit on both platforms. That's why there's such a large number of projects and products.
  • Defining and resolving associations is difficult. 1:N is hard, because you need to know what your N looks like. You don't want to dehydrate 10000 objects to find a value in one of them or to calculate a sum over a column. That's work that's, quite frankly, best left in the database. I realize that some people worry how that leads to logic bleeding into the database, but for me that's a discussion about pureness vs. pragmatism. If the N is small, grabbing all related objects is relatively easy - unless you support polymorphism, which forces the mapper into all sorts of weird query trees. 1:N is so difficult because an object model is inherently about records, while SQL is about sets. N:M is harder.
  • "Object identity" is a dangerous lure. Every object has its own identifier. In memory that is its address, on disk that's some form of unique identifier. The idea of making the persistent identifier also the in-memory identifier often has the design consequence of an in-memory "running object table" with the goal of avoiding to load the same object twice but rather linking it appropriately into the object graph. That's a fantastic concept, but leads to all sort of interesting concurrency puzzles: What do you do if you happen to find an object you have already loaded as you resolve an 1:N association and realize that the object has meanwhile changed on disk? Another question is what the scope of the object identity is. Per appdomain/process, per machine or even a central object server (hope not)?
  • Transactions are hard. Databases are doing a really good job with data concurrency management, especially with stored procedures. If you are loading and managing data as object-graphs, how do you manage transaction isolation? How do you identify the subtree that's being touched by a transaction? How do you manage rollbacks? What is a transaction, anyways?
  • Changing the underlying data model is hard. I've run into several situations where existing applications had to be, with the customer willing to put money on the table, be integrated with existing data models. O/R mapping is relatively easy of the data model falls out of the object model. If an existing data model bubbles up against an object model, you often end up writing a DAL or the O/R in stored procedures.
  • Reporting and data aggregation is hard. I'll use an analogy for that: It's really easy to write an XPath query against an XML document, but it is insanely difficult to do the same navigating the DOM.

That said, I am not for or against O/R mapping. There are lots of use cases with a lot of CRUD work where O/R saves a lot of time. However, it is a leaky abstraction. In fact is is so leaky that we ended up not using all that much of the funkyness we put into our framework, because "special cases" kept popping up. I am pointing out that there are a lot of fundamental differences between what an RDBMS does with data and how OOP treats data. The discussion is in part a discussion about ISAM vs. RDBMS.

The number of brain cycles that need to be invested for a clean O/R mapping of a complex object model in the presence of the fundamental challenges I listed here (and that list isn't exhaustive) are not automatically less than for a plain-old data layer. It may be more. YMMV.

Now you can (and some already have) ask how all of that plays with LINQ and, in particular, DLINQ. Mind that I don't work in the LINQ team, but I think to be observing a subtle but important difference between LINQ and O/R*: 

  • O/R is object->relational mapping.
  • LINQ is relational->object mapping.

LINQ acknowledges the relational nature of the vast majority of data, while O/R attempts to deny it. LINQ speaks about entities, relations and queries and maps result-sets into the realm of objects, even cooking up classes on the fly if it needs to. It's bottom up and the data (from whatever source) is king. Objects and classes are just tooling. For O/R mapping, the database is just tooling.

Updated: