March 14, 2006
@ 02:17 PM

I kicked off quite a discussion with my recent post on O/R mapping. Some people think I am completely wrong, some say that it resonates with their experience, some say I wrote this in mean spirit, some are jubilating. I particularly liked the "Architectural Truthiness" post by David Ing and the comment by "Scott E" in my comments section who wrote:

I've hiked up the learning curve for Hibernate (the Java flavor) only to find that what time was saved in mapping basic CRUD functionality got eaten up by out-of-band custom data access (which always seems to be required) and tuning to get performance close to what it would have been with a more specialized, hand-coded DAL.

As always, it's a matter of perspective. Here is mine: I went down the O/R mapping route in a project in '98/'99 when my group at the company I was working for at the time was building a new business framework. We wrote a complete, fully transparent O/R mapper in C++. You walked up to a factory which dehydrated objects and you could walk along the association links and the object graph would either incrementally dehydrate or dehydrate in predefined segments. We had filtering capabilities that allowed to constrain 1:N collections with large N's, we could auto-resolve N:M relationships, had support for inheritance, and all that jazz. The whole framework was written with code generation in mind. Our generators were fed with augmented UML class diagrams and spit out the business layer, whereby we had a "partial classes" concept where we'd keep the auto-gen'd code in one tree and the parts that were supposed to be filled manually in another part of the code tree. Of course we'd preserve changes across re-gen's. Pure OO nirvana.

While the platforms have evolved substantially in the past 7 years, the fundamental challenges for transparent (fully abstracted) mapping of data to objects remain essentially the same.

  • Given metadata to do the mapping, implementing CRUD functionality with an O/R mapper is quite easy. We had to put lots of extra metadata into our C++ classes back in the day, but with .NET and Java the metadata is all there and therefore CRUD O/R mapping is very low-hanging fruit on both platforms. That's why there's such a large number of projects and products.
  • Defining and resolving associations is difficult. 1:N is hard, because you need to know what your N looks like. You don't want to dehydrate 10000 objects to find a value in one of them or to calculate a sum over a column. That's work that's, quite frankly, best left in the database. I realize that some people worry how that leads to logic bleeding into the database, but for me that's a discussion about pureness vs. pragmatism. If the N is small, grabbing all related objects is relatively easy - unless you support polymorphism, which forces the mapper into all sorts of weird query trees. 1:N is so difficult because an object model is inherently about records, while SQL is about sets. N:M is harder.
  • "Object identity" is a dangerous lure. Every object has its own identifier. In memory that is its address, on disk that's some form of unique identifier. The idea of making the persistent identifier also the in-memory identifier often has the design consequence of an in-memory "running object table" with the goal of avoiding to load the same object twice but rather linking it appropriately into the object graph. That's a fantastic concept, but leads to all sort of interesting concurrency puzzles: What do you do if you happen to find an object you have already loaded as you resolve an 1:N association and realize that the object has meanwhile changed on disk? Another question is what the scope of the object identity is. Per appdomain/process, per machine or even a central object server (hope not)?
  • Transactions are hard. Databases are doing a really good job with data concurrency management, especially with stored procedures. If you are loading and managing data as object-graphs, how do you manage transaction isolation? How do you identify the subtree that's being touched by a transaction? How do you manage rollbacks? What is a transaction, anyways?
  • Changing the underlying data model is hard. I've run into several situations where existing applications had to be, with the customer willing to put money on the table, be integrated with existing data models. O/R mapping is relatively easy of the data model falls out of the object model. If an existing data model bubbles up against an object model, you often end up writing a DAL or the O/R in stored procedures.
  • Reporting and data aggregation is hard. I'll use an analogy for that: It's really easy to write an XPath query against an XML document, but it is insanely difficult to do the same navigating the DOM.

That said, I am not for or against O/R mapping. There are lots of use cases with a lot of CRUD work where O/R saves a lot of time. However, it is a leaky abstraction. In fact is is so leaky that we ended up not using all that much of the funkyness we put into our framework, because "special cases" kept popping up. I am pointing out that there are a lot of fundamental differences between what an RDBMS does with data and how OOP treats data. The discussion is in part a discussion about ISAM vs. RDBMS.

The number of brain cycles that need to be invested for a clean O/R mapping of a complex object model in the presence of the fundamental challenges I listed here (and that list isn't exhaustive) are not automatically less than for a plain-old data layer. It may be more. YMMV.

Now you can (and some already have) ask how all of that plays with LINQ and, in particular, DLINQ. Mind that I don't work in the LINQ team, but I think to be observing a subtle but important difference between LINQ and O/R*: 

  • O/R is object->relational mapping.
  • LINQ is relational->object mapping.

LINQ acknowledges the relational nature of the vast majority of data, while O/R attempts to deny it. LINQ speaks about entities, relations and queries and maps result-sets into the realm of objects, even cooking up classes on the fly if it needs to. It's bottom up and the data (from whatever source) is king. Objects and classes are just tooling. For O/R mapping, the database is just tooling.

Wednesday, March 15, 2006 12:59:15 AM UTC
One of the main reasons why, on my current project, we skipped Object Relational Mapping and use Relational Object Mapping instead. IBatis is a great tool for this, mapping stored procedures and SQL statements to objects.

For those who invested in Object Relational Mapping, your thinking certainly doesn't come falling from the sky.
Sunday, March 19, 2006 6:32:47 AM UTC
Question: What do you do if you happen to find an object you have already loaded as you resolve an 1:N association and realize that the object has meanwhile changed on disk?

Answer:
[Optimistic concurrency]
Well if you were looking at the desired object graph as a whole unit, you'd have to read it transactionally (to be consistent). Otherwise, you'd never be sure wether or not during your read somebody else had altered data which is part of your object graph.
Let's suppose this is not the requirement. You load object A (which points to object B but the association is not loaded yet). B is already loaded in the "running object table" (do I smell (D)COM?). You navigate to B via A. Plumbing sees the association to B is not loaded and hence loads B from the database (in case it requires a query). If B is not the same version as the one we read earlier then there are several options:
1. Throw some kind of concurrency exception (abort early)
2. Ignore the difference in version. We know we're working with a copy, so lets behave accordingly. The concurrency exception may come later (if we choose to persist B).
3. Handle concurrent versions and thus load the new version and hand out the new version.

These are indeed the same questions I would ask myself if I was riding the SOE(Service Orient Express). Boundaries at different levels, but boundaries nonetheless.
Tuesday, March 21, 2006 10:59:55 PM UTC
... I'm still waiting for that OR mapper that actually does some mapping. I dont consider copying a flat table to a round object mapping. In my view a perfect mapper would allow me to define an application data model. A logical data structure for my specific domain. I model data in such a way that it makes the most sense inside my domain or application. Then I tell the mapper from what data stores (yes plural!) it should take this data and "MAP" it to my convinient little domain data model.

Of course if you start to think what you need to make this happen you will realize that it is a mix of distribute data aggregation, record-matching engines and having a 'command interpreter' (I think sql is up for a overhaul too ;-) that lives outside a database.

Oh, did I forget to mention that my perfect mapper any datasource goes, whether its from a 'real' database or a (web) service or the file system or the registry or...

You can't keep a man from dreaming...

Great piece, clemens
Obiwan Jacobi
Comments are closed.