It's 2008. Where's my flying car? RSS 2.0
 Tuesday, March 14, 2006

I kicked off quite a discussion with my recent post on O/R mapping. Some people think I am completely wrong, some say that it resonates with their experience, some say I wrote this in mean spirit, some are jubilating. I particularly liked the "Architectural Truthiness" post by David Ing and the comment by "Scott E" in my comments section who wrote:

I've hiked up the learning curve for Hibernate (the Java flavor) only to find that what time was saved in mapping basic CRUD functionality got eaten up by out-of-band custom data access (which always seems to be required) and tuning to get performance close to what it would have been with a more specialized, hand-coded DAL.

As always, it's a matter of perspective. Here is mine: I went down the O/R mapping route in a project in '98/'99 when my group at the company I was working for at the time was building a new business framework. We wrote a complete, fully transparent O/R mapper in C++. You walked up to a factory which dehydrated objects and you could walk along the association links and the object graph would either incrementally dehydrate or dehydrate in predefined segments. We had filtering capabilities that allowed to constrain 1:N collections with large N's, we could auto-resolve N:M relationships, had support for inheritance, and all that jazz. The whole framework was written with code generation in mind. Our generators were fed with augmented UML class diagrams and spit out the business layer, whereby we had a "partial classes" concept where we'd keep the auto-gen'd code in one tree and the parts that were supposed to be filled manually in another part of the code tree. Of course we'd preserve changes across re-gen's. Pure OO nirvana.

While the platforms have evolved substantially in the past 7 years, the fundamental challenges for transparent (fully abstracted) mapping of data to objects remain essentially the same.

  • Given metadata to do the mapping, implementing CRUD functionality with an O/R mapper is quite easy. We had to put lots of extra metadata into our C++ classes back in the day, but with .NET and Java the metadata is all there and therefore CRUD O/R mapping is very low-hanging fruit on both platforms. That's why there's such a large number of projects and products.
  • Defining and resolving associations is difficult. 1:N is hard, because you need to know what your N looks like. You don't want to dehydrate 10000 objects to find a value in one of them or to calculate a sum over a column. That's work that's, quite frankly, best left in the database. I realize that some people worry how that leads to logic bleeding into the database, but for me that's a discussion about pureness vs. pragmatism. If the N is small, grabbing all related objects is relatively easy - unless you support polymorphism, which forces the mapper into all sorts of weird query trees. 1:N is so difficult because an object model is inherently about records, while SQL is about sets. N:M is harder.
  • "Object identity" is a dangerous lure. Every object has its own identifier. In memory that is its address, on disk that's some form of unique identifier. The idea of making the persistent identifier also the in-memory identifier often has the design consequence of an in-memory "running object table" with the goal of avoiding to load the same object twice but rather linking it appropriately into the object graph. That's a fantastic concept, but leads to all sort of interesting concurrency puzzles: What do you do if you happen to find an object you have already loaded as you resolve an 1:N association and realize that the object has meanwhile changed on disk? Another question is what the scope of the object identity is. Per appdomain/process, per machine or even a central object server (hope not)?
  • Transactions are hard. Databases are doing a really good job with data concurrency management, especially with stored procedures. If you are loading and managing data as object-graphs, how do you manage transaction isolation? How do you identify the subtree that's being touched by a transaction? How do you manage rollbacks? What is a transaction, anyways?
  • Changing the underlying data model is hard. I've run into several situations where existing applications had to be, with the customer willing to put money on the table, be integrated with existing data models. O/R mapping is relatively easy of the data model falls out of the object model. If an existing data model bubbles up against an object model, you often end up writing a DAL or the O/R in stored procedures.
  • Reporting and data aggregation is hard. I'll use an analogy for that: It's really easy to write an XPath query against an XML document, but it is insanely difficult to do the same navigating the DOM.

That said, I am not for or against O/R mapping. There are lots of use cases with a lot of CRUD work where O/R saves a lot of time. However, it is a leaky abstraction. In fact is is so leaky that we ended up not using all that much of the funkyness we put into our framework, because "special cases" kept popping up. I am pointing out that there are a lot of fundamental differences between what an RDBMS does with data and how OOP treats data. The discussion is in part a discussion about ISAM vs. RDBMS.

The number of brain cycles that need to be invested for a clean O/R mapping of a complex object model in the presence of the fundamental challenges I listed here (and that list isn't exhaustive) are not automatically less than for a plain-old data layer. It may be more. YMMV.

Now you can (and some already have) ask how all of that plays with LINQ and, in particular, DLINQ. Mind that I don't work in the LINQ team, but I think to be observing a subtle but important difference between LINQ and O/R*: 

  • O/R is object->relational mapping.
  • LINQ is relational->object mapping.

LINQ acknowledges the relational nature of the vast majority of data, while O/R attempts to deny it. LINQ speaks about entities, relations and queries and maps result-sets into the realm of objects, even cooking up classes on the fly if it needs to. It's bottom up and the data (from whatever source) is king. Objects and classes are just tooling. For O/R mapping, the database is just tooling.

Tuesday, March 14, 2006 7:17:53 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [3] - Trackback
Architecture | Technology
 Tuesday, March 07, 2006

To (O/R) map or not to map.

The monthly discussion about the benefits and dangers of O/R mapping is making rounds on one of the mailing lists that I am signed up to. One big problem in this space - from my experience of discussing this through with a lot of people over and over – is that O/R mapping is one of those things where the sheer wish for an elegant solution to the data/object schism obscures most of the rational argumentation. If an O/R mapper provides a nice programming or tooling experience, developers (and architects) are often willing to accept performance hits and a less-than-optimal tight coupling to the data model, because they are lured by the aesthetics of the abstraction.

Another argument I keep hearing is that O/R mapping yields a significant productivity boost. However, if that were the case and if using O/R mapping would shorten the average development cost in a departmental development project by – say – a quarter or more, O/R mapping would likely have taken over the world by now. It hasn't. And it's not that the idea is new. It’s been around for well more than a decade.

To me, O/R mapping is one of the unfortunate consequences of trying to apply OOP principles to anything and everything. For "distributed objects", we’re fixing that with the service orientation idea and the consequential constraints when we talk about the network edge of applications. It turns out that the many of the same principles apply to the database edge as well. The list below is just for giving you the idea. I could write a whole article about this and I wish I had the time:

  • Boundaries are explicit => Database access is explicit
  • Services avoid coupling (autonomy) => Database schema and in-process data representation are disjoint and mapped explicitly
  • Share schema not code => Query/Sproc result sets and Sproc inputs form data access schema (aliased result sets provide a degree of separation from phys. schema)

In short, I think the dream of transparent O/R mapping is the same dream that fueled the development of fully transparent distributed objects in the early days of DSOM, CORBA and (D)COM when we all thought that'd just work and were neglecting the related issues of coupling, security, bandwidth, etc.

Meanwhile, we’ve learned the hard way that even though the idea was fantastic, it was rather naïve to apply local development principles to distributed systems. The same goes for database programming. Data is the most important thing in the vast majority of applications. Every class of data items (table) surround special considerations: read-only, read/write, insert-only; update frequency, currency and replicability; access authorization; business relevance; caching strategies; etcetc. 

Proper data management is the key to great architecture. Ignoring this and abstracting data access and data management away just to have a convenient programming model is … problematic.

And in closing: Many of the proponents of O/R mapping that I run into (and that is a generalization and I am not trying to offend anyone – just an observation) are folks who don't know SQL and RDBMS technology in any reasonable depth and/or often have no interest in doing so. It may be worth exploring how tooling can better help the SQL-challenged instead of obscuring all data access deep down in some framework and make all data look like a bunch of local objects. If you have ideas, shoot. Comment section is open for business.

Tuesday, March 07, 2006 3:17:47 AM (Pacific Standard Time, UTC-08:00)  #    Comments [23] - Trackback
Architecture | SOA
 Friday, February 24, 2006

It was only my second day in out in Redmond and what happens? Doug and I got stuck in an elevator for seven minutes.

Friday, February 24, 2006 10:59:00 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2] - Trackback
MIX06
 Wednesday, February 22, 2006

The fabulous Ed Pinto has blogged about out breaking changes for the February CTP. Exhaustive list here.

Wednesday, February 22, 2006 9:55:02 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1] - Trackback
Indigo

The WinFX Runtime Components February CTP and the SDK and the VS extensions that go with them just hit the download sites. Go get it:

Wednesday, February 22, 2006 9:37:15 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Avalon | Indigo
 Monday, February 20, 2006

I just got a comment from Oran about the lack of durable messaging in WCF and the need for a respective extensibility point. Well... the thing is: Durable messaging is there; use the MSMQ bindings. One of the obvious "problems" with durable messaging that's only based on WS-ReliableMessaging is that that spec (intentionally) does not make any assertions about the behavior of the respective endpoints.

There is no rule saying: "the received message MUST be written do disk". WS-ReliableMessaging is as reliable (and unreliable in case of very long-lasting network failures or an endpoint outright crashing) and plays the same role as TCP. The mapping is actually pretty straightforward like this: WS-Addressing = IP, WS-ReliableMessaging = TCP.

So if you do durable messaging on one end and the other end doesn't do it, the sum of the gained reliability doesn't add up to anything more than it was before. MSMQ is fully in control of both ends of the wire and makes assertions about the endpoint behavior and was therefore the logical choice for our durable messaging strategy in V1, because it already ships with Windows and there is (as of yet) no agreed interoperable set of behavioral assertions for WS-RM around how endpoints must deal with received messages except ACKing them.

See Shy's comments.

Monday, February 20, 2006 12:13:32 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1] - Trackback
Indigo | MSMQ

Just read this on Robert Hurlbut's blog (via Dominick, source is Doug)

As Doug indicates, the issue here is not "we don't want to do it", but that we need to ship. 

The problem is that partial trust is incredibly hard (and very time consuming) to test for a communication platform that is supposed to have rock solid security (no paradoxon here) and shall perform well. It's just as hard to provide meaningful exceptions (and -messages) in case we'd stumble into a CAS exception. You wouldn't want us to just bubble up some aribtrary security exception, but instead will want us tell you what's causing the problem and how you could fix it. There are (give or take some) 20 base permissions in the framework, most of them allow parameterization, and the system is extensible with custom permissions as well. You can do the math for where that takes you in terms of required combinations and test cases for achieving satisfying test coverage across the whole of Indigo, let alone all the special casing in the actual product code-base.

I wonder how many applications written to support partial trust actually take that complexity into account in their test strategy (hint, hint) ;-)

That said, I will clarify once more that this doesn't mean "we will never do that". It's just not possible to fit this into our V1 schedule in a way that we and you would find the outcome acceptable. 

Monday, February 20, 2006 11:00:02 AM (Pacific Standard Time, UTC-08:00)  #    Comments [3] - Trackback
Indigo
 Wednesday, February 15, 2006

I am spending my first week at Redmond with the WCF team. All new, all interesting, and a lot to learn.

Amongst the little things I learned is that I'll be speaking at MIX06 with Doug Purdy in a joint session on REST, POX, RSS, AJAX, Web2.0, Media Convergence, and general black magic with WCF/Indigo. That'll be fun. Guess what the demo will be! Yep, right .... shhhh! don't tell anyone. 

Wednesday, February 15, 2006 5:00:37 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1] - Trackback
MIX06
 Friday, February 10, 2006

Dear MDC Attendees! It was a great pleasure to talk for and directly to so many of you and it was very interesting to learn about the very many interesting projects in which you are using our technologies.

Below you can download my scribbles from the first presentation in HTML format (they are likely “out of context” for everyone not attending the session). The PowerPoint presentations are available at http://windowscommunication.net

MDC2006IntroPresentation.zip (656.95 KB)
Friday, February 10, 2006 4:42:40 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2] - Trackback
MDC2006
 Saturday, February 04, 2006

If you have a blog and you post stuff around WCF/Indigo and you think that I don't have you in my aggregator, please post a comment below with your blog URL. And it totally doesn't matter whether you blog in English, Italian, French, Spanish, Dutch, German, Arabic, Chinese, Russian, or any other language ... I want to know.

Saturday, February 04, 2006 2:37:51 AM (Pacific Standard Time, UTC-08:00)  #    Comments [29] - Trackback
Indigo
Stuff
About the author/Disclaimer

The content of this site are my own personal opinions and do not represent my employer's view in anyway. In addition, my thoughts and opinions often change, and as a weblog is intended to provide a semi-permanent point in time snapshot you should not consider out of date posts to reflect my current thoughts and opinions.

© Copyright 2008
Clemens Vasters
Sign In
Statistics
Total Posts: 717
This Year: 11
This Month: 0
This Week: 0
Comments: 1220
Themes
Pick a theme:
All Content © 2008, Clemens Vasters
DasBlog theme 'Business' created by Christoph De Baene (delarou)