A Farewell to ORMs

2009-10-12

j I've been using ORMs for years, starting with my own hand-hacked library back in the days before there were good ORMs for Python, and more recently settling into a comfortable reliance on SQLAlchemy. Over time, though, my initially rosy feelings towards ORMs have begun to sour. I gradually realised I was spending a disproportionate amount of time trying to coax the ORM into doing my bidding - and when I succeeded, the results were often ugly, slow and needlessly opaque. Analysing the performance of some of the more complicated portions of my data access layer was often painful, and I spent cumulative hours poring over generated SQL, trying to figure out what the ORM was doing and why. Usually, improving performance involved side-stepping the ORM altogether. Recently, a particularly gnarly performance issue prompted me to ditch the ORM from a project altogether, with surprisingly pleasant results.

Impedance mismatch

Ask any programmer why they use an ORM, and the answer is likely to be "impedance mismatch". This is a lovely phrase from a rhetorical point of view - hovering at the edge of meaning, but nicely avoiding asserting anything that can actually be quantified. The usual hand-wave is that impedance mismatch arises from the tension between table-oriented relational data, and object oriented conceptual thinking. Your Bicycle class - a subclass, naturally, of Vehicle - might have to be reconstructed from data scattered across six different tables, and it's a distressing possibility that none of those tables might be called Bicycle, or indeed Vehicle. What we should aim for, the argument goes, is a programmer's Shangri-La where where we can transparently persist and restore our objects and have the storage taken care of by some magical plumbing. Whether or not the magical plumbing is worthwhile depends largely on how often the abstraction breaks down. The ORM approach does so frequently. Yes, I can use an ORM and think at the object level in the common case, but whenever I need to do anything remotely complicated - optimising a query, say - I'm back in the land of tables and foreign keys. In the end, the structure of data is something fundamental that can't be simplified or abstracted away. The ORM doesn't resolve the impedance mismatch, it just postpones it.

A lighter abstraction

So, if ORMs are at best a very partial solution to the ill-defined impedance mismatch problem, why do so many programmers swear by them? It's not that they're all fools, it's just that ORMs solve ANOTHER practical problem much more successfully. Most programmers who use ORMs do so simply to avoid re-writing endless nearly identical CRUD operations for every persistable object in their project. This isn't about any fundamental object-relational impedance mismatch - it's simply a problem of query generation. So, this brings me to my own difficult-to-quantify contribution to the miasma of fuzzy thinking that already surrounds this issue: 90% of the benefit most people derive from ORMs can be gained more simply and more transparently through unashamedly table-oriented query generation. All we need is a nice programmatic way to generate and manipulate SQL statements... Luckily we have just such a tool in the SQLAlchemy SQLAlchemy SQL expression language - a good, simple and nearly complete language for working with SQL expressions from Python.

Pursuing this line of thought, I've ditched the ORM from a few of my projects. Instead, I'm using a defter abstraction - a simple, lightweight framework that uses SQLAlchemy's SQL expression language to auto-generate most queries. This framework is unashamedly table-oriented, and exists to manipulate data at a relational level. It clocks in at less than 150 lines of code. The database schema is no longer defined by the ORM - instead, helper objects are built through schema reflection. The result has been satisfying - my data layers are better encapsulated, database interaction is more transparent, and the conceptual complexity is much reduced. Since nothing happens magically behind the scenes, it's easier to analyse performance, and since there is no session layer (few projects really need one) a whole chunk of complexity has gone away. Using reflection rather than defining the schema in code has made schema evolution much less of a chore. I also retain other benefits usually attributed to ORMs - the expression language abstracts away flavour differences between databases, so I can still, for example, run a large fraction of my unit tests against in-memory SQLite databases and deploy on PostgreSQL. I'm now gradually migrating all my projects to this way of working.