The decline and fall of the relational database

By Sean McGrath  5 comments

What have the Roman empire and the relational database got in common? Not too much at the moment I would suggest, but in a few short years I think we will be seeing an interesting similarity in their life histories. Pretend we are in the next decade somewhere ... Here is how it might look in Wikipedia if I'm right. "Like the Roman Empire, the Relational Database grew gradually to be an extremely dominant force. Like the Roman Empire, it suffered gradual collapse, the causes of which, are debated by historians to this day ... "

Of course, from the current vantage point of 2009 -- still some years away, I suspect, from the collapse of the relational database hegemony --  it is difficult to predict how history will be written. Will the relational database have the equivalent of the Romulus Augustulus abdication moment? Will some historians attribute some modern day IT equivalent of lead poising to its demise? Will I prove to be utterly wrong in the grand tradition of technology prognosticators?

I am predicting that the fall of the relational database will indeed happen and furthermore, I am predicting that history will not point to any one single event as the trigger. Rather, I suspect that a combination of forces, a variety of ideological movements and technological developments, acting contemporaneously, will create a perfect storm for the relational database. I see seven main forces at work. I have given each force a somewhat whimsical name in what follows. Here is a list of all seven with a brief explanation of each. The rest of the article goes into a little more detail on each one.

  • The hierarchicalists: Promote the benefits of hierarchical information models such as XML and the emergence of viable mechanisms for querying and processing large corpora of such information. The poster boy here is the XQuery language and to a lesser extent Microsoft's Xlinq.
  • The chaoticians: Promote the benefits of ex-post-facto information structuring. i.e. rather than worry about finding a perfect structured model for information, chaoticians keep it all loose – maybe as a set of office documents or html pages or spreadsheets. Then use tools to retro-fit or reverse engineer structure on top of the purposely loose, chaotic corpus. The poster boy of this movement is, of course, the Google search engine which manages to achieve a  high degree of findability from a low-level of explicit information structure. In fact, Google makes a public point of not using the metadata capabilities of HTML. Capabilities that could be used to add structure on top of a corpus of HTML. Also in the chaotician camp are products such as Microsoft Sharepoint, Lotus Notes and Autonomy which purposely blur the distinction between structured and un-structured information and support findability of both within their information retrieval features.
  • The steganographers: Promote the benefits of sprinkling structure information inside largely unstructured information and then using parsing software to dig it out and synthesize it, thus creating a graph of structure on top of a set of unstructured information artifacts. Microformats and RDFa fit into this movement. The concept of a mashup and the "linked data" initiatives are emerging as the poster children here. The steganographers and the chaoticians have much in common but steganographers are more inclined to try to work out in advance, what the data model should look like.
  • The democriticians: Promote the benefits of decomposing information into triples and treating all higher levels of structure as derived from this atomic level. Web 3.0, OWL and the semantic web are all in this category.
  • The parallelizers: Promote the benefits of simple key/value information models and like to point out how enormous compute power and storage can be brought to bear cheaply to derive higher order information from simply key/value models. Google's MapReduce is the poster child here.
  • The agilists: Promote the benefits of iterative development of information models. Rather than the classic relational approach of building your data model and then building the applications around it, agilists often hold the view that the model must be as fluid as the applications built around it. The relational model, as implemented in many database management systems, has many positive attributes, but fluidity is not one of them.
  • The temporalists: Point to the weakness of the relational model when it comes to one of the most frequent concerns in IT systems. Namely, how information changes over time. Although the time dimension can be factored into relational systems, it is not something that the model itself promotes. In fact, it can be argued that relational data normalization is antithetical to the common requirement of capturing "point in time" views of a business process or a corpus of content.

The rise of the hierarchicalists

XML, like SGML before it, takes the view that much information is naturally hierarchical in form. SGML never really caught on outside of some niche areas but XML – its successor – is slowly but surely carving out a following in the mainsteam database management arena. For all its faults, the W3C XML Schema Language (XSD) is one reason for this but I think it is XQuery that is really responsible for the shift to center stage. Not many technologists under the age of, say, forty will remember but believe it or not, there was life in the field of hierarchical database management long before SGML and XML came along in the form of IMS. It can be argued that technology and adoption are really just catching up with an idea that is now over forty years old.

The intriguing thing about XQuery is that it sets out not so much to kill the relational database but more to extend it. It does this in the time-honored way of treating the enemy as a mere special case of a more powerful data modelling abstraction. I.e. to the hierarchicalists, relational tables are merely very regular, shallow and non-recursive hierarchies.

The rise of the democriticians

Democritus commonly gets credit for being the first person to speculate that matter is really all made up of super small indivisible particles: atoms. In information theory, it is common to think of a triple consisting of a subject, a predicate and an object as being an atom of information. Democriticians hold that the best way to create data models is to start at this triple level and build everything up from there. The idea has a long, long history. It can be argued that Prolog explored this approach in the Seventies. It can also be argued that the CODASYL model popular in the days of COBOL -- with its network approach to data modelling -- also covered this territory. Indeed, the philosopher C.S. Pierce was arguably drawing RDF diagrams with a quill pen back in 1885.

The rise of the parallelizers

It is pretty evident I think that we are entering the age of the parallel -- whether we like it or not. It seems that Moore's law is slowing down. Individual chips are not getting faster at the rate we have become used to. Instead, there are more and more of them crammed into each chip. The term "CPU" is becoming increasingly inaccurate. The Von Neumann architecture that has served so long as the fundamental abstraction no longer fits the facts. The facts increasingly consist of umpteen virtualized machines, bottomless pits of storage and an increasingly "on demand" approach to compute resource allocation.

It is true that some developers creating cloud computing platforms, or utilizing the cloud ecosystems being created by companies such as IBM, Microsoft, Amazon and Google, start by firing up a relational database but a goodly number are jumping straight for designs based on MapReduce or Hadoop or CouchDB to name but three.

The rise of the chaoticians

Over the last decade, we have seen an increasingly jaundiced eye being turned toward what I would call the library sciences. Foundational concepts such as controlled vocabularies, taxonomies, data types, part/whole relationships, relational algebras etc. all require data modelling work up front. Often, they require a lot of work up front. The theory goes that the time spent up front on the information design will allow the rest of the project to proceed waterfall-style and will lead to systems that are optimal in terms of performance and accuracy.

There are a number of problems with this theory, say the chaoticians. Firstly, you never know up front what the information model will need to be. Rather, you discover it as you go along (a world view that resonates with the agilists also). Secondly, it is no longer such a big deal to have an optimal model in terms of storage or performance. Who cares if a more chaotic model entails some extra processing or eats some more storage or creates some "false positives" in the results? An abundance of cheap processing power and storage density deals with the former and human nature deals with the latter. Look at a set of results from a search engine. Some are false positives. In fact, the majority of them may be false positives. The search "hits" are statistical in nature or, put another way, wrong. The chaoticians argue that it often makes more sense to deal with the statistical nature of the results than slave for years to find the perfect, normalized, relational model.

The chaotcians often point to the memo fields and blob storage layers in relational databases as evidence to support their cause. Over time, it is not unusual for memo fields to end up as repositories of very rich information in a relational database. Once out of the rigorously controlled, rigorously data-typed table/field structure, it is "in the database" but it effectively bypasses the data model. "If a lot of the good stuff is going to end up in memo fields", say the chaoticians, "why bother building an elaborate table/field structure that will atrophy over time anyway?"

The rise of the steganographers

This movement overlaps to some extent with the democriticians and the chaoticians. Steganographers point out that "structured" is really a subset of "un-structured" when it comes to information. If there are a few identifiable integer, date and dollar fields to be found in all the invoices, curriculum vitae, recipes, product descriptions etc., why not simply smuggle them inside the word processor or html pages that hold all the rest of the information? To the steganographers, much of the world's information is semi-structured at best. Their position is that it is better to start with an open-ended design in which anything goes i.e. the use of very loose data models such as text fields, word processor files, spreadsheets etc., and layer on whatever islands of structure you can. Aiding their cause is the emergence of tools and techniques to effectively index large corpora of semi-structured text. Many search engines support the creation of "fields" that can be embedded into otherwise unstructured documents and these are indexed and queried in a very analogous way to how relational databases function. The primary difference say the steganographers, is that the messy, irregular real-world documents remain the real deal and the indexing sub-system is simply a finding aid - not the repository per se.

The rise of the agilists

Agilists prize one thing above all else and that is the speed with which an IT system can change shape over time. Back in the Eighties, the world experimented with 4GLs, many of which had the notion of evolving a data model hand in glove with the applications built on top of it. More recently web-oriented database application development systems like Django and Ruby On Rails have done much to promote the idea that the application level data structures are really primary and that the relational data model "falls out" as just one possible way to represent the application model at the storage layer.

The intriguing thing about this is that the data modelling language is not predicated on a relational storage model. It just so happens that the first back-ends for these frameworks have been relational. The very fact that both frameworks speak of relational databases as one possible "back end" speaks volumes for what is going to happen in this space. Namely, we will see more and more back-ends for these frameworks that are not relational at all. 

The rise of the temporalists

Many, many real-world systems have to fight the realities of time's arrow. How many systems do you know that have to store data that changes through time? Or report on how data has changed over time? Or allow modification to themselves over time? A sizable subset I suspect. And yet, the concept of time is not primary in the relational model. Of course, it is possible to model time in a relational database and implement a layer on top that adds the time dimension but it is not the relational database's strong point. Indeed, the concept of data normalization and the removal of duplication in general, has a nasty habit of making point-in-time reporting very problematic indeed. Consider the classic example of normalizing a design that contains customer information. You want to store a single copy of the customer's contact information - or so the standard wisdom holds. But what if you need to find out who used to be the contact before the current person took over? In many classically designed relational information models your only recourse is to backups or historical reports. In this day and age, when storage is effectively free and technology has developed to the point where storing information deltas between time points can be done very efficiently, does it really make sense to throw any historical information away? Does it make sense to have to manually account for time's arrow in every data model?

One of the reasons why this dimension of data modeling is under scrutiny is that software developers are increasingly used to the highly time-oriented data management approaches used in source code control systems such as Subversion, Git, Mercurial, Darcs, Microsoft Visual Source Safe and Perforce. Managing a complex corpus of source code has much in common with managing a complex corpus of product data, manufacturing data, personnel data ... the similarities are not being lost on developers who are becoming increasingly used to being able to mix and match structured and un-structured information and manage it all under a system that makes the time dimension easy to access and exploit.

Tentative conclusions

I do not think that any one of the above camps can deliver a killer blow to the pre-eminence of the relational database but taken together, I think they have enough momentum to topple the giant. For years I thought that the relational database was unassailable. After all, the last time a challenger entered the fray -- the object database that accompanied the object oriented analysis and design revolution -- it was summarily dismissed. This time it is different, the enemy is diverse and attacking from all sides. People are revisiting the writings of the early heretics. Terms like NOSQL are being coined. The term "schema free" is accruing acceptability. Open source projects in this space are appearing at a rate of knots: mongodb, cassandra, CouchDB to name but three. The phrase 'non-relational persistence' produces quite a few hits in search engines nowadays. Also, I am increasingly detecting the use of the word "legacy" in connection with relational databases!

As Ghandi said, "first they ignore you, then they laugh at you, then they fight you, then you win". I think we are now in stage 3 of that progression.

Pass the popcorn.

5 comments

    Sean McGrath
    Sean McGrath 2 years ago
    Hello all,Some might find these presentations interestinghttp://blog.oskarsson.nu/2009/06/nosql-debrief.htmlregards,Sean
    Anonymous 2 years ago
    The hierarchicalists, The chaoticians, The steganographers - you left out the wafflers. Working in I.T security means I have seen this scare mongering before... the sky is falling - the world will end... both these statements are true - The servers are unstable -, because I have not given them a timeline when they become untrue - I simply need to sit back and wait and wait and wait... eventually the sky will fall... - the server will crash - and eventually the world will end. Unfortunately no one will be down the pub with me to help me bask in the glory of my amazing prediction.Personally I think its a way of validating positions in IT - coming up with theory's and witch craft... I predict fluid hardrives that will operate at minus 20 degrees and formatting will consist of leaving them out in the sun. As long as we are making predictions and leaving out the time factor - I challenge anyone to prove me wrong.Imagine the savings in the hard disk, simply scrape the surface of the deoxygenated water for a 1 or polish it for a zero... magic. Well not magic - science.
    Anonymous 2 years ago
    You predict the demise of relational databases, but only show how alternatives are better applied in some circumstances. I could have told you that years ago.The relational model is the best available IN SOME APPLICATIONS!I have built databases under a number of models, each dependent on the suitability of the model for the application. I have also been required to force data and applications into grossly inappropriate database models.Discovering that relational databases are not a silver bullet does not equate to their demise. It only relegates them, properly, to the tool chest where they will be retrieved when appropriate.
    Anonymous 2 years ago
    It wasn't just object databases, it is also 'XML databases' that failed, and the failings of hierarchical databases were made clear with the failure of CODASYL. It is no accident that all of hierarchical, 'key-value', graphs and time series data can be managed relationally.

      Add a comment

      Post a comment using one of these accounts
      Or join now
      At least 6 characters

      Note: Comment will appear soon after you have activated your account.
      Obscene/spam comments will be removed and accounts suspended.
      The information you submit is subject to our Privacy Policy and Terms of Service.

      ITworld LIVE

      StorageWhite Papers & Webcasts

      White Paper

      AppAssure vs Acronis

      In this study of data protection for environments with virtual and physical servers running Windows, openBench Labs tested AppAssure Backup and Replication software v 4.7 and Acronis Backup & Recovery 11. Both solutions utilize block-based technology to unify data protection operations.

      White Paper

      Guaranteeing 100% Backup Recovery

      The single biggest challenge for IT personnel involved in the data protection process is making sure that their backups are recoverable every time. Management and users won't remember the ninety-nine successful recoveries but they will always remember the one failure.

      White Paper

      ESG Analyst White Paper - VMware's vSphere Storage Appliance: High Availability for Small IT Operations

      Learn how small and midsized businesses are increasingly adopting virtualisation to deliver consolidation, improve data back up and disaster recovery and increase security with an in-depth new paper from the Enterprise Strategy Group (ESG). Learn directly from your peer's experiences and see why VMware's solutions are perfect for the growing and ambitious business.

      Webcast On Demand

      Understand Your Data: The Future of Backup and Archiving

      Archiving and Backup are the foundation of the next generation of information governance. However, commodity data protection tools and basic archives are only good for storing data. In the changing IT landscape, understanding what you are keeping, when to delete, and delivering insight to the business from your data is the future of these systems. Join us to hear the impact of private and public cloud solutions, "big data" and your choices while market evolves.

      Sponsor: Autonomy

      White Paper

      NetVault: #1 in the 2011 Oracle Backup Solutions Buyer's Guide

      Want to know how NetVault Backup compared against other Oracle backup software solutions - and why it's DCIG's #1 choice? In this 37-page report you'll get unbiased, third-party evaluations of Oracle backup software - and why NetVault Backup sits on the top of the list. Download your copy today.

      See more White Papers | Webcasts

      Ask a question

      Ask a Question