From: www.itworld.com
November 29, 2006 —
Sometimes, typos can be useful. Earlier today I wrote "rush of codd to the hand" in an e-mail when I should have written "rush of code to the hand". The e-mail concerned a not-very-pretty system of my acquaintance (I wrote it), in which the programmer (moi) had starting coding way too early in the development cycle out of sheer youthful enthusiasm.
In my experience, a rush of code to the hand is a very common problem in software development. A problem that is only adequately tackled through the application of large amounts of experience. Looking back now on the programmer I was then (we are talking 1992-93 time frame here), I simply was not experienced enough to make the right call. You live, you make mistakes, you learn, you move on.
Speaking of moving on, it is about time that I re-vectored this article to its proper subject matter which, is, believe it or not, relational databases. The "codd" in the serendipitous phrase "rush of codd to the hand" is Edgar F. Codd[1]. Codd was a British computer scientist who is best remembered as one of the founding fathers of the science of relational databases. Most students of relational databases will, sooner or later, come across his name; most likely cloaked in acronym form such as BCNF. BCNF is short for Boyce Codd Normal Form[2]. The acronym BCNF pops up a lot in database design and in particular, in an important soul cleansing endeavor known as database normalization[3].
In my experience, sheer enthusiasm can lead designers to introduce relational database into their designs way too early in the development cycle. The thought process seems to go something like this:
"We need code for the algorithms. Ok, let's use Java/C#/Python/Php (whatever). We have data that the algorithms will work on. We need a database. Ok, let's use Oracle/SQL Server/MySQL/Postgres (whatever)."
The key word here is the word "data". There are many forms of data that fit the relational database model like a glove. These forms of data are extremely commonplace. Things like customer details, line items of invoices, product inventories etc. etc. It is no wonder that relational databases are as popular as they are.
However, the problem starts when the word "data" is used as a catch-all for every type of data there is. Take documents for example. Documents are clearly data. Does it follow that they fit the relational database model like a glove? Not at all. In fact, the opposite tends to be the case. And yet, all over the map, in my travels through the software development world, I see documents bludgeoned into databases. One system of my acquaintance has to manage 120 small HTML documents in a hierarchy. The developer spent a significant amount of time - starting on day one of the project - figuring out how to represent the hierarchy he needed inside a set of relational database tables. Each record then contained a CLOB field[4] into which the HTML was placed. As part of the system design, the developer then had to ensure that he coded all the necessary CRUD functions[5] so that mere mortals could create new content, edit existing content, delete existing content and so on.
These days, I tend to store documents in a file system - at least until the volume gets into the many tens of thousands of individual documents. I use plain vanilla folders to represent hierarchy. I use plain vanilla naming conventions on filenames to provide simple "views" over the data. Where necessary, I use a search engine of some form to provide full text search. I use plain vanilla file system utilities for the CRUD functions and so on.
To some developers I meet on my travels, this feels wrong. "Surely", the thought goes, "in order to be properly managed, the content needs to be in a database?" Not so. It is absolutely true in many cases but not in all cases. Some of the best managed content I have ever seen sits on a Unix file system and some of the worst managed content I have ever seen sits in a big honking, breathtakingly expensive, relational database.
Data management is as much a people/process thing as it is a technology. It is hard to keep that in mind when the marketing materials for relational databases keep piling up on your desk. Especially since no such marketing material for the power of your file system (a system that comes for free with your operating system) ever piles up on your desk.
Remember that managing data through a simple file system does not make you a bad person. Beware of a rush of Codd to the hand. It is as dangerous as a rush of code to the hand but not as easily detected.
[1] http://en.wikipedia.org/wiki/Edgar_F._Codd
[2] http://en.wikipedia.org/wiki/Boyce-Codd_normal_form
[3] http://en.wikipedia.org/wiki/Database_normalization
[4] http://en.wikipedia.org/wiki/CLOB
[5] http://en.wikipedia.org/wiki/CRUD_%28acronym%29
 
ITworld.com