A Mission From Codd

I first used a relational DBMS in 1984 while working at Lawrence Berkeley Lab. At the time Oracle (then called Relational Software, Inc.) was a $15M company. It was about 2 minutes after sun-up on the dawn of the relational era.

I liked Ingres, the RDBMS I used at the lab, and moved on to work at the vendor (Relational Technology, Inc.) when I finished school. Soon thereafter, I discovered my first DBMS hero: Ted Codd. A mathematician at IBM, Ted had invented the relational DBMS. I loved the idea that, unlike prior generation DBMSs, the relational model was based on a rigorous mathematical foundation. As a math-guy myself, I loved the idea of databases built in compliance with the rules of mathematics. The whole idea of deriving anything in computing from mathematical first principles struck me as cool and logical.

We often joked as we proselytized relational DBMSs that we were on a mission from Codd.

In my estimation, relational would never have succeeded with Codd alone. It was Chris Date, not Codd, who popularized the idea. Date wrote numerous papers and books about relational database, varying from a standard college text (Introduction to Database Systems) to articles debunking relational myths to diatribes against the choice of SQL as the preferred relational query language. (Date hated SQL then, and still does today.)

Date was a skilled writer with a gift for argument. While Codd invented the relational model, Date sold it. He did more for the RDBMS than the marketing departments of all the vendors combined. For years, I believe that Oracle, Ingres, Informix, and others just backfilled the demand that Date created.

At Ingres, I had the pleasure of working with Michael Stonebraker, the person who I believe has done the most to improve on the relational model since its introduction. Distributed RDBMSs, gateways to non-relational systems, statistical query optimization, abstract datatypes / universal databases are just a few of the ideas that came from Stonebraker and Ingres or Postgres. For nearly two decades, Stonebraker set the technology agenda in the RDBMS market. Other Stonebraker ideas, like database time travel, have yet to be seen in commercial implementations, but I suspect they will one day.

Thinking about these folks got me to wondering, where are they today and what are they doing?

Codd passed away in 2003. His obituary on the IBM web site can be found here.

Date is still writing, but based on this interview about his latest book I worry that he has become an anachronism. He appears to argue that most problems with relational databases are due to impure implementations of the pure relational model and that if we went back, dumped SQL, and made real relational databases then the world would be a better place. I’ve not yet read his new book but based on the interview I’m pretty sure what I’ll find. If I’m surprised, I make a posting to revise my opinion.

Overall, I think Date missed the point. Relational wasn’t successful because of its mathematical foundation. It was successful because of queries. Prior-generation DBMSs required you to know what queries you were going to run in advance and design the database accordingly. Once designed, if you had an unanticipated query, you were either SOL or had to re-design, dump, and reload your databases.

This is also why BI has been so successful. Users don’t want databases. They want answers to questions. As Harvard marketing guru Theodore Levitt always said: “purchasing agents buy quarter-inch holes, not quarter-inch bits.”

By the way, this is a key reason why I believe Mark Logic is being successful. Search engines are one-trick ponies. They know how to run one query against content: return a link to all documents containing [word | phrase | Boolean]. That’s it. Most extensions beyond this are hacks. Mark Logic delivers full database queries against content. I believe that once more people understand this ability and see how it can be applied that a whole new generation of content applications will be created.

After the Ingres train wreck, Stonebraker had commercial success with Illustra (sold to Informix) and then Cohera (sold to PeopleSoft). Now, he is into stream processing and CTO of a company called Streambase. He wrote an otherwise-excellent retrospective a while back with Joe Hellerstein of Berkeley that suffered from one key flaw. All content applications are dismissed in a single paragraph that discusses semi-structured data where they equate virtually all our customers’ applications that manage scientific journal articles, text books, websites, flight manuals, project reports, government notices, etc. as “semi-structured [stuff] like want ads and resumes.” Such is the blind spot of those too focused on data. They can’t see content. Or when they do see it, they say there isn’t very much of it anyway.

Of the generation of my original DBMS heroes, one person seems to have changed with the times: IBM fellow Don Chamberlin.

While Date still grumbles about relational impurities and Stonebraker has moved onto streams, Chamberlain was a major force in shaping XQuery. He strikes me as the only one of the original relational apostles who evolved and used lessons learned from the prior model (e.g., SQL) to improve the new one (e.g., XQuery). For example, click here for an article he wrote on design influences on XQuery. Or here for his introduction to XQuery.

Here’s to evolution. And here’s to Don.

A Mission From Codd

Read more

Please Don't Smooth the Metrics

The Kellblog Companion and Thoughts on Derivative Works

What Mr. Jambo and Levi's Can Teach Us About Listening to Customers

Book Review: The Curious Case of Mike Lynch by Katie Prescott