As frequent readers know, one of my memes is the rise of special-purpose databases, whether they be data warehouse appliances like Netezza, stream databases like Streambase, or OLAP (aka multi-dimensional) databases like Essbase, recently purchased by Oracle through the Hyperion Acquisition.
I believe that MarkLogic is one of a class of special-purpose DBMSs that will be necessary to handle new requirements that were never envisioned when the RDBMS was born. The relational database is now pushing 40 years old since its invention (and pushing 30 since the first implementations in commercial products).
An easy way of seeing the problem is to think about the computers you used even 20 years ago, their disk and memory configuration, their network connection speed, the types of data they managed, and the applications they ran. For me, that would be a 1 MIPS MicroVAX II with 8MB of memory, 256 MB of disk space, 40 users (among other things I was the sysadmin), and we used it to run a technical support call tracking system at Ingres, then known as Relational Technology, Inc.
While RDBMSs have proven remarkably extensible, for certain classes of applications (e.g., ultra-low latency trading) and databases (e.g., managing tens to hundreds of terabytes of XML documents), they are simply not appropriate.
As it turns out, I’m not the only person who sees this problem. Michael Stonebraker, noted computer science professor (formerly of UC Berkeley and now of MIT), serial entrepreneur (a founder of Ingres, Illustra, Cohera, Streambase, and Vertica), and general database visionary, thinks the same thing.
Towards that end, he co-authored of two papers:
- One Size Fits All: An Idea Whose Time Has Come and Gone. This paper makes the argument that the relational database cannot be extended ad infinitum, demonstrates how RDBMSs are inappropriate for several new applications, and argues that the DBMS market will fragment into a series of special-purpose engines, perhaps unified by a common front-end parser.
- One Size Fits All: Part 2, Benchmarking Results. This paper buttresses the first with benchmark results for relational vs. special-purpose databases in several applications. Interestingly and pragmatically, Stonebraker argues that most people won’t even consider a special-purpose database (largely due to inertia) unless it is at least 10x faster than relational for a given application. He then demonstrates several applications where you can see 10 – 100x gains in performance. (Large text and XML contentbases are one the cases he discusses, citing Google’s creation of their own file system and software stack to deal with Internet-scale documentbases.)
I have always found Stonebraker’s work very clear; he’s one of the few authors of academic computer science literature whose work I can always read and understand. Take a look at the articles.
If you’re not up for the papers, then here’s an interview in Red Hat Magazine that hits many of the key points. (But bear in mind he’s doing PR for Vertica here, so the examples are a bit biased towards column-orientation, and I’m sure the webinar mentioned at the bottom is a Vertica one.)