Adam Smith’s invisible hand never rests. Just five years ago, the database market looked like a static, three-player $10B/year oligopoly where the primary forces were inertia and profit-taking. Today, we have two major forces disrupting the comfortable stasis that has developed over the past 30 years.
- One force is DBMS specialization: while the general-purpose RDBMS is useful for a broad range of applications, it is optimal for few of them. The RDBMS has slowly become expensive bloatware that is functionally a jack of all trades, master of none. MIT’s Michael Stonebraker calls the RDBMS a one size fits all solution.
- The other force is NoSQL, an organic and rapidly-growing industry movement away from relational databases, driven by a number of factors including both technology and cost.
The purpose of this post is to share my thoughts on NoSQL. Make no mistake, like the Tea Party Movement, NoSQL is a rebellion; just look at the name. But like most demonstrations, not everyone is marching for the same reasons. Here are some of the things I think various members of the NoSQL crowd are marching against:
- Table-oriented, 1960s-era database technology: RDBMSs were designed for handling data and short-text fields, necessitate mapping programmatic objects to tables (i.e., the impedance mismatch), and require the use of an increasingly stone-age query language, SQL.
- Scalability: relational databases were not designed to handle and do not generally cope well with Internet-scale, “big data” applications. Most of the big Internet companies (e.g., Google, Yahoo, Facebook) do not rely on RDBMS technology for this reason.
- High prices and the heavy-handed treatment of customers: both stem from the underlying oligopoly and the lack of credible alternative suppliers
- Closed source: the inability to customize the internals of the DBMS engine to meet specific needs
- Bloatware: ironically that while RDBMSs are perceived as light in requirements that matter (e.g., scalability), they are also seen as over-engineered for features that don’t. (ACID transactions are a favorite target in this department.)
- DBA supremacy. For years, corporate DBAs called the shots on where strategic data assets would be stored, and thus how they would be accessed. This created headaches for the programmers of the world who, in response, have done as much as possible to abstract away the database (e.g., Ruby on Rails).
On the flip side, there are things the NoSQL crowd are fighting for:
- Open source, implying control. The ability that open source software provides to customize product functionality.
- Open source, implying free. The often-flawed notion that the absence of software license fees results in a reduced lifetime cost of ownership.
- Coolness, or the “I want to be like Google” effect. If Google’s got BigTable, Yahoo’s got Hadoop, and Facebook’s got Cassandra, then we should build our own, too. Our app is hard; we’re smart guys, too.
- Vengeance, or the “I’m so mad at Oracle that I’ll do anything” effect. Yes, some folks are just plain mad enough at Oracle to either go write their own DBMS, or take on the support of a very low-level infrastructure technology.
So, if you’re considering a NoSQL solution — a class in which I include MarkLogic — you need to figure out what you’re marching against, what you’re fighting for, and ultimately what will meet your needs at the lowest total cost of ownership.
My first recommendation to detect and, where applicable, kill off the coolness effect. Google is swimming in money and PhDs. They can build anything they want regardless of whether they should and, right or wrong, for Google it just doesn’t matter. So unless you have Google’s business model and talent pool, you probably shouldn’t copy their development tendencies.
Heck, I get the coolness attraction. I think infrastructure software is cool, too. That’s why I was an OS geek early on and have spent my career around databases. But I surely don’t think that F1000 companies and government agencies should build their own DBMSs, nor fall into the trap of thinking that open source low-level stores are a free and easy way to avoid Oracle license fees. Cool shouldn’t be in the equation. Technology suitability and total cost should be. Period.
My second recommendation is to orthogonalize the open source question, making it independent of functional requirements. (This breaks if source customization is a requirement, but remember that requirement is often fictional: most open source users don’t customize.) If you’re struggling with an RDBMS on a given application problem you shouldn’t say: we need an open source, NoSQL type thing. You should say: we need to look at relational database alternatives. Those alternatives include a open source database projects (e.g., MongoDB, CouchDB) and distributed computing frameworks (e.g., Hadoop), but they also include commercial software offerings such as specialized DBMSs like Streambase (for real-time streams), Aster (for analytics on big data), and MarkLogic (for semi-structured data). Don’t throw out the commercial-software-benefits baby with the RDBMS bathwater.
My personal take on this issue is that:
- Relational databases, like the mainframe in 1985, are entering the Autumn of their lives. They won’t die quickly and mainframe isn’t dead today, but their best days are behind them.
- Our kids will see SQL the way we see COBOL. Some people can’t stand when I say this, but I think they’re in denial. There is no logical reason to assume that the relational database and the SQL language are the endpoints in database evolution. Yes, Larry Ellison is powerful. But Adam Smith is more so.
- Our kids will see no data/document dichotomy. They will just see digital information. We need to understand and remember that the data/document dichotomy is an artifact of the limitations of the tools and technologies with which we grew up.
- Some of the NoSQL hype is an over-reaction to the database oligopoly. I believe there are organizations out there who should be using alternative commercial databases, but instead are using open source NoSQL-type projects due to coolness, anger, or a mistaken belief that open source always has a lower total cost of ownership. I believe rationality will return to these people. One day management will say: “Holy cow! Why in the world are we paying programmers to write and support software at this low a level?” (This is potentially avoidable if you can mentally project yourself into the future now and imagine how you will look back at the coming three years.)
- Some of the NoSQL hype is a valid reaction to the technological limits of relational databases and the impedance mismatch in programming on them.
In the end, I think it’s great that the NoSQL movement is happening. It’s awakening people to traditional RDBMS alternatives. It’s making people understand that they don’t have to write big checks for commodity software. It’s helping people solve problems that they can’t solve, or solve efficiently, on relational technology.
My axe to grind is simple: just because you’re throwing out Oracle, don’t throw out all DBMSs and all commercial software with it. Take a breath. Look at all your alternatives. Study total costs and technology applicability. And make your best decision.
Interesting Writings on NoSQL
- Wikipedia NoSQL entry
- The NoSQL Discussion Has Nothing to do with SQL by Michael Stonebraker
- The Legit Part of the NoSQL Idea by Curt Monash
- The MyNoSQL Blog by Alex Popescu
- Jason Hunter’s presentation on MarkLogic Server to NoSQL Oakland
- Announcing the Release of HadoopDB by Daniel Abadi
- Seeking a Database that Doesn’t Suck on Ambient Irony
- Twitter Growth Prompts Switch from MySQL to NoSQL Database by Eric Lai (Computerworld)
- No to SQL: Anti-Database Movement Gains Steam by Eric Lai