The Database Tea Party: The NoSQL Movement

Adam Smith’s invisible hand never rests.  Just five years ago, the database market looked like a static, three-player $10B/year oligopoly where the primary forces were inertia and profit-taking.  Today, we have two major forces disrupting the comfortable stasis that has developed over the past 30 years.

  • One force is DBMS specialization:  while the general-purpose RDBMS is useful for a broad range of applications, it is optimal for few of them.  The RDBMS has slowly become expensive bloatware that is functionally a jack of all trades, master of none.  MIT’s Michael Stonebraker calls the RDBMS a one size fits all solution.
  • The other force is NoSQL, an organic and rapidly-growing industry movement away from relational databases, driven by a number of factors including both technology and cost.

The purpose of this post is to share my thoughts on NoSQL.  Make no mistake, like the Tea Party Movement, NoSQL is a rebellion; just look at the name.  But like most demonstrations, not everyone is marching for the same reasons.  Here are some of the things I think various members of the NoSQL crowd are marching against:

  • Table-oriented, 1960s-era database technology:  RDBMSs were designed for handling data and short-text fields, necessitate mapping programmatic objects to tables (i.e., the impedance mismatch), and require the use of an increasingly stone-age query language, SQL.
  • Scalability:  relational databases were not designed to handle and do not generally cope well with Internet-scale, “big data” applications.  Most of the big Internet companies (e.g., Google, Yahoo, Facebook) do not rely on RDBMS technology for this reason.
  • High prices and the heavy-handed treatment of customers:  both stem from the underlying oligopoly and the lack of credible alternative suppliers
  • Closed source:  the inability to customize the internals of the DBMS engine to meet specific needs
  • Bloatware:  ironically that while RDBMSs are perceived as light in requirements that matter (e.g., scalability), they are  also seen as over-engineered for features that don’t.  (ACID transactions are a favorite target in this department.)
  • DBA supremacy.  For years, corporate DBAs called the shots on where strategic data assets would be stored, and thus how they would be accessed.  This created headaches for the programmers of the world who, in response, have done as much as possible to abstract away the database (e.g., Ruby on Rails).

On the flip side, there are things the NoSQL crowd are fighting for:

  • Open source, implying control.  The ability that open source software provides to customize product functionality.
  • Open source, implying free.  The often-flawed notion that the absence of software license fees results in a reduced lifetime cost of ownership.
  • Coolness, or the “I want to be like Google” effect.  If Google’s got BigTable,  Yahoo’s got Hadoop, and Facebook’s got Cassandra, then we should build our own, too.  Our app is hard; we’re smart guys, too.
  • Vengeance, or the “I’m so mad at Oracle that I’ll do anything” effect.  Yes, some folks are just plain mad enough at Oracle to either go write their own DBMS, or take on the support of a very low-level infrastructure technology.

So, if you’re considering a NoSQL solution — a class in which I include MarkLogic — you need to figure out what you’re marching against, what you’re fighting for, and ultimately what will meet your needs at the lowest total cost of ownership.

My first recommendation to detect and, where applicable, kill off the coolness effect.  Google is swimming in money and PhDs.  They can build anything they want regardless of whether they should and, right or wrong,  for Google it just doesn’t matter.  So unless you have Google’s business model and talent pool, you probably shouldn’t copy their development tendencies.

Heck, I get the coolness attraction.  I think infrastructure software is cool, too.  That’s why I was an OS geek early on and have spent my career around databases.  But I surely don’t think that F1000 companies and government agencies should build their own DBMSs, nor fall into the trap of thinking that open source low-level stores are a free and easy way to avoid Oracle license fees.  Cool shouldn’t be in the equation.  Technology suitability and total cost should be.  Period.

My second recommendation is to orthogonalize the open source question, making it independent of functional requirements.  (This breaks if source customization is a requirement, but remember that requirement is often fictional:  most open source users don’t customize.)  If you’re struggling with an RDBMS on a given application problem you shouldn’t say:  we need an open source, NoSQL type thing.  You should say:  we need to look at relational database alternatives.  Those alternatives include a open source database projects (e.g., MongoDB, CouchDB) and distributed computing frameworks (e.g., Hadoop), but they also include commercial software offerings such as specialized DBMSs like Streambase (for real-time streams), Aster (for analytics on big data), and MarkLogic (for semi-structured data).  Don’t throw out the commercial-software-benefits baby with the RDBMS bathwater.

My personal take on this issue is that:

  • Relational databases, like the mainframe in 1985,  are entering the Autumn of their lives.  They won’t die quickly and mainframe isn’t dead today, but their best days are behind them.
  • Our kids will see SQL the way we see COBOL.  Some people can’t stand when I say this, but I think they’re in denial.  There is no logical reason to assume that the relational database and the SQL language are the endpoints in database evolution.  Yes, Larry Ellison is powerful.  But Adam Smith is more so.
  • Our kids will see no data/document dichotomy.  They will just see digital information.  We need to understand and remember that the data/document dichotomy is an artifact of the limitations of the tools and technologies with which we grew up.
  • Some of the NoSQL hype is an over-reaction to the database oligopoly.  I believe there are organizations out there who should be using alternative commercial databases, but instead are using open source NoSQL-type projects due to coolness, anger, or a mistaken belief that open source always has a lower total cost of ownership.  I believe rationality will return to these people.  One day management will say:  “Holy cow!  Why in the world are we paying programmers to write and support software at this low a level?”  (This is potentially avoidable if you can mentally project yourself into the future now and imagine how you will look back at the coming three years.)
  • Some of the NoSQL hype is a valid reaction to the technological limits of relational databases and the impedance mismatch in programming on them.

In the end, I think it’s great that the NoSQL movement is happening.  It’s awakening people to traditional RDBMS alternatives.  It’s making people understand that they don’t have to write big checks for commodity software.  It’s helping people solve problems that they can’t solve, or solve efficiently, on relational technology.

My axe to grind is simple:  just because you’re throwing out Oracle, don’t throw out all DBMSs and all commercial software with it.  Take a breath.  Look at all your alternatives.  Study total costs and technology applicability.  And make your best decision.

Interesting Writings on NoSQL

48 responses to “The Database Tea Party: The NoSQL Movement

  1. Pingback: The NoSQL Movement: The Object – RDMBS Incompatability : HalWebGuy. Online Media Geek.

  2. A very interesting write-up with one little oversight: you’re wrong.

    I am part of a large program to write a NoSQL database for military applications. It’s not a backlash against paying Oracle (the DoD has a blanket license for Oracle installations) or a philosophical stance by the hippies in the defense arena; it’s the fact that RDBMSs are built in a different space in the CAP trades (see http://www.julianbrowne.com/article/viewer/brewers-cap-theorem).

    Google, Amazon, Facebook, and DARPA all recognized that when you scale systems large enough, you can never put enough iron in one place to get the job done (and you wouldn’t want to, to prevent a single point of failure). Once you accept that you have a distributed system, you need to give up consistency or availability, which the fundamental transactionality of traditional RDBMSs cannot abide. Based on the realization that something fundamentally different needed to be built, a lot of Very Smart People tackled the problem in a variety of different ways, making different trades along the way. Eventually, we all started getting together and trading ideas, and we realized that we needed some moniker to call all of these different databases that were not the traditional relational databases. The NoSQL name was coined more along the lines of “anything outside of the SQL part of the Venn diagram” rather than “opposed to SQL”.

    So – the NoSQL databases are a pragmatic response to growing scale of databases and the falling prices of commodity hardware. It’s not a noble counterculture movement (although it does attract the sort that have a great deal of mental flexibility), it’s just a way to get business done cheaper.

    • Gregor,

      Thanks for your opinion. I am happy to hear the story of why you and your project were drawn to a NoSQL solution. And thanks for your reference to Brewer’s CAP “theorem” though I think this reference is a little more straightforward (and less colorful) than the one you provided. For those interested, here’s the original talk.

      Based on what you commented, I’d classify you as someone who went NoSQL because of one of the reasons I cited: ACID transactions are overkill / inappropriate / must be traded off for what you are trying to do. That’s one of about 8 reasons I give for why people go NoSQL and I’m certainly not trying to suggest either [1] that every person joins for every reason or [2] that every person who joins considers themselves in a movement.

      By the way, for someone disagreeing with me, you’re agreeing a lot: you did exactly what I suggested which was: step away from the hype, consider your alternatives, and pick the best solution for you and at the best cost. So I think your primary beef is me calling it a movement, and here I’d say we simply disagree. I do believe NoSQL is a movement, though I didn’t mean to suggest anything “noble” about it.

      By the way, those DoD Oracle licenses may have been “free” for your project if you choose to use them, but they were certainly not “free” for the DoD. Oracle does roughly $5B/year in DBMS revenues generating >50% operating margins. And while your project may have had “free” access to Oracle someone up the chain is indeed paying for those licenses and those people are starting to issue directives to look at cheaper solutions/alternatives.

  3. Pingback: Saying yes to NoSQL — Too much information

  4. I think you are over analyzing. There’s nothing wrong with relational theory, its good stuff. NoSQL is a kickback agains SQL, plain and simple.

    The problem is that SQL is a poor way of interfacing a program to a database. What’s more it hasn’t evolved since about 1985, whereas programming languages have.

    The NoSQL movement has lumped together a host of disparate solutions that have just one common factor. No SQL. The reason they all exist is because the existing SQL based databases were not up to the job and the main thing holding back Oracle and the like is that they are wedded to SQL.

    People are now able to start thinking about the relationship between their code and it’s data in a fresh and innovative way. It’ll be interesting (and exciting) to see how this plays out.

    Fwiw, my money is on REST and JSON with who knows what behind that.

    • George,

      Thanks for your comment. Relational theory is fine (see prior comment response). I think for a combination of the reasons listed in the post many people are tired of RDBMSs. For some it’s the pricing. For others, it’s the programming language interfaces (but, btw, object databases had wonderful programming interfaces). To me the NoSQL alternatives are a mixed bag as the reasons that people are attracted to the NoSQL movement.

  5. About eight months ago, I have attempted to formalize how XML documents could be represented in an SQL database using the schema definition (I named it: XSD to DDL).

    I have summarized here http://www.web21th.com/schemas/xsd-ddl.htm
    I quickly stopped because the work became a great complexity.
    At this time I searched to use XForms with a relational database.
    It was actually a little silly, but this experience convinced me that I had to give up with RDBMS and then adopt native XML.

    I have not devote as much time as I wanted.
    But finally, new style usage go to finger tips (and brain too) very quickly if we find the useful information and that one gets in touch with the experts and advanced users.

    And about autumn of SQL, remember Cullinet.

    I both now use eXist-db and Mark Logic, facing XSLTForms

    One must of course make an effort to deepen XML, including XQuery / XForms / XML Schema.
    Read also the technical documentation (administration, programming interfaces) databases.

    We check very quickly that there is not only a volume gain in development, but that application naturally follows the objectives and design, and also large savings of administration and database maintenance will be at the rendezvous

  6. You may want to do some more investigation as to why SQL really has nothing to do with the Relational Model of data.

    Part of the reason that SQL DBMSs are having difficulty is that they don’t obey the principles of the Relational model.

    This just goes to show that everything old in the computer industry gets to be renamed and become new again. A lot of these “new” projects are based on database ideas that were rejected in the 1960s precisely because they don’t work.

    • Jacobus,

      I’m aware that SQL and the relational model are *not* two sides of the same coin, but in practice SQL and today’s relational databases are. i.e., As a matter of theory, you are correct, but as a matter of practice, SQL is the language to which one speaks to an RDBMS. People interested more in relational theory will enjoy this book by Chris Date.

  7. Pingback: NoSQL explained correctly (finally) « Otaku, Cedric's weblog

  8. Wow. “Tea Party”. Really? Where do I even start…they’re idiot reactionaries. NoSQL folks are engineers who want to make life easier :)

    “NoSQL” not about being anti-SQL, it’s about using the right tool for the job. And it’s not just about “big data”, it’s about ease of use and effortless scalability.

    That’s why we built the Drawn to Scale platform. (http://www.drawntoscalehq.com)

    The RDBMS was designed over 40 years ago for hardware and software that just doesn’t make sense for every use case out there. It’s difficult to scale, doesn’t work with modern data

    I agree: study *all* your technology, and make the best choice. Now, thanks to NoSQL, that’s a possibility.

    • Bradford, Thanks. I wasn’t trying to extend the tea party metaphor to anything other the generic idea of rebellion — beyond the basic notion of rebelling against the status quo I wasn’t trying to draw any other similarities. Plus, hey, a good headline is a good headline.

  9. Oh… and Hadoop isn’t a key-value store. It’s a distributed filesystem combined with batch-processing framework.

    HBase, the database built on top of Hadoop, is modeled after BigTable. It’s a distributed, column-oriented DB where you store and retrieve data by a key. It’s also used by dozens of companies in production.

    • Bradford,

      On Hadoop, I think you’re right and Wikipedia largely agrees with you: Apache Hadoop is a Java software framework that supports data-intensive distributed applications under a free license. More here.

  10. You left off the .com in the link to your own company on this post. Just thought you might like to know. You can delete my comment after you see this.

  11. Pingback: SQL at 40: Ready for Retirement? : Beyond Search

  12. I recently blogged about the rise of NoSQL in the cloud and the threat to the SQL world, with predictions on how it will play out. http://scaledb.blogspot.com/2010/02/will-nosql-movement-unseat-database.html

  13. Pingback: PLM Platforms: Retirement Or noSQL Knock-Out? « Daily PLM Think Tank Blog

  14. Pingback: NoSQL y el fin de las bases de datos como las conocíamos | Incognitosis

  15. I dont think the traditional RDBMS will die. How many people needs to handle the amount of data that requires discarding Oracle? Just a few. There will be market for Oracle, MySQL, etc, for a long time.

    • I don’t disagree. Things in IT take a very long time to die. The question is not literally whether something will go away; it’s whether it’s best days are ahead or behind. As I think I mentioned in some post, the mainframe started “dying” in the mid 1980s (and COBOL along with it). But yes, both are still here today. So it’s not binary. But my belief is that the general-purpose RDBMS is and will be challenged — which for the past 25 or so years it really hasn’t, save for OLAP servers — and the market will change from “which of the big three do I use” to a much broader set of alternatives.

  16. I don’t think it makes sense for you to say the NoSQL crowd is fighting against closed source.

    There are plenty of open source choices for RDBMS: Postgresql, MySQL, Firebird. Yes they are all free and have huge user bases (especially MySQL)

    The companies you cited as “not relying on RDBMS technology” – google, yahoo, facebook – are all in reality heavy users of MySQL. Google and Facebook even have their own patches for MySQL.

    The use of NoSQL vs. SQL databases simply comes down to this: use the right tool for the right job.

  17. Hi Andy,

    To be clear, I think the NoSQL thing is a movement/protest (which some commenters clearly disagree with), but like all protests people “in the march” at there for different reasons.

    So when I made the list of things people were protesting against, I did not mean to imply that all protestors were protesting for all items on the list.

    I’m sure FB, Google et alia us RDBMSs in some places — hey anyone who uses salesforce.com is indirectly using Oracle. But for core infrastructure for their web apps they are using a lot in-house built stuff that they’ve then open sourced.

    Agree on the right tool for the right job thing, but remember to consider all alternatives and total cost. To me, it’s not just Oracle vs. Hadoop. It’s a bunch of RDBMSs and a bunch of database alternatives.

  18. Dave,

    Its been a few years (20+) since we closed the bar at the Redmond Marriott after many hours of solving the world’s problems. I’m delighted to see you’ve still got the enthusiasm for this generally cynical industry.

    Keep up the thought provoking dialogue.

  19. Chris,

    Great to hear from you and thanks for reading.

  20. Mark,
    Excellent post. As usual, you have captured all elements of the arguments succinctly. I would like to corroborate your conclusions with one additional point. Would be curious to know your take on that.

    One thing we can not overrule is the deep penetration of SQL within the business/end user community. However sophisticated BI tools an organization might have, you often find business users rolling up their sleeves and introspecting data using SQL.
    SQL is declarative and turing complete and offers a data introspection paradigm that relates to an analysts thought process. It may not necessarily be elegant for many classes of problems but at least it bears familiarity.
    In order to drive No-SQL to the same level of adoption within the business user community, we will need a fairly sophisticated set of tools that abstract the language nitty-grittyies. People have been trying to do that for the last 25 years for procedural languages and have not succeeded. We have simplified the programatic access (e.g. Ruby on Rails) but still its a skill barrier that most end users find too daunting to overcome.

    In my opinion we might end up with a more hybrid approach towards the SQL limitations. Many of the specialized massively parallel data warehousing technologies are beginning to provide hooks and IDEs to write procedural routines, that can be executed in parallel, and invoked from SQL.
    So its likely that the next evolution of SQL might be “Mixed-SQL” where one preserves the simplicity of SQL but amplifies its capabilities thru the salient features of a No-SQL like movement.

  21. Pingback: Search Facets » Let’s not let “NoSQL” go the way of “Web 2.0”

  22. Dave, I think you might want to know that Date renamed the book Database in Depth and updated the information in: http://amzn.com/0596523068

    I read both those books back to back, and I can say that while alot of the information is similar, but the new one does go into alot more detail in several areas.

  23. Pingback: SQL or NoSQL? « TechLedger

  24. The Tea Party is an unfortunate metaphor. Are NoSQL people a movement funded by rich millionaires from Texas who are willing to destroy their own party to win symbolic victories? I don’t think so.

    Besides the unfortunate political metaphor, this is a good article. Thanks for writing it.

    • Hugo,

      If you’re not quite so literal, I think the metaphor still works. While I won’t get into the funding of the Tea Party, or its political beliefs, I do view it as a grass-roots populist movement, which I think NoSQL also is. That said, in a sense NoSQL is funded by billionaire companies — Google built BigTable, the grand-daddy of many of the NoSQL systems, Facebook built Cassandra, LinkedIn built Voldemort … maybe the metaphor is even more apt due to your observation than I thought.

  25. Very interesting post, Dave. To be honest, I too was drawn to NoSQL for many of the reasons you cited, but that was quite a long time ago (CouchDB was still XML-based back then). Though I have to say your characterisation of open source (colored as it may be by your employ) is a bit unfair. Evaluating the open source options doesn’t always come from a knee-jerk screw the man mentality: it’s not always about “free as in speech” or “free as in beer”, when it comes to dealing with vendor licensing sometimes it’s just about “free as in unchained”. Look no further than the kinds of restrictions you put on your licensees to see what I mean — does it really matter how many cores are in the machine? What’s the difference between an “offline” app and an “online” app anyway?! Arbitrary restrictions are arbitrary.

    None the less, as you said, the future isn’t one of these options, it’s all of them, even SQL (in its ever-growing number of forms). The interesting challenge is in finding the primitives that bind all of these stores together and exposing them in a unified interface. This is what we’re doing with Persevere [1] — and in fact, I have a MarkLogic store implementation that lets me use it over the same HTTP interface as an RDBMS, Mongo, Couch — anything that can store data really. If you get the interface right you can still dip down into the more specialized vendor-specific features where needed. And with a little more work I can even generalize this code to work with other XQuery databases (loosening the cuffs, so to speak).

    But as far as actually interfacing with your data? Just like George James in an earlier comment, my money’s also on REST and JSON, even if under the hood my data’s really in XML. In fact, I’d go as far as to say REST and JSON have already won, wouldn’t you?

    [1] http://persvr.org

    • Hi Dean,

      Thanks for your comment. All licensed software is licensed along some dimension: # of computers, # of CPUs, # of cores, # of users, amount of data, amount of data processed, something. Remember that Xerox machines (and I’m guessing this must have been during some patent protection period) were sold with a per-copy fee (as opposed to just buying the machine). So I’d argue that software is either free or it isn’t. And if it’s not free, then it’s priced along some [inherently] arbitrary dimension. I’ve never found trying to justify any particular dimension as more worthy than others. At one time, people joked about East Coast licensing (based on how much money you appeared to have based on the computers you bought) vs. West Coast licensing (based on how many users got to participate in the database experience). In the end, the market ends up picking one of the “arbitrary” dimensions, it becomes standard, and then — I’d argue — no longer arbitrary. Because of this process, we try to price like Oracle: it’s a database, we’re a database, and the market has said that databases are priced this way. Yes, that leaves open the possibility of business model disruptors who want to price differently — if at all. But that’s not the business MarkLogic is in; we are a technology disruptor.

      As for REST and JSON, I think REST is winning but I’m not at all sure that JSON has already won — I think that represents the programmer’s perspective, not the data architect’s. But, frankly, I’m not the best person at MarkLogic to debate this particular point with, either.

  26. Pingback: NoSQL Daily – Wed Sep 22 › PHP App Engine

  27. Pingback: Nuno Job’s NoSQL Frankfurt Presentation | Kellblog

  28. How About "Database Renaissance"?

    Any NoSQL proponents who’d like to shoot themselves in the foot should continue to refer to it as the database “Tea Party.” Most of those “tea party” folks think their mousepad is a foot pedal and have long since broken off the “cupholders” off their PC towers!

  29. จองตั๋วเครื่องบิน ตั๋วเครื่องบิน ตั๋วเครื่องบินราคาถูก
    The Tea Party is an unfortunate metaphor. Are NoSQL people a movement funded by rich millionaires from Texas who are willing to destroy their own party to win symbolic victories? I don’t think so.

    Besides the unfortunate political metaphor, this is a good article. Thanks for writing it.

    • Thanks. Many people didn’t like the tea party analogy. I wasn’t thinking Texas billionaires when I picked it; I was thinking grass-roots uprising. Perhaps the political reality is that the Tea Party is a “grass-roots” uprising, funded by billionaires, but this isn’t a politics blog, so please excuse me if I got any part of it wrong. What I meant to imply was grass-roots uprising and nothing more.

  30. Pingback: Best of Kellblog 2010 | Kellblog

  31. Pingback: Max Schireson Appointed President of MongoDB Company, 10gen | Kellblog

  32. Pingback: MarkLogic Needs to Harness the NoSQL Movement « Creating a Web Startup on MarkLogic

  33. Pingback: Bases de Datos: RDBMS vs No-SQL, una R-Evolución « Mis Ideas

  34. The use of the Tea Party is apt. Both the No-SQL movement and the Tea Party avoid the centralization requirements used by their predecessors.

    A better name for No-SQL would have been No-ACID, but that would confuse more people. Current relational database technology is great for most classes of data but for website data defined as read-mainly with loose consistency, then using many cached copies or partitioned data servers makes sense.

  35. Hi Dave;
    I did work on COBOL, RDBMS and No SQL, I don’t agree that No SQL will kill SQL I see that all has its needs (No SQL reminds me of COBOL the concept is nearly the same and its problems are too close to COBOL problems data integrity, consistency, portability…etc.), to use NO SQL or SQL in my opinion is more or less depends on the type of task that you would like to do and what fits there.!
    For example;
    which of the following looks simpler:
    1-SQL
    SELECT * FROM USERS WHERE AGE = 21

    2-NO SQL
    function (doc)
    {
    if (doc.objType == “users”) {
    if (doc.age = 21) {
    emit(doc._id, null)
    }
    }
    }

    • And C didn’t kill COBOL. It’s all a matter of perspective. By my definition, it did subject to the first over-riding rule of technology: things die very slowly. The mainframe has been dying since the mid-1980s

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.