Category Archives: Stonebraker

My Thoughts on the NoSQL Database "Tea Party" Post

Without a doubt, the most controversial post I’ve written on this blog was last month’s The Database Tea Party: The NoSQL Movement.  I know this both from the comment stream, but also from the volume and tenor of emails I received from friends and colleagues over the past few weeks.

The first thing I learned is that standing in the middle is a great way to get attacked from all sides.

  • My database buddies blasted me, treating me like a reckless turncoat:   how dare you endorse these BASE people?  (Humorous double entendre intended.)
  • The NoSQL folks blasted me, generally misunderstanding the protest march metaphor I was using (see below).
  • The above-it-all crowd blasted me for oversimplifying the issue, suggesting that I was endorsing simplistic views such as it’s about batch vs. online or it’s about scaling vs. not scaling.

All of which, of course, only confirmed the religious nature of the movement and that was indeed a movement, regardless of whether any given participant identified himself as such.

(Several folks also blasted me for using the Tea Party Movement as a metaphor.  Let me clarify that this is not a political blog so I won’t debate politics.  I intended the metaphor to cover only concepts like “rebellion” and “grass roots,” both of which I do believe apply to NoSQL.)

First, I’ll clarify the protest march metaphor which was easily the most misunderstood aspect of the post.   Let me share my first rule of protests:  not everyone is there for the same reason.  Some people are there for the stated cause, which I’ll call cause A.  But others participate on behalf of a group; their signs will say “Group 1’s for Cause A.”  Others are there for cause B which lacks enough support to generate its own march, so they tag along with signs saying “Cause A and Cause B.”  If you’ve ever been at a rally, you’ve invariably winced when speakers attempted to hijack the agenda, turning to some personal cause, all while pretending to speak on behalf of the group.

It was this sense of chaos and disorder that I was trying to portray.  When I made the list of reasons why I thought people were on the NoSQL march, it was neither to say that I agreed with them or that all people were there for all reasons.  I was doing the equivalent of asking protesters on the  UC Berkeley Anti-Grenada March why they were there.  To which the replies might have been:

  • To protest the Grenada invasion (cause A)
  • To remind people about [insert group here] rights, all while protesting against Grenada
  • To protest about UC budget cuts, all while protesting about something the Government did, which I’m pretty sure was bad.
  • Because I go to every march; what’s this one about?
  • Because I was at Top Dog when it went by and had nothing better to do

So hopefully, the intent of my NoSQL-reasons-list is now clear.   Some folks are there because they don’t want ACID transactions.  Some folks are there because they are dealing with Internet scale.  Some folks are there because they hate the SQL impedance mismatch.  Some folks are there because they’re tired of paying oligopoly prices to Oracle.   (I particularly liked the comment that said Oracle was free because they had an enterprise license that most certainly wasn’t, ignoring the possibility that recent enterprise/agency directives to look at open source could result directly from the size of the last Oracle check.)

And yes, some folks are there because it’s cool and they want to be like Twitter, Google, and Facebook, but getting them to admit that is a virtual impossibility.

One irony is that I actually agree with one of the fiercest commenters:

A very interesting write-up with one little oversight: you’re wrong.

I am part of a large program to write a NoSQL database for military applications. [It’s not about …] It’s [about]  the fact that RDBMSs are built in a different space in the CAP trades.

Google, Amazon, Facebook, and DARPA all recognized that when you scale systems large enough, you can never put enough iron in one place to get the job done (and you wouldn’t want to, to prevent a single point of failure). Once you accept that you have a distributed system, you need to give up consistency or availability, which the fundamental transactionality of traditional RDBMSs cannot abide. Based on the realization that something fundamentally different needed to be built, a lot of very smart people tackled the problem in a variety of different ways, making different trades along the way. […]

So – the NoSQL databases are a pragmatic response to growing scale of databases and the falling prices of commodity hardware. It’s not a noble counterculture movement (although it does attract the sort that have a great deal of mental flexibility), it’s just a way to get business done cheaper.

To respond to the commenter:

  • Thank you for the clear definition of why you moved to NoSQL.
  • Your comment was picked up by the Otaku blog in an post called NoSQL Explained Correctly (Finally), so congrats and I’m glad I could help facilitate “the conversation.”
  • Disorder and chaos is what I was trying to portray in the protest march metaphor, not hippies or nobility
  • You are clearly using NoSQL for two of the reasons on my list:  scalability and bloatware (i.e., perhaps not the best word choice, but the idea was undesired, included functionality — e.g., ACID transactions)
  • You did exactly what I said to do:  consider all alternatives and do what’s right for your business
  • So, why are we disagreeing again?

I think some people didn’t like my putting “coolness” on the table as a factor and the notion of a “movement.”  I believe those are both very real and ironically those who disagreed with me loudest were effectively screaming:  it’s not a movement and I’m not doing it to be cool; I’m doing it because it’s right for my business.  If so, great.  But why does it hit such a nerve?

In the end, when it comes to NoSQL I am trying to:

  • Provide an overview of why I think people are considering and/or using NoSQL solutions
  • Provide good background references and readings (see bottom of my first post)
  • Remind mangers to keep an eye out for the “bad reasons” to go NoSQL — i.e,. coolness and Google wannabeism
  • Remind people not to confuse NoSQL with NoDatabase.  Special-purpose databases (e.g., MarkLogic) are optimized for specific applications (e.g., semi-structured data) and handle them far better than a general-purpose RDBMS.  So in your haste to move off Oracle, don’t advance directly to an open source key-value store; there might be alternative DBMSs that meet your needs more effectively.
  • Remind people not to confuse NoSQL with NoCommercialSoftware.  While people seem to dislike when I say it, the RDBMS market is an oligopoly and the big vendors’ pricing, margins, and heavy-handed customer relationships are all consistent with that market structure.  But you can find other classes of commercial software where the vendors are hungrier and more customer centric.

Thoughts on Category Creation and Information Access Platforms [Revised]

[Revised 8/2/08; still working on cleaning up this consciousness stream.]

Back in the old days, it seemed easy to create a category in software. Look at the database market, for example:

  • IBM invents the relational DBMS (RDBMS) category
  • Oracle, Ingres, and Informix enter in a largely undifferentiated way, though Informix eventually drifts towards the low-end/cheap segment
  • Sybase creates the derivative category of high-performance OLTP RDBMS.
  • Arbor re-christens the failed multi-dimensional DBMS as the OLAP Server
  • Tandem creates the non-stop RDBMS with its superb fault tolerance
  • Illustra launches the universal DBMS and is quickly acquired by Informix
  • Sybase launches the bitmap-indexed DBMS with SybaseIQ
  • Teradata launches the data-warehouse DBMS category

And you can find just as many examples outside database-land.

  • ASK defines the manufacturing resource planning (MRP) category
  • SAP hijacks MRP, redefines it as ERP, and goes on to become the world’s largest applications software company
  • PeopleSoft invents the HRMS category
  • Gartner Group’s Howard Dresner invents the business intelligence (BI) category, re-christening and re-framing what was formally known as DSS or EIS.
  • Siebel pioneers the sales force automation (SFA) category
  • Scopus pioneers call center automation (CCA)
  • Companies like Rubric pioneer enterprise marketing automation (EMA)
  • Siebel, through acquisition, coalesces SFA, CCA, and EMA into a single category called customer relationship management (CRM)
  • Oracle and SAP work to coalesce CRM back into ERP. Such is the ebb and flow of categories.

(And I could go on and on — BPM, KM, CMS, WCM, ECM, LMS, DRM, SCM, PLM, ETL, DI, EII — but I think I’ll stop here with the initials list.)

People are still creating categories today, and sometimes it looks easy. Uber-categories have been quite popular in the past decade as people have focused on different ways of developing and delivering software:

  • SaaS as an uber-category has worked well, with a variety offerings in various SaaS sub-categories (e.g., Salesforce, NetSuite)
  • Appliances have done pretty much the same thing — i.e., offering an appliance alternative for a wide variety of existing categories (e.g., a data warehouse appliance a la Netezza)
  • Open source has also done the same thing — again serving as a different flavor/dimension for a wide variety of largely existing software categories.

Only a few genuinely new categories have emerged, virtualization being the most obvious example. (Though you could argue that virtualization is itself an uber-category covering storage virtualization, server virtualization, et cetera.)

Companies are still working to carve new categories, particularly in the database market:

Sometimes vendors and/or the analysts who cover them try to impose either a straight name change (e.g., from MD-DBMS to OLAP) or a strategic shift (e.g., from BI to analytic applications) in category. Sometimes they’re just bored. Sometimes a vendor’s trying to redefine the market in line with its strengths. Sometimes an analyst is trying to make his/her mark on the industry and earn the coveted “father/mother of [category name],” much as Howard Dresner successfully did with BI.

BI got bored with its name several times during my tenure at Business Objects. At one point both the analysts and Informatica were trying to re-dub the category “analytic applications” in an attempt to get a fresh name and raise the abstraction level from tools to applications. Informatica nearly died on that hill.

Later, analysts tried to redefine the category, dubbing it corporate performance management (CPM), and arguing that business intelligence needed to link with financial planning systems. While knowing actuals is good, knowing actuals compared to the plan is better, and using actuals to drive the future plan better still. Cognos nearly tripped over itself repositioning around the CPM, ultimately acquiring Adaytum, which in turn lead to SRC’s eventual acquisition by Business Objects.

In an art-imitates-life sort of way, one wonders if the analysts predicted a move in the market or provoked it? My chips are on the latter.

This stream-of-consciousness is a long way of winding up to a single question: are enterprise search vendors successfully repositioning themselves as “information access platforms” or not?

Background: the enterprise-search-related vendors (e.g., Fast/Microsoft, Endeca) and search/content analysts who cover them are in the midst of an attempted category repositioning:

  • The word “enterprise search” is now seemingly dead, having been contaminated by the Google Appliance. When a shark gets in the water, all the fish jump out.
  • The word “information” is increasingly being used as a unifying term to describe both data and content (aka, unstructured data)
  • Enterprise search vendors are increasingly calling themselves “information access platforms” (though not generally abbreviated as IAP, I will do so here for brevity).

For example, consider Endeca’s corporate boilerplate:

Endeca’s innovative information access software that helps people explore, analyze, and understand complex information, guiding them to unexpected insights and better dec
isions. The Endeca Information Access Platform, built around a new class of access-optimized database, powers applications that combine the ease of searching and browsing with the analytical power of business intelligence.

I have a number of concerns on and related to this attempted shift:

  • The important thing about categories is that they exist in the mind of the customer. Analysts and vendors can try to put them there — but they have to stick. In my mind, IAP is not sticking. I have never heard a customer say: “I need to go out and get an IAP.”
  • I do, however, believe that “information” might well stick as an overall term, meaning both data and content (aka, structured and unstructured data).
  • It is not clear to me why someone who desires a unified platform for “information” would turn to a search vendor. Search engines were designed as read-only indexes to help people find documents containing tokens; hardly ideal as an application development platform.
  • In my estimation, someone managing “special” data should turn to a database vendor. While databases have classically not handled “special” data well, databases were designed as application platforms, and there is a whole new class of specialized databases emerging for handling various “special” types of data.
  • While I think a unified platform is a dandy vision, I think no one is close to delivering a unified platform that handles all types of data equally well. Bolting Lucene and MySQL together isn’t a platform. Relational databases still do a poor job with both content and many types of data (e.g., sparse, hierarchical, or semi-structured). XML servers (like MarkLogic) handle XML brilliantly, but need work before they can match RDBMSs at classical relational data.
  • I believe that someone who needs a crawl-and-index the intranet value proposition should use the Google Appliance; so I think the search vendors are correct in their desire to flee, I don’t think that “information access platform” is a good refuge.

Overall, my chips remain on the don’t come line for the attempted category repositioning from “enterprise search” to “information access platform.” You can find my stack on the come line for the emerging “special-purpose database” category and “XML servers” as an instance of them.

The End of an Architectural Era

I picked up via this post on the High Scalability Blog a new paper by Michael Stonebraker, Nabil Hachem, and Pat Helland entitled The End of an Era (It’s Time for a Complete Rewrite) presented at the VLDB 2007 conference in Austria on September 23rd through 27th.

From the paper’s summary:

“In the last quarter of a century, there has been a dramatic shift in:

1. DBMS markets: from business data processing to a collection of markets, with varying requirements
2. Necessary features: new requirements include shared nothing support and high availability
3. Technology: large main memories, the possibility of hot standbys, and the web change most everything

The result is:

1. The predicted demise of “one size fits all”
2. The inappropriateness of current relational implementations for any segment of the market
3. The necessity of rethinking both data models and query languages for the specialized engines, which we expect to be dominant in the various vertical markets”

As you know, I’m a big believer in the special-purpose DBMS meme. Any database historian knows what Codd was thinking, and more importantly — what he wasn’t — when he designed the relational model. Again, from the paper:

“Ted Codd’s idea of normalizing data into flat tables has served our community well over the subsequent 30 years. However, there are now other markets, whose needs must be considered. These include data warehouses, web-oriented search, real-time analytics, and semi-structured data markets.”

The complete paper can be found here.