We are in the middle of one of our periodic analyst tours at MarkLogic, where we meet about 50 top software industry analysts focused in areas like enterprise search, enterprise content management, and database management systems. The NoSQL movement was one of four key topics we are covering, and while I’d expected some lively discussions about it, most of the time we have found ourselves educating people about NoSQL.
In this post, I’ll share the six key points we’re making about NoSQL on the tour.
Our first point is that NoSQL systems come in many flavors and it’s not just about key/value stores. These flavors include:
- Key/value stores (e.g., Hadoop)
- Document databases (e.g., MarkLogic, CouchDB)
- Graph databases (e.g., AllegroGraph)
- Distributed caching systems (e.g., Memcached)
Our second point is that NoSQL is part of a broader trend in database systems: specialization. The jack-of-all-trades relational database (e.g., Oracle, DB2) works reasonably well for a broad range of applications — but it is a master of none. For any specific application, you can design a specialized DBMS that will outperform Oracle by 10 to 1000 times. Specialization represents, in aggregate, the biggest threat to the big-three DBMS oligopolists. Examples of specialized DBMSs include:
- Streambase, Skyler: real-time stream processing
- MarkLogic: semi-structured data
- Vertica, Greenplum: mid-range data warehousing
- Aster: large-scale (aka “big data”) analytic data warehousing
- VoltDB: high volume transaction processing
- MATLAB: scientific data management
Our third point is that NoSQL is largely orthogonal to specialization. There are specialized NoSQL databases (e.g., MarkLogic) and there are specialized SQL databases (e.g., Aster, Volt). The only case where I think there are zero examples is general-purpose NoSQL systems. While I’m sure many of the NoSQL crowd would argue that their systems can do everything, is anyone *really* going to run general ledger or opportunity management on Hadoop? I don’t think so.
Our fourth point is that NoSQL isn’t about open source. The software-wants-to-be-free crowd wants to build open source into the definition of NoSQL and I believe that is both incorrect and a mistake. It’s incorrect because systems like MarkLogic (which uses an XML data model and XQuery) are indisputably NoSQL. And it’s a mistake because technology movements should be about technology, not business models. (The open source NoSQL gang can solve its problem simply by affiliating with both the NoSQL technology movement and the open source business model movements.)
As CEO of a company that’s invested a lot of energy in supporting standards, our fifth point was that, rather ironically, most open source NoSQL systems have proprietary interfaces. People shouldn’t confuse “can access the source code” with “can write applications that call standard interfaces” and ergo can swap components easily. If you take offense at the word proprietary, that’s fine. You can call them unique instead. But the point is an application written on Cassandra is not practically moved to Couch, regardless of whether you can access the source code both Couch and Cassandra.
Our sixth point is that we think MarkLogic provides a best-of-both-worlds option between open source NoSQL systems and traditional DBMSs. Like open source NoSQL systems, MarkLogic provides shared-nothing clustering on inexpensive hardware, superior support for unstructured data, document-orientation, and high-performance. But like traditional databases, MarkLogic speaks a high-level query language, implements industry standards, and is commercial-grade, supported software. This means that customers can scale applications on inexpensive computers and storage, avoid the pains of normalization and joins, have systems that run fast, can be implemented by normal database programmers, and feel safe that their applications are built via a standard query language (XQuery) that is supported by scores of vendors.