The other day I noticed a taxonomy used on one of the NoSQL Database blogs that went like this:
Types of NoSQL systems
- Core NoSQL Systems
- Wide column stores
- Document stores
- Key-value / tuple stores
- Eventually consistent key-value stores
- Graph databases
- Soft NoSQL Systems (not the original intention …)
- Object databases
- Grid database solutions
- XML databases
- Other NoSQL-related databases
I, perhaps obviously, take some umbrage at having MarkLogic (acceptably classified as an XML database) being declared “soft NoSQL.” In this post I’ll explain why.
Who decided that being open source was a requirement to be real NoSQL system? More importantly, who gets to decide? NoSQL – like the Tea Party – is a grass-roots, effectively leaderless movement towards relational database alternatives. Anyone arguing original intent of the founders is misguided because there is no small group of clearly identified founders to ask. In reality, all you can correctly argue is what you think was the intent of the initial NoSQL developers and early adopters, or — perhaps more customarily — why you were drawn to them yourself, disguised or confused as original founder intent.
As mentioned here, movements often appear homogeneous when they are indeed heterogeneous. What looks like a long line of demonstrators protesting a single cause is in fact a rugby scrum of different groups pushing in only generally aligned directions. For example, for each of the following potential motivations, I am certain that I can find some set of NoSQL advocates that are motivated by it:
- Anger at Oracle’s heavy-handed licensing policies
- The need to store unstructured or semi-structured data that doesn’t fit well into relations
- The impedance mismatch with relational databases
- A need and/or desire to use open source
- An attempt to reduce total cost
- A desire to land at a different point in the Brewer CAP Theorem triangle of consistency, availability, and partition tolerance
- Coolness / wannabe-ism, as in, I want to be like Google or Facebook
(Since this was a source of confusion in prior posts, note that this is not to claim the inverse: that all NoSQL advocates are motivated by all of the possible motivations.)
I’d like to advocate a simple idea: that NoSQL means NoSQL. That a NoSQL system is defined as:
A structured storage system that is not based on relational database technology and does not use SQL as its primary query language
In short, my proposed definition means that NoSQL (broadly) = NoSQL (literally) + NoRelational. In short: relational database alternatives. It does not mean:
- NoDBMS. We should not take NoSQL to exclude systems we would traditionally define as DBMSs. For example, supporting ACID transactions or supporting a non-SQL query language (e.g., XQuery) should not be exclusion criteria for NoSQL.
- NoCommercialSoftware. While many of the flagship NoSQL projects (e.g., Hadoop, CouchDB) are open source projects, that should be not a defining criterion. NoSQL should be a technological, not a delivery- or business-model, classification. Technology and delivery model are orthogonal dimensions. We should be able to speak of traditionally licensed, open source licensed, and cloud-hosted NoSQL systems if for no other reason than understanding the nuances of the various business/delivery models is a major task unto itself. Do you mean open source or open core? Is it open source or faux-pen source? Under which open source license? How should I think of a hosted subscription service that is a based on or a derivative of an open source project?
Recently, I’ve heard a piece of backpeddling that I’ve found rather irritating: that NoSQL was never intended to mean “no SQL,” it was actually intended to mean “not only SQL.” Frankly, this strikes me as hogwash: uh oh, I’m afraid that people are seeing us as disruptors and it’s probably easier to penetrate the enterprise as complementary, not competitive, so let’s turn what was a direct assault into a flanking attack.
To me, it’s simple: NoSQL means NoSQL. No SQL query language and no relational database management system. Yes, it’s disruptive and — by some measures — “crazy talk” but no, we shouldn’t hide because there are lots of perfectly valid (and now socially acceptable) reasons to want to differ from the relational status quo.
In effect, my definition of NoSQL is relational database alternative. Such options include both alternative databases (e.g., MarkLogic) and database alternatives (e.g., key/value stores). This, of course, then cuts at your definition of database management system where I (for now at least) still require the support of a query language and the option to have ACID transactions.
By the way, I understand the desire to exclude various bandwagon-jumpers from the NoSQL cause. Like most, I have no interest in including thrice-reborn object databases in the discussion, but if the cost of excluding them is excluding systems like MarkLogic then I think that cost is too high. Many people contemplating the top-of-mind NoSQL systems (e.g., Hadoop) could be better served using MarkLogic which addresses many typical NoSQL concerns, including:
- Vast scale
- High performance
- Highly parallel shared-nothing clusters
- Support for unstructured and semi-structured data
All with all the pros (and cons) of being a commercial software package and without requiring reduced consistency: losing a few Tweets won’t kill Twitter, but losing a few articles, records, or individuals might well kill a patient, bank, or counter-terrorism agency. BASE is fine for some; many others still need ACID. Michael Stonebraker has some further points on this idea in this CACM post.
I’d like to suggest that we should combine the ideas in this post with the ideas in my prior one, Classifying Database Management Systems. That post says the correct way to classify DBMSs is by their native modeling element (e.g., table, class, hypercube). This post says that NoSQL is semi-orthogonal – i.e., I can imagine a table-oriented database that doesn’t use SQL as its query language, but I doubt that any exist. Applying my various rules, the combined posts say that:
- Aster is a SQL database optimized for analytics on big data
- MarkLogic is an XML [document] database optimized for large quantities of semi-structured information and a NoSQL system
- CouchDB is a document database and a NoSQL system
- Reddis is a key/value store and a NoSQL system
- VoltDB is a SQL database optimized to solve one of the two core problems that NoSQL systems are built for (i.e., high-volume simple processing)
Finally, I’d conclude that even with these rules I have trouble classifying MarkLogic because of multiple inheritance: MarkLogic is both a document database and an XML database, it is difficult to pick one over the other, and I there certainly are non-document-oriented XML database systems. Similar issues exist with classifying the various hybrids of document databases and key/value stores. So while I may have more work to do on building an overall taxonomy, I am absolutely sure about one thing: MarkLogic is a NoSQL system.
—
* The “Yes, Virginia” phrase comes from a 1897 story in the New York Sun. For more, see here.
To me it’s simple – XML is a mean not a purpose.
It’s purpose might be to describe documents. Talking about XML databases makes little sense to me, from a technical standpoint the important part is that the underlining data model allows you to store tree-like structures.
From a business perspective what’s important is the fact that you can store your business objects in a model that is suited to fast retrieval and insertion of those objects.
So drop the XML label, drop the JSON label. They can both describe the same (on a side discussion I would say JSON is better for describing objects and XML for annotations and documents). But what really matters is what can be achieved with these models that cannot with a simple table. The shift from data to information.
My 2 cents – by the way great article!
Dear Dave,
thanks for your posting and your ideas. I really do appreciate your opinion. Nevertheless I allow myself to write down the rationale behind my decision for the world and for a fruitful discussion.
(I included your blog to the NoSQL archive)
You adressed some points from my NoSQL-database.org website:
> Who decided that being open source was a requirement to be real NoSQL system?
No one decided this. I didn’t wrote this. What I wrote was: “…mostly address some of the points”. So it’s clear that I mean a good mix
of all points makes it NoSQL.
Then I distinguish between “core” and “soft” NoSQL at the nosql archive. First of all, I was very glad that other non relational databases joined the NoSQL movement. And I was happy to include them on the website. I received several hundred emails and it was great to see how people loved to see the growing space. And get the freedom of choice back. Even people from the surroundings of Oracle and IBM emailed me to join the boat. And no system was left excluded.
I follow the NoSQL movement since early 2009 after some talk with Jann Lennart (CouchDB), who worked not far from my home. Perhaps this is why I personally feel that the reader of my website should see that the inventors of the “NoSQL” word had a very specific area of databases in mind. If you recall the history the new term NoSQL, it was “invented” by Johan Oskarsson (I know that the ‘true’ history goes back to the 1970s…) and first blogged by Eric Evans. They set up conference for a specific group of databases (NoSQL) designed mostly to back specific websites. So if was not about being “cool and they want to be like Twitter, Google, and Facebook” as you mentioned. It was a conference about real databases like HBase, CouchDB, Cassandra etc. and for websites with real needs and people interested in this.
And the readers of nosql-database.org should see, that there are concrete ‘core’ databases these people had in mind. And of course that there are other great non-relational databases which support NoSQL too (my soft part).
Perhaps the word “intention” or “soft” I have chosen in this sentence “soft nosql: not the original intention of ‘NoSQL’ …” is not perfect. Do you have a better proposal? I promise I think seriously about it.
And the sentence above I wrote continues (you left this out in your citation): “…worth a look for great non relational solutions”. From this you can easily see that I highly appreciate other great databases as XML-, Object Databases or others in the broad sense of NoSQL.
Best Regards and good luck to MarkLogic
Stefan Edlich
Just saw your post today. Only yesterday someone asked me about NoSQL databases, and I told them that XML databases are the original NoSQL databases. They didn’t like my answer, it wasn’t the one they were after, but I agree with you all the way that for many situations, XML databases are great NoSQL option.
Cheers, Tony.
Your interpretation of NoSQL falls into the binary camp where the “No” means “No”. I prefer the alternate view of “not only” SQL rather than the “F* relational” view.
I find that many complaints about relational theory are there because of lack of understanding of the ceoncepts and of SQL, and exposure to poorly constructed and performing databases.
The other complaints related to hierarchical constructs, networks, objects, partitioned high scale operation, etc. are all valid. Since I focus more on the analysis side, I prefer “not only” where many of my peers working on web applications are doing form/transaction style work and “no” is a better fit.
The problem with classifying all these information storage technologies and models is similar to the problem early libraries had (and still have) regarding how to organize and structure their collections. In the end, I prefer Cutter’s solution over Dewey’s because an attribute model is more flexible than a fixed taxonomy. You’re applying the fixed taxonomy model to the databases. I think an attribute model might be better suited because then you can map it to the type of use or specific requirements needed by different people*
The problem is that attribute-based organization doesn’t lend itself to simple lists or tables of “is-a this”, e.g. MarkLogic is a document store and XML database.
*Best argument for not having mutual exclusion in product classification is the floor wax – desert topping debate.
Pingback: Avalon Consulting LLC Blogs » MarkLogic? A “NoSQL” Database? YES!
Your interpretation of NoSQL falls into the binary camp where the “No” means “No”. I prefer the alternate view of “not only” SQL rather than the “F* relational” view.
I find that many complaints about relational theory are there because of lack of understanding of the ceoncepts and of SQL, and exposure to poorly constructed and performing databases.
The other complaints related to hierarchical constructs, networks, objects, partitioned high scale operation, etc. are all valid. Since I focus more on the analysis side, I prefer “not only” where many of my peers working on web applications are doing form/transaction style work and “no” is a better fit.
The problem with classifying all these information storage technologies and models is similar to the problem early libraries had (and still have) regarding how to organize and structure their collections. In the end, I prefer Cutter’s solution over Dewey’s because an attribute model is more flexible than a fixed taxonomy. You’re applying the fixed taxonomy model to the databases. I think an attribute model might be better suited because then you can map it to the type of use or specific requirements needed by different people*
The problem is that attribute-based organization doesn’t lend itself to simple lists or tables of “is-a this”, e.g. MarkLogic is a document store and XML database.
*Best argument for not having mutual exclusion in product classification is the floor wax – desert topping debate.
Via De Morgan’s Law, the inverse of the set of bullets is:
If you are not a NoSQL supporter, then you are interested in none of the bulletted items.
-Todd
Pingback: MongoDB and db4o via Linq / The object-document mismatch | emphess .NET
Pingback: MarkLogic? A “NoSQL” Database? YES! | Avalon Consulting, LLC. Blogs
Pingback: MarkLogic: Beyond NoSQL « Another Word For It
Pingback: Thoughts on MongoDB’s Humongous $150M Round | Kellblog
Pingback: Site Title