Open Source Archives

Category Archives: Open Source

Yes, Virginia, MarkLogic is a NoSQL System

Posted on April 11, 2010 | 12 comments

The other day I noticed a taxonomy used on one of the NoSQL Database blogs that went like this:

Types of NoSQL systems

Core NoSQL Systems

Wide column stores

Document stores

Key-value / tuple stores

Eventually consistent key-value stores

Graph databases

Soft NoSQL Systems (not the original intention …)

Object databases

Grid database solutions

XML databases

Other NoSQL-related databases

I, perhaps obviously, take some umbrage at having MarkLogic (acceptably classified as an XML database) being declared “soft NoSQL.” In this post I’ll explain why.

Who decided that being open source was a requirement to be real NoSQL system? More importantly, who gets to decide? NoSQL – like the Tea Party – is a grass-roots, effectively leaderless movement towards relational database alternatives. Anyone arguing original intent of the founders is misguided because there is no small group of clearly identified founders to ask. In reality, all you can correctly argue is what you think was the intent of the initial NoSQL developers and early adopters, or — perhaps more customarily — why you were drawn to them yourself, disguised or confused as original founder intent.

As mentioned here, movements often appear homogeneous when they are indeed heterogeneous. What looks like a long line of demonstrators protesting a single cause is in fact a rugby scrum of different groups pushing in only generally aligned directions. For example, for each of the following potential motivations, I am certain that I can find some set of NoSQL advocates that are motivated by it:

Anger at Oracle’s heavy-handed licensing policies
The need to store unstructured or semi-structured data that doesn’t fit well into relations
The impedance mismatch with relational databases
A need and/or desire to use open source
An attempt to reduce total cost
A desire to land at a different point in the Brewer CAP Theorem triangle of consistency, availability, and partition tolerance
Coolness / wannabe-ism, as in, I want to be like Google or Facebook

(Since this was a source of confusion in prior posts, note that this is not to claim the inverse: that all NoSQL advocates are motivated by all of the possible motivations.)

I’d like to advocate a simple idea: that NoSQL means NoSQL. That a NoSQL system is defined as:

A structured storage system that is not based on relational database technology and does not use SQL as its primary query language

In short, my proposed definition means that NoSQL (broadly) = NoSQL (literally) + NoRelational. In short: relational database alternatives. It does not mean:

NoDBMS. We should not take NoSQL to exclude systems we would traditionally define as DBMSs. For example, supporting ACID transactions or supporting a non-SQL query language (e.g., XQuery) should not be exclusion criteria for NoSQL.

NoCommercialSoftware. While many of the flagship NoSQL projects (e.g., Hadoop, CouchDB) are open source projects, that should be not a defining criterion. NoSQL should be a technological, not a delivery- or business-model, classification. Technology and delivery model are orthogonal dimensions. We should be able to speak of traditionally licensed, open source licensed, and cloud-hosted NoSQL systems if for no other reason than understanding the nuances of the various business/delivery models is a major task unto itself. Do you mean open source or open core? Is it open source or faux-pen source? Under which open source license? How should I think of a hosted subscription service that is a based on or a derivative of an open source project?

Recently, I’ve heard a piece of backpeddling that I’ve found rather irritating: that NoSQL was never intended to mean “no SQL,” it was actually intended to mean “not only SQL.” Frankly, this strikes me as hogwash: uh oh, I’m afraid that people are seeing us as disruptors and it’s probably easier to penetrate the enterprise as complementary, not competitive, so let’s turn what was a direct assault into a flanking attack.

To me, it’s simple: NoSQL means NoSQL. No SQL query language and no relational database management system. Yes, it’s disruptive and — by some measures — “crazy talk” but no, we shouldn’t hide because there are lots of perfectly valid (and now socially acceptable) reasons to want to differ from the relational status quo.

In effect, my definition of NoSQL is relational database alternative. Such options include both alternative databases (e.g., MarkLogic) and database alternatives (e.g., key/value stores). This, of course, then cuts at your definition of database management system where I (for now at least) still require the support of a query language and the option to have ACID transactions.

By the way, I understand the desire to exclude various bandwagon-jumpers from the NoSQL cause. Like most, I have no interest in including thrice-reborn object databases in the discussion, but if the cost of excluding them is excluding systems like MarkLogic then I think that cost is too high. Many people contemplating the top-of-mind NoSQL systems (e.g., Hadoop) could be better served using MarkLogic which addresses many typical NoSQL concerns, including:

Vast scale
High performance
Highly parallel shared-nothing clusters
Support for unstructured and semi-structured data

All with all the pros (and cons) of being a commercial software package and without requiring reduced consistency: losing a few Tweets won’t kill Twitter, but losing a few articles, records, or individuals might well kill a patient, bank, or counter-terrorism agency. BASE is fine for some; many others still need ACID. Michael Stonebraker has some further points on this idea in this CACM post.

I’d like to suggest that we should combine the ideas in this post with the ideas in my prior one, Classifying Database Management Systems. That post says the correct way to classify DBMSs is by their native modeling element (e.g., table, class, hypercube). This post says that NoSQL is semi-orthogonal – i.e., I can imagine a table-oriented database that doesn’t use SQL as its query language, but I doubt that any exist. Applying my various rules, the combined posts say that:

Aster is a SQL database optimized for analytics on big data
MarkLogic is an XML [document] database optimized for large quantities of semi-structured information and a NoSQL system
CouchDB is a document database and a NoSQL system
Reddis is a key/value store and a NoSQL system
VoltDB is a SQL database optimized to solve one of the two core problems that NoSQL systems are built for (i.e., high-volume simple processing)

Finally, I’d conclude that even with these rules I have trouble classifying MarkLogic because of multiple inheritance: MarkLogic is both a document database and an XML database, it is difficult to pick one over the other, and I there certainly are non-document-oriented XML database systems. Similar issues exist with classifying the various hybrids of document databases and key/value stores. So while I may have more work to do on building an overall taxonomy, I am absolutely sure about one thing: MarkLogic is a NoSQL system.

—
* The “Yes, Virginia” phrase comes from a 1897 story in the New York Sun. For more, see here.

12 Comments

Posted in Open Source, Uncategorized

Dear CIO: Stop Writing Big Checks for Commodity (Database) Software

Posted on October 14, 2009 | 3 comments

Dear CIO,

What’s wrong this picture?

At 50%+, Oracle’s operating margins have never been higher

The differentiation of Oracle’s database technology, however, has never been lower and the number of both core and specialized alternatives has never been greater.

So what’s going on? You, kind Sir or Madam, are being milked. What’s worse is that you, in an example of collective behavioral dysfunction, have inadvertently played a role in setting up the milking. What happened?

Like all smart CIOs you followed a bit of herd mentality when it came to core technology. Pity the poor fools who, back in the day, bet big on Ingres or Sybase. You played it safe and went with Oracle, IBM, or if your requirements weren’t too heavy, Microsoft.

The problem is, of course, that everyone executed the same strategy you did. Hence, the market created a system of increasing returns where the strong vendors got stronger and the weak ones died. The result: the RDBMS market is an (order of magnitude) $10B/year market, structured as an oligopoly with 3 players. Most other software markets worked out the same way.

You were focused on standardization. You realized that through a combination of decentralized IT decision making and growth-by-acquisition your organization had become a kitchen sink of enterprise software. You had everything. In order to reduce the administrative, training, and license acquisition costs, you fought tooth and nail with your divisions to standardize the environment. You said, “Heck, it’s all the same stuff in the end, folks, so let’s make Oracle our DBMS standard, Business Objects our BI standard, Documentum our ECM standard, and SAP our ERP standard.”

And you won. Mostly. There’s still some Cognos in finance. And marketing didn’t totally give up on Interwoven. But, for the most part, you won. You reduced the entropy of your IT environment and drove cost savings for your organization.

The problem is you’ve won the battle but lost the war. Why? Because if, as you say, the “stuff really is all the same” you shouldn’t standardize on the most expensive product. You should standardize on the cheapest.

Do you really need to be paying those big fees to Oracle for enterprise licenses? Wouldn’t MySQL do?

Are you really using all the functionality of that $1M/year Documentum ECM system? Wouldn’t SharePoint or Alfresco do?

For BI, do you need all the bells and whistles of BusinessObjects? Wouldn’t Pentaho or Qlikview do a fine job, at a fraction of the cost?

But these alternatives are obvious. Heck, even “the establishment” (i.e, Gartner) says it’s safe to tread in the open source water. So the question is, what’s holding you back?

Switching costs. It’s hard to move off Oracle or Documentum and you don’t want to pay the nut to do so.

Organizational inertia. Your whippersnapper DBAs who were in their 30s in the 1980s are now in their 50s. They’re thinking that change devalues their knowledge and experience; some just want to cruise into retirement. But that’s their personal agenda, not your enterprise one.

Accounting: you made it free for your divisions to keep using Documentum, Oracle, or BusinessObjects because you bought an enterprise license. While this appeared to “save” you money on a per-license basis, and it helped support your standardization initiative, it squashed innovation in your divisions, reinforced the organization inertia, and has a lot of people using the wrong tool for the job, resulting in projects that either take more or more expensive hardware than necessary (Oracle is good at this), that take too long to develop, or that simply fail.

So, what do I recommend doing about all this? I suggest that you adopt these policies, which –- for full disclosure, are at least partially in the self-interest of this blog’s author:

Stop writing big checks for commodity software. Every time a big check comes along, ask yourself: is this software differentiated or commoditized? Be willing to pay a premium for differentiated software, and price shop commodity software. Call a group of your smartest staff together periodically to help you make the commodity versus differentiated call.

When you see a big check coming for commodity software, make a migration plan. My hunch is that most of the time, you can create a nice 3-year ROI in the transition from premium to cheaper software. (This reminds me of the time I visited an investment bank’s CIO asking about their Documentum strategy. The answer: “our Documentum strategy is to get off Documentum,” because we’re paying too much and using too little.)

Stop doing enterprise agreements that create poor economic incentives within your organization. Don’t pay $XM at the enterprise level, spread that as a “tax” across your divisions, and then make use of certain software “free.” It distorts project reality, creates false incentives, squashes innovation, and generates lots of hidden costs. If you want to negotiate a master agreement and discount rate, that’s fine. Shoot for centralized discounts without central planning.

Don’t worry that the prior policies will create mayhem. While I understand that you don’t want arbitrary taste differences increasing the entropy of your enterprise software portfolio, recognize that with the first policy you’ve solved that problem already. If you deem a category (e.g., core RDBMS, enterprise search) commoditized, then you are going to force people to pick on cost. You’ll get standardization on the commodity categories –- just on the least expensive alternatives. The only entropy you’ll need to manage will be on the differentiated software which, having dispatched the commodity majority, you’ll have time to explore, study, and exploit.

Why I am taking the time to write this note to you? Back in the 1980s I was a foot soldier in the relational database revolution, and today I’m the CEO of one specialized DBMS company and on the board of another.

Mark Logic makes an XML server which can save great amounts of time and money in creating applications against unstructured information, replacing the combination of an RDBMS, an enterprise search engine, and an application server. Not only can Mark Logic manage 100s of TB of XML, the system eliminates the object / relational/ hierarchical impedance mismatch between Java, SQL, and XML that hampers developer productivity. Mark Logic was recently named the fourth fastest-growing IT company in Silicon Valley.

Aster Data makes a specialized data warehouse DBMS that runs on low-cost commodity hardware with a shared nothing architecture and leverages in-database MapReduce technology for parallelism and high scalability.

And during the past 25 years or so I’ve watched the market evolve. While I fully understand the policies and market forces that have led
us to where we are, I feel like we’ve come full circle. Vendor power is now concentrated in the big three. Vendor margins top 50%. Big vendors don’t innovate; they consolidate. Inertia has set in customer organizations. And there’s a major platform shift in progress; last time it was mainframe to minicomputer, this time it’s cloud.

Things feel a lot to me the way they did in 1985, just past dawn of the relational revolution. So in one way I’m writing to point out the oft-overlooked obvious: stop paying premium prices for commodity items. And in another way I’m saying, take the money you save in so doing and invest it in innovation technologies that:

Drive competitive advantage (which will matter again as we come out of the Great Recession)

Enable the Internet-scale applications you’ll need to face the coming information deluge

Reform the application development stack in ways that make sense for the coming generation of information applications, not that made sense for the last generation of data-centric ones.

Thank you for reading my note. If you have any questions or comments, please give me a ping at dave-dot-kellogg-at-marklogic-com or comment on this post.

Sincerely,

Dave Kellogg

3 Comments

Posted in Databases, Open Source