People sometimes ask: what’s the argument for special-purpose databases like MarkLogic, as opposed to general-purpose databases like DB2, Oracle, or SQL Server? While I have written much on this topic, in the end I think it boils down to one word: performance.
The big 3 database oligopoly have proven that the general-purpose database management system (DBMS) can indeed be bloated into a wide scope of functionality (today’s RDBMSs are so bloated that most analysts now drop the R, because they’ve long-since stopped being relational).
So while the big 3 can bloat the DBMS, what they can’t do is optimize it for each special case. By definition, the general-purpose DBMS needs to be optimized for general purposes. When trade-offs are encountered, you must design for the general case.
That’s what creates the opening for specialized DBMSs. For example, MarkLogic is not optimized for the general case — a bit of transaction processing, a bit of data warehousing, a bit of analytics, a bit of text, a bit of XML, a bit of spatial indexing, a bit of data mining, a bit of huge deployments, a bit of tiny ones, a bit of OLAP, a bit of memory-residency, and so on.
MarkLogic is optimized for the specific case of large amounts of semi-structured XML data, typically containing lots of text. The result: performance numbers that simply crush the competition when they’re playing in our house.
For example, while I can’t go into specifics, one of our technical staff sent an email out this morning that went like:
Another 100x Win Against XXXXX
Today, I indexed XML in 137 seconds which took XXXXX 4 hours, even though they were running on beefier hardware. Due to other pressing deadlines [and the already clear victory], I didn’t have time to optimize the MarkLogic side. Had I been able to do threading and cache tuning, I’m quite sure I could have sped up the MarkLogic side by 4x.
Is this magic? No.
While I think the world of our engineering team and I do believe they have built a tremendous product, there’s no magic. It’s simply the combination of a great implementation focused on a specific XML-based use case. No general-purpose player can beat that.