The Specialized Database Argument: Performance

People sometimes ask: what’s the argument for special-purpose databases like MarkLogic, as opposed to general-purpose databases like DB2, Oracle, or SQL Server? While I have written much on this topic, in the end I think it boils down to one word: performance.

The big 3 database oligopoly have proven that the general-purpose database management system (DBMS) can indeed be bloated into a wide scope of functionality (today’s RDBMSs are so bloated that most analysts now drop the R, because they’ve long-since stopped being relational).

So while the big 3 can bloat the DBMS, what they can’t do is optimize it for each special case. By definition, the general-purpose DBMS needs to be optimized for general purposes. When trade-offs are encountered, you must design for the general case.

That’s what creates the opening for specialized DBMSs. For example, MarkLogic is not optimized for the general case — a bit of transaction processing, a bit of data warehousing, a bit of analytics, a bit of text, a bit of XML, a bit of spatial indexing, a bit of data mining, a bit of huge deployments, a bit of tiny ones, a bit of OLAP, a bit of memory-residency, and so on.

MarkLogic is optimized for the specific case of large amounts of semi-structured XML data, typically containing lots of text. The result: performance numbers that simply crush the competition when they’re playing in our house.

For example, while I can’t go into specifics, one of our technical staff sent an email out this morning that went like:

Another 100x Win Against XXXXX

Today, I indexed XML in 137 seconds which took XXXXX 4 hours, even though they were running on beefier hardware. Due to other pressing deadlines [and the already clear victory], I didn’t have time to optimize the MarkLogic side. Had I been able to do threading and cache tuning, I’m quite sure I could have sped up the MarkLogic side by 4x.

Is this magic? No.

While I think the world of our engineering team and I do believe they have built a tremendous product, there’s no magic. It’s simply the combination of a great implementation focused on a specific XML-based use case. No general-purpose player can beat that.

Startup Zeitgeist

Seedcamp, a London-based, week-long camp for European entrepreneurs recently did an interesting exercise. They took the several hundred applications they received for their event and made tagclouds. Here’s what they found.

What are you creating?


How will you make money?


What tools will you use?

(I’d love to see XQuery in the toolset, but happy to see that database, server, and XML are already there.)

And who says you can’t do interesting analytics on content? I thought this was fascinating. Check out Seedcamp’s blog post about the exercise, here.

Krugman on the Grateful Dead as a Business Model

Back in June, Paul Krugman wrote a nice op-ed piece in the New York Times entitled Bits, Bands, and Books which looks at the changes in the information and media business (e.g., publishing, music) and compares them to what I call the Grateful Dead business model.

Having been to, shall we say “more than one,” Grateful Dead concert, I’ve always believed the Dead were the role model for Web 2.0. Consider the business model:

  • Give away the (digital) product. Encourage live taping (bootlegging) and tape sharing. I’ve been at shows where they stopped and waited until someone moved their microphones so they could get a better recording.
  • Make money by selling concert tickets. To my knowledge they made more money touring than any band in history.
  • Make money by selling paraphernalia (in the sense of t-shirts and such)
  • Build a strong community. Need I say more?

So, all the while the music industry was freaking out over the copy-ability of digital media, I kept asking myself — why doesn’t anyone study the Dead? (And, yes, part of the answer is that all those concerts were hard work compared to replicating albums or CDs.)

As a business-oriented Dead fan, I’d always thought this. I was just happy to find someone, er, respectable, who thought the same thing. You can read Krugman’s piece, here.

The Never-Ending Fast Search Story

I’ve already spent a lot of space covering the financial issues at Fast Search & Transfer. In part that was because, prior to the Microsoft acquisition, we competed fairly often with Fast, particularly in our publishing practice. Part was because the company reminded me of MicroStrategy, against whom we had to compete at Business Objects. Part was driven by my personal interest in international software companies and the issues that un-level the reporting playing field (e.g., GAAP vs. IFRS reporting).

Anyway, I took a crack at a post earlier today based on a story in a Norwegian business weekly, Dagens Næringsliv, that in turn has prompted posts from CMS Watch to TechCrunch to Stephen Arnold to Curt Monash.

I burned several hours, posted something, got in the car, drove home, and deleted the post just after I arrived. Somehow, despite considerable effort, I couldn’t find what I thought was a satisfactory and appropriate way to editorialize.

Ergo, I decided simply to present the story. You can see it by pressing this link or looking at the Scribd iPaper below.

Disclaimers: I don’t speak Norwegian and can’t attest to the quality of the translation. I don’t know either Norwegian culture nor Norwegian business publications so I can’t vouch for either the legitimacy of the source publication itself or for any cultural slant present in the story.

Beneath the translated text are images of the original story with Norwegian body copy.


Thoughts on Category Creation and Information Access Platforms [Revised]

[Revised 8/2/08; still working on cleaning up this consciousness stream.]

Back in the old days, it seemed easy to create a category in software. Look at the database market, for example:

  • IBM invents the relational DBMS (RDBMS) category
  • Oracle, Ingres, and Informix enter in a largely undifferentiated way, though Informix eventually drifts towards the low-end/cheap segment
  • Sybase creates the derivative category of high-performance OLTP RDBMS.
  • Arbor re-christens the failed multi-dimensional DBMS as the OLAP Server
  • Tandem creates the non-stop RDBMS with its superb fault tolerance
  • Illustra launches the universal DBMS and is quickly acquired by Informix
  • Sybase launches the bitmap-indexed DBMS with SybaseIQ
  • Teradata launches the data-warehouse DBMS category

And you can find just as many examples outside database-land.

  • ASK defines the manufacturing resource planning (MRP) category
  • SAP hijacks MRP, redefines it as ERP, and goes on to become the world’s largest applications software company
  • PeopleSoft invents the HRMS category
  • Gartner Group’s Howard Dresner invents the business intelligence (BI) category, re-christening and re-framing what was formally known as DSS or EIS.
  • Siebel pioneers the sales force automation (SFA) category
  • Scopus pioneers call center automation (CCA)
  • Companies like Rubric pioneer enterprise marketing automation (EMA)
  • Siebel, through acquisition, coalesces SFA, CCA, and EMA into a single category called customer relationship management (CRM)
  • Oracle and SAP work to coalesce CRM back into ERP. Such is the ebb and flow of categories.

(And I could go on and on — BPM, KM, CMS, WCM, ECM, LMS, DRM, SCM, PLM, ETL, DI, EII — but I think I’ll stop here with the initials list.)

People are still creating categories today, and sometimes it looks easy. Uber-categories have been quite popular in the past decade as people have focused on different ways of developing and delivering software:

  • SaaS as an uber-category has worked well, with a variety offerings in various SaaS sub-categories (e.g., Salesforce, NetSuite)
  • Appliances have done pretty much the same thing — i.e., offering an appliance alternative for a wide variety of existing categories (e.g., a data warehouse appliance a la Netezza)
  • Open source has also done the same thing — again serving as a different flavor/dimension for a wide variety of largely existing software categories.

Only a few genuinely new categories have emerged, virtualization being the most obvious example. (Though you could argue that virtualization is itself an uber-category covering storage virtualization, server virtualization, et cetera.)

Companies are still working to carve new categories, particularly in the database market:

Sometimes vendors and/or the analysts who cover them try to impose either a straight name change (e.g., from MD-DBMS to OLAP) or a strategic shift (e.g., from BI to analytic applications) in category. Sometimes they’re just bored. Sometimes a vendor’s trying to redefine the market in line with its strengths. Sometimes an analyst is trying to make his/her mark on the industry and earn the coveted “father/mother of [category name],” much as Howard Dresner successfully did with BI.

BI got bored with its name several times during my tenure at Business Objects. At one point both the analysts and Informatica were trying to re-dub the category “analytic applications” in an attempt to get a fresh name and raise the abstraction level from tools to applications. Informatica nearly died on that hill.

Later, analysts tried to redefine the category, dubbing it corporate performance management (CPM), and arguing that business intelligence needed to link with financial planning systems. While knowing actuals is good, knowing actuals compared to the plan is better, and using actuals to drive the future plan better still. Cognos nearly tripped over itself repositioning around the CPM, ultimately acquiring Adaytum, which in turn lead to SRC’s eventual acquisition by Business Objects.

In an art-imitates-life sort of way, one wonders if the analysts predicted a move in the market or provoked it? My chips are on the latter.

This stream-of-consciousness is a long way of winding up to a single question: are enterprise search vendors successfully repositioning themselves as “information access platforms” or not?

Background: the enterprise-search-related vendors (e.g., Fast/Microsoft, Endeca) and search/content analysts who cover them are in the midst of an attempted category repositioning:

  • The word “enterprise search” is now seemingly dead, having been contaminated by the Google Appliance. When a shark gets in the water, all the fish jump out.
  • The word “information” is increasingly being used as a unifying term to describe both data and content (aka, unstructured data)
  • Enterprise search vendors are increasingly calling themselves “information access platforms” (though not generally abbreviated as IAP, I will do so here for brevity).

For example, consider Endeca’s corporate boilerplate:

Endeca’s innovative information access software that helps people explore, analyze, and understand complex information, guiding them to unexpected insights and better dec
isions. The Endeca Information Access Platform, built around a new class of access-optimized database, powers applications that combine the ease of searching and browsing with the analytical power of business intelligence.

I have a number of concerns on and related to this attempted shift:

  • The important thing about categories is that they exist in the mind of the customer. Analysts and vendors can try to put them there — but they have to stick. In my mind, IAP is not sticking. I have never heard a customer say: “I need to go out and get an IAP.”
  • I do, however, believe that “information” might well stick as an overall term, meaning both data and content (aka, structured and unstructured data).
  • It is not clear to me why someone who desires a unified platform for “information” would turn to a search vendor. Search engines were designed as read-only indexes to help people find documents containing tokens; hardly ideal as an application development platform.
  • In my estimation, someone managing “special” data should turn to a database vendor. While databases have classically not handled “special” data well, databases were designed as application platforms, and there is a whole new class of specialized databases emerging for handling various “special” types of data.
  • While I think a unified platform is a dandy vision, I think no one is close to delivering a unified platform that handles all types of data equally well. Bolting Lucene and MySQL together isn’t a platform. Relational databases still do a poor job with both content and many types of data (e.g., sparse, hierarchical, or semi-structured). XML servers (like MarkLogic) handle XML brilliantly, but need work before they can match RDBMSs at classical relational data.
  • I believe that someone who needs a crawl-and-index the intranet value proposition should use the Google Appliance; so I think the search vendors are correct in their desire to flee, I don’t think that “information access platform” is a good refuge.

Overall, my chips remain on the don’t come line for the attempted category repositioning from “enterprise search” to “information access platform.” You can find my stack on the come line for the emerging “special-purpose database” category and “XML servers” as an instance of them.