Positioning MarkLogic Server

Here’s a great picture from our VP of engineering, Ron Avnur, on how he positions MarkLogic Server relative to other software categories. It’s an elegant and simple way of explaining where we fit.

The two dimensions are structure and query type. Structure can either be predefined or ad hoc (and often, in the document world, there is a predefined structure that no one actually uses, which is de facto ad hoc). Query types can either be predefined (i.e., known in advance) or ad hoc (i.e., not known in advance).

Let’s look at the quadrants that result:

  • Bottom left is where both structure and queries and predefined. Hierarchical DBMSs, like IMS, live in this quadrant. In these (now legacy) systems, the structure of the data is rigidly defined as are the queries that may be run against them. These databases provide high performance, but their inflexibility became their Achilles’ heel.
  • Bottom right is where structure is predefined but queries are ad hoc. The quadrant defines the relational database, which brought unprecedented flexibility to database querying, eventually enabling the modern BI market. Data structure is predefined through the creation of tables with defined names/columns to hold the data. Queries are ad hoc — in a well designed relational database, the system can provide the results for almost any imaginable query. (And with the right indexes, it can provide those results fairly quickly.)
  • Top left is where queries are predefined but structure is not. This — and this is non-obvious to most people — is the zone of the enterprise search engine. People tend to think of search engines as providing high flexiblity because you can type any word in the search box. In reality, seen from a database viewpoint, search engines provide a small number of parametrized queries. (It’s the parametrization that gives the impression of flexibility.) The small number of queries include (1) return list of documents where document contains word or phrase, (2) return list of documents where field-in-document contains word or phrase, (3) either query (1) or (2) where word or phrase is replaced with the search engine’s basically Boolean primitive query language (i.e., AND, OR, NOT).
  • The top right is the tricky zone where both queries and structure are not defined in advance. This is the zone of the XML Server, like MarkLogic. In these systems, content can be ingested “as is” without adherence to any predefined structure. Queries are ad hoc, and written in XQuery with full-text extensions. Given the proper indexes, these systems can run virtually any query against the content with high performance.

Hopefully this sheds some light on my soundbite that: “at Mark Logic, we are doing for (XML) documents what the relational database did for data.”

Related articles by Zemanta
Reblog this post [with Zemanta]

4 responses to “Positioning MarkLogic Server

  1. Stephan H. Wissel

    Interesting positioning. Need to show that to a few RDBMS zealots. Where would you put DB/2 PureXML into that picture?:-) stwDisclaimer: I work for IBM/Lotus

  2. Thanks for noting your affiliation, which seems fair since everybody knows mine.DB2 pureXML is what I call an “XML column” or XML datatype in a relational database model.Unlike the first few whacks by the RDBMS vendors, so-called “native XML storage” introduces the ability to natively (e.g., without shredding) store XML in a column in a relational database.So it muddies up the water a bit, which obviously, is there intent. The RDBMS vendors want to argue that “XML is a feature” and XML is just another datatype you want to put in your RDBMS.XML Server vendors want to argue that XML warrants a category, a new type of DBMS optimize for — and only for — built XML-based web applications.Simply put, if you’re building XML-based web apps which are all about slicing and delivering XML documents, then why do you want a DBMS that includes lots of features for: processing SQL, optimizing table joins, optimizing data warehouse query performance, driving 100s of TPS, et cetera.The RDBMS guys will argue that all that extra code does no harm. We’ll argue it’s bloatware and that you should use tools and DBMSs optimized for the task at hand.

  3. Thanks… just the sound-bite I needed for level 2 of “what does Dave’s company do again?”

  4. Dave, I’ve taken the liberty of extrapolating your thinking a little, as I can still see a technology gap that XQuery does not yet fill.In short I have introduced the RDF question into the picture. I would like XQuery and RDF for Christmas!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.