Here’s a great picture from our VP of engineering, Ron Avnur, on how he positions MarkLogic Server relative to other software categories. It’s an elegant and simple way of explaining where we fit.
The two dimensions are structure and query type. Structure can either be predefined or ad hoc (and often, in the document world, there is a predefined structure that no one actually uses, which is de facto ad hoc). Query types can either be predefined (i.e., known in advance) or ad hoc (i.e., not known in advance).
Let’s look at the quadrants that result:
- Bottom left is where both structure and queries and predefined. Hierarchical DBMSs, like IMS, live in this quadrant. In these (now legacy) systems, the structure of the data is rigidly defined as are the queries that may be run against them. These databases provide high performance, but their inflexibility became their Achilles’ heel.
- Bottom right is where structure is predefined but queries are ad hoc. The quadrant defines the relational database, which brought unprecedented flexibility to database querying, eventually enabling the modern BI market. Data structure is predefined through the creation of tables with defined names/columns to hold the data. Queries are ad hoc — in a well designed relational database, the system can provide the results for almost any imaginable query. (And with the right indexes, it can provide those results fairly quickly.)
- Top left is where queries are predefined but structure is not. This — and this is non-obvious to most people — is the zone of the enterprise search engine. People tend to think of search engines as providing high flexiblity because you can type any word in the search box. In reality, seen from a database viewpoint, search engines provide a small number of parametrized queries. (It’s the parametrization that gives the impression of flexibility.) The small number of queries include (1) return list of documents where document contains word or phrase, (2) return list of documents where field-in-document contains word or phrase, (3) either query (1) or (2) where word or phrase is replaced with the search engine’s basically Boolean primitive query language (i.e., AND, OR, NOT).
- The top right is the tricky zone where both queries and structure are not defined in advance. This is the zone of the XML Server, like MarkLogic. In these systems, content can be ingested “as is” without adherence to any predefined structure. Queries are ad hoc, and written in XQuery with full-text extensions. Given the proper indexes, these systems can run virtually any query against the content with high performance.