One of the fun things about Mark Logic is that we unite people from different computing backgrounds: database people, search engine people, content management people, the odd computational linguistics person, and — of course — document/XML people.
Aside: one of my big theses of computing life is that individuals tend to stovepipe into a single computing camp early on, fail to cross-breed / cross-read, and thus the camps end up quite in-bred and incommunicado over time. That’s one reason why I deliberately “jumped camps” in leaving Business Objects four years ago, hopping from BI into unstructured data / content / documents / XML.
But I digress.
We recently hired Norm Walsh, a pretty big guy in the document camp, which elicited comments such as the following from his fellow camp members:
I’m wondering how in the hell some obscure “XQuery Content” company stole Norm Walsh away from Sun. […] Anyone care to provide some insight? Is Mark Logic really *that* good?
That was fun.
But what’s been even more fun is helping someone who is clearly a distinguished individual in one camp and introducing him to another. Towards that end, I’m happy to report that Norm is now officially certified in what I call rule 1 of database performance: push constraints to data, don’t move data to constraints.
Believe it or not, rule 1 appears quite counter-intuitive to document people who seem to innately want to materialize DOM trees and then process them in a middle tier.
Because I’m so wed to the database viewpoint, I have trouble expressing it in a document-person way. That’s why I’m happy that Norm has recounted his journey here, in a post entitled Thinking Differently about XML.
I think that most people now correctly perceive our product, MarkLogic Server, as an XML content server, a special-purpose DBMS designed specifically for handling XML marked-up content. That’s the good news.
The better news is that many of these same people are figuring out what that means when it comes to developing web applications – specifically, that you can use an XML content server to build web applications using XML top-to-bottom. No Java required. No relational tables required. No application server required. (And no expense for all those supporting products.)
Don’t get me wrong. Many customers choose to use MarkLogic as the XML repository and query system in their architecture, building their applications in Java, using an application server, and making calls out to MarkLogic to process XML queries. Lots of people use the product in that way. That’s fine.
But, people soon realize, when you have a DBMS and query language (XQuery) that directly outputs XML (e.g., xHTML) which can be directly rendered by a browser, and when that “query” language is really a misnamed and underpositioned programming language easily capable of developing entire applications, you can say:
“Wait a minute. My content’s in XML. My browser speaks XML. Why not build my whole app top-to-bottom in XML and XQuery?”
Good question. And the answer is you can. And in many cases, you probably should. What’s the advantage of so doing?
- Use of a high-level, standard, powerful programming language, XQuery. High-level and powerful translate to greater development and maintenance productivity. Standard translates to risk reduction and freedom of choice. (Aside: While XQuery is not a big-hype, overnight-success type of technology like Ajax, XQuery continues to march along with certain inevitability. In my mind, there is no question that XQuery will be the database programming language of the future – it is superior to SQL, it is more general than SQL and ergo applicable to a broader class of problems, and all major DBMS vendors are already committed to it. The question is not will XQuery become mainstream, but when?)
- Elimination of three impedance mismatches: Java/XML, XML/relational, and Java/relational. Java is object-oriented, XML is hierarchical, and relational databases are tabular. The mapping between these three different data models generates a lot of zero-value-added work in developing an application. When you’re XML top-to-bottom, poof, that work’s all gone.
- Elimination of tiers. I had lunch a while back with a top engineer at Oracle who told me that he believed the limiting factor on database application performance was becoming scheduling. That is, hardware and databases are becoming so fast that scheduling work across tiers was becoming the limiting factor in performance. His suggested solution? Eliminate tiers. Well top-to-bottom XML does exactly that.
Here’s a link to a post done by Matt Turner on his Discovering XQuery blog that discusses Publishing 2.0 and content logic.
In this post Matt discusses what I call the “thick middle tier” problem with most search-engine-based content applications.
Here’s the issue. Search engines (1) return lists of links to documents and (2) allow only fairly basic “query” (and I’m reluctant to even call them that) predicates to be applied in the search engine.
As a result, a typical search-engine-based application ends up with a thick middle tier of Java code that (1) systematically materializes each document in the returned list as a DOM tree and then (2) does subsequent processing on that document using Java.
As Matt points out, you might be tempted to think of this work as “your application” or “business logic,” but in reality it’s not. It’s content processing, not business or application processing. This approach is bad for several reasons:
- Productivity is negatively impacted because you have to do low-level content processing yourself, and typically in a relatively low-level language, like Java
- Performance is negatively impacted because you end up with an architecture that violates “rule 1” of database performance — push processing to the data, don’t bring data to the processing
All DBMSs strive for compliance with rule 1.
- Query optimizers always apply the most restrictive predicate first (e.g., apply emp-id = 178 before sex = female)
- Query optimizers always do lookup joins from the table with the most restrictive predicates on it (where dept.dname = “fieldmkt” as opposed to emp.name = “*stein*”)
- It’s why everyone loves stored procedures. Not only do they minimize client/server interaction and allow pre-compilation, most importantly, they push processing to the data.
I’m not going to criticize people who built systems this way historically. Prior to products like MarkLogic, the thick-middle-tier architecture was the best you could do. DBMSs couldn’t handle content so the best you could do was to leave your content in files (or stuff it in BLOBs), index it with a search engine, and then build these thick-middle-tier applications.
But in the future it doesn’t have to be this way. With systems like MarkLogic, you can now build content applications using a standard query language (XQuery) and the “correct” allocation of processing across tiers. This has the following benefits:
- Improved productivity because XQuery is a relatively high-level language
- Greatly improved performance because you can thin-out the middle tier and push content processing to the XML content server (which is both optimized to do it and close to the content)
- Openness and standardization, which makes it easier to find skilled resources, eliminates vendor lock-in, and makes software integration generally easier.
- Flexibility. Typically with enough smarts in the middle layer you can hack something together than runs one query fast. The trick is when you want to run many and/or new queries fast — in that case, you really need the right architecture — i.e., one that pushes processing to the content instead of bringing content to the processing.