One of our consultants is doing some testing of MarkLogic vs. XML-extended relational databases, and he sent me an example of the kinds of queries you need to write when you’re mixing SQL and XQuery/XPath. Here is an example:
SELECT XMLQUERY( ‘$p/Citation/Index/ConceptCodeList/ConceptCode’ PASSING P.XMLDATA AS “p”)FROM AllCitations AS p WHERE contains (XMLDATA,‘(SECTION( “/Citation/Index/ChemicalData/ChemicalList/ChemicalName”) “leucovorin”)&(SECTION( “/Citation/Index/ConceptCodeList/ConceptCode”) “Pharmacology”)’) = 1;
A few things spring to mind when I see queries like this:
- This is why people made XQuery — so you wouldn’t have to write stuff like this.
- Why in the world do you need to mix XPath and SQL in this way? In a theoretically bi-lingual SQL/XQuery database, can I just write document-oriented queries purely in XQuery and not mess around with selecting columns that are themselves XMLQUERYs? Answer: in DB2’s ironically named pureXML, you need to use SQL as the outer framework if you want to use full-text indexing; so yes, you must do this.
- Are there more than 10 people in the world who will understand what the answer to this query is supposed to be? SQL and XQuery each have their own semantics, and few people deeply understand them. How many people understand not only both SQL and XQuery semantics, but also how they interact? (It reminds me of trying to find a tax guy in France who could do both the US and French systems at the same time.) I watched two world-class experts debate what the correct answer was to such a query for 20 minutes. Does Joe Programmer even have a chance?
Here are some excerpts from a comment I got internally at Mark Logic via email.1. I really like the phrase “XML-extended relational database”2. I think the tax analogy would be closer if you were looking for someone to do your US taxes in French3. I disagree that this is obviously DB2 – both XMLQUERY and CONTAINS are part of the SQL standard (SQL/XML and SQL/MM respectively), and IBM does a pretty good job of sticking to the standards.4. The semantics of this query are not what you think, because there’s no link between the XMLQUERY in the SELECT clause and the CONTAINS predicate in the WHERE clause. The WHERE clause gets you to a list of rows, and the SELECT clause pulls stuff out of those rows – but when the SELECT pulls stuff out, it has no knowledge of what it was about those rows that matched, i.e. it only sees the rows. That’s because SQL was built for data where each cell (row/column intersection) has only a single value (“hello world”, or 42, or 2007-11-08) – it’s not suited to dealing with XML, where there are lots of values inside a complex structure at each cell.Even if full-text is not involved, to guarantee the right semantics you have to do an XMLEXISTS in the WHERE clause and repeat the predicate you’ve got in the SELECT clause.Here’s an example from the Oracle 11g documentation. I like it because it’s the simplest possible case – Purchase Orders data, and no full-text.SELECT XMLCast(XMLQuery(‘$p/PurchaseOrder/Reference’ PASSING OBJECT_VALUE AS “p”RETURNING CONTENT)AS VARCHAR2(30)) “Reference”FROM purchaseorderWHERE XMLExists(‘$p/PurchaseOrder[User=”SBELL”]’ PASSING OBJECT_VALUE AS “p”);Of course, this means the same as something like $p/PurchaseOrder[User=”SBELL”]/Reference/text().And there are some situations where this kind of Frankenquery will give you unexpected (“wrong”) results.