Thanks to a Google Alert I stumbled into this interesting post entitled The Content Imperative: Unlearning the Relational Model in another CEO blog, that of Joel Amoussou of Montreal-based Efasoft.
The following are some fundamental differences between content and relational data:
- Content is created to be human readable
- Content can be rendered in multiple presentation formats such as print, web, and wireless devices. Therefore it is very important to cleanly separate content from presentation
- Content can have an inherent deep hierarchical structure. For example, think about the book/part/chapter/section/subsection/paragraph hierarchy
- The relationships between content items are expressed through hierarchical containment and hyperlinks
- Content is often mixed (in the sense of mixed content in XML). For example inside a paragraph, some words are italicized, in bold, or underlined to indicate special meaning
- Content can have multi-valued properties such as the authors of a document. Multi-valued properties are not supported by SQL.
He continues, starting an argument in favor of XML:
The problem with unstructured content is that it cannot be processed and queried like the well-structured relational data stored by the RDBMS on which your ERP and CRM systems sit. XML goes beyond tags (in the web 2.0 sense), taxonomies, full-text search, and content categorization to provide fine-grained content discovery, query, and processing capabilities. With XML, the document becomes the database. If your business is content (you are a media company, a publisher, or the technical documentation department of a manufacturing company), then you should seriously consider the benefits of XML in terms of content longevity, reuse, repurposing, and cross-media publishing.
And goes on to discuss XQuery:
The relational data model is based on set theory and predicate logic. Data is represented as n-ary relations and manipulated with relational algebra. CMS vendors and even standard bodies have tried to fork SQL in order to support hierarchies and multi-value properties. It is clear however that XQuery is a superior alternative, specifically designed to address those content-related concerns.
And then finally argues in favor of XML databases over a JCR repository when dealing with large amounts of content:
You should seriously consider a native XML database when dealing with large quantities of document-oriented XML documents.
I couldn’t agree more. (Hey, I think I like this guy). The post also includes some discussion of data vs. content modeling and some interesting parallel history between SGML/XML and the RDBMS.