Unlearning the Relational Model

Thanks to a Google Alert I stumbled into this interesting post entitled The Content Imperative: Unlearning the Relational Model in another CEO blog, that of Joel Amoussou of Montreal-based Efasoft.

Says Joel:

The following are some fundamental differences between content and relational data:

  • Content is created to be human readable
  • Content can be rendered in multiple presentation formats such as print, web, and wireless devices. Therefore it is very important to cleanly separate content from presentation
  • Content can have an inherent deep hierarchical structure. For example, think about the book/part/chapter/section/subsection/paragraph hierarchy
  • The relationships between content items are expressed through hierarchical containment and hyperlinks
  • Content is often mixed (in the sense of mixed content in XML). For example inside a paragraph, some words are italicized, in bold, or underlined to indicate special meaning
  • Content can have multi-valued properties such as the authors of a document. Multi-valued properties are not supported by SQL.

He continues, starting an argument in favor of XML:

The problem with unstructured content is that it cannot be processed and queried like the well-structured relational data stored by the RDBMS on which your ERP and CRM systems sit. XML goes beyond tags (in the web 2.0 sense), taxonomies, full-text search, and content categorization to provide fine-grained content discovery, query, and processing capabilities. With XML, the document becomes the database. If your business is content (you are a media company, a publisher, or the technical documentation department of a manufacturing company), then you should seriously consider the benefits of XML in terms of content longevity, reuse, repurposing, and cross-media publishing.

And goes on to discuss XQuery:

The relational data model is based on set theory and predicate logic. Data is represented as n-ary relations and manipulated with relational algebra. CMS vendors and even standard bodies have tried to fork SQL in order to support hierarchies and multi-value properties. It is clear however that XQuery is a superior alternative, specifically designed to address those content-related concerns.

And then finally argues in favor of XML databases over a JCR repository when dealing with large amounts of content:

You should seriously consider a native XML database when dealing with large quantities of document-oriented XML documents.

I couldn’t agree more. (Hey, I think I like this guy). The post also includes some discussion of data vs. content modeling and some interesting parallel history between SGML/XML and the RDBMS.

Reblog this post [with Zemanta]

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.