In creating my presentation for this past December’s Mark Logic 2010 Digital Publishing Summit, I had a “creative moment” when I made a slide that made me think: wow, perhaps I’m onto something here.
The slide was a list of six things that publishers should be able to do with their content. For this blog post, I’d say the scope includes publishers of any ilk, professional publishers whose content is their business and “accidental” publishers — i.e., enterprises whose primary business is not content publishing, but where content nevertheless plays a mission-critical role (e.g., doctrine for the Army, in-flight manuals for airlines, or maintenance procedures for medical devices, such as PET scanners).
So, if content either is your business or is mission-critical to it, then here are the six things you should be able to do with it:
- Integrate it. Content is more valuable when it’s integrated with other content. Typically this means putting it in one place and then transforming it — over time — to a common structure/schema. Note that many systems require a ‘big bang” approach that requires 100% cleansed content as the first step. This artificial technology constraint dooms many projects to failure because that first step’s a doozy and is typically never completed before the business runs out of budgetary patience. Instead of trying to clean the Augean Stables as step one, adopt a lazy approach to content transformation, cleansing, and enrichment.
- Enrich it. Content can be made more valuable by enriching it; using text-mining tools to identify entities such as people and places, phone numbers and credit card numbers, geopolitical organizations, or diseases and symptoms. No matter which entities are important to your content, the odds are you can find a text mining tool that will identify them. But, whatever you do, don’t extract the entities from your content by loading them into relational tables that say “document 17 mentions Paris.” Instead, enrich the content itself through the addition of in-line markup that says <city>Paris </city> directly in the text. In-line markup allows for much more powerful queries than entity extraction. So don’t extract from your content; enrich it, instead.
- Slice and dice it. Much as good business intelligence tools let you slice and dice data, so should good content tools let you slice and dice content. Slicing and dicing means pulling content in any way that you want. You want all the section headers to dynamic build a table of contents? Great. You want all the figures and captions, only? Great. You want all the chapters in a corpus sorted by relevancy to a specific phrase? Great. You want the abstracts of articles written by a given author in a certain time period? Great. Slicing and dicing content means querying it along any dimension you want, instantly. When you can slice and dice content, you can repurpose it into new products in virtually unlimited ways.
- Deliver it. You should be able to deliver content from one repository to all of your distribution channels: web, print, feeds (e.g., RSS, Atom), BlackBerries, iPhones, the Kindle, other e-readers, 508-compliant readers, other phones, and — heck — even the rumored iTablet. The point of multi-channel publishing is to be fully separate formatting from structure so that you can dynamically render content from a central repository to all of the various forms — some existing and many not yet existing — that your content consumers want. The rapid transformation of XML is key to delivering on this vision.
- Analyze it. Today’s readers don’t just want to consume content, they want to surf it and analyze it. They want dynamic wordclouds or tagclouds. They want to do frequency analysis. They may want to analyze co-occurrence — e.g., between side-effects and drugs or symptoms and diseases. They want to count results and to slice and dice those counts using facets. They want to be able to feed visualization tools to create interfaces such as hyperbolic trees. It’s no longer enough to simply locate and deliver content: both your consumers and your internal producers want statistics both to learn more from the content and to determine who’s reading what to assist in future planning.
- Contextualize it. The Holy Grail of publishing is to put content in context. For example, rather than teaching a pilot a table of information about descent rates at various altitudes, instead give him one descent rate recommended for the specific airplane he’s flying at a specific altitude. Instead of dumping a tome of slides under various stains on a pathologist, give him an application that walks him through the process of differential diagnosis of a given tumor. Instead of documentation on service personnel, give them a laptop that outlines the exact steps — specific to a given make, model, and unit — for performing maintenance on an expensive medical device. Instead of a generic lesson for a student, intermix content and exercises in a way that’s specific and optimal for their apparent knowledge.
When information providers can do these six things with their content, they are ready to move successfully to the “post web 2.0” online age.