Category Archives: digital publishing

Six Things Publishers Should Be Able To Do With Content

In creating my presentation for this past December’s Mark Logic 2010 Digital Publishing Summit, I had a “creative moment” when I made a slide that made me think: wow, perhaps I’m onto something here.

The slide was a list of six things that publishers should be able to do with their content. For this blog post, I’d say the scope includes publishers of any ilk, professional publishers whose content is their business and “accidental” publishers — i.e., enterprises whose primary business is not content publishing, but where content nevertheless plays a mission-critical role (e.g., doctrine for the Army, in-flight manuals for airlines, or maintenance procedures for medical devices, such as PET scanners).

So, if content either is your business or is mission-critical to it, then here are the six things you should be able to do with it:

  • Integrate it. Content is more valuable when it’s integrated with other content. Typically this means putting it in one place and then transforming it — over time — to a common structure/schema. Note that many systems require a ‘big bang” approach that requires 100% cleansed content as the first step. This artificial technology constraint dooms many projects to failure because that first step’s a doozy and is typically never completed before the business runs out of budgetary patience. Instead of trying to clean the Augean Stables as step one, adopt a lazy approach to content transformation, cleansing, and enrichment.
  • Enrich it. Content can be made more valuable by enriching it; using text-mining tools to identify entities such as people and places, phone numbers and credit card numbers, geopolitical organizations, or diseases and symptoms. No matter which entities are important to your content, the odds are you can find a text mining tool that will identify them. But, whatever you do, don’t extract the entities from your content by loading them into relational tables that say “document 17 mentions Paris.” Instead, enrich the content itself through the addition of in-line markup that says <city>Paris </city> directly in the text. In-line markup allows for much more powerful queries than entity extraction. So don’t extract from your content; enrich it, instead.
  • Slice and dice it. Much as good business intelligence tools let you slice and dice data, so should good content tools let you slice and dice content. Slicing and dicing means pulling content in any way that you want. You want all the section headers to dynamic build a table of contents? Great. You want all the figures and captions, only? Great. You want all the chapters in a corpus sorted by relevancy to a specific phrase? Great. You want the abstracts of articles written by a given author in a certain time period? Great. Slicing and dicing content means querying it along any dimension you want, instantly. When you can slice and dice content, you can repurpose it into new products in virtually unlimited ways.
  • Deliver it. You should be able to deliver content from one repository to all of your distribution channels: web, print, feeds (e.g., RSS, Atom), BlackBerries, iPhones, the Kindle, other e-readers, 508-compliant readers, other phones, and — heck — even the rumored iTablet. The point of multi-channel publishing is to be fully separate formatting from structure so that you can dynamically render content from a central repository to all of the various forms — some existing and many not yet existing — that your content consumers want. The rapid transformation of XML is key to delivering on this vision.
  • Analyze it. Today’s readers don’t just want to consume content, they want to surf it and analyze it. They want dynamic wordclouds or tagclouds. They want to do frequency analysis. They may want to analyze co-occurrence — e.g., between side-effects and drugs or symptoms and diseases. They want to count results and to slice and dice those counts using facets. They want to be able to feed visualization tools to create interfaces such as hyperbolic trees. It’s no longer enough to simply locate and deliver content: both your consumers and your internal producers want statistics both to learn more from the content and to determine who’s reading what to assist in future planning.
  • Contextualize it. The Holy Grail of publishing is to put content in context. For example, rather than teaching a pilot a table of information about descent rates at various altitudes, instead give him one descent rate recommended for the specific airplane he’s flying at a specific altitude. Instead of dumping a tome of slides under various stains on a pathologist, give him an application that walks him through the process of differential diagnosis of a given tumor. Instead of documentation on service personnel, give them a laptop that outlines the exact steps — specific to a given make, model, and unit — for performing maintenance on an expensive medical device. Instead of a generic lesson for a student, intermix content and exercises in a way that’s specific and optimal for their apparent knowledge.

When information providers can do these six things with their content, they are ready to move successfully to the “post web 2.0″ online age.

Two Great Posts on Media Industry Disruption

I’ve been off filling my brain at the Stanford Graduate School of Business for the past two weeks, so I haven’t been able to post much. I have nevertheless managed to keep my Tweetstream going so, if you’re not already following me on Twitter, you may wish to consider doing so because I am changing my sharing pattern to include more Tweets based upon the realization that bit.ly makes it very easy to do so and that I only blog on somewhere between 5% and 25% of the topics that I throw on my to-blog list.

On digging through the deluge of RSS articles I found on my return, I located two particularly interesting posts on disruption of the media industry.

The first is a post by Michael Nielsen, a quantum information theorist and seemingly very smart fellow, entitled Is Scientific Publishing About To Be Disrupted, which includes links to some great posts about the challenges facing newspapers, and provides not only a great general discussion of how industry disruption happens, but also specific look at media overall and scientific publishing in particular. I’d never heard of Nielsen before, but I’ve already subscribed to his blog because he strikes me as a real Renaissance individual working on fascinating projects like a book on The Future of Science, a series of posts on Google’s Technology Stack, along with the odd post on things like Why The World Needs Quantum Mechanics.

The second is a post on the ReadWriteWeb entitled Bits of Destruction Hit the Book Publishing Business Part 1 and Part 2. These posts focus on three waves rocking the publishing industry (Google Book Search, e-Books, and print on demand) and their consequences on various participants in the book publishing value chain. In the end they predict that future book revenues end up getting split 33/33/33 among the author, the (web) publisher, and the e-book or print-on-demand deliverer.

Excerpt:

Here is a bookstore owner’s nightmare. Customer walks in; browses around; has grand old time in this temple of knowledge; peruses a book that costs $27; takes out Kindle and orders it for $17, right there in front of your nose, using your wi-fi connection. Aaagh!

You wake up sweating at 3:00 in the morning

Both posts are well worth reading, but save some time to do so and be sure to hit lots of the links embedded in the Nielsen post.

The Downturn: Accelerating the Digital Publishing Transition

As part of my company’s focus on the media industry, I sit on a few industry groups where I have the opportunity to spend quality time with senior media and publishing industry executives.

Like any CEO, I have a natural tendency to believe that my company is, if not totally counter-cyclical, at least somewhat immune to the effects of the economic downturn. I’ve heard enough CEOs make the claim (cf: this query), often where it’s ostensibly absurd, that I should ask myself if I don’t have a case of CEO denial. Am I arguing something akin to the rise in bedbugs is good for the hotel industry or not?

So when a recent publishing executive group I sit on started to discuss the economic downturn, I turned up my defenses to make sure I didn’t have my happy ears on.

But executive after executive said that they believed the downturn is accelerating the digital publishing transformation. Not because I said it. Not because, as a technology supplier that helps companies transition, I want it to be true. But because about a dozen senior folks from many different publishing sectors said it.

Why?

  • Foot-dragging in some publishing sectors has already gone on almost a decade, slowly whittling away at the traditional models and those who support them.
  • As the decade has passed, the top brass at publishers continues to change, slowly replacing less tech savvy executives with more tech savvy ones.
  • Enough time has passed that there are now examples of both new and traditional publishing companies who have successfully transitioned business models. The “it can’t be done” rationalization starts to wear thin.
  • Hands are being forced. Seeking to cut costs, publishers are forced to make real trade-offs between investing in the future and preserving the past. When forced, most executives will bet on the future.

Now that I see the picture, it’s clear: after roughly a decade of fence-setting, the downturn is forcing publishers of all ilks to move. The downturn is accelerating the transition to digital publishing. And that’s not happy ears.

Semantic Technologies at Dow Jones

Matt Turner, a principal consultant in our Information and Media practice attended the recent New York Semantic Web Meetup and told me about this interesting presentation from Christine Connors, Global Director of Semantic Technology Solutions at Dow Jones. (First off, it’s kinda cool that Dow Jones even has a director of semantic technology.)

Her presentation, entitled An Overview of Semantic Technologies at Dow Jones, follows: