Category Archives: Information and Media

Matt Turner 2010 Predictions from the Information Industry Summit

Just a quick post to highlight a nice one-minute video of Mark Logic’s own Matt Turner, captured making a few predictions at the SIIA’s 2010 Information Industry Summit in snowy New York City.

I’ve embedded it below:

Six Things Publishers Should Be Able To Do With Content

In creating my presentation for this past December’s Mark Logic 2010 Digital Publishing Summit, I had a “creative moment” when I made a slide that made me think: wow, perhaps I’m onto something here.

The slide was a list of six things that publishers should be able to do with their content. For this blog post, I’d say the scope includes publishers of any ilk, professional publishers whose content is their business and “accidental” publishers — i.e., enterprises whose primary business is not content publishing, but where content nevertheless plays a mission-critical role (e.g., doctrine for the Army, in-flight manuals for airlines, or maintenance procedures for medical devices, such as PET scanners).

So, if content either is your business or is mission-critical to it, then here are the six things you should be able to do with it:

  • Integrate it. Content is more valuable when it’s integrated with other content. Typically this means putting it in one place and then transforming it — over time — to a common structure/schema. Note that many systems require a ‘big bang” approach that requires 100% cleansed content as the first step. This artificial technology constraint dooms many projects to failure because that first step’s a doozy and is typically never completed before the business runs out of budgetary patience. Instead of trying to clean the Augean Stables as step one, adopt a lazy approach to content transformation, cleansing, and enrichment.
  • Enrich it. Content can be made more valuable by enriching it; using text-mining tools to identify entities such as people and places, phone numbers and credit card numbers, geopolitical organizations, or diseases and symptoms. No matter which entities are important to your content, the odds are you can find a text mining tool that will identify them. But, whatever you do, don’t extract the entities from your content by loading them into relational tables that say “document 17 mentions Paris.” Instead, enrich the content itself through the addition of in-line markup that says <city>Paris </city> directly in the text. In-line markup allows for much more powerful queries than entity extraction. So don’t extract from your content; enrich it, instead.
  • Slice and dice it. Much as good business intelligence tools let you slice and dice data, so should good content tools let you slice and dice content. Slicing and dicing means pulling content in any way that you want. You want all the section headers to dynamic build a table of contents? Great. You want all the figures and captions, only? Great. You want all the chapters in a corpus sorted by relevancy to a specific phrase? Great. You want the abstracts of articles written by a given author in a certain time period? Great. Slicing and dicing content means querying it along any dimension you want, instantly. When you can slice and dice content, you can repurpose it into new products in virtually unlimited ways.
  • Deliver it. You should be able to deliver content from one repository to all of your distribution channels: web, print, feeds (e.g., RSS, Atom), BlackBerries, iPhones, the Kindle, other e-readers, 508-compliant readers, other phones, and — heck — even the rumored iTablet. The point of multi-channel publishing is to be fully separate formatting from structure so that you can dynamically render content from a central repository to all of the various forms — some existing and many not yet existing — that your content consumers want. The rapid transformation of XML is key to delivering on this vision.
  • Analyze it. Today’s readers don’t just want to consume content, they want to surf it and analyze it. They want dynamic wordclouds or tagclouds. They want to do frequency analysis. They may want to analyze co-occurrence — e.g., between side-effects and drugs or symptoms and diseases. They want to count results and to slice and dice those counts using facets. They want to be able to feed visualization tools to create interfaces such as hyperbolic trees. It’s no longer enough to simply locate and deliver content: both your consumers and your internal producers want statistics both to learn more from the content and to determine who’s reading what to assist in future planning.
  • Contextualize it. The Holy Grail of publishing is to put content in context. For example, rather than teaching a pilot a table of information about descent rates at various altitudes, instead give him one descent rate recommended for the specific airplane he’s flying at a specific altitude. Instead of dumping a tome of slides under various stains on a pathologist, give him an application that walks him through the process of differential diagnosis of a given tumor. Instead of documentation on service personnel, give them a laptop that outlines the exact steps — specific to a given make, model, and unit — for performing maintenance on an expensive medical device. Instead of a generic lesson for a student, intermix content and exercises in a way that’s specific and optimal for their apparent knowledge.

When information providers can do these six things with their content, they are ready to move successfully to the “post web 2.0″ online age.

The Downturn: Accelerating the Digital Publishing Transition

As part of my company’s focus on the media industry, I sit on a few industry groups where I have the opportunity to spend quality time with senior media and publishing industry executives.

Like any CEO, I have a natural tendency to believe that my company is, if not totally counter-cyclical, at least somewhat immune to the effects of the economic downturn. I’ve heard enough CEOs make the claim (cf: this query), often where it’s ostensibly absurd, that I should ask myself if I don’t have a case of CEO denial. Am I arguing something akin to the rise in bedbugs is good for the hotel industry or not?

So when a recent publishing executive group I sit on started to discuss the economic downturn, I turned up my defenses to make sure I didn’t have my happy ears on.

But executive after executive said that they believed the downturn is accelerating the digital publishing transformation. Not because I said it. Not because, as a technology supplier that helps companies transition, I want it to be true. But because about a dozen senior folks from many different publishing sectors said it.

Why?

  • Foot-dragging in some publishing sectors has already gone on almost a decade, slowly whittling away at the traditional models and those who support them.
  • As the decade has passed, the top brass at publishers continues to change, slowly replacing less tech savvy executives with more tech savvy ones.
  • Enough time has passed that there are now examples of both new and traditional publishing companies who have successfully transitioned business models. The “it can’t be done” rationalization starts to wear thin.
  • Hands are being forced. Seeking to cut costs, publishers are forced to make real trade-offs between investing in the future and preserving the past. When forced, most executives will bet on the future.

Now that I see the picture, it’s clear: after roughly a decade of fence-setting, the downturn is forcing publishers of all ilks to move. The downturn is accelerating the transition to digital publishing. And that’s not happy ears.

Hard Times Strategies for Publishers

I just stumbled into this pithy post from Greenhouse Associates, a boutique strategy consultancy that serves firms in the information and media market. The post, entitled Counter-Intuitive Tactics for Bad Times, lists seven non-obvious tactics that companies should consider when managing through tough times.

The list is below, along with a brief parenthetic comment on each item:

  • Invest in product development, not sales. (We like this one since MarkLogic Server is often sold to publishers as a platform for new product development.)
  • Turn salespeople into consultants. (A good idea at any time, but a necessary one in tough times.)
  • Put your customer first. (Ditto. Information companies have such a long history of product-centricity that the transition to customer/solution-centricity is a big one.)
  • Build value through relationships as well as products. (Complement product with service and the relationships built in the process.)
  • Look for evergreen and counter-cyclical sectors. (Example: bankruptcy and foreclosure lawyers are having a field day.)
  • Cut costs with a scalpel, not a hatchet. (My first reaction to an across-the-board cut is that management either couldn’t or didn’t take the time to figure out a more strategic way to do its job.)
  • Be ready for black swans. (Life is discontinuous. Yes.)

The full article is here.

StartWithXML Early Survey Results

I’ve previously blogged about the StartWithXML project that O’Reilly is working on with the folks at Idea Logical.

Overall, the project reminds me of the California Milk Advisory Board: get a bunch of diary farmers together to push an idea they can all agree on — eat California cheese. (Which, by the way, was articulated in my favorite way through the famous Grandma commercial.)

Here, instead of dairy farmers, it’s content and publishing vendors (e.g., codeMantra, Jouve / Publishing Dimensions, Klopotek, Firebrand, and Really Strategies). But the idea is similar — get a bunch of vendors together who can agree on one thing — in this case, starting with XML — and go push that idea.

Towards that end, the project is doing a few things. First, they’re hosting a one-day forum in New York City on January 13, 2009. They’ve recently run an educational webcast entitled Essential Tools of an XML Workflow, slides below.

They’ve run a survey and are producing a research report as well. Below are some slides that highlight selected results from the survey.

Some takeaways:

  • Remember that “trade” publishers in this context means book publishers
  • Note that digital publishing is “very important” to 40% of non-trade publishers, but only 18% of trade publishers. This is scary on both sides. It’s a bit sobering to think that it’s 2009 and only 1 in 5 book publishers thinks digital is very important.
  • 43% of trade publishers say they are trying to “understand the importance” of digital publishing. Another yikes.
  • Trade publishers are twice as likely to ignore downstream use and twice as likely to edit with a print focus.
  • Trade publishers are half as likely as others to use XML in the production process

Now, none of this is a big surprise to those who work with the information and media market. The clear leaders in XML adoption were STM publishers (e.g., Elsevier), followed by those in other segments like education and B2B trade. At the mid tier, you see folks like legal, tax, and regulatory publishers and market researchers. Bringing up the rear you have consumer magazines, news, and trade publishers.

While some trade publishers (e.g., Simon and Schuster) are strong adopters of XML, it seems that most others are way behind. This will get increasingly dangerous as the Kindle takes off (I’m a user and a big fan) and the Google Books settlement turns Google into an Amazon-rival online bookseller, overnight.

If a publisher can’t output for the Kindle, pretty soon a lot of people won’t be buying your books. Right now, a quick search reveals about 200K titles for the Kindle out of 24M total on Amazon, but that number will be increasing fast. And if you can’t output in the appropriate format for Google Books to ingest your content, then for many customers, your books won’t even exist.

Trade publishers need to get moving to enable flexible output to both different print (e.g., large print, library editions) and e-book formats. The good news? 46% of trade publishers believe their business will benefit by publishing in more e-book formats and nearly 70% say print-to-web processes are problematic or need to be fixed soon.

I wonder if moving from scrolls to paper was as difficult. Well, I suppose it was.

XML: Why You Should Care

The folks at O’Reilly Media have created an excellent blog around their ToC (Tools of Change for Publishing) meme and event. As part of that, they are running a series called StartWithXML that has some excellent material on the topic of XML and publishing.

One of the first posts in the StartWithXML project is entitled Why You Should Care About XML by Andrew Savikas, with whom I had the pleasure of speaking on a panel at the Gilbane conference in San Francisco a few months back. Excerpt:

But there are several reasons why it’s really really important for publishers to start paying attention to XML right now, and across their entire workflow:

  • XML is here to stay, for the reasonably forseeable future. While it’s always dangerous to attempt to predict expiration dates on technology, I think it’s fair to assume XML will have a shelf life at least as long as ASCII, which has been with us for more than 40 years, and isn’t going anywhere soon.
  • Web publishing and print publishing are converging, and writing and production for print will be much more influenced by the Web than vice-versa. It will only get harder to succeed in publishing without putting the Web on par with (or ahead of) print as the primary target. The longer you wait to get that content into Web-friendly and re-usable XML, the worse.

Many in publishing balk at bringing XML “up the stack” to the production, editing, or even the authoring stage. And with good reason; XML isn’t really meant to be created or edited by hand (though a nice feature is that in a pinch it easily can be). There are two places to look for useful clues about how XML will actually fit into a publisher’s workflow: Web publishing and the “alpha geeks.”

He then goes on to examining both web publishing and alpha geek behavior in order to provide a lay of the future publishing land. See the post for more.

O’Reilly is also hosting a StartWithXML one-day forum in New York City on 1/13/09 at the McGraw-Hill Auditorium.

Mark Logic in EContent Magazine Dynamic Navigation Story

A rather overdue post to highlight that Mark Logic was featured a few months back in an EContent Magazine story entitled Reaping Information: Dynamic Navigation Helps Users (PDF).

Excerpts:

Delivering information in ways that make the most sense to users is a key characteristic of MarkLogic Server, an XML Server that allows users to store, manage, manipulate, and deliver information

Indeed, a key use-case for MarkLogic is as an information delivery platform. More:

Media company ALM uses MarkLogic Server for its enterprise content repository, which holds more than 2 decades worth of news and analysis for and about the legal market.

ALM was acquired by Incisive Media a while back but nevertheless remains a customer. More:

Oxford University Press has organized its reference works on African-Americans into a central repository it calls the African American Studies Center (AASC), which allows researchers the ability to search through images and articles, arranging them in chronological order.

AASC is not only a very cool MarkLogic-based application, but also — perhaps more importantly — it’s just one slice of Oxford’s content.

Once a publisher builds their content application platform, it is relatively easy to take different slices of their content to build new and different information products. For example, Oxford Islamic Studies Online (OISO) is built on the same platform as the AASC, and I’m sure the OISO’s marginal development cost was reduced because it could leverage the fixed costs invested the development of OUP’s (MarkLogic-based) publishing platform.