Category Archives: content architecture

Mark Logic in EContent Magazine Dynamic Navigation Story

A rather overdue post to highlight that Mark Logic was featured a few months back in an EContent Magazine story entitled Reaping Information: Dynamic Navigation Helps Users (PDF).

Excerpts:

Delivering information in ways that make the most sense to users is a key characteristic of MarkLogic Server, an XML Server that allows users to store, manage, manipulate, and deliver information

Indeed, a key use-case for MarkLogic is as an information delivery platform. More:

Media company ALM uses MarkLogic Server for its enterprise content repository, which holds more than 2 decades worth of news and analysis for and about the legal market.

ALM was acquired by Incisive Media a while back but nevertheless remains a customer. More:

Oxford University Press has organized its reference works on African-Americans into a central repository it calls the African American Studies Center (AASC), which allows researchers the ability to search through images and articles, arranging them in chronological order.

AASC is not only a very cool MarkLogic-based application, but also — perhaps more importantly — it’s just one slice of Oxford’s content.

Once a publisher builds their content application platform, it is relatively easy to take different slices of their content to build new and different information products. For example, Oxford Islamic Studies Online (OISO) is built on the same platform as the AASC, and I’m sure the OISO’s marginal development cost was reduced because it could leverage the fixed costs invested the development of OUP’s (MarkLogic-based) publishing platform.

Lazy XML Enrichment

One of my big gripes with most content-oriented software is that it requires a big bang approach (see The First Step’s a Doozy). The basic premise behind most content software is roughly:

1. If you do all this hard work to perfectly standardize the schema of your content, perfectly tag it, and possibly perfectly shred it, then

2. You can do cool stuff like content repurposing, content integration, multi-channel content delivery, and custom publishing.

The problem is, of course, that the first step is lethal. Many content software projects blow up on the launchpad because they can’t get beyond step 1. Our first customer had been stuck on step 1 for 18 months with Oracle before they found Mark Logic. (We loaded their content in a week.) At a recent Federal tradeshow, we had dinner with some folks from Booz Allen who’d been trying to load to some semi-structured message traffic data into a relational database for months. We told them to swing by our booth the next day. Our sales engineer then loaded their content over a cup of coffee while eating a muffin and built a basic application in an hour. They couldn’t believe it.

In most companies — even publishers — content is a mess. It’s in 100 different places in 15 different formats, and each defined format is usually more of an aspiration than a standard. Once, at a multi-billion dollar publisher one of our technical guys actually found this sentence in some internal documentation: “it is believed that this tag is used to …” Only folklore describes the schema.

So when it comes to the general problem of making XML more rich — i.e., having more tags that indicate more meaning — many people take the same big-bang approach. “Well, step 1 would be to put all the content into a single schema (which alone could kill you) and run it through a dozen different entity, fact, sentiment, concept, summarization “extractors” that can markup the content and fragments of it with lots of new and powerful tags (which alone could cost millions).

Again, step 1 becomes lethal.

At Mark Logic we advocate that people consider the opposite approach. Instead of:

  • Step 1: make the content perfect so you can enable any application you want to build
  • Step 2: build an application

We say:

  • Step 1: figure out the application you want to build
  • Step 2: figure out which portions of your markup need to be improved to build that application
  • Step 3: improve only that markup, sometimes manually, sometimes with extraction software, and sometimes with heuristics (i.e., rules of thumb) coded in XQuery
  • Step 4: build your application and get some business value from it
  • Step 5: repeat the process, driven by subsequent application requirements

I call this lazy XML enrichment. You could call it application-driven, as opposed to infrastructure-driven, content cleanup. I think it’s an infinitely better approach because it delivers business results faster and eliminates the risk of either never finishing the first step because it’s impossible, or having funding yanked by the business because it runs out of patience with an IT project that’s showing no ostensible progress.

At this point, I’d like to direct those of technical heart to Matt Turner’s Discovering XQuery blog where he provides a detailed post (code included) that shows an example of lazy, heuristic-based XML enrichment, here.

  • Matt’s example show lazy enrichment because the only markup he needs for his desired application is related to weapons, so that’s all he adds.
  • Matt’s example is heuristic-based because he devises a way to find weapons in XQuery, and then use XQuery to tag them as such.

What's At The Center of Your Content Architecture?

I had dinner a few weeks back in Boston with the folks from Harvard Business School Publishing (HBSP).

The restaurant, Mare, had the absolutely unique positioning of “organic, coastal Italian.” (Try the spaghettoni and the bread pudding.) The wine list was super but they lacked cocktails, presumably a victim of the Boston liquor license shortage.

We had an interesting conversation about “content architecture” — that is, in a complete multi-channel publishing system what software and tools should be used where, and in which roles, to achieve business ends. Put concretely, where and how you do handle everything from:

  • Authoring
  • Workflow
  • Content management
  • Content transformation
  • Content enrichment
  • Content delivery
  • Rights management
  • Subscription and billing management
  • Merchandising and cross-selling
  • Search
  • Database

At one point, the HBSP folks asked me a seemingly simple question: “Dave, what do you think should be at the center of your content architecture?”

Sensing a trick question, I hesitated. “Content?” I dared.

“That seems logical to me to me, too,” they said. “But you know what? Content is almost never at the center of a content architecture. The center is always about some relational database or some ecommerce system or some rights management package.”

Then it struck me: this is another thing that MarkLogic enables: we let you put content at the center of your content architecture.

That’s what Elsevier does. That’s what O’Reilly does. That’s what Oxford University Press does.

Amazingly, in large part due to the type of tools available, content has ended up a second-class citizen in a content architecture. It was one of those observations that was so obvious I’d never seen it before.

So, question for you: what’s at the center of your content architecture?