Everyone knows that content is largely unmanaged, untapped as an enterprise resource. We’ve all heard the soundbite (usually attributed to IDC) that 80% of enterprise information is unstructured and not kept in a database. We all believe there is a lot of value in tapping into that content.
So, we’re all on the same mission, right? We’re all seeking the same Grail. Or are we?
As a long-time data guy, I know well the Grail that data people seek — it’s the data/content integration Grail. The stump speech I used at Business Objects went something like this.
“Oh yeah, there’s that other stuff — content, I think it’s called — that doesn’t fit in your relational databases, so you can’t access it with our BI tools. Well, I guess it’s obvious what you want to do with that content — structure it up and summarize it, so you can shove it in the warehouse right alongside your data … and then make reports on it using our tools.”
When you have a $1B data business, you see content through data-colored glasses.
The example I used in my past life was customer service emails. “What you really want is to take the hundreds of feedback emails that flow in every week and make a cross-tab report that groups them along two dimensions: by product and by tone (i.e., positive, negative, neutral). In that way, you can take a mass of otherwise useless, unstrutcured content, and use it to enrich your existing dashboards and reports.”
It’s a good example. It’s a real example. Lots of people want to do it. But it is not the only example. Integration with data is not the sole reason to unlock content. Many important content applications have little or no data/content integration angle:
- Custom publishing
- Content delivery
- Contract management
- Content integration
- Knowledge management
- RFP management
- Technical publications
- Financial publishing
- Search and discovery
- Archiving
- Content intelligence
Just to name a few.
If you have a 2 TB data warehouse and you want summarize some unstructured content into data and then load it alongside your existing tables, then you should acquire a text analytics tool to do the summarization and then store the resulting data in your data warehouse.
If, on the other hand, you are working on systems that need to query, manipulate, and render content (such as those listed above) then I’d argue you are seeking a different Grail. It’s about content for content’s sake … and not about turning content into data.
So the question is not do you seek the Grail, but indeed which Grail do you seek?
Great points made.I think the grail for unstructured and semi structured data, is to be able to translate it in such a way that others in the value chain can re-use it for their purposes. It could be reports, data feeds, integration with other applications etc.
Wow – you’re blog is full of good info. It’s getting hard to find blogs with useful content and people talking about Database Management these days. I have just started my Latest Database Management News blog and would really appreciate you coming by – thanks again