Forbes Interview: Corporate Pack Rats

I recently had a conversation with Ed Sperling from Forbes who runs their “CIO Chat” column. Today, ran a story, entitled Corporate Pack Rats, resulting from that interview.

It’s hard to talk about content these days without talking about e-discovery, email archive search (wouldn’t MarkMail be wonderful at that?), and compliance. So the story starts out with a chat about that.

I then go onto one of my new rants: why does everyone want to play offense (think: business intelligence) with their data, but simply play defense (think: records management, e-discovery) with their content? Yes, not going to jail is important, but don’t you believe there’s value in your corporate document/content that help you build better products, serve customers better, and improve the efficiency of your operations?

This excerpt summarizes it well:

Do CIOs get this?

Most CIOs? No. The vast majority are still in a place where they’re trying to avoid getting in trouble with their documents.

We later started talking about one of my favorite topics, XML, where there’s another nice excerpt:

Does all the content have to be tagged with XML, because there’s a lot of content that predates XML?

The better the tags, the better the queries. If you want to find all documents that contain the words “bird strike,” any text search engine can do that. If you want to find all documents that classify procedures related to approach, if all that is tagged, you can get a pinpointed result. Without tags, you may learn that somewhere in the 300-page PDF are the words “bird strike.” That’s not very helpful. With the tags, you can increase the precision of searches and their granularity.

Finally, another nice excerpt related to the slow, inexorable move towards XML:

There will be transition issues, but over the next three to five years we’re going to move from a “.doc” world to “.docx.” Right now, rounding up it’s 1%. But in five years, rounding down it will be 100%.

Indeed. And that’s one big change.

3 responses to “Forbes Interview: Corporate Pack Rats

  1. regarding docx, I don’t see how a flat list of elements (what does tagged with XML mean?) in document order will help poinpoint data.

  2. if you could tag everything in document in the following way imagine the queries you could run. consider a document that has markup like: [drug]rituximab, [caption]five-year survivial rates, [citation]another-article, [author]bob ekstrom, [date]11/1/07i could run queries like: return the abstracts of all articles that talk about the drug rituximab with “survival rate” in a caption of figure that are cited more than 5 times by other articles in the corpus and were published in 2007. basically, the goal/vision is to have both structural and semantic markup [how you get that is another topic] for every interesting item in a document and then run database-like queries against documents, which is basically impossible today.

  3. Sure, but docx does not do this (it is possible, but not probable). It is a simple list of p elements with (usually) a nested t elem. here is an example of the docx markup:<w:p w:rsidR="00A41EAD" w:rsidRDefault="00A41EAD" w:rsidP="00A41EAD"> <w:pPr> <w:pStyle w:val="NormalWeb"/> </w:pPr> <w:r> <w:t xml:space="preserve">In another moment down went </w:t> </w:r> <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="City"> <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="place"> <w:r> <w:t>Alice</w:t> </w:r> </w:smartTag> </w:smartTag> <w:r> <w:t xml:space="preserve"> after it, never once considering how in the world she was to get out again. </w:t> </w:r></w:p><w:p w:rsidR="00A41EAD" w:rsidRDefault="00A41EAD" w:rsidP="00A41EAD"> <w:pPr> <w:pStyle w:val="NormalWeb"/> </w:pPr> <w:r> <w:t xml:space="preserve">The rabbit-hole went straight on like a tunnel for some way, and then dipped suddenly down, so suddenly that </w:t> </w:r> <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="City"> <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="place"> <w:r> <w:t>Alice</w:t> </w:r> </w:smartTag> </w:smartTag> <w:r> <w:t xml:space="preserve"> had not a moment to think about stopping herself before she found herself falling down a very deep well. </w:t> </w:r></w:p>

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.