Category Archives: O'Reilly

StartWithXML Early Survey Results

I’ve previously blogged about the StartWithXML project that O’Reilly is working on with the folks at Idea Logical.

Overall, the project reminds me of the California Milk Advisory Board: get a bunch of diary farmers together to push an idea they can all agree on — eat California cheese. (Which, by the way, was articulated in my favorite way through the famous Grandma commercial.)

Here, instead of dairy farmers, it’s content and publishing vendors (e.g., codeMantra, Jouve / Publishing Dimensions, Klopotek, Firebrand, and Really Strategies). But the idea is similar — get a bunch of vendors together who can agree on one thing — in this case, starting with XML — and go push that idea.

Towards that end, the project is doing a few things. First, they’re hosting a one-day forum in New York City on January 13, 2009. They’ve recently run an educational webcast entitled Essential Tools of an XML Workflow, slides below.

They’ve run a survey and are producing a research report as well. Below are some slides that highlight selected results from the survey.

Some takeaways:

  • Remember that “trade” publishers in this context means book publishers
  • Note that digital publishing is “very important” to 40% of non-trade publishers, but only 18% of trade publishers. This is scary on both sides. It’s a bit sobering to think that it’s 2009 and only 1 in 5 book publishers thinks digital is very important.
  • 43% of trade publishers say they are trying to “understand the importance” of digital publishing. Another yikes.
  • Trade publishers are twice as likely to ignore downstream use and twice as likely to edit with a print focus.
  • Trade publishers are half as likely as others to use XML in the production process

Now, none of this is a big surprise to those who work with the information and media market. The clear leaders in XML adoption were STM publishers (e.g., Elsevier), followed by those in other segments like education and B2B trade. At the mid tier, you see folks like legal, tax, and regulatory publishers and market researchers. Bringing up the rear you have consumer magazines, news, and trade publishers.

While some trade publishers (e.g., Simon and Schuster) are strong adopters of XML, it seems that most others are way behind. This will get increasingly dangerous as the Kindle takes off (I’m a user and a big fan) and the Google Books settlement turns Google into an Amazon-rival online bookseller, overnight.

If a publisher can’t output for the Kindle, pretty soon a lot of people won’t be buying your books. Right now, a quick search reveals about 200K titles for the Kindle out of 24M total on Amazon, but that number will be increasing fast. And if you can’t output in the appropriate format for Google Books to ingest your content, then for many customers, your books won’t even exist.

Trade publishers need to get moving to enable flexible output to both different print (e.g., large print, library editions) and e-book formats. The good news? 46% of trade publishers believe their business will benefit by publishing in more e-book formats and nearly 70% say print-to-web processes are problematic or need to be fixed soon.

I wonder if moving from scrolls to paper was as difficult. Well, I suppose it was.

Notes from Tim O'Reilly Keynote Address

Tim O’Reilly gave a fascinating, information-loaded, 105-slide keynote address this morning at the 2007 Mark Logic User Conference. Tidbits include:

  • O’Reilly’s mission is to change the world by spreading the knowledge of innovators: watch the alpha geeks.
  • Web 2.0, which could be called Publishing 2.0, is about information businesses: it’s a data revolution
  • What did the survivors of the dot-com bust have in common? They all used the network as a platform
  • User-generated content (UGC) and harnessing collective intelligence aren’t the same thing. UGC is one way of harnessing collective intelligence, but there are others as well. For example, every time a webmaster makes links to a site they are telling Google the site they’re linking to is important.
  • Harnessing collective intelligence is about growing a database whose value grows with the number of participants.
  • Data is the next Intel Inside. (Or, as we prefer to say at Mark Logic: content is the next Intel Inside.)
  • The top placed ad on Google isn’t based on solely on the highest bid: it’s based on the highest bid times the expected click-through rate. That both serves the user and makes Google more money
  • You should include network effects by default. On Flickr, the default is to share.
  • Human collaboration beats the machine/algorithm: consider Last.fm’s success relative to Pandora, who created a “music genome project” to try and dig into music and determine what you’ll like.
  • Everyone should read Kathy Sierra’s Creating Passionate Users blog.
  • Lessons from Google Maps: if your users are not surprising you with what they’re doing, then you’re not open enough. If they are, then try to learn from them. Half of all mashups leverage Google Maps. (See programmableweb for a mashup directory.)
  • We see content as a database and web services as a platform
  • Remember this quote from Ray Kurzweil: “an invention needs to makes sense in the world in which it’s finished, not the world in which it’s started.”

Tim also mention their upcoming conference, Tools of Change for Publishing, which is in San Jose from 6/18-20. (Mark Logic user conference attendees were given a discount code worth ~20% off.) Others should know that you can save $200 by registering prior to the early bird deadline on 5/21.