O'Reilly Labs: Possibilities With Database Content

Frequent readers will know that O’Reilly Media used MarkLogic to build a very cool, Web 2.0 style application called SafariU that lets professors build custom textbooks for college courses and then share what they’ve created with other professors.

I’ve written frequently about SafariU on this blog because:

  • It’s a cool application that’s easy to understand
  • It is a canonical custom publishing system (from which you can extrapolate to other custom publishing examples)
  • It’s example of disaggregation. O’Reilly is selling pages for $0.16 instead of books for $52, much as iTunes sells songs for $0.99 instead of CDs for $15.
  • It’s an example of what we in Silicon Valley call “eating your own dog food” (as the driver behind the Web 2.0 concept, it’s only fitting that O’Reilly build a Web 2.0 style app).
  • It’s a creative response to real business problems in the textbook market (e.g., resistance to multiple book purchases, used book cannabalism, price sensitivity, course readers)

Today’s quick post (after yesterday’s record-length monster) is primarily about serendipity.

One of the by-products of building a system like SafariU is that you end up with your content in a MarkLogic database, and therefore can do anything you’d like using XQuery. O’Reilly Labs is an example of just that. Here’s an excerpt from the O’Reilly Radar post on the project:

So Ryan Grimm and Andy Bruno started asking themselves what else they could do with all that content. A couple of their initial projects are up on our new O’Reilly Labs site. The first, Code Search, lets you search through the more than 2.6 million lines of example code from almost 700 O’Reilly books. You can limit your search to a particular book, a particular category (e.g. Perl, or Java), or a particular author.

In addition, the site has a content statistics section. Again, an excerpt from the O’Reilly post:

Want to know how many total pages there are in all O’Reilly books? (309,647) How many examples? (123,439) Do our Java books or our Perl books have more lines of code per page, on average? (Java) How many lines? (14.76 vs. 10.97 for Perl.) How many index entries are there in an average O’Reilly book? (1,783)

The stats are linked to the search box, so changing the search refigures the stats for the books matching the search result. There’s also a cool tag cloud of the most commonly appearing technical terms across all O’Reilly books… and clicking on a term takes you to a listing of all the books containing the term. From there, you can click to a content statistics page for each book.

The stats application could be useful not only for marketing and merchandising books, but also for product planning. For example, if you started to see Ruby popping up in programming language books a few years ago, it would have been an early indicator of the opportunity for books on Ruby. This starts to enter the realm of content analytics, a space that is only now just starting to form.

The labs site is not a sexy polished application like SafariU. But it does show the cool things a couple of smart guys can do using XQuery, quickly, once they’ve got their content loaded into a MarkLogic database.

2 responses to “O'Reilly Labs: Possibilities With Database Content

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s