Category Archives: Enterprise Search

Thoughts on Category Creation and Information Access Platforms [Revised]

[Revised 8/2/08; still working on cleaning up this consciousness stream.]

Back in the old days, it seemed easy to create a category in software. Look at the database market, for example:

  • IBM invents the relational DBMS (RDBMS) category
  • Oracle, Ingres, and Informix enter in a largely undifferentiated way, though Informix eventually drifts towards the low-end/cheap segment
  • Sybase creates the derivative category of high-performance OLTP RDBMS.
  • Arbor re-christens the failed multi-dimensional DBMS as the OLAP Server
  • Tandem creates the non-stop RDBMS with its superb fault tolerance
  • Illustra launches the universal DBMS and is quickly acquired by Informix
  • Sybase launches the bitmap-indexed DBMS with SybaseIQ
  • Teradata launches the data-warehouse DBMS category

And you can find just as many examples outside database-land.

  • ASK defines the manufacturing resource planning (MRP) category
  • SAP hijacks MRP, redefines it as ERP, and goes on to become the world’s largest applications software company
  • PeopleSoft invents the HRMS category
  • Gartner Group’s Howard Dresner invents the business intelligence (BI) category, re-christening and re-framing what was formally known as DSS or EIS.
  • Siebel pioneers the sales force automation (SFA) category
  • Scopus pioneers call center automation (CCA)
  • Companies like Rubric pioneer enterprise marketing automation (EMA)
  • Siebel, through acquisition, coalesces SFA, CCA, and EMA into a single category called customer relationship management (CRM)
  • Oracle and SAP work to coalesce CRM back into ERP. Such is the ebb and flow of categories.

(And I could go on and on — BPM, KM, CMS, WCM, ECM, LMS, DRM, SCM, PLM, ETL, DI, EII — but I think I’ll stop here with the initials list.)

People are still creating categories today, and sometimes it looks easy. Uber-categories have been quite popular in the past decade as people have focused on different ways of developing and delivering software:

  • SaaS as an uber-category has worked well, with a variety offerings in various SaaS sub-categories (e.g., Salesforce, NetSuite)
  • Appliances have done pretty much the same thing — i.e., offering an appliance alternative for a wide variety of existing categories (e.g., a data warehouse appliance a la Netezza)
  • Open source has also done the same thing — again serving as a different flavor/dimension for a wide variety of largely existing software categories.

Only a few genuinely new categories have emerged, virtualization being the most obvious example. (Though you could argue that virtualization is itself an uber-category covering storage virtualization, server virtualization, et cetera.)

Companies are still working to carve new categories, particularly in the database market:

Sometimes vendors and/or the analysts who cover them try to impose either a straight name change (e.g., from MD-DBMS to OLAP) or a strategic shift (e.g., from BI to analytic applications) in category. Sometimes they’re just bored. Sometimes a vendor’s trying to redefine the market in line with its strengths. Sometimes an analyst is trying to make his/her mark on the industry and earn the coveted “father/mother of [category name],” much as Howard Dresner successfully did with BI.

BI got bored with its name several times during my tenure at Business Objects. At one point both the analysts and Informatica were trying to re-dub the category “analytic applications” in an attempt to get a fresh name and raise the abstraction level from tools to applications. Informatica nearly died on that hill.

Later, analysts tried to redefine the category, dubbing it corporate performance management (CPM), and arguing that business intelligence needed to link with financial planning systems. While knowing actuals is good, knowing actuals compared to the plan is better, and using actuals to drive the future plan better still. Cognos nearly tripped over itself repositioning around the CPM, ultimately acquiring Adaytum, which in turn lead to SRC’s eventual acquisition by Business Objects.

In an art-imitates-life sort of way, one wonders if the analysts predicted a move in the market or provoked it? My chips are on the latter.

This stream-of-consciousness is a long way of winding up to a single question: are enterprise search vendors successfully repositioning themselves as “information access platforms” or not?

Background: the enterprise-search-related vendors (e.g., Fast/Microsoft, Endeca) and search/content analysts who cover them are in the midst of an attempted category repositioning:

  • The word “enterprise search” is now seemingly dead, having been contaminated by the Google Appliance. When a shark gets in the water, all the fish jump out.
  • The word “information” is increasingly being used as a unifying term to describe both data and content (aka, unstructured data)
  • Enterprise search vendors are increasingly calling themselves “information access platforms” (though not generally abbreviated as IAP, I will do so here for brevity).

For example, consider Endeca’s corporate boilerplate:

Endeca’s innovative information access software that helps people explore, analyze, and understand complex information, guiding them to unexpected insights and better dec
isions. The Endeca Information Access Platform, built around a new class of access-optimized database, powers applications that combine the ease of searching and browsing with the analytical power of business intelligence.

I have a number of concerns on and related to this attempted shift:

  • The important thing about categories is that they exist in the mind of the customer. Analysts and vendors can try to put them there — but they have to stick. In my mind, IAP is not sticking. I have never heard a customer say: “I need to go out and get an IAP.”
  • I do, however, believe that “information” might well stick as an overall term, meaning both data and content (aka, structured and unstructured data).
  • It is not clear to me why someone who desires a unified platform for “information” would turn to a search vendor. Search engines were designed as read-only indexes to help people find documents containing tokens; hardly ideal as an application development platform.
  • In my estimation, someone managing “special” data should turn to a database vendor. While databases have classically not handled “special” data well, databases were designed as application platforms, and there is a whole new class of specialized databases emerging for handling various “special” types of data.
  • While I think a unified platform is a dandy vision, I think no one is close to delivering a unified platform that handles all types of data equally well. Bolting Lucene and MySQL together isn’t a platform. Relational databases still do a poor job with both content and many types of data (e.g., sparse, hierarchical, or semi-structured). XML servers (like MarkLogic) handle XML brilliantly, but need work before they can match RDBMSs at classical relational data.
  • I believe that someone who needs a crawl-and-index the intranet value proposition should use the Google Appliance; so I think the search vendors are correct in their desire to flee, I don’t think that “information access platform” is a good refuge.

Overall, my chips remain on the don’t come line for the attempted category repositioning from “enterprise search” to “information access platform.” You can find my stack on the come line for the emerging “special-purpose database” category and “XML servers” as an instance of them.

Kawaski Interviews Balmer at Mix08

Check out this post with notes from the Guy Kawaski keynote interview with Steve Balmer at the Mix08 conference this week in Las Vegas.

In the interview Balmer talks about Google, the Yahoo! deal, Apple, his three types of day [see below], Silverlight, the Facebook investment, Fast Search & Transfer [see below], the number of emails he gets per day (~60), and he even gives out his email address: steveb@microsoft.com.

On his three types of day:

  1. With customers. From 730 AM to 800 PM and then get on [private] plane to next city.
  2. Doctor in office. Wall to wall meetings all day. “Exhausting.”
  3. Think, write, and research.

On Fast Search & Transfer:

Fast is company had internet and website/corporate products. Sold off web search. They have great for high end search on enterprise and engines that can search web sites. Tech fantastic and team is great. Anxious to build both ways. Love company/people. Great integration plan – more to say.

This is consistent with my thesis for why Microsoft bought Fast (to fend off the Google Appliance in high-end enterprise search, aka, the best defense is a good offense). However, I’d not previously heard the message that they want to build Fast out “both ways” — i.e., in enterprise search and in their Internet search offerings.

The only part of the acquisition that continues to amaze me is the ~8x revenue-run-rate price. That kind of multiple is in-line for high flyers, i.e., for healthy, high growth enterprise software companies. But Fast was in the midst of unwinding a world-class accounting mess, complete with lots of AR write-offs and a revenue restatement. I’d think companies in that situation are usually lucky to trade for 1-2x revenues.

Much as the the price SAP paid for Business Objects wasn’t surprising until you noticed that Business Objects was about to announce a quarterly miss, nor is Microsoft’s price for Fast surprising until you consider the not so easy to overlook financial mess. Personally, I would have guessed a sale in the $300M to $500M price range, proving that I’m not always right.

My current speculation is that there must have been a bidding war for the price to get so high. The fun question then becomes who else was bidding, why did they want it so bad, and what are they going to do now that they’ve lost?

Google and Autonomy Spat, Round II

Autonomy and Google are at it again. Per this InformationWeek story:

For the second time in six months, Google has publicly challenged a white paper from enterprise search rival Autonomy, claiming the latest document contains “significant inaccuracies.”

For customers with demanding needs, the Google appliance lacks the necessary security and connectivity models,” Mike Lynch, chief executive of Autonomy, said in an emailed statement. “It is not possible to make successful high-end enterprise search solutions without mapped security and productized connectors to repositories.”

I’ve not yet had time to dig into the detail of this, so I’m sharing it more as a news item for now and will — if it proves interesting — come back with analysis later.

Google’s rebuttal is here on the Google Enterprise blog.

My free PR advice for Google is to avoid a spat and simply create a low-key white paper that responds to any claims they believe are incorrect. In my experience, in PR wars the big guy never wins. Sometimes the little guy wins. Sometimes both companies lose. So when you’re the leader the best strategy is not to fight. Much as you want to.

The Relevancy Quest

In the classic book, The Innovator’s Dilemma, Clayton Christensen concludes that a key reason leading companies fail is because they spend too much energy working on sustaining innovations that continuously improve their products for their existing customers. Seemingly paradoxically, he points out that these sustaining innovations can involve very advanced and very expensive technology. That is, it’s not the nature of the technology used (e.g., advanced or simple) that causes innovation to be sustaining or disruptive — it’s who the technology is designed to serve and in what uses.

I think search vendors need to dust off their copies of The Innovator’s Dilemma. Why? Because, for the most part, they seemed wedged in the following paradigm, which I’d call the relevancy quest:

  • Search is about grunting a few keywords
  • The answer is a list of links
  • The quest is then magically inducing the most relevant links given a few grunts

And it’s not a bad paradigm. Heck, it made Google worth $140B and bought Larry and Sergey a nice 767. But can we do better?

Some folks, like the much-hyped Powerset, think so. They’re challenging the grunting part of the equation, arguing that “keyword-ese” is the problem and the solution is natural language. They seem unphased both by Ask Jeeves’ failure to dominate search and by the more than 20 years of failed attempts to provide natural language interfaces to database data, used for business intelligence (BI). As I often say, if natural language were the key to BI user interfaces, then Business Objects would have been purchased by Microsoft years ago for a pittance and Natural Language Inc.’s DataTalker would rule BI. (Instead of the other way around.)

But I respect Powerset because at least they’re challenging the paradigm and taking a different approach to the problem. And, while I sure don’t understand the cost model, I also respect guys like ChaCha because they’re challenging the paradigm, too. In ChaCha’s case, they’re delivering human-powered search where you can literally chat with a live guide who helps you refine your search.

I can also respect the social search guys, including the recently launched Mahalo, because they’re challenging the paradigm as well — using Wisdom of Crowds / Web 2.0 / Wikipedia style collaboration to created “hand-written results pages” for topics, such as the always searchable “Paris Hilton.”

The folks I have trouble understanding are those on the algorithmic relevancy quest, companies like Hakia, a semantic search vendor (interviewed here by Read/Write Web) whose schtick is meaning-based search, and who comes complete with a PageRank ™ rip-off-name algorithm called SemanticRank ™. Or Ask who recently launched a $100M advertising campaign about “the algorithm“. These people remind me of the disk drive manufacturers who invested millions in very advanced technologies for improved 8″ disk drives (to serve their existing customers) all the while missing the market for 5.25” disk drives required by different customers (i.e., PC manufacturers).

Are the Hakias of the world answering the right question? Should we be grunting keywords into search boxes and relying on SomethingRank ™ to do the best job of determining relevancy? Is the search battle of the future really about “my rank’s better than you rank” or equivalently, “my PhD’s smarter than your PhD”? Aren’t these guys fighting the last war?

As usual, I think there are separate answers for Internet and enterprise search.

On the Internet side, sure I think search engines can certainly use more “magic” to improve search relevancy. For example, they can use recent queries and a user profile to impute intent. They can use dynamic clustering and iterative query refinement (e.g., faceted navigation) to help users incrementally improve the precision of their queries.

More practically, I think vertical search and community sites are a great way of improving search results. The context of the site you’re on provides a great clue to what you’re looking for. Typing “Paris Hilton” into Expedia means you’re probably looking for a hotel, where typing it EOnLine means you’re looking for information on the jailed debutante.

Of course, there are a host of Web 2.0 style techniques to improve search like diggs and wikis which can be put to work as well.

Increasingly, our publishing and media customers are going well beyond “improving search” and changing the paradigm to “content applications” — systems that combine software and content to help specific users accomplish specific tasks. See Elsevier’s PathConsult as a concrete example.

On the enterprise search side, I think the answer is different. As I’ve often mentioned, on the enterprise side you lack the rich link structure of the web, effectively lobotomizing PageRank and robbing Google of its once-special (and now increasingly gamed and hacked) sauce.

When I look for the answer of how to improve search in an enterprise context, I look back to BI, where we have decades of history to guide us about the quest to enable end-user access to corporate data.

  • Typing SQL (once seriously considered as the answer) failed. Too complex. While SQL itself was the great enabler of the BI industry, end users could never code it.
  • Creating reports in 4GL languages failed. Too complex.
  • Having other people create reports and deliver them to end users was a begrudging success. While this created a report treadmill/backlog for IT and buried end-users in too much information, it was probably the most widely used paradigm.
  • Natural language interfaces failed. Too hard to express what you really want. Too much precision required. Too much iteration required.
  • End users using graphical tools linked directly to the database schema failed. While these tools hid the complexities of SQL, they failed to hide the complexity of the database schema.

It was only when Business Objects invented a graphical, SQL-generating tool that hid all underlying database complexity and enabled users to compose an arbitrary query that the BI market took off. Simply put, there were two keys:

1. The ability to phrase an arbitrary query of arbitrary complexity (not a highly constrained search).

2. The ability to hide the complexity of the database from the underlying user

While no one has yet built a such a tool for an arbitrary XML contentbase (and while I think building one will be hard given the lack of requirement for a defined schema), MarkLogic customers use our product every day to build content applications that generate complex queries against large contentbases, and completely hide XQuery from the end-user.

Simply put, it’s not about improving search. It’
s about delivering query. That’s the game-changer.

Enterprise Search Crisis

Check out this blog post on ZDnet, entitled “Enterprise Search: Why it’s a Crisis and Googzilla will Strike,” which is a series of takeaways from Stephen Arnold’s recent presentation at Enterprise Search Summit in New York.

The parts I agree with are:

  • It’s a crisis. I continue to believe there is generally low customer satisfaction with enterprise search and it seems to come from a combination of expectations management and post-sale delivery. As one enterprise search alumnus I know says: “At [enterprise search vendor X], we sold a Ferrari. However, we just dumped the pieces on your driveway and you had to assemble it.”
  • Over-positioning enterprise search as a silver bullet. I see a lot of this. Enterprise search vendors claim today that they can do everything from finding documents (the original purpose) to detecting money laundering to BI reporting to merchandising so you can sell more polo shirts to data warehousing to legal compliance and beyond. One vendor pitches 30 different solutions, each as a silver bullet, I’d suppose.
  • Complexity and cost. Enterprise search vendors charge a lot of money for their wares and they are certainly complex to configure, use — and in some cases — understand. (Think Bayesian inferencing.)
  • That all this creates a great opportunity for Google to step in and sweep up some serious market share.

However, I think I have different take at the macro level. Simply put, I think enterprise search is stuck between a rock and a hard place.

  • The rock is database management systems. Many search solutions are integrations of relational databases with search engines along with templates for specific applications. While many search vendors are trying to reposition their products as application platforms, they’re not. Tying together MySQL, search engine X, and some pre-processing logic so you can properly feed the search engine indexer is not a great “platform” on which to build applications. Databases are much better application platforms and the real problem has been that databases, until recently, didn’t do content. But as new generations of database management systems — like MarkLogic — emerge, it will become increasingly clear that the platform for content applications should not be an enterprise search engine (bolted to other things), but instead a database management system built to natively handle content.
  • The hard place is the Google Appliance. There will always be a need for “Google inside your company” type search. I call this the “crawl and index” value proposition. Given cost and complexity, I can’t see why Google won’t sweep up most of the market here. (I just wish they could do better with PDFs and email.)

You can find the full text of Stephen’s speech here.