Category Archives: Enterprise Search

Quick Take on the Dassault Systèmes Acquisition of Exalead

Today, in what I consider a surprising move, French PLM and CAD vendor Dassault Systèmes announced the acquisition of French enterprise search vendor Exalead for €135M or, according to my calculator, $161M.  Here is my quick take on the deal:

  • While I don’t have precise revenue figures, my guess is that Exalead was aiming at around $25M in 2010 revenues, putting the price/sales multiple at 6.4x current-year sales, which strikes me as pretty good given what I’m guessing is around a 25% growth rate.  (This source says $21M in software revenue, though the year is unclear and it’s not clear if software means software-license or software-related.  This source, which I view as quite reliable, says $22.7M in total revenue in 2009 and implies around 25% growth.  Wikipedia says €15.5M in 2008 revenues, which equals exactly $22.7M at the average exchange rate.  This French site says €12.5M in 2008 revenues.  The Qualis press release — presumably an excellent source — says €14M ($19.5M) in 2009 revenues.  Such is the nature of detective work.)
  • I am surprised that Dassault would be interested in search-based applications, Exalead’s latest focus.  While PLM vendors have always had an interest in content delivery and life-cycle documentation (e.g., a repair person entering feedback on documentation that directly feeds into future product requirements) , I’d think they want to buy a more enterprise techpubs / DITA vendor than a search vendor to do so as in the PTC / Arbortext deal of 2005.  Nevertheless, Dassault President and CEO Bernard Charlès said that with Exalead they could build “a new class of search-based applications for collaborative communities.”  There is more information, including a fairly cryptic video which purports to explain the deal, on a Dassault micro-site devoted to the Exalead acquisition, which ends with the phrase:  search-based applications for lifelike experience.  Your guess as to what that means is as good as mine.
  • A French investment firm called SCA Qualis owned 83% of Exalead steadily building up its position from 51% in 2005 to 83% in 2008, through successive rounds of €5M, €12M and €5M in 2005, 2006, and 2008 respectively.  This causes me to question the CrunchBase’s profile that Exalead had raised a total of $15.6M.  (You can see €22M since 2005 and the company was founded in 2000.  I’m guessing there was $40M to $50M invested in total, though some reports are making me think it’s twice that.)
  • The prior bullet suggests that Qualis took $133M of the sale price and everybody else split $27M, assuming there were no active liquidation preferences on the Qualis money.
  • Given the European-focus, the search-focus, and the best-and-brightest angle (Exalead had more than its share of impressive grandes écoles graduates), one wonders why Autonomy didn’t end up owning Exalead, as opposed to a PLM/CAD company.  My guess is Autonomy took a look, but the deal got too pricey for them because they are less interested in paying up for great technology and more interested in buying much larger revenue streams at much lower multiples.  In some sense, Autonomy’s presumed “pass” on this deal is more proof that they are no longer a technology company and instead a CA-like, Oracle-like financial consolidation play.  (By the way, there’s nothing wrong with being a financial play in my view; I just dislike pretending to be one thing when you’re actually another.)
  • One wonders what role, if any, the other French enterprise search vendor, Sinequa, played in this deal.  They, too, have some great talent from France’s famed Ecole Polytechnique, and presumably some nice technology to go along with it.

Here are some links to other coverage of the deal

IDC’s Definiton of Search-Based Applications

Sue Feldman and the team over at IDC are talking about a new category / trend called search-based applications, and I think they may well be onto something.

Because I believe that IDC puts real thought and rigor into definitions, I pay attention when I see them attempting to define something. From past experience, IDC was about 10 years ahead of the market in predicting the convergence of BI and enterprise applications with — even in the mid 1990s — a single analyst covering both ERP and BI.

Here’s how IDC describes search-based applications.

Search-based applications combine search and/or text analytics with collaborative technologies, workflow, domain knowledge, business intelligence, or relevant Web services. They deliver a purpose-designed user interface tailored to support a particular task or workflow. Examples of such search-based applications include e-Discovery applications, search marketing/advertising dashboards, government intelligence analysts’ workstations, specialized life sciences research software, e-commerce merchandising workbenches, and premium publishing subscriber portals in financial services or healthcare.

There are many investigative or composite, text- and data-centric analysis activities in the enterprise that are candidates for innovative discovery and decision-support applications. Many of these activities are carried out manually today. Search-based applications provide a way to bring automation to a broad range of information worker tasks.

Some vendors are jumping whole hog into the nascent category. For example, French Internet and enterprise search vendor Exalead has jumped in with both feet, making search-based applications a key war cry in their marketing. In addition, Exalead’s chief science officer, Gregory Grefenstette, seems a like match to the “Ggrefen” credited in Wikipedia with the creation of the search-based applications page.

Another vendor jumping in hard is Endeca, with the words “search applications” meriting the largest font on their homepage.

While you could argue that this is yet-another, yet-another focus for Endeca, clearly the folks in marketing — at least — are buying into the category.

At Mark Logic, we are not attempting to redefine ourselves around search-based applications. Our product is an XML server. Our vision is to provide infrastructure software for the next generation of information applications. We believe that search-based applications are one such broad class of information applications. That is, they are yet another class of applications that are well suited for development on MarkLogic Server.

So, if you’re thinking about building something that you consider a search-based application, then be sure to include us on your evaluation list.

XML: YAFF, YADT, or Whole World?

If you have a bunch of XML and are looking for of a place to put it, then I think I may have come up with a simple test that might be helpful.

In talking with prospective vendors of XML repositories (definition: software that lets you store, search, analyze and deliver XML), try to establish what I’ll call “XML vision compatibility.” Quite simply, try to figure out if the vendor’s vision of XML is consistent with your own. To help with that exercise, I’ll define what I see as the three common XML vendor visions:

  • YAFF (yet another file format)
  • YADT (yet another data type)
  • Whole world

YAFF Vendors
Vendors with the YAFF vision view XML as yet another file format. ECM vendors clearly fall into this category (“oh yes, XML is one of the 137 file formats you can manage in our system”). So do enterprise search vendors (“oh yes, we have filters for XML formatted files which clear out all those nasty tags and feed our indexing engine the lovely text.”)

For example, let’s look at how EMC Documentum — one of the more XML-aggressive ECM vendors — handles XML on its website.

Hmm. There’s no XML on that page. But lots of information about records management, digital asset management, document capture, collaboration and document managent (it’s not there either). Gosh, I wonder where it is? SAP integration? Don’t think so. Hey, let’s try Documentum Platform, whatever that is.

Not there, either. Now that’s surprising because I really have no idea where else it might be. Oh, wait a minute. I didn’t scroll the page down. Let’s try that.

There we go. We finally found it. I knew they were committed to XML. What’s going on here is that EMC has a huge, largely vendor consolidation-driven (e.g., Documentum, Captiva, Document Sciences, x-Hive, Kazeon) vision of what content management is. And XML is just one tiny piece of that vision. XML is, well, yet another file format among the scores that they have manage, archive, capture, and provide workflow, compliance, and process management against. The vision isn’t about XML. It’s about content. That’s nice if you have an ECM problem (and a lot of money to solve it); t’s not so nice if you have an XML problem, or more precisely a problem that can be solved with XML.

YADT Vendors
Vendors with the YADT vision view XML as yet another data type. These are the relational database management system vendors (e.g., Oracle) who have decided that the best way to handle XML is to make it a valid datatype for a column in a table.

The roots of this approach go back to the late 1980s and Ingres 6.3 (see this semi-related blast from the past) which was the first commercial DBMS to provide support for user-defined datatypes. All the primitives for datatyping were isolated from the core server code and made extensible through standard APIs. So, for example, if you wanted to store complex numbers of the form (a, bi) all you had to do was to write some primitives so the server would know:

  • What they look like — i.e., (a, bi)
  • Any range constraints (the biggest, the smallest)
  • What operators should be available (e.g., +, -)
  • How to implement those operators — (a, bi) + (c, di) = (a+c, (b+d)i)

It was — far as I remember — yet another clever idea from the biggest visionary in database management systems after Codd himself: Michael Stonebraker then of UC Berkeley and now of MIT. After founding Ingres, Stonebraker went on found Illustra which was all about “datablades” — a sexy new name for user-defined types. Datablades, in turn, became sexy bait for Informix to buy the company with an eye towards leveraging the technology towards unseating Oracle from its leadership position. It didn’t happen.

User-defined datatypes basically didn’t work. There were two key problems:

  • You had user-written code running in the same address space as the database server. This made it nearly impossible to determine fault when the server crashed. Was it a database server bug, or did the customer cause problem in implementing a UDT? While RDBMS customers were well qualified to write applications and SQL, writing server-level was quite another affair. This was a bad idea.
  • Indexing and query processing performance. It’s fairly simple to say that, for example, a text field looks like a string of words and the + operator means concatenate. It’s basically impossible for a end customer to tell the query optimizer how to process queries involving those text fields and how to build indexes that maximize query performance. If getting stuff into UDTs was a level-5 challenge, getting stuff back out quickly was a level-100 one.

So while the notion of end users adding types to a DBMS basically failed, when XML came along the database vendors dusted off this approach, in saying effectively: let use all those hooks we put in to build support for XML types ourselves. And they did. Hence what I call the “XML column” approach to storing XML in a relational database.

After all, if your only data modeling element’s a table, then every problem looks like a column.

Now this approach isn’t necessarily bad. If, for example, you have a bunch of resumes and want to store attribute data in columns (e.g., name, address, phone, birthdate) and keep an XML copy of the resume alongside, then this might be a reasonable way to do things. That is, if you have a lot of data and a touch of XML, this may be the right way to do things.

So again, it comes down to vision alignment. If XML is just another type of data that you want to store in a column, then this might work for you. Bear in mind you’ll:

  • Probably have to setup separate text and pre-defined XML path indexes (a hassle on regular schemas, an impossibility on irregular ones),
  • Face some limitations in how those indexes can be combined and optimized in processing queries,
  • Need to construct frankenqueries that mix SQL and XQuery, whose mixed-language semantics are sometimes so obscure that I’ve seen experts argue for hours about what the “correct” answer for a given queries is,
  • And suffer from potentially crippling performance problems as you scale to large amounts of XML.

But if those aren’t problems, then this approach might work for you.

This is what it looks like when a vendor has a YADT vision. Half the fun in storing XML in an RDBMS is figure out which query language and which store options you want to use. See the table that starts on page 9, spans four pages, and considers nearly a dozen criteria to help you decide which of the three primary storage options you should use:

See this post from IBM for more Oracle-poking on the complexity of storage options available. Excerpt:

Oracle has long claimed that the fact that Oracle Database has multiple different ways to store XML data is an advantage. At last count, I think they have something like seven different options:

  • Unstructured
  • XML-Object-Relational, where you store repeating elements in CLOBs
  • XML-Object-Relational, where you store repeating elements in VARRAY as LOBs
  • XML-Object-Relational, where you store repeating elements in VARRAY as nested tables
  • XML-Object-Relational, where you store repeating elements in VARRAY as XMLType pointers to BLOBs
  • XML-Object-Relational, where you store repeating elements in VARRAY as XMLType pointers to nested tables
  • XML-Binary

Their argument is that XML has diverse use cases and you need different storage methods to handle those diverse use cases. I don’t know about you, but I find this list to be a little bewildering. How do you decide among the options? And what happens if you change your mind and want to change storage method?

Such is life in the land of putting XML in tables because your database management system has columns.

Whole World Vendors
Vendors with the whole world vision view XML as, well, their whole world.

And when I say XML, I don’t mean information that’s already in XML. I mean information that is either already in XML (e.g., documents, information in any horizontal or industry-specific XML standard) or that is best modeled in XML (e.g., sparse data, irregular information, semi-structured information, information in no, multiple, and/or time-varying schemas).

“Whole world” vendors don’t view XML as one format, but as a plethora: docbook, DITA, s1000d, xHMTL, TEI, XBRL, the HL7 standards in healthcare, the Acord standards in insurance, Microsoft’s Open Office XML format, Open Document Format, Adobe’s IDML, chemical markup lanuage, MathML, the DoD’s DDMS metadata standard, semantic web standards like RDF and OWL, and scores of others.

Whole world vendors don’t view XML tags as “something that get in the way of the text” and thus they don’t provide filters for XML files. Nor do they require schema adherence because they know that XML schema compliance, in real life, tends to be more of an aspiration than a reality. So they allow you load and index XML, as is, avoiding the first step’s a doozy problem, and enabling lazy clean-up of XML information.

Whole world vendors don’t try to model XML in tables simple because they have a legacy tabular data model. Instead, their native modeling element (NME) is the XML document. That is:

  • In a hierarchical DBMS the NME is the hierarchy
  • In a network DBMS the NME is the graph
  • In a relational DBMS the NME is the table
  • In an object DBMS the NME is the object class hierarchy
  • In an OLAP, or multi-dimensional, DBMS the NME is the hypercube
  • And in an XML server, or native XML, DBMS the NME is the XML document

Whole world vendors don’t bolt a search engine to a DBMS because they know XML is often document-centric, making search an integral function, and requiring a fundamentally hybrid search/database — as opposed to a bolted-together search/database — approach.

Here is what it looks like when you encounter a whole world vendor:

Reblog this post [with Zemanta]

Stephen Arnold on Search and Enterprise Publishing

Just a quick note to highlight this post on the ever-colorful Beyond Search blog by Stephen Arnold. Entitled New Info Mix: Search and Enterprise Publishing, the post provides a discussion of professional and enterprise publishing from the downstream perspective of print production / composition shops.

Arnold argues that most of the high-tech enterprise publishing crowd is overlooking both the size and strategic importance of the opportunity to come at publishing from the other end of the market. Excerpt:

I think that dismissing this story is a bad idea, particularly for companies in the search, content processing, and text analytics business. Here’s why:

  • Most vendors of enterprise search have not entered the enterprise publishing sector. Some of the firms with which I have had contact are generally unaware of these systems, their inclusion of search as a utility, and the systems’ ability to output Web pages, reports, and invoices. This cloud of unknowing is one that should be dispelled but the ostrich approach to business is often a favorite of search vendors, their advisers, and the conference organizers who seem indifferent to this major shift in enterprise information systems.
  • Enterprise publishing systems carry hefty price tags. Because the systems are mission critical and make it possible to cross sell or run ads in most output from the system, seven or eight figure deals are not uncommon. Enterprise search and content processing systems that purport to index “all information” for the organization may gain credibility in some parts of an organization, but at the CFO level, enterprise publishing gets the attention of the woman who writes the checks.
  • The end-to-end model seems to becoming popular. I may be reacting to news stories that flow through my intelligence system. […] The Exstream Software deals are, as I understood the briefing I got earlier this year, end-to-end. The question becomes, “Where do specialist search, content processing, and text mining companies fit in?”

An excellent point, worth pondering, and ponder I will.

Reblog this post [with Zemanta]

Fun Google Parody Video: Complexity is Good

I stumbled into this video while reading Stephen Arnold’s recent post, Google Search Appliance: Showing Some Fangs. In the post, Stephen offers a pretty comprehensive look at the Google search appliance (GSA) prompted, I believe, by a new release that includes features such as personalized search results, alerts, and broader language support.

If you’re interested in the new features, see this video here.

If you want to have some fun, check out this video which portrays Google’s view of a typical enterprise search software sale, complete with the cheesy salesperson.

As I’ve repeatedly maintained (e.g., 1, 2, 3, and 4), I think the GSA is going to consume the “crawl and index the intranet” segment of the search market, pushing classical enterprise search vendors up-market, and eventually into an un-winnable conflict with DBMS vendors.

Thoughts on Category Creation and Information Access Platforms [Revised]

[Revised 8/2/08; still working on cleaning up this consciousness stream.]

Back in the old days, it seemed easy to create a category in software. Look at the database market, for example:

  • IBM invents the relational DBMS (RDBMS) category
  • Oracle, Ingres, and Informix enter in a largely undifferentiated way, though Informix eventually drifts towards the low-end/cheap segment
  • Sybase creates the derivative category of high-performance OLTP RDBMS.
  • Arbor re-christens the failed multi-dimensional DBMS as the OLAP Server
  • Tandem creates the non-stop RDBMS with its superb fault tolerance
  • Illustra launches the universal DBMS and is quickly acquired by Informix
  • Sybase launches the bitmap-indexed DBMS with SybaseIQ
  • Teradata launches the data-warehouse DBMS category

And you can find just as many examples outside database-land.

  • ASK defines the manufacturing resource planning (MRP) category
  • SAP hijacks MRP, redefines it as ERP, and goes on to become the world’s largest applications software company
  • PeopleSoft invents the HRMS category
  • Gartner Group’s Howard Dresner invents the business intelligence (BI) category, re-christening and re-framing what was formally known as DSS or EIS.
  • Siebel pioneers the sales force automation (SFA) category
  • Scopus pioneers call center automation (CCA)
  • Companies like Rubric pioneer enterprise marketing automation (EMA)
  • Siebel, through acquisition, coalesces SFA, CCA, and EMA into a single category called customer relationship management (CRM)
  • Oracle and SAP work to coalesce CRM back into ERP. Such is the ebb and flow of categories.

(And I could go on and on — BPM, KM, CMS, WCM, ECM, LMS, DRM, SCM, PLM, ETL, DI, EII — but I think I’ll stop here with the initials list.)

People are still creating categories today, and sometimes it looks easy. Uber-categories have been quite popular in the past decade as people have focused on different ways of developing and delivering software:

  • SaaS as an uber-category has worked well, with a variety offerings in various SaaS sub-categories (e.g., Salesforce, NetSuite)
  • Appliances have done pretty much the same thing — i.e., offering an appliance alternative for a wide variety of existing categories (e.g., a data warehouse appliance a la Netezza)
  • Open source has also done the same thing — again serving as a different flavor/dimension for a wide variety of largely existing software categories.

Only a few genuinely new categories have emerged, virtualization being the most obvious example. (Though you could argue that virtualization is itself an uber-category covering storage virtualization, server virtualization, et cetera.)

Companies are still working to carve new categories, particularly in the database market:

Sometimes vendors and/or the analysts who cover them try to impose either a straight name change (e.g., from MD-DBMS to OLAP) or a strategic shift (e.g., from BI to analytic applications) in category. Sometimes they’re just bored. Sometimes a vendor’s trying to redefine the market in line with its strengths. Sometimes an analyst is trying to make his/her mark on the industry and earn the coveted “father/mother of [category name],” much as Howard Dresner successfully did with BI.

BI got bored with its name several times during my tenure at Business Objects. At one point both the analysts and Informatica were trying to re-dub the category “analytic applications” in an attempt to get a fresh name and raise the abstraction level from tools to applications. Informatica nearly died on that hill.

Later, analysts tried to redefine the category, dubbing it corporate performance management (CPM), and arguing that business intelligence needed to link with financial planning systems. While knowing actuals is good, knowing actuals compared to the plan is better, and using actuals to drive the future plan better still. Cognos nearly tripped over itself repositioning around the CPM, ultimately acquiring Adaytum, which in turn lead to SRC’s eventual acquisition by Business Objects.

In an art-imitates-life sort of way, one wonders if the analysts predicted a move in the market or provoked it? My chips are on the latter.

This stream-of-consciousness is a long way of winding up to a single question: are enterprise search vendors successfully repositioning themselves as “information access platforms” or not?

Background: the enterprise-search-related vendors (e.g., Fast/Microsoft, Endeca) and search/content analysts who cover them are in the midst of an attempted category repositioning:

  • The word “enterprise search” is now seemingly dead, having been contaminated by the Google Appliance. When a shark gets in the water, all the fish jump out.
  • The word “information” is increasingly being used as a unifying term to describe both data and content (aka, unstructured data)
  • Enterprise search vendors are increasingly calling themselves “information access platforms” (though not generally abbreviated as IAP, I will do so here for brevity).

For example, consider Endeca’s corporate boilerplate:

Endeca’s innovative information access software that helps people explore, analyze, and understand complex information, guiding them to unexpected insights and better dec
isions. The Endeca Information Access Platform, built around a new class of access-optimized database, powers applications that combine the ease of searching and browsing with the analytical power of business intelligence.

I have a number of concerns on and related to this attempted shift:

  • The important thing about categories is that they exist in the mind of the customer. Analysts and vendors can try to put them there — but they have to stick. In my mind, IAP is not sticking. I have never heard a customer say: “I need to go out and get an IAP.”
  • I do, however, believe that “information” might well stick as an overall term, meaning both data and content (aka, structured and unstructured data).
  • It is not clear to me why someone who desires a unified platform for “information” would turn to a search vendor. Search engines were designed as read-only indexes to help people find documents containing tokens; hardly ideal as an application development platform.
  • In my estimation, someone managing “special” data should turn to a database vendor. While databases have classically not handled “special” data well, databases were designed as application platforms, and there is a whole new class of specialized databases emerging for handling various “special” types of data.
  • While I think a unified platform is a dandy vision, I think no one is close to delivering a unified platform that handles all types of data equally well. Bolting Lucene and MySQL together isn’t a platform. Relational databases still do a poor job with both content and many types of data (e.g., sparse, hierarchical, or semi-structured). XML servers (like MarkLogic) handle XML brilliantly, but need work before they can match RDBMSs at classical relational data.
  • I believe that someone who needs a crawl-and-index the intranet value proposition should use the Google Appliance; so I think the search vendors are correct in their desire to flee, I don’t think that “information access platform” is a good refuge.

Overall, my chips remain on the don’t come line for the attempted category repositioning from “enterprise search” to “information access platform.” You can find my stack on the come line for the emerging “special-purpose database” category and “XML servers” as an instance of them.

Kawaski Interviews Balmer at Mix08

Check out this post with notes from the Guy Kawaski keynote interview with Steve Balmer at the Mix08 conference this week in Las Vegas.

In the interview Balmer talks about Google, the Yahoo! deal, Apple, his three types of day [see below], Silverlight, the Facebook investment, Fast Search & Transfer [see below], the number of emails he gets per day (~60), and he even gives out his email address:

On his three types of day:

  1. With customers. From 730 AM to 800 PM and then get on [private] plane to next city.
  2. Doctor in office. Wall to wall meetings all day. “Exhausting.”
  3. Think, write, and research.

On Fast Search & Transfer:

Fast is company had internet and website/corporate products. Sold off web search. They have great for high end search on enterprise and engines that can search web sites. Tech fantastic and team is great. Anxious to build both ways. Love company/people. Great integration plan – more to say.

This is consistent with my thesis for why Microsoft bought Fast (to fend off the Google Appliance in high-end enterprise search, aka, the best defense is a good offense). However, I’d not previously heard the message that they want to build Fast out “both ways” — i.e., in enterprise search and in their Internet search offerings.

The only part of the acquisition that continues to amaze me is the ~8x revenue-run-rate price. That kind of multiple is in-line for high flyers, i.e., for healthy, high growth enterprise software companies. But Fast was in the midst of unwinding a world-class accounting mess, complete with lots of AR write-offs and a revenue restatement. I’d think companies in that situation are usually lucky to trade for 1-2x revenues.

Much as the the price SAP paid for Business Objects wasn’t surprising until you noticed that Business Objects was about to announce a quarterly miss, nor is Microsoft’s price for Fast surprising until you consider the not so easy to overlook financial mess. Personally, I would have guessed a sale in the $300M to $500M price range, proving that I’m not always right.

My current speculation is that there must have been a bidding war for the price to get so high. The fun question then becomes who else was bidding, why did they want it so bad, and what are they going to do now that they’ve lost?