Category Archives: Search Engines

Internet Search: The Reality of Link-Buying and Comment Spam

Google search today has, in my opinion, degenerated roughly to the point of keyword search a decade ago.  Most searches, particularly those with commercial intent, have been search-engine-optimized, spammed, link-farmed, or content-farmed to the point of uselessness.

As Michael Arrington succinctly put it:  Search Still Sucks.  I’d actually quibble with the “still” — it’s taken a decade of cat-and-mouse to make Google as bad today as AltaVista was in 2000.

One of the many reasons search has degenerated is link-buying.  One of the benefits of running a blog is that you get to see tactics like link-buying and comment spam first-hand.  In this post, I thought I’d share that first-hand look.

Here is an email I received today which is an example of link buying.

That’s it.  If you write a post and link to my client, I’ll pay you.  It can’t be easy for Google to algorithmically figure out which links I’ve put in naturally and which ones I’ve been paid to insert.  It’s not obviously even possible, though getting close probably is.  But it can’t be easy.

For comment spam, here is what the comment dashboard looks like in my blog, which is powered by WordPress.

Since Google is all about inbound links, comment spammers either load their comments up with links (see last entry above) or enter a seemingly innocuous text comment with a blog/web address that is the link they’re promoting (see Minh’s entry).

The amazing thing about comment spam is the volume.  My blog has had 4600 spam comments in the past 60 days.   While I believe these are much easier to detect than purchased links — particularly for the blogging platform if not the search engine — the volume is certainly impressive.  Note that since WordPress bundles Akismet all of these spam comments were picked off before Google had to deal with them.  But I’m sure for plenty of blogs that’s not the case.

If you look at the history of search and spam, it’s pretty simple:

Phase 1:  keyword frequency.  Rank pages by the TF/IDF of search keywords.   Spammers then quickly discover how to load pages and/or tags with keywords to inflate their rank.

Phase 2:  inbound link frequency and authority.  Rank pages by the number and authority of inbound links.  Pages that themselves have lots of inbound links have higher authority than those that don’t.  Spammers slowly discover the aforementioned techniques to eventually beat this as well.

I believe the world is strongly in need of a phase 3 approach and I suspect it will involve curation.  Consider some more of Arrington’s comments:

Yes, search is very hard. But Silicon Valley is really good at doing hard things. The real problem right now is that there’s a perception that Google is untouchable in search. When a venture capitalist sees a pitch from a new search startup all they can think about is the Cuil debacle. And since venture capitalists are just about the most risk averse people in Silicon Valley, the funds just don’t flow.

But all the evidence suggests otherwise. Demand Media is worth $1.6 billion, and their entire business is based on pushing cheap, useless content into Google to get a few stray links. If Google was good at search, Demand Media wouldn’t exist. And Bing wouldn’t be making solid gains in search market share. And JC Penney wouldn’t be able to massively game search results for a few months, during the holiday season, without getting caught until months later.

We need to see a real competitor emerge in search. If only because it will make Google up its game, and make all of us a lot happier.

This is one reason I’m watching Blekko.  While I’m not in love with the way they currently do curation (i.e., slashtags), I do believe that they are focusing on the right core concept.  For more information on Blekko, you can read this TechCrunch article to which, I should probably say, I linked by choice and not for profit.

Quick Take on the Dassault Systèmes Acquisition of Exalead

Today, in what I consider a surprising move, French PLM and CAD vendor Dassault Systèmes announced the acquisition of French enterprise search vendor Exalead for €135M or, according to my calculator, $161M.  Here is my quick take on the deal:

  • While I don’t have precise revenue figures, my guess is that Exalead was aiming at around $25M in 2010 revenues, putting the price/sales multiple at 6.4x current-year sales, which strikes me as pretty good given what I’m guessing is around a 25% growth rate.  (This source says $21M in software revenue, though the year is unclear and it’s not clear if software means software-license or software-related.  This source, which I view as quite reliable, says $22.7M in total revenue in 2009 and implies around 25% growth.  Wikipedia says €15.5M in 2008 revenues, which equals exactly $22.7M at the average exchange rate.  This French site says €12.5M in 2008 revenues.  The Qualis press release — presumably an excellent source — says €14M ($19.5M) in 2009 revenues.  Such is the nature of detective work.)
  • I am surprised that Dassault would be interested in search-based applications, Exalead’s latest focus.  While PLM vendors have always had an interest in content delivery and life-cycle documentation (e.g., a repair person entering feedback on documentation that directly feeds into future product requirements) , I’d think they want to buy a more enterprise techpubs / DITA vendor than a search vendor to do so as in the PTC / Arbortext deal of 2005.  Nevertheless, Dassault President and CEO Bernard Charlès said that with Exalead they could build “a new class of search-based applications for collaborative communities.”  There is more information, including a fairly cryptic video which purports to explain the deal, on a Dassault micro-site devoted to the Exalead acquisition, which ends with the phrase:  search-based applications for lifelike experience.  Your guess as to what that means is as good as mine.
  • A French investment firm called SCA Qualis owned 83% of Exalead steadily building up its position from 51% in 2005 to 83% in 2008, through successive rounds of €5M, €12M and €5M in 2005, 2006, and 2008 respectively.  This causes me to question the CrunchBase’s profile that Exalead had raised a total of $15.6M.  (You can see €22M since 2005 and the company was founded in 2000.  I’m guessing there was $40M to $50M invested in total, though some reports are making me think it’s twice that.)
  • The prior bullet suggests that Qualis took $133M of the sale price and everybody else split $27M, assuming there were no active liquidation preferences on the Qualis money.
  • Given the European-focus, the search-focus, and the best-and-brightest angle (Exalead had more than its share of impressive grandes écoles graduates), one wonders why Autonomy didn’t end up owning Exalead, as opposed to a PLM/CAD company.  My guess is Autonomy took a look, but the deal got too pricey for them because they are less interested in paying up for great technology and more interested in buying much larger revenue streams at much lower multiples.  In some sense, Autonomy’s presumed “pass” on this deal is more proof that they are no longer a technology company and instead a CA-like, Oracle-like financial consolidation play.  (By the way, there’s nothing wrong with being a financial play in my view; I just dislike pretending to be one thing when you’re actually another.)
  • One wonders what role, if any, the other French enterprise search vendor, Sinequa, played in this deal.  They, too, have some great talent from France’s famed Ecole Polytechnique, and presumably some nice technology to go along with it.

Here are some links to other coverage of the deal

Why Google Employees Quit

As the bloom comes off the Google rose, you can now see the flip side of many of their once-sacred practices, such as their interviewing process and academic elitism.

While I’m not a Google fan — for the record, I always hated “don’t be evil” — nor am I a detractor. My biggest problem with Google is the seeming lack of self-awareness of many of its employees. I’m not opposed to Google-isms; in fact, I agree violently with some of them (e.g., intelligence matters in software).

I’m sharing this post mostly to balance the mainstream press which worshiped all Google practices as best ones when times were good, and then forgets to update us when the tide goes out and, as Warren Buffet once said, you can see who’s swimming naked.

I believe in strong culture and I know that Google has one. The trick, in my opinion, is looking out for the downside of a strong culture, because there always is one. The mails below help paint a picture of that downside.

My take on Google has always been

  • One-trick pony
  • Which has spent literally billions in experimental R&D — in an organic model that I like
  • But has nothing to show for it

I remember once watching a panel of thirty-something, first-100-in, ex-Googlers rather condescendingly lecture about innovation best practices and thinking simply: despite literally billions in investment, you’ve never come up with a successful business innovation since the first two (search keyword and contextual ads) and, what’s worse, is you don’t even seem to know it. Then again, when world’s best business model is your first trick, it’s pretty hard to come up with a second one, and if you were in early enough, heck, you don’t need to.

But enough of my rambling. The purpose of this post was to link you to over to TechCrunch where you can see, in their own words, why Google employees quit.

A few excerpts:

As I was saying. Google actually celebrates its hiring process, as if its ruthless inefficiency and interminable duration were a sure proof of thoroughness, a badge of honor. Perhaps it is thorough. But I would be willing to wager that Microsoft’s hiring process, which takes a fraction of the time, does not result in a lower-skilled workforce or result in a higher rate of attrition. And let me say this: if Larry Page is still reviewing resumes, shareholders should organize a rebellion. That is a scandalous waste of time for someone at that level, and the fact that it’s “quirky” is no mitigation.

What was strange with me at Google was: while outside, I had all these big ideas I could do if I ever worked there. Once inside, you have 18,000 (at the time, Feb 2008) other googlers thinking the same things.

I wonder if post-Google bitterness is correlated to when you joined and/or how long you were at Google. It seems that it is. Maybe it’s the memories of Google in the first few years I was there that make it it seem magical, but I really do treasure the time I spent at Google. I left a few weeks ago, after almost 5 years at the company, because I wanted to pursue a markedly different career path. Sure, I had times when I was frustrated with the way Google was doing things, or when I felt that my particular project, or assignment was lacking, and I definitely had managers that I didn’t enjoy. But all in all — what a freakin’ amazing experience!

Google was my first job out of college. I was an English major at a prestigious college and was hired to work in HR. That is one of the problems I had with Google right there – is it really necessary to hire Ivy League graduates to process paperwork? I went from reading Derrida to processing “Status Change Request Forms” for X employees to go on paid leave. The term “Status Change Request Form” will forever haunt me.

Those of us who failed to thrive at Google are faced with some pretty serious questions about ourselves. Just seeing that other people ran into the same issues is a huge relief. Google is supposed to be some kind of Nirvana, so if you can’t be happy there how will you ever be happy? It’s supposed to be the ultimate font of technical resources, so if you can’t be productive there how will you ever be productive?

Fun Google Parody Video: Complexity is Good

I stumbled into this video while reading Stephen Arnold’s recent post, Google Search Appliance: Showing Some Fangs. In the post, Stephen offers a pretty comprehensive look at the Google search appliance (GSA) prompted, I believe, by a new release that includes features such as personalized search results, alerts, and broader language support.

If you’re interested in the new features, see this video here.

If you want to have some fun, check out this video which portrays Google’s view of a typical enterprise search software sale, complete with the cheesy salesperson.

As I’ve repeatedly maintained (e.g., 1, 2, 3, and 4), I think the GSA is going to consume the “crawl and index the intranet” segment of the search market, pushing classical enterprise search vendors up-market, and eventually into an un-winnable conflict with DBMS vendors.

Thoughts on Category Creation and Information Access Platforms [Revised]

[Revised 8/2/08; still working on cleaning up this consciousness stream.]

Back in the old days, it seemed easy to create a category in software. Look at the database market, for example:

  • IBM invents the relational DBMS (RDBMS) category
  • Oracle, Ingres, and Informix enter in a largely undifferentiated way, though Informix eventually drifts towards the low-end/cheap segment
  • Sybase creates the derivative category of high-performance OLTP RDBMS.
  • Arbor re-christens the failed multi-dimensional DBMS as the OLAP Server
  • Tandem creates the non-stop RDBMS with its superb fault tolerance
  • Illustra launches the universal DBMS and is quickly acquired by Informix
  • Sybase launches the bitmap-indexed DBMS with SybaseIQ
  • Teradata launches the data-warehouse DBMS category

And you can find just as many examples outside database-land.

  • ASK defines the manufacturing resource planning (MRP) category
  • SAP hijacks MRP, redefines it as ERP, and goes on to become the world’s largest applications software company
  • PeopleSoft invents the HRMS category
  • Gartner Group’s Howard Dresner invents the business intelligence (BI) category, re-christening and re-framing what was formally known as DSS or EIS.
  • Siebel pioneers the sales force automation (SFA) category
  • Scopus pioneers call center automation (CCA)
  • Companies like Rubric pioneer enterprise marketing automation (EMA)
  • Siebel, through acquisition, coalesces SFA, CCA, and EMA into a single category called customer relationship management (CRM)
  • Oracle and SAP work to coalesce CRM back into ERP. Such is the ebb and flow of categories.

(And I could go on and on — BPM, KM, CMS, WCM, ECM, LMS, DRM, SCM, PLM, ETL, DI, EII — but I think I’ll stop here with the initials list.)

People are still creating categories today, and sometimes it looks easy. Uber-categories have been quite popular in the past decade as people have focused on different ways of developing and delivering software:

  • SaaS as an uber-category has worked well, with a variety offerings in various SaaS sub-categories (e.g., Salesforce, NetSuite)
  • Appliances have done pretty much the same thing — i.e., offering an appliance alternative for a wide variety of existing categories (e.g., a data warehouse appliance a la Netezza)
  • Open source has also done the same thing — again serving as a different flavor/dimension for a wide variety of largely existing software categories.

Only a few genuinely new categories have emerged, virtualization being the most obvious example. (Though you could argue that virtualization is itself an uber-category covering storage virtualization, server virtualization, et cetera.)

Companies are still working to carve new categories, particularly in the database market:

Sometimes vendors and/or the analysts who cover them try to impose either a straight name change (e.g., from MD-DBMS to OLAP) or a strategic shift (e.g., from BI to analytic applications) in category. Sometimes they’re just bored. Sometimes a vendor’s trying to redefine the market in line with its strengths. Sometimes an analyst is trying to make his/her mark on the industry and earn the coveted “father/mother of [category name],” much as Howard Dresner successfully did with BI.

BI got bored with its name several times during my tenure at Business Objects. At one point both the analysts and Informatica were trying to re-dub the category “analytic applications” in an attempt to get a fresh name and raise the abstraction level from tools to applications. Informatica nearly died on that hill.

Later, analysts tried to redefine the category, dubbing it corporate performance management (CPM), and arguing that business intelligence needed to link with financial planning systems. While knowing actuals is good, knowing actuals compared to the plan is better, and using actuals to drive the future plan better still. Cognos nearly tripped over itself repositioning around the CPM, ultimately acquiring Adaytum, which in turn lead to SRC’s eventual acquisition by Business Objects.

In an art-imitates-life sort of way, one wonders if the analysts predicted a move in the market or provoked it? My chips are on the latter.

This stream-of-consciousness is a long way of winding up to a single question: are enterprise search vendors successfully repositioning themselves as “information access platforms” or not?

Background: the enterprise-search-related vendors (e.g., Fast/Microsoft, Endeca) and search/content analysts who cover them are in the midst of an attempted category repositioning:

  • The word “enterprise search” is now seemingly dead, having been contaminated by the Google Appliance. When a shark gets in the water, all the fish jump out.
  • The word “information” is increasingly being used as a unifying term to describe both data and content (aka, unstructured data)
  • Enterprise search vendors are increasingly calling themselves “information access platforms” (though not generally abbreviated as IAP, I will do so here for brevity).

For example, consider Endeca’s corporate boilerplate:

Endeca’s innovative information access software that helps people explore, analyze, and understand complex information, guiding them to unexpected insights and better dec
isions. The Endeca Information Access Platform, built around a new class of access-optimized database, powers applications that combine the ease of searching and browsing with the analytical power of business intelligence.

I have a number of concerns on and related to this attempted shift:

  • The important thing about categories is that they exist in the mind of the customer. Analysts and vendors can try to put them there — but they have to stick. In my mind, IAP is not sticking. I have never heard a customer say: “I need to go out and get an IAP.”
  • I do, however, believe that “information” might well stick as an overall term, meaning both data and content (aka, structured and unstructured data).
  • It is not clear to me why someone who desires a unified platform for “information” would turn to a search vendor. Search engines were designed as read-only indexes to help people find documents containing tokens; hardly ideal as an application development platform.
  • In my estimation, someone managing “special” data should turn to a database vendor. While databases have classically not handled “special” data well, databases were designed as application platforms, and there is a whole new class of specialized databases emerging for handling various “special” types of data.
  • While I think a unified platform is a dandy vision, I think no one is close to delivering a unified platform that handles all types of data equally well. Bolting Lucene and MySQL together isn’t a platform. Relational databases still do a poor job with both content and many types of data (e.g., sparse, hierarchical, or semi-structured). XML servers (like MarkLogic) handle XML brilliantly, but need work before they can match RDBMSs at classical relational data.
  • I believe that someone who needs a crawl-and-index the intranet value proposition should use the Google Appliance; so I think the search vendors are correct in their desire to flee, I don’t think that “information access platform” is a good refuge.

Overall, my chips remain on the don’t come line for the attempted category repositioning from “enterprise search” to “information access platform.” You can find my stack on the come line for the emerging “special-purpose database” category and “XML servers” as an instance of them.