Category Archives: Blog

Internet Search: The Reality of Link-Buying and Comment Spam

Google search today has, in my opinion, degenerated roughly to the point of keyword search a decade ago.  Most searches, particularly those with commercial intent, have been search-engine-optimized, spammed, link-farmed, or content-farmed to the point of uselessness.

As Michael Arrington succinctly put it:  Search Still Sucks.  I’d actually quibble with the “still” — it’s taken a decade of cat-and-mouse to make Google as bad today as AltaVista was in 2000.

One of the many reasons search has degenerated is link-buying.  One of the benefits of running a blog is that you get to see tactics like link-buying and comment spam first-hand.  In this post, I thought I’d share that first-hand look.

Here is an email I received today which is an example of link buying.

That’s it.  If you write a post and link to my client, I’ll pay you.  It can’t be easy for Google to algorithmically figure out which links I’ve put in naturally and which ones I’ve been paid to insert.  It’s not obviously even possible, though getting close probably is.  But it can’t be easy.

For comment spam, here is what the comment dashboard looks like in my blog, which is powered by WordPress.

Since Google is all about inbound links, comment spammers either load their comments up with links (see last entry above) or enter a seemingly innocuous text comment with a blog/web address that is the link they’re promoting (see Minh’s entry).

The amazing thing about comment spam is the volume.  My blog has had 4600 spam comments in the past 60 days.   While I believe these are much easier to detect than purchased links — particularly for the blogging platform if not the search engine — the volume is certainly impressive.  Note that since WordPress bundles Akismet all of these spam comments were picked off before Google had to deal with them.  But I’m sure for plenty of blogs that’s not the case.

If you look at the history of search and spam, it’s pretty simple:

Phase 1:  keyword frequency.  Rank pages by the TF/IDF of search keywords.   Spammers then quickly discover how to load pages and/or tags with keywords to inflate their rank.

Phase 2:  inbound link frequency and authority.  Rank pages by the number and authority of inbound links.  Pages that themselves have lots of inbound links have higher authority than those that don’t.  Spammers slowly discover the aforementioned techniques to eventually beat this as well.

I believe the world is strongly in need of a phase 3 approach and I suspect it will involve curation.  Consider some more of Arrington’s comments:

Yes, search is very hard. But Silicon Valley is really good at doing hard things. The real problem right now is that there’s a perception that Google is untouchable in search. When a venture capitalist sees a pitch from a new search startup all they can think about is the Cuil debacle. And since venture capitalists are just about the most risk averse people in Silicon Valley, the funds just don’t flow.

But all the evidence suggests otherwise. Demand Media is worth $1.6 billion, and their entire business is based on pushing cheap, useless content into Google to get a few stray links. If Google was good at search, Demand Media wouldn’t exist. And Bing wouldn’t be making solid gains in search market share. And JC Penney wouldn’t be able to massively game search results for a few months, during the holiday season, without getting caught until months later.

We need to see a real competitor emerge in search. If only because it will make Google up its game, and make all of us a lot happier.

This is one reason I’m watching Blekko.  While I’m not in love with the way they currently do curation (i.e., slashtags), I do believe that they are focusing on the right core concept.  For more information on Blekko, you can read this TechCrunch article to which, I should probably say, I linked by choice and not for profit.

Coming Friday 1/29/10: Kellblog!

As I have discussed a few times in the past, I want to rename the Mark Logic CEO Blog in order to accomplish a few goals:

  • Get a shorter, pithier name that will be easier for people to write and talk about
  • Get a more normal, blog-like name that will hopefully increase citations and in-bound links
  • Get a name that better reflects the content of the blog. While the blog certainly contains some pro-Mark-Logic posts, the majority of the content is not typical “corporate blog” fodder

Toward these ends, I am pleased to announce that on Friday, January 29th, 2010, the Mark Logic CEO blog will become Kellblog.

  • Site readers will automatically be redirected to the new domain: www.kellblog.com
  • Feed subscribers using the proper Feedburner feed will need to do nothing, since — for the time being — the feed address will remain http://feeds.feedburner.com/marklogic. (At some future point, we’ll switch the feed, but we have plenty of other work to do first.)
  • On Friday, February 12th, 2010, I intend to cutover to a fresher, crisper, simpler design to provide the blog with a new, and more contemporary, look.
  • I also intend to switch work-related tweets to a new account @kellblog, as opposed to my original Twitter account @ramblingman, from which I no longer expect to tweet. So, please follow @kellblog right now!

Highlights from 2Q09 Software Equity Group Report

I’m not sure which better explains my recent decrease in blog post frequency: bit.ly or being out of the office. Either way, I wasn’t kidding a few weeks ago when I said I’m changing my sharing pattern. Much as popular business authors take one good idea and inflate it into a book, I now realize (thanks to bit.ly) that I have been taking what could have been one good tweet and inflating it into a blog post. While I’ve not drawn any definitive conclusions, thus far I’d say I’m sharing many more articles with significantly less effort than before.

Going forward, my guess is that steady state will be ~2 posts/week (instead of ~5), but those posts will supplemented by 5-10 tweets/day (RSS feed here). Because of this, I’ve added the Tweet Blender widget to my home page, made it quite large, and have set it up to include not only my direct tweets (@ramblingman) but all tweets that include the word ramblingman to catch re-tweets and such. This will probably result in the inclusion of odd items from time to time — apologies if anything offensive comes up — and if this becomes a problem I’ll change the setup.

I’ve re-enabled Zemanta after turning it off for several quarters because I found it too slow to justify its value. They’ve put out a new release, and since I’m interested in all things vaguely semantic web, I figured I’d give it another try. Finally, I’m still considering renaming the blog to either Kellblog or Kellogic, but doing so is a daunting project (think of all the links that break) which I’m not yet ready to tackle at present. So, watch this space.

The purpose of this post, however is to present highlights from the Software Equity Group’s 2Q09 Software Industry Equity Report. Here they are:

  • Consensus IT spending forecasts for 2009 predict 8% decrease in overall spending
  • Top five CTO spending priorities from the Goldman Sachs 3/09 survey: cost reduction, diaster recovery, server virtualization, server consolidation, data center consolidation
  • The SEG software index had a 23.7% positive return, bouncing back from a decline in 1Q09
  • Median enterprise value (EV) / sales = 1.4x, up from 1.2x the prior quarter
  • Median EV/EBITDA = 9.4x, up from 7.7x the prior quarter
  • Median EBITDA margin = 14.9%
  • Median net income margin = 3.9%
  • Median TTM revenue growth = 5.2%
  • Baidu and SolarWinds topped the EV/sales charts with values of 16.2x and 10.0x revenues, respectively
  • The great software arbitrage continues with companies >$1B in revenues having a median EV/sales of 2.2x while those <$100M have a mean of 0.7x. This theoretically means that the median big company can buy a median small one and triple its value overnight.
  • Database companies median EV/sales was 1.8x
  • Document/content management companies median EV/sales was 2.4x
  • Median SaaS vendor EV/sales was 2.6x, suggesting that $1 of SaaS revenue is worth $1.70 of perpetual revneue. (Though I worry the overall average includes SaaS so this could be understating it.)
  • Four software companies went public in 2Q09 raising, on median, $182M with an EV of $814M, an EV/revenue of 3.6x, and a first-day return of 17.3%
  • Five companies remain in the IPO pipeline with median revenues of $58.7M, net income of -$2.2M, and growth of 46.4%
  • 285 software M&A deals were done on the quarter with $3.1B in total value. This was down from 296 deals in the prior quarter worth $7.3B. (The lowest total value in the past 13 quarters.)
Related articles by Zemanta