Category Archives: Enterprise Software

The Customer Acquisition Cost (CAC) Ratio: Another Subtle SaaS Metric

The software-as-a-service (SaaS) space is full of seemingly simple metrics that can quickly slip through your fingers when you try to grasp them.  For example, see Measuring SaaS Renewals Rates:  Way More Than Meets the Eye for a two-thousand-word post examining the many possible answers to the seemingly simple question, “what’s your renewal rate?”

In this post, I’ll do a similar examination to the slightly simpler question, “what’s your customer acquisition cost (CAC) ratio?”

I write these posts, by the way, not because I revel in the detail of calculating SaaS / cloud metrics, but rather because I cannot stand when groups of otherwise very intelligent people have long discussions based on ill-defined metrics.  The first rule of metrics is to understand what they are and what they mean before entertaining long discussions and/or making important decisions about them.  Otherwise you’re just counting angels on pinheads.

The intent of the CAC ratio is to determine the cost associated with acquiring a customer in a subscription business.  When trying to calculate it, however, there are six key issues to consider:

  • Months vs. years
  • Customers vs. dollars
  • Revenue on top vs. bottom
  • Revenue vs. gross margin
  • The cost of customer success
  • Time periods of S&M

Months vs. Years

The first question — which relates not only to CAC but also to many other SaaS metrics:  is your business inherently monthly or annual?

Since the SaaS movement started out with monthly pricing and monthly payments, many SaaS businesses conceptualized themselves as monthly and thus many of the early SaaS metrics were defined in monthly terms (e.g., monthly recurring revenue, or MRR).

While for some businesses this undoubtedly remains true, for many others – particularly in the enterprise space – the real rhythm of the business is annual.  Salesforce.com, the enterprise SaaS pioneer, figured this out early on as customers actually encouraged the company to move to an annual rhythm, for among other reasons, to avoid the hassle associated with monthly billing.

Hence, many SaaS companies today view themselves as in the business of selling annual subscriptions and talk not about MRR, but ARR (annual recurring revenue).

Customers vs. Dollars

If you ask some cloud companies their CAC ratio, they will respond with a dollar figure – e.g., “it costs us $12,500 to acquire a customer.”  Technically speaking, I’d call this customer acquisition cost, and not a cost ratio.

There is nothing wrong with using customer acquisition cost as a metric and, in fact, the more your business is generally consistent and the more your customers resemble each other, the more logical it is to say things like, “our average customer costs $2,400 to acquire and pays us $400/month, so we recoup our customer acquisition cost in six months.”

However, I believe that in most SaaS businesses:

  • The company is trying to run a “velocity” and an “enterprise” model in parallel.
  • The company may also be trying to run a freemium model (e.g., with a free and/or a low-price individual subscription) as well.

Ergo, your typical SaaS company might be running three business models in parallel, so wherever possible, I’d argue that you want to segment your CAC (and other metric) analysis.

In so doing, I offer a few generic cautions:

  • Remember to avoid the easy mistake of taking “averages of averages,” which is incorrect because it does not reflect weighting the size of the various businesses.
  • Remember that in a bi-modal business that the average of the two real businesses represents a fictional mathematical middle.

avg of avg

For example, the “weighted avg” column above is mathematically correct, but it contains relatively little information.  In the same sense that you’ll never find a family with 1.8 children, you won’t find a customer with $12.7K in revenue/month.  The reality is not that the company’s average months to recoup CAC is a seemingly healthy 10.8 – the reality is the company has one very nice business (SMB) where it takes only 6 months to recoup CAC and one very expensive one where it takes 30.  How you address the 30-month CAC recovery is quite different from how you might try to squeeze a month or two out the 10.8.

Because customers come in so many different sizes, I dislike presenting CAC as an average cost to acquire a customer and prefer to define CAC as an average cost to acquire a dollar of annual recurring revenue.

Revenue on Top vs. Bottom

When I first encountered the CAC ratio is was in a Bessemer white paper, and it looked like this.

cac picture

In English, Bessemer defined the 3Q08 CAC as the annualized amount of incremental gross margin in 3Q08 divided by total S&M expense in 2Q08 (the prior quarter).

Let’s put aside (for a while) the choice to use gross margin as opposed to revenue (e.g., ARR) in the numerator.  Instead let’s focus on whether revenue makes more sense in the numerator or the denominator.  Should we think of the CAC ratio as:

  • The amount of S&M we spend to generate $1 of revenue
  • The amount of revenue we get per $1 of S&M cost

To me, Bessemer defined the ratio upside down.  The customer acquisition cost ratio should be the amount of S&M spent to acquire a dollar of (annual recurring) revenue.

Scale Venture Partners evidently agreed  and published a metric they called the Magic Number:

Take the change in subscription revenue between two quarters, annualize it (multiply by four), and divide the result by the sales and marketing spend for the earlier of the two quarters.

This changes the Bessemer CAC to use subscription revenue, not gross margin, as well as inverts it.  I think this is very close to CAC should be calculated.  See below for more.

Bessemer later (kind of) conceded the inversion — while they side-stepped redefining the CAC, per se, they now emphasize a new metric called “CAC payback period” which puts S&M in the numerator.

Revenue vs. Gross Margin

While Bessemer has written some great papers on Cloud Computing (including their Top Ten Laws of Cloud Computing and Thirty Q&A that Every SaaS Revenue Leader Needs to Know) I think they have a tendency to over-think things and try to extract too much from a single metric in defining their CAC.  For example, I think their choice to use gross margin, as opposed to ARR, is a mistake.

One metric should be focused on measuring one specific item. To measure the overall business, you should create a great set of metrics that work together to show the overall state of affairs.

leaky

I think of a SaaS company as a leaky bucket.  The existing water level is a company’s starting ARR.  During a time period the company adds water to the bucket in form of sales (new ARR), and water leaks out of the bucket in the form of churn.

  • If you want to know how efficient a company is at adding water to the bucket, look at the CAC ratio.
  • If you want to know what happens to water once in the bucket, look at the renewal rates.
  • If you want to know how efficiently a company runs its SaaS service, look at the subscription gross margins.

There is no need to blend the efficiency of operating the SaaS service with the efficiency of customer acquisition into a single metric.  First, they are driven by different levers.  Second, to do so invariably means that being good at one of them can mask being bad at the other.  You are far better off, in my opinion, looking at these three important efficiencies independently.

The Cost of Customer Success

Most SaaS companies have “customer success” departments that are distinct from their customer support departments (which are accounted for in COGS).  The mission of the customer success team is to maximize the renewals rate – i.e., to prevent water from leaking out of the bucket – and towards this end they typically offer a form of proactive support and adoption monitoring to ferret out problems early, fix them, and keep customers happy so they will renew their subscriptions.

In addition, the customer success team often handles basic upsell and cross-sell, selling customers additional seats or complementary products.  Typically, when a sale to an existing customer crosses some size or difficultly threshold, it will be kicked back to sales.  For this reason, I think of customer success as handling incidental upsell and cross-sell.

The question with respect to the CAC is what to do with the customer success team.  They are “sales” to the extent that they are renewing, upselling, and cross-selling customers.  However, they are primarily about ARR preservation as opposed to new ARR.

My preferred solution is to exclude both the results from and the cost of the customer success team in calculating the CAC.  That is, my definition of the CAC is:

dk cac pic

I explicitly exclude the cost customer success in the numerator and exclude the effects of churn in the denominator by looking only at the new ARR added during the quarter.  This formula works on the assumption that the customer success team is selling a relatively immaterial amount of new ARR (and that their primary mission instead is ARR preservation).  If that is not true, then you will need to exclude both the new ARR from customer success as well as its cost.

I like this formula because it keeps you focused on what the ratio is called:  customer acquisition cost.  We use revenue instead of gross margin and we exclude the cost of customer success because we are trying to build a ratio to examine one thing:  how efficiently do I add new ARR to the bucket?  My CAC deliberately says nothing about:

  • What happens to the water once S&M pours it in the bucket.  A company might be tremendous at acquiring customers, but terrible at keeping them (e.g., offer a poor quality service).  If you look at net change in ARR across two periods then you are including both the effects of new sales and churn.  That is why I look only at new ARR.
  • The profitability of operating the service.  A company might be great at acquiring customers but unable to operate its service at a profit.  You can see that easily in subscription gross margins and don’t need to embed that in the CAC.

There is a problem, of course.  For public companies you will not be able to calculate my CAC because in all likelihood customer success has been included in S&M expense but not broken out and because you can typically only determine the net change in subscription revenues and not the amounts of new ARR and churn.  Hence, for public companies, the Magic Number is probably your best metric, but I’d just call it 1/CAC.

My definition is pretty close to that used by Pacific Crest in their annual survey, which uses yet another slightly different definition of the CAC:  how much do you spend in S&M for a dollar of annual contract value (ACV) from a new customer?

(Note that many vendors include first-year professional services in their definition of ACV which is why I prefer ARR.  Pacific Crest, however, defines ACV so it is equivalent to ARR.)

I think Pacific Crest’s definition has very much the same spirit as my own.  I am, by comparison, deliberately simpler (and sloppier) in assuming that customer success not providing a lot of new ARR (which is not to say that a company is not making significant sales to its customer base – but is to say that those opportunities are handed back to the sales function.)

Let’s see the distribution of CAC ratios reported in Pacific Crest’s recent, wonderful survey:

pac crest cac

Wow.  It seems like a whole lot of math and analysis to come back and say:  “the answer is 1.

But that’s what it is.  A healthy CAC ratio is around 1, which means that a company’s S&M investment in acquiring a new customer is repaid in about a year.  Given COGS associated with running the service and a company’s operating expenses, this implies that the company is not making money until at least year 3.  This is why higher CACs are undesirable and why SaaS businesses care so much about renewals.

Technically speaking, there is no absolute “right” answer to the CAC question in my mind.  Ultimately the amount you spend on anything should be related to what it’s worth, which means we need relate customer acquisition cost to customer lifetime value (LTV).

For example, a company whose typical customer lifetime is 3 years needs to have a CAC well less than 1, whereas a company with a 10 year typical customer lifetime can probably afford a CAC of more than 2.  (The NPV of a 10-year subscription increasing price at 3% with a 90% renewal rate and discount at 8% is nearly $7.)

Time Periods of S&M Expense

Let me end by taking a practical position on what could be a huge rat-hole if examined from first principles.  The one part of the CAC we’ve not yet challenged is the use of the prior quarter’s sales and marketing expense.  That basically assumes a 90-day sales cycle – i.e., that total S&M expense from the prior quarter is what creates ARR in the current quarter.  In most enterprise SaaS companies this isn’t true.  Customers may engage with a vendor over a period of a year before signing up.  Rather than creating some overlapped ramp to try and better model how S&M expense turns into ARR, I generally recommend simply using the prior quarter for two reasons:

  • Some blind faith in offsetting errors theory.  (e.g., if 10% of this quarter’s S&M won’t benefit us for a year than 10% of a year ago’s spend did the same thing, so unless we are growing very quickly this will sort of cancel out).
  • Comparability.  Regardless of its fundamental correctness, you will have nothing to compare to if you create your own “more accurate” ramp.

I hope you’ve enjoyed this journey of CAC discovery.  Please let me know if you have questions or comments.

Thoughts on the Jive Registration Statement (S-1) and Initial Public Offering (IPO)

I finally found  some time to read over the approximately 175-page registration statement (S-1) that enterprise social networking software provider Jive Software filed on August 24, 2011 in support of a upcoming initial public offering (IPO) of its stock.

In this post, and subject to my usual disclaimers, I’ll share some of my thoughts on reading the document.

Before jumping into financials, let’s look at their marketing / positioning.

  • Jive positions as a “social business software” company.   Nice and clear.
  • Since everyone now needs a Google-esque (“organize the world’s information”) mission statement, Jive has one:  “to change the way work gets done.”  Good, but is change inherently a benefit?  Not in my book.
  • Jive’s tagline is “The New Way To Business.”  Vapid.
  • Since everyone seems to inexplicably love the the tiny-slice-of-huge-market argument in an IPO, Jive offers up $10.3B as the size of the collaborative applications market in 2013.  That this implies about 2% market share in 2013 at steady growth doesn’t seem to bother anyone.  Whither focus and market dominance?

Now, let’s move to financials.  Here’s an excerpt with the consolidated income statement:

The astute reader will notice a significant change in 2010 when Jive Founder Dave Hersh stepped down as CEO and was replaced with ex-Mercury CEO Tony Zingale.  Let’s make it easier to see what’s going by adding some ratios:

Translating some of the highlighted cells to English:

  • Jive does not make money on professional services:  they had a -17% gross margin 2010 and -13% gross margin in 1H11.
  • In 2009,  a very difficult year, Jive grew total revenue 77% and did so with a -15% return on sales.
  • In 2010, Jive grew revenue 54% with a -60% return on sales, while in 1H11, Jive grew revenue 76% with a -64% return on sales.
  • In 2010, Jive increased R&D, S&M, and G&A expense by 127%, 103%, and 132% respectively.
  • In 2010, Jive had a $27.6M operating loss, followed by a $30.6M operating loss 1H11

To say that Jive is not yet profitable is like saying the Tea Party is not yet pro-taxation.  For every $1.00 in revenue Jive earned in 1H11, they lost $0.90. People quipped that the Web 1.0 business model was “sell dollars for ninety cents.”  Jive seems to be selling them for about fifty-three.

But that analysis is unduly harsh if you buy into the bigger picture that:

  • This is the dawn of a large opportunity; a land-grab where someone is going to take the market.
  • You assume that once sold, there are reasonably high switching costs to prevent a customer from defecting to a competitive service.
  • These are subscription revenues.  Buying $1.00 of revenue for $1.90 is foolish on a one-shot deal, but in this case they’re buying a $1.00 annuity per year.  In fact, if you read about renewal rates later on in the prospectus, they’re actually paying $1.90 for a $1.00 annuity that grows at 25% per year.

I’d say this is a clear example of a go-big-or-go-home strategy.  You can see the strategic tack occurring in 2010, concurrent with the management change.  And, judging by the fact that they’re filing an S-1, it appears to be working.

Before moving on, let’s look at some ratios I calculated off the income statement:

You can see the strategy change in the highlighted cells.

  • Before the change, Jive spent $1.16 to get a dollar of revenue.  After, they spent $1.90.
  • Before, they got $2.91 of incremental revenue per incremental operating expense.  After, they got $0.90.  (It looks similar on a billings basis.)
  • Before, they got $6.76 of incremental product revenue per incremental S&M dollar.  After, they got $1.73.

Clearly, the change was not about efficiency.  You could argue that it was either about growth-at-all-costs or, more strategically, about growth as a landgrab.

But we’re only on page 6 of the prospectus, so we’re going to need to speed up.

Speaking of billings and revenues, let’s hear what Jive has to say:

We consider billings a significant leading indicator of future recognized revenue and cash inflows based on our business model of billing for subscription licenses annually and recognizing revenue ratably over the subscription term. The billings we record in any particular period reflect sales to new customers plus subscription renewals and upsell to existing customers, and represent amounts invoiced for product subscription license fees and professional services. We typically invoice the customer for subscription license fees in annual increments upon initiation of the initial contract or subsequent renewal. In addition, historically we have had some arrangements with customers to purchase subscription licenses for a term greater than 12 months, most typically 36 months, in which case the full amount of the agreement will be recognized as billings if the customer is invoiced for the entire term, rather than for an annual period.

The following table sets forth our reconciliation of total revenues to billings for the periods shown:

This says that billings is equal to revenue plus the change in deferred revenue.  Billings is a popular metric in SaaS companies, though often imputed by financial analysts, because revenue is both damped and seen as a dependent variable.  Billings is seen as the purer (and more volatile) metric and thus seen by many as a superior way to gauge the health of the business.

For Jive, from a growth perspective, this doesn’t strike me as particularly good news since billings, which were growing 99% in 2010, are growing at 59% in 1H11, compared to revenue which is growing at 76%.

Now we’re on page 8.  Happily the next 20 pages present a series of valid yet unsurprisingly risk factors that I won’t review here, though here are a few interesting extracted tidbits:

  • The company had 358 employees as of 6/30/11.
  • They plan to move from third-party hosted data centers to their own data centers.
  • Subscription agreements typically range from 12 to 36 months.
  • They do about 20% of sales internationally.
  • They recently completed three acquisitions (FiltrboxProximal,  OffiSync).
  • There is a 180 day lockup period following the offering.

Skipping out of page-by-page mode, let me pull some other highlights from the tome.

  • There were 44M shares outstanding on 6/30/11, excluding 15M options, 0.8M in the options pool, 0.9M shares subject to repurchase.  That, by my math, means ~59M fully-diluted shares outstanding after the offering.
  • Despite having $44.6M in cash on 6/30/11, they had a working capital deficit of $15.9M.
  • The Jive Engage Platform was launched in February 2007.  In August 2007, the company raised its first external capital.
  • The Jive Engage Platform had 590 customers as of 12/31/10, up from 468 at 12/31/09.  There were 635 as of 6/30/11.
  • The dollar-based renewal rate, excluding upsell, for 1H11 for transactions > $50K was over 90%.  Including upsell, the renewal rate was 125%.
  • Public cloud deployments represented 59% of product revenues in 1H11.
  • The way they recognize revenue probably hurts the professional services performance because they must ratably take the PSO revenue while taking the cost up-front.

One thing soon-to-be-public companies need to do is gradually align the common stock valuation with the expected IPO price to avoid a huge run-up in the weeks preceding the IPO.  Gone are the days where you can join a startup, get a rock-bottom strike price on your options, and then IPO at ten times that a few weeks later.  Companies now periodically do section 409a valuations in order to establish a third-party value for the common stock.  Here’s a chart of those valuations for Jive, smoothed to a line, over the 18 months prior to the filing.

This little nugget was interesting on two levels, bolded:

The core application of the Jive Engage Platform is written in Java and is optimized for usability, performance and overall user experience. It is designed to be deployed in the production environments of our customers, runs on top of the Linux operating system and supports multiple databases, including Microsoft SQL Server, MySQL, Oracle and PostgreSQL. The core application is augmented by externally hosted web-based services such as a recommendation service and an analytics service. We have made investments in consolidating these services on a Hadoop-based platform.

First, it seems to suggest that it’s not written for the cloud / multi-tenancy (which, if true, would be surprising) and second, it suggests that they are investigating Hadoop which is cool (and not surprising).

More tidbits:

  • 105 people in sales as of 6/30/11
  • 122 people in R&D as of 6/30/11
  • Executives Tony Zingale (CEO), Bryan LeBlanc (CFO), John McCracken (Sales), and Robert Brown (Client Services) all worked at Mercury Interactive.  The latter three were brought in after Zingale was made a director (10/07) but well before he was appointed CEO (2/10).
  • Zingale beneficially owns 7.5% of the company pre-offering.  This is high by Silicon Valley standards, but he’s a big-fish CEO in a small-pond company.
  • Sequoia Capital beneficially owns 36% of the company.  Kleiner Perkins owns 14%.
  • I think Sequoia contributed $37M of the $57M total VC raised (though I can only easily see $22M in the S-1).
  • If that’s right, and if Sequoia eventually exits Jive at a $1B market cap, that means they will, on average across funds, get a ~10x return on their investment.  $2B would give them 20x.

What’s left of my brain has officially melted at page F-11.  If I dig back in and find anything interesting, I’ll update the post.  Meantime, if you have questions or comments, please let me know.

As a final strategic comment, I’d say that investors should consider the possibility of an increased level of competition from Salesforce.com, given their massive push around “the social enterprise” at Dreamforce 11.

Quick Take on the Dassault Systèmes Acquisition of Exalead

Today, in what I consider a surprising move, French PLM and CAD vendor Dassault Systèmes announced the acquisition of French enterprise search vendor Exalead for €135M or, according to my calculator, $161M.  Here is my quick take on the deal:

  • While I don’t have precise revenue figures, my guess is that Exalead was aiming at around $25M in 2010 revenues, putting the price/sales multiple at 6.4x current-year sales, which strikes me as pretty good given what I’m guessing is around a 25% growth rate.  (This source says $21M in software revenue, though the year is unclear and it’s not clear if software means software-license or software-related.  This source, which I view as quite reliable, says $22.7M in total revenue in 2009 and implies around 25% growth.  Wikipedia says €15.5M in 2008 revenues, which equals exactly $22.7M at the average exchange rate.  This French site says €12.5M in 2008 revenues.  The Qualis press release — presumably an excellent source — says €14M ($19.5M) in 2009 revenues.  Such is the nature of detective work.)
  • I am surprised that Dassault would be interested in search-based applications, Exalead’s latest focus.  While PLM vendors have always had an interest in content delivery and life-cycle documentation (e.g., a repair person entering feedback on documentation that directly feeds into future product requirements) , I’d think they want to buy a more enterprise techpubs / DITA vendor than a search vendor to do so as in the PTC / Arbortext deal of 2005.  Nevertheless, Dassault President and CEO Bernard Charlès said that with Exalead they could build “a new class of search-based applications for collaborative communities.”  There is more information, including a fairly cryptic video which purports to explain the deal, on a Dassault micro-site devoted to the Exalead acquisition, which ends with the phrase:  search-based applications for lifelike experience.  Your guess as to what that means is as good as mine.
  • A French investment firm called SCA Qualis owned 83% of Exalead steadily building up its position from 51% in 2005 to 83% in 2008, through successive rounds of €5M, €12M and €5M in 2005, 2006, and 2008 respectively.  This causes me to question the CrunchBase’s profile that Exalead had raised a total of $15.6M.  (You can see €22M since 2005 and the company was founded in 2000.  I’m guessing there was $40M to $50M invested in total, though some reports are making me think it’s twice that.)
  • The prior bullet suggests that Qualis took $133M of the sale price and everybody else split $27M, assuming there were no active liquidation preferences on the Qualis money.
  • Given the European-focus, the search-focus, and the best-and-brightest angle (Exalead had more than its share of impressive grandes écoles graduates), one wonders why Autonomy didn’t end up owning Exalead, as opposed to a PLM/CAD company.  My guess is Autonomy took a look, but the deal got too pricey for them because they are less interested in paying up for great technology and more interested in buying much larger revenue streams at much lower multiples.  In some sense, Autonomy’s presumed “pass” on this deal is more proof that they are no longer a technology company and instead a CA-like, Oracle-like financial consolidation play.  (By the way, there’s nothing wrong with being a financial play in my view; I just dislike pretending to be one thing when you’re actually another.)
  • One wonders what role, if any, the other French enterprise search vendor, Sinequa, played in this deal.  They, too, have some great talent from France’s famed Ecole Polytechnique, and presumably some nice technology to go along with it.

Here are some links to other coverage of the deal

IDC’s Definiton of Search-Based Applications

Sue Feldman and the team over at IDC are talking about a new category / trend called search-based applications, and I think they may well be onto something.

Because I believe that IDC puts real thought and rigor into definitions, I pay attention when I see them attempting to define something. From past experience, IDC was about 10 years ahead of the market in predicting the convergence of BI and enterprise applications with — even in the mid 1990s — a single analyst covering both ERP and BI.

Here’s how IDC describes search-based applications.

Search-based applications combine search and/or text analytics with collaborative technologies, workflow, domain knowledge, business intelligence, or relevant Web services. They deliver a purpose-designed user interface tailored to support a particular task or workflow. Examples of such search-based applications include e-Discovery applications, search marketing/advertising dashboards, government intelligence analysts’ workstations, specialized life sciences research software, e-commerce merchandising workbenches, and premium publishing subscriber portals in financial services or healthcare.

There are many investigative or composite, text- and data-centric analysis activities in the enterprise that are candidates for innovative discovery and decision-support applications. Many of these activities are carried out manually today. Search-based applications provide a way to bring automation to a broad range of information worker tasks.

Some vendors are jumping whole hog into the nascent category. For example, French Internet and enterprise search vendor Exalead has jumped in with both feet, making search-based applications a key war cry in their marketing. In addition, Exalead’s chief science officer, Gregory Grefenstette, seems a like match to the “Ggrefen” credited in Wikipedia with the creation of the search-based applications page.

Another vendor jumping in hard is Endeca, with the words “search applications” meriting the largest font on their homepage.

While you could argue that this is yet-another, yet-another focus for Endeca, clearly the folks in marketing — at least — are buying into the category.

At Mark Logic, we are not attempting to redefine ourselves around search-based applications. Our product is an XML server. Our vision is to provide infrastructure software for the next generation of information applications. We believe that search-based applications are one such broad class of information applications. That is, they are yet another class of applications that are well suited for development on MarkLogic Server.

So, if you’re thinking about building something that you consider a search-based application, then be sure to include us on your evaluation list.

XML: YAFF, YADT, or Whole World?

If you have a bunch of XML and are looking for of a place to put it, then I think I may have come up with a simple test that might be helpful.

In talking with prospective vendors of XML repositories (definition: software that lets you store, search, analyze and deliver XML), try to establish what I’ll call “XML vision compatibility.” Quite simply, try to figure out if the vendor’s vision of XML is consistent with your own. To help with that exercise, I’ll define what I see as the three common XML vendor visions:

  • YAFF (yet another file format)
  • YADT (yet another data type)
  • Whole world

YAFF Vendors
Vendors with the YAFF vision view XML as yet another file format. ECM vendors clearly fall into this category (“oh yes, XML is one of the 137 file formats you can manage in our system”). So do enterprise search vendors (“oh yes, we have filters for XML formatted files which clear out all those nasty tags and feed our indexing engine the lovely text.”)

For example, let’s look at how EMC Documentum — one of the more XML-aggressive ECM vendors — handles XML on its website.

Hmm. There’s no XML on that page. But lots of information about records management, digital asset management, document capture, collaboration and document managent (it’s not there either). Gosh, I wonder where it is? SAP integration? Don’t think so. Hey, let’s try Documentum Platform, whatever that is.

Not there, either. Now that’s surprising because I really have no idea where else it might be. Oh, wait a minute. I didn’t scroll the page down. Let’s try that.

There we go. We finally found it. I knew they were committed to XML. What’s going on here is that EMC has a huge, largely vendor consolidation-driven (e.g., Documentum, Captiva, Document Sciences, x-Hive, Kazeon) vision of what content management is. And XML is just one tiny piece of that vision. XML is, well, yet another file format among the scores that they have manage, archive, capture, and provide workflow, compliance, and process management against. The vision isn’t about XML. It’s about content. That’s nice if you have an ECM problem (and a lot of money to solve it); t’s not so nice if you have an XML problem, or more precisely a problem that can be solved with XML.

YADT Vendors
Vendors with the YADT vision view XML as yet another data type. These are the relational database management system vendors (e.g., Oracle) who have decided that the best way to handle XML is to make it a valid datatype for a column in a table.

The roots of this approach go back to the late 1980s and Ingres 6.3 (see this semi-related blast from the past) which was the first commercial DBMS to provide support for user-defined datatypes. All the primitives for datatyping were isolated from the core server code and made extensible through standard APIs. So, for example, if you wanted to store complex numbers of the form (a, bi) all you had to do was to write some primitives so the server would know:

  • What they look like — i.e., (a, bi)
  • Any range constraints (the biggest, the smallest)
  • What operators should be available (e.g., +, -)
  • How to implement those operators — (a, bi) + (c, di) = (a+c, (b+d)i)

It was — far as I remember — yet another clever idea from the biggest visionary in database management systems after Codd himself: Michael Stonebraker then of UC Berkeley and now of MIT. After founding Ingres, Stonebraker went on found Illustra which was all about “datablades” — a sexy new name for user-defined types. Datablades, in turn, became sexy bait for Informix to buy the company with an eye towards leveraging the technology towards unseating Oracle from its leadership position. It didn’t happen.

User-defined datatypes basically didn’t work. There were two key problems:

  • You had user-written code running in the same address space as the database server. This made it nearly impossible to determine fault when the server crashed. Was it a database server bug, or did the customer cause problem in implementing a UDT? While RDBMS customers were well qualified to write applications and SQL, writing server-level was quite another affair. This was a bad idea.
  • Indexing and query processing performance. It’s fairly simple to say that, for example, a text field looks like a string of words and the + operator means concatenate. It’s basically impossible for a end customer to tell the query optimizer how to process queries involving those text fields and how to build indexes that maximize query performance. If getting stuff into UDTs was a level-5 challenge, getting stuff back out quickly was a level-100 one.

So while the notion of end users adding types to a DBMS basically failed, when XML came along the database vendors dusted off this approach, in saying effectively: let use all those hooks we put in to build support for XML types ourselves. And they did. Hence what I call the “XML column” approach to storing XML in a relational database.

After all, if your only data modeling element’s a table, then every problem looks like a column.

Now this approach isn’t necessarily bad. If, for example, you have a bunch of resumes and want to store attribute data in columns (e.g., name, address, phone, birthdate) and keep an XML copy of the resume alongside, then this might be a reasonable way to do things. That is, if you have a lot of data and a touch of XML, this may be the right way to do things.

So again, it comes down to vision alignment. If XML is just another type of data that you want to store in a column, then this might work for you. Bear in mind you’ll:

  • Probably have to setup separate text and pre-defined XML path indexes (a hassle on regular schemas, an impossibility on irregular ones),
  • Face some limitations in how those indexes can be combined and optimized in processing queries,
  • Need to construct frankenqueries that mix SQL and XQuery, whose mixed-language semantics are sometimes so obscure that I’ve seen experts argue for hours about what the “correct” answer for a given queries is,
  • And suffer from potentially crippling performance problems as you scale to large amounts of XML.

But if those aren’t problems, then this approach might work for you.

This is what it looks like when a vendor has a YADT vision. Half the fun in storing XML in an RDBMS is figure out which query language and which store options you want to use. See the table that starts on page 9, spans four pages, and considers nearly a dozen criteria to help you decide which of the three primary storage options you should use:

See this post from IBM for more Oracle-poking on the complexity of storage options available. Excerpt:

Oracle has long claimed that the fact that Oracle Database has multiple different ways to store XML data is an advantage. At last count, I think they have something like seven different options:

  • Unstructured
  • XML-Object-Relational, where you store repeating elements in CLOBs
  • XML-Object-Relational, where you store repeating elements in VARRAY as LOBs
  • XML-Object-Relational, where you store repeating elements in VARRAY as nested tables
  • XML-Object-Relational, where you store repeating elements in VARRAY as XMLType pointers to BLOBs
  • XML-Object-Relational, where you store repeating elements in VARRAY as XMLType pointers to nested tables
  • XML-Binary

Their argument is that XML has diverse use cases and you need different storage methods to handle those diverse use cases. I don’t know about you, but I find this list to be a little bewildering. How do you decide among the options? And what happens if you change your mind and want to change storage method?

Such is life in the land of putting XML in tables because your database management system has columns.

Whole World Vendors
Vendors with the whole world vision view XML as, well, their whole world.

And when I say XML, I don’t mean information that’s already in XML. I mean information that is either already in XML (e.g., documents, information in any horizontal or industry-specific XML standard) or that is best modeled in XML (e.g., sparse data, irregular information, semi-structured information, information in no, multiple, and/or time-varying schemas).

“Whole world” vendors don’t view XML as one format, but as a plethora: docbook, DITA, s1000d, xHMTL, TEI, XBRL, the HL7 standards in healthcare, the Acord standards in insurance, Microsoft’s Open Office XML format, Open Document Format, Adobe’s IDML, chemical markup lanuage, MathML, the DoD’s DDMS metadata standard, semantic web standards like RDF and OWL, and scores of others.

Whole world vendors don’t view XML tags as “something that get in the way of the text” and thus they don’t provide filters for XML files. Nor do they require schema adherence because they know that XML schema compliance, in real life, tends to be more of an aspiration than a reality. So they allow you load and index XML, as is, avoiding the first step’s a doozy problem, and enabling lazy clean-up of XML information.

Whole world vendors don’t try to model XML in tables simple because they have a legacy tabular data model. Instead, their native modeling element (NME) is the XML document. That is:

  • In a hierarchical DBMS the NME is the hierarchy
  • In a network DBMS the NME is the graph
  • In a relational DBMS the NME is the table
  • In an object DBMS the NME is the object class hierarchy
  • In an OLAP, or multi-dimensional, DBMS the NME is the hypercube
  • And in an XML server, or native XML, DBMS the NME is the XML document

Whole world vendors don’t bolt a search engine to a DBMS because they know XML is often document-centric, making search an integral function, and requiring a fundamentally hybrid search/database — as opposed to a bolted-together search/database — approach.

Here is what it looks like when you encounter a whole world vendor:

Reblog this post [with Zemanta]

Highlights from 2Q09 Software Equity Group Report

I’m not sure which better explains my recent decrease in blog post frequency: bit.ly or being out of the office. Either way, I wasn’t kidding a few weeks ago when I said I’m changing my sharing pattern. Much as popular business authors take one good idea and inflate it into a book, I now realize (thanks to bit.ly) that I have been taking what could have been one good tweet and inflating it into a blog post. While I’ve not drawn any definitive conclusions, thus far I’d say I’m sharing many more articles with significantly less effort than before.

Going forward, my guess is that steady state will be ~2 posts/week (instead of ~5), but those posts will supplemented by 5-10 tweets/day (RSS feed here). Because of this, I’ve added the Tweet Blender widget to my home page, made it quite large, and have set it up to include not only my direct tweets (@ramblingman) but all tweets that include the word ramblingman to catch re-tweets and such. This will probably result in the inclusion of odd items from time to time — apologies if anything offensive comes up — and if this becomes a problem I’ll change the setup.

I’ve re-enabled Zemanta after turning it off for several quarters because I found it too slow to justify its value. They’ve put out a new release, and since I’m interested in all things vaguely semantic web, I figured I’d give it another try. Finally, I’m still considering renaming the blog to either Kellblog or Kellogic, but doing so is a daunting project (think of all the links that break) which I’m not yet ready to tackle at present. So, watch this space.

The purpose of this post, however is to present highlights from the Software Equity Group’s 2Q09 Software Industry Equity Report. Here they are:

  • Consensus IT spending forecasts for 2009 predict 8% decrease in overall spending
  • Top five CTO spending priorities from the Goldman Sachs 3/09 survey: cost reduction, diaster recovery, server virtualization, server consolidation, data center consolidation
  • The SEG software index had a 23.7% positive return, bouncing back from a decline in 1Q09
  • Median enterprise value (EV) / sales = 1.4x, up from 1.2x the prior quarter
  • Median EV/EBITDA = 9.4x, up from 7.7x the prior quarter
  • Median EBITDA margin = 14.9%
  • Median net income margin = 3.9%
  • Median TTM revenue growth = 5.2%
  • Baidu and SolarWinds topped the EV/sales charts with values of 16.2x and 10.0x revenues, respectively
  • The great software arbitrage continues with companies >$1B in revenues having a median EV/sales of 2.2x while those <$100M have a mean of 0.7x. This theoretically means that the median big company can buy a median small one and triple its value overnight.
  • Database companies median EV/sales was 1.8x
  • Document/content management companies median EV/sales was 2.4x
  • Median SaaS vendor EV/sales was 2.6x, suggesting that $1 of SaaS revenue is worth $1.70 of perpetual revneue. (Though I worry the overall average includes SaaS so this could be understating it.)
  • Four software companies went public in 2Q09 raising, on median, $182M with an EV of $814M, an EV/revenue of 3.6x, and a first-day return of 17.3%
  • Five companies remain in the IPO pipeline with median revenues of $58.7M, net income of -$2.2M, and growth of 46.4%
  • 285 software M&A deals were done on the quarter with $3.1B in total value. This was down from 296 deals in the prior quarter worth $7.3B. (The lowest total value in the past 13 quarters.)
Related articles by Zemanta

Goldman Sachs Smacks Software Stocks

See this story on SeekingAlpha (which might consider renaming itself SeekingShelter), entitled Goldman Slaps Most Software Stocks.

Excerpt on aggregate spending:

The worst of the IT-spending slowdown likely remains in front of us, as we start the clock on slashed 2009 budgets. We forecast 0 percent revenue growth for our group, below consensus at 5 percent, and 1 percent earnings growth, below Street at 2 percent.

The most interesting point addressed is whether the downturn will drive consumers to open source (i.e., nominally “free”) software:

There has been much discussion in the blogosphere about open source software and how it will see a surge of adoption do to its lower cost. Goldman quite rightly says this will not be the case. I have written that CIOs will hunker down and stick with the tried and true (which is not open source in most large-sized enterprises) and Goldman is in agreement, seeing a consolidation of functionality with big, established vendors and a moving away from the concept of seeking best-of-breed point solutions regardless of vendor.

On sectors:

So in terms of non-defense technology companies we are batting two for two: Neither hardware not software will be spared over the next several quarters as the outlook remains dim for both.

Happily for Mark Logic we have a large defense / intelligence business, which I believe will offer shelter from the storm. And, as I’ve argued before, for non-advertising-driven media companies, I believe that GDP growth (or lack thereof) is a second-order effect relative to seismic changes driven by the Internet and Google to which MarkLogic helps them respond.