Category Archives: Open source

Open Source Business Models, Revisited

I had breakfast the other day with Mike Olson, CEO of Hadoop ecosystem leader, Cloudera.  We met because we run in similar circles in data management land and because Mike had some quibbles with my post, The Open Source Software Paradox.

My premise was that open source presents a fundamental paradox:   the larger the community, the better the software, and the less people need to buy support for it.  Thus, that open source market opportunities were inherently flawed / paradoxical because you could only sell services for projects  that were not terribly successful.  Simply put,

You can have a large community who doesn’t need to buy from you or a small community who does.

I think Mike’s overall take on my post was “1990s thinking” because things have evolved over the past decade and businesses now try to monetize open source opportunities in more sophisticated ways.  This approach doesn’t actually contradict the paradox I observed, but instead looks  for more creative ways around it.

Another key point Mike made was that open source is not a business model.  I agree.  Open source is a way of developing software.  There are many different possible business models for monetizing open source projects.

Rather than attempt to replay the back-and-forth of our discussion, I will simply list my revised take on the 4 basic open source business models.

  • Professional services.  The most basic way to make money around an open source project is to offer related consulting (and training) services.  For example, ThinkBigAnalytics, seems to  building a consulting business around Hadoop and NoSQL databases (most of which are also open source).
  • Dual licensing.  A vendor offers (1) a free version under the GPL license which freely enables internal use but contaminates on redistribution and (2) a paid version under a different license that doesn’t include GPL’s copyleft provisions.  This model reeks of the vig as you force people under threat (of open sourcing their system) if they don’t move to the non-GPL version.  In addition, since SaaS or cloud services use but don’t redistribute software, this approach loses its teeth in the SaaS / cloud world.
  • Open core.  A vendor promotes an open source version of a system and makes money by extending it with proprietary additions.  In this model, the vendor “has some IP” and is not totally dependent on support subscriptions which may or may not be renewed.  Cloudera is executing this strategy by offering both (1) the Cloudera Distribution on an Apache license as well as (2) Cloudera Enterprise which is built on the Cloudera Distribution but also includes production support and management applications.

The open core model clearly sidesteps the paradox I’d outlined because open core vendors offer more than support.  Open core is a freemium business model and possesses all the strengths and suffers from all the weaknesses of other freemium models.

  • First, can you build a large community on the free version or service?
  • Second, through what mechanism and at what cost you monetize members of that community to a higher-level service?
  • Third, once monetized at what rate can you keep premium members renewing the premium service or moving them up to an even higher service level?

LinkedIn has done freemium spectacularly well.  I’ve never paid them a dime (as a free service user) but somebody paid them the ~$250M they made in the first 9 months of the year.  (Turns out it’s about 33% each of premium subscriptions, hiring solutions, and marketing solutions.)

The newspapers still haven’t figured out freemium though FT and The New York Times are making headway.

How will open core play out for open source vendors?  I don’t know.  I do know the freemium code is hard to crack.  I do know that freemium models are constantly evolving.  I do believe that freemium is a better business model than simply offering support or services.  And with the  IPO window opening, I do believe we may get a chance to see the financials of a few open core companies in the coming years.

The Open Source Software Paradox

As a marketer, I’m a fan of open source software.   After all, if you can’t dislodge Microsoft from mid-range server operating systems, Microsoft Office from desktop productivity suites, or Oracle from relational databases — and doing so through traditional means is a virtual impossibility —  then blowing up the whole business model isn’t a bad start.  It’s creative and it cuts right to the core of the problem.

But as a business-person I am not.  When you play the role of market spoiler it’s much easier to be famous than rich.  For example, when MySQL was acquired by Sun in 2008 for $1.2B, MySQL was doing only about $65M in annual revenues.  While the revenue multiple on the exit was spectacular, their capture rate was not:  MySQL disrupted literally billions in “big three” (i.e., Oracle, DB2, SQL Server) database revenues.  But if your value proposition is rooted in “almost free relative to leading commercial alternatives,” then you won’t succeed at 50% of their cost; you’ll need to be more like 2-5%.

I refer to open source as both a development model —  i.e., a way of building software — and a business model.  While the former is more well defined than the latter, the typical way to make money in open source is through selling subscriptions or licenses to certified and more-quickly-patched releases as well as selling technical support and/or consulting services to go with them.

While a spectacular exit multiple may occasionally pay off big time for shareholders (e.g., JBoss, MySQL), my theory is that in general it’s very hard to make money with the open source business model.  Red Hat is the obvious exception, and we’ll talk about them in a minute.

The basic paradox of open source is this:

  • The smaller the community the worse the software quality and the more people need certified releases and support.
  • The bigger the community the higher quality the software and the less people need certified releases and support (i.e., the community version will do).

So you can have a large community who doesn’t need to buy from you or a small community who does.

Two other drivers complete the picture:

  • The nature of the software and to what extent it truly requires an almost-daily stream of patches and updates and …
  • The monetization rate which is a function of the commercial market structure.  For example, the lower-level the software (e.g., operating systems) the more the market tends towards natural monopoly as customers want to minimize entropy at the bottom of the stack.  This should drive high pricing/margins on the commercial side of the market, and a parallel opportunity for someone to establish clear leadership on the open source side.

This is why Red Hat does so well when most others end up stagnating in the tens-of-millions of revenues range. The market is huge.  The software is low-level and thus the market “wants” a clear leader (think:  increasing returns) who can provide a hardware-independent, low-cost, supported product as an alternative to the proprietary Unix-es of days past.

Put differently, the bigger the commercial market and the more monopolistic its structure, the better the open source opportunity.  Conversely, the smaller the commercial market and the more fragmented leadership is within it (e.g., enterprise search, document management, and to some extent BI), the worse the open source opportunity.

Yes, Virginia, MarkLogic is a NoSQL System

The other day I noticed a taxonomy used on one of the NoSQL Database blogs that went like this:

Types of NoSQL systems

  • Core NoSQL Systems
    • Wide column stores
    • Document stores
    • Key-value / tuple stores
    • Eventually consistent key-value stores
    • Graph databases
  • Soft NoSQL Systems (not the original intention …)
    • Object databases
    • Grid database solutions
    • XML databases
    • Other NoSQL-related databases

I, perhaps obviously, take some umbrage at having MarkLogic (acceptably classified as an XML database) being declared “soft NoSQL.”  In this post I’ll explain why.

Who decided that being open source was a requirement to be real NoSQL system?  More importantly, who gets to decide?  NoSQL – like the Tea Party – is a grass-roots, effectively leaderless movement towards relational database alternatives.  Anyone arguing original intent of the founders is misguided because there is no small group of clearly identified founders to ask.  In reality, all you can correctly argue is what you think was the intent of the initial NoSQL developers and early adopters, or — perhaps more customarily — why you were drawn to them yourself, disguised or confused as original founder intent.

As mentioned here, movements often appear homogeneous when they are indeed heterogeneous.  What looks like a long line of demonstrators protesting a single cause is in fact a rugby scrum of different groups pushing in only generally aligned directions.  For example, for each of the following potential motivations, I am certain that I can find some set of NoSQL advocates that are motivated by it:

  • Anger at Oracle’s heavy-handed licensing policies
  • The need to store unstructured or semi-structured data that doesn’t fit well into relations
  • The impedance mismatch with relational databases
  • A need and/or desire to use open source
  • An attempt to reduce total cost
  • A desire to land at a different point in the Brewer CAP Theorem triangle of consistency, availability, and partition tolerance
  • Coolness / wannabe-ism, as in, I want to be like Google or Facebook

(Since this was a source of confusion in prior posts, note that this is not to claim the inverse:  that all NoSQL advocates are motivated by all of the possible motivations.)

I’d like to advocate a simple idea:  that NoSQL means NoSQL.  That a NoSQL system is defined as:

A structured storage system that is not based on relational database technology and does not use SQL as its primary query language

In short, my proposed definition means that NoSQL (broadly) = NoSQL (literally) + NoRelational.  In short:  relational database alternatives.  It does not mean:

  • NoDBMS.  We should not take NoSQL to exclude systems we would traditionally define as DBMSs.  For example, supporting ACID transactions or supporting a non-SQL query language (e.g., XQuery) should not be exclusion criteria for NoSQL.
  • NoCommercialSoftware.  While many of the flagship NoSQL projects (e.g., Hadoop, CouchDB) are open source projects, that should be not a defining criterion.  NoSQL should be a technological, not a delivery- or business-model, classification.  Technology and delivery model are orthogonal dimensions.   We should be able to speak of traditionally licensed, open source licensed, and cloud-hosted NoSQL systems if for no other reason than understanding the nuances of the various business/delivery models is a major task unto itself.  Do you mean open source or open core?  Is it open source or faux-pen source?  Under which open source license?  How should I think of a hosted subscription service that is a based on or a derivative of an open source project?

Recently, I’ve heard a piece of backpeddling that I’ve found rather irritating:  that NoSQL was never intended to mean “no SQL,” it was actually intended to mean “not only SQL.”  Frankly, this strikes me as hogwash:  uh oh, I’m afraid that people are seeing us as disruptors and it’s probably easier to penetrate the enterprise as complementary, not competitive, so let’s turn what was a direct assault into a flanking attack.

To me, it’s simple:  NoSQL means NoSQL.  No SQL query language and no relational database management system.  Yes, it’s disruptive and — by some measures — “crazy talk” but no, we shouldn’t hide because there are lots of perfectly valid (and now socially acceptable) reasons to want to differ from the relational status quo.

In effect, my definition of NoSQL is relational database alternative.  Such options include both alternative databases (e.g., MarkLogic) and database alternatives (e.g., key/value stores).  This, of course, then cuts at your definition of database management system where I (for now at least) still require the support of a query language and the option to have ACID transactions.

By the way, I understand the desire to exclude various bandwagon-jumpers from the NoSQL cause.  Like most, I have no interest in including thrice-reborn object databases in the discussion, but if the cost of excluding them is excluding systems like MarkLogic then I think that cost is too high.  Many people contemplating the top-of-mind NoSQL systems (e.g., Hadoop) could be better served using MarkLogic which addresses many typical NoSQL concerns, including:

  • Vast scale
  • High performance
  • Highly parallel shared-nothing clusters
  • Support for unstructured and semi-structured data

All with all the pros (and cons) of being a commercial software package and without requiring reduced consistency:  losing a few Tweets won’t kill Twitter, but losing a few articles, records, or individuals might well kill a patient, bank, or counter-terrorism agency.  BASE is fine for some; many others still need ACID.  Michael Stonebraker has some further points on this idea in this CACM post.

I’d like to suggest that we should combine the ideas in this post with the ideas in my prior one, Classifying Database Management Systems.  That post says the correct way to classify DBMSs is by their native modeling element (e.g., table, class, hypercube).  This post says that NoSQL is semi-orthogonal – i.e., I can imagine a table-oriented database that doesn’t use SQL as its query language, but I doubt that any exist.  Applying my various rules, the combined posts say that:

  • Aster is a SQL database optimized for analytics on big data
  • MarkLogic is an XML [document] database optimized for large quantities of semi-structured information and a NoSQL system
  • CouchDB is a document database and a NoSQL system
  • Reddis is a key/value store and a NoSQL system
  • VoltDB is a SQL database optimized to solve one of the two core problems that NoSQL systems are built for (i.e., high-volume simple processing)

Finally, I’d conclude that even with these rules I have trouble classifying MarkLogic because of multiple inheritance:  MarkLogic is both a document database and an XML database, it is difficult to pick one over the other, and I there certainly are non-document-oriented XML database systems.   Similar issues exist with classifying the various hybrids of document databases and key/value stores.  So while I may have more work to do on building an overall taxonomy, I am absolutely sure about one thing:  MarkLogic is a NoSQL system.


* The “Yes, Virginia” phrase comes from a 1897 story in the New York Sun.  For more, see here.

Dear CIO: Stop Writing Big Checks for Commodity (Database) Software

Dear CIO,

What’s wrong this picture?

  • At 50%+, Oracle’s operating margins have never been higher
  • The differentiation of Oracle’s database technology, however, has never been lower and the number of both core and specialized alternatives has never been greater.

So what’s going on? You, kind Sir or Madam, are being milked. What’s worse is that you, in an example of collective behavioral dysfunction, have inadvertently played a role in setting up the milking. What happened?

  • Like all smart CIOs you followed a bit of herd mentality when it came to core technology. Pity the poor fools who, back in the day, bet big on Ingres or Sybase. You played it safe and went with Oracle, IBM, or if your requirements weren’t too heavy, Microsoft.
  • The problem is, of course, that everyone executed the same strategy you did. Hence, the market created a system of increasing returns where the strong vendors got stronger and the weak ones died. The result: the RDBMS market is an (order of magnitude) $10B/year market, structured as an oligopoly with 3 players. Most other software markets worked out the same way.
  • You were focused on standardization. You realized that through a combination of decentralized IT decision making and growth-by-acquisition your organization had become a kitchen sink of enterprise software. You had everything. In order to reduce the administrative, training, and license acquisition costs, you fought tooth and nail with your divisions to standardize the environment. You said, “Heck, it’s all the same stuff in the end, folks, so let’s make Oracle our DBMS standard, Business Objects our BI standard, Documentum our ECM standard, and SAP our ERP standard.”
  • And you won. Mostly. There’s still some Cognos in finance. And marketing didn’t totally give up on Interwoven. But, for the most part, you won. You reduced the entropy of your IT environment and drove cost savings for your organization.

The problem is you’ve won the battle but lost the war. Why? Because if, as you say, the “stuff really is all the same” you shouldn’t standardize on the most expensive product. You should standardize on the cheapest.

  • Do you really need to be paying those big fees to Oracle for enterprise licenses? Wouldn’t MySQL do?
  • Are you really using all the functionality of that $1M/year Documentum ECM system? Wouldn’t SharePoint or Alfresco do?
  • For BI, do you need all the bells and whistles of BusinessObjects? Wouldn’t Pentaho or Qlikview do a fine job, at a fraction of the cost?

But these alternatives are obvious. Heck, even “the establishment” (i.e, Gartner) says it’s safe to tread in the open source water. So the question is, what’s holding you back?

  • Switching costs. It’s hard to move off Oracle or Documentum and you don’t want to pay the nut to do so.
  • Organizational inertia. Your whippersnapper DBAs who were in their 30s in the 1980s are now in their 50s. They’re thinking that change devalues their knowledge and experience; some just want to cruise into retirement. But that’s their personal agenda, not your enterprise one.
  • Accounting: you made it free for your divisions to keep using Documentum, Oracle, or BusinessObjects because you bought an enterprise license. While this appeared to “save” you money on a per-license basis, and it helped support your standardization initiative, it squashed innovation in your divisions, reinforced the organization inertia, and has a lot of people using the wrong tool for the job, resulting in projects that either take more or more expensive hardware than necessary (Oracle is good at this), that take too long to develop, or that simply fail.

So, what do I recommend doing about all this? I suggest that you adopt these policies, which –- for full disclosure, are at least partially in the self-interest of this blog’s author:

  • Stop writing big checks for commodity software. Every time a big check comes along, ask yourself: is this software differentiated or commoditized? Be willing to pay a premium for differentiated software, and price shop commodity software. Call a group of your smartest staff together periodically to help you make the commodity versus differentiated call.

  • When you see a big check coming for commodity software, make a migration plan. My hunch is that most of the time, you can create a nice 3-year ROI in the transition from premium to cheaper software. (This reminds me of the time I visited an investment bank’s CIO asking about their Documentum strategy. The answer: “our Documentum strategy is to get off Documentum,” because we’re paying too much and using too little.)

  • Stop doing enterprise agreements that create poor economic incentives within your organization. Don’t pay $XM at the enterprise level, spread that as a “tax” across your divisions, and then make use of certain software “free.” It distorts project reality, creates false incentives, squashes innovation, and generates lots of hidden costs. If you want to negotiate a master agreement and discount rate, that’s fine. Shoot for centralized discounts without central planning.
  • Don’t worry that the prior policies will create mayhem. While I understand that you don’t want arbitrary taste differences increasing the entropy of your enterprise software portfolio, recognize that with the first policy you’ve solved that problem already. If you deem a category (e.g., core RDBMS, enterprise search) commoditized, then you are going to force people to pick on cost. You’ll get standardization on the commodity categories –- just on the least expensive alternatives. The only entropy you’ll need to manage will be on the differentiated software which, having dispatched the commodity majority, you’ll have time to explore, study, and exploit.

Why I am taking the time to write this note to you? Back in the 1980s I was a foot soldier in the relational database revolution, and today I’m the CEO of one specialized DBMS company and on the board of another.

  • Mark Logic makes an XML server which can save great amounts of time and money in creating applications against unstructured information, replacing the combination of an RDBMS, an enterprise search engine, and an application server. Not only can Mark Logic manage 100s of TB of XML, the system eliminates the object / relational/ hierarchical impedance mismatch between Java, SQL, and XML that hampers developer productivity. Mark Logic was recently named the fourth fastest-growing IT company in Silicon Valley.
  • Aster Data makes a specialized data warehouse DBMS that runs on low-cost commodity hardware with a shared nothing architecture and leverages in-database MapReduce technology for parallelism and high scalability.

And during the past 25 years or so I’ve watched the market evolve. While I fully understand the policies and market forces that have led
us to where we are, I feel like we’ve come full circle. Vendor power is now concentrated in the big three. Vendor margins top 50%. Big vendors don’t innovate; they consolidate. Inertia has set in customer organizations. And there’s a major platform shift in progress; last time it was mainframe to minicomputer, this time it’s cloud.

Things feel a lot to me the way they did in 1985, just past dawn of the relational revolution. So in one way I’m writing to point out the oft-overlooked obvious: stop paying premium prices for commodity items. And in another way I’m saying, take the money you save in so doing and invest it in innovation technologies that:

  • Drive competitive advantage (which will matter again as we come out of the Great Recession)
  • Enable the Internet-scale applications you’ll need to face the coming information deluge
  • Reform the application development stack in ways that make sense for the coming generation of information applications, not that made sense for the last generation of data-centric ones.

Thank you for reading my note. If you have any questions or comments, please give me a ping at dave-dot-kellogg-at-marklogic-com or comment on this post.

Sincerely,

Dave Kellogg