Databases - Kellblog

Kellblog Predictions for 2022

Dave Kellogg — Mon, 27 Dec 2021 17:33:57 GMT

Well it's time for my annual predictions post, a series now in its eighth year. Before diving in, let me remind readers that I do these predictions in the spirit of fun, they are not business or investment advice, and that all of my usual disclaimers and terms apply. I'm starting to believe that the value of this series is more about the chosen topics than the predictions themselves because my formula for creating these posts is to select interesting topics that I want to ponder, research them, and figure out a prediction for each topic along the way.

Let's start with a review of my 2021 predictions, keeping in mind one of my favorite quotes, often misattributed (including by me) to Yogi Berra: "predictions are hard, especially about the future."

Kellblog 2021 Predictions Review
On my own admittedly subjective and charitable self-scoring system, 2021 was a pretty good year for Kellblog predictions.

1. Divisiveness decreases but unity remains elusive. Hit. This is totally subjective, but I'd say that divisiveness in the USA has decreased a bit and that unity has most certainly remained elusive.

2. COVID-19 goes to brushfire mode. Hit, until recently. Well, it certainly felt like brushfire mode until December. As I write, it's still early in the omicron wave, so I'm going to remain optimistic that current predictions of omicron being more transmissible but less lethal will hold true.

3. The new normal isn't. Hit. I don't think many people believe that we're returning to pre-Covid norms when, and indeed if, we enter a post-Covid world.

4. We start to value resilience, not just efficiency. Hit. I don't frequently write about supply chain, but I made this prediction because for years I have wondered if, in our quest to wrest inefficiency from the supply chain, we were undervaluing resilience to Black Swan events from wars to infrastructure failures to natural disasters [1]. One person's inefficiency is another person's insurance.

5. Work from home sticks. Hit. At this point perhaps for the wrong reasons (i.e., omicron), but where and how we work has already changed and many of those changes will become permanent. McKinsey is producing some strong content on the future of work as is my friend Dan Turchin on his AI and the Future of Work podcast.

6. Tech flight happens, but with a positive effect. Hit. A lot of Californians have moved to Texas, Arizona, and Nevada -- but a lot have also moved to California (i.e., from the Bay Area to cheaper parts of the state). Florida, despite the hype, nudges out Oregon for fifth place. My point was that this is normal and healthy: you can long Miami and Austin without shorting Palo Alto which, by the way, would have been a bad idea in 2020.

7. Tech bubble relents. Miss, until recently. My world is probably best approximated by the WCLD ETF, which opened the year at $53, recently hit as high as $65, and (as I write) is at $51. Taking the longer view, WCLD has nevertheless more than doubled over the past 5 years, so a lot of this depends on what you mean by "bubble" and "relent."

Towards that end, revenue multiple is a better bubble indicator than share price, so let's take a look at the latest from Jamin Ball at Clouded Judgement.

Are multiples down? Yes, from a median high of nearly 20x to 12x, nearly 40%. So I'd say yes on "relent." On "bubble," well, we're still at 12x compared to what I'd say is a normal (eyeballed) range of 6-10x -- so we're still running hot by historical standards. [2]

8. Net dollar retention becomes the top SaaS metric. Hit, depending on what you mean by "top" [3], but my real point was the NDR would replace churn rates as a method for valuing the installed base and I think it has. See my SaaStr 2020 talk or my GainSight Pulse 2021 talks for more.

9. Data intelligence happens. Partial hit. I'd say it's "happening" much more than "happened" because we're still early days in a multi-year category transformation. My friends at Alation continue to crush it driving their vision of data intelligence extending from the data catalog [4].

10. Rebirth of EPM. Hit. While the second-generation EPM vendors [5] continue to prosper (i.e., Adaptive within Workday, Anaplan and Planful as independent companies) the industry is nevertheless being reborn underneath with new firms such Cube, Mosaic, OnPlan, and Pigment blazing the trail [6]. It's exciting to watch.

Kellblog Predictions for 2022
Well, here we go with our predictions for 2022.

1. Covid goes from pandemic to endemic. I'm not sure we ever had a realistic chance to keep the genie in the bottle, as they did in New Zealand, but at least our actions bought us time to create and deploy vaccines. By the way, if you look at this chart, you might argue that New Zealand, in the end, failed to keep the genie in the bottle. [7]

See the big bump? Yes, it does seem that trying to bottle up Covid was destined to failure. Or was it? Look at the scale. Then compare New Zealand to Louisiana, which has a similar population.

The New Zealand peak is 200, the Louisiana peak is 30x higher at 6,000. If nothing else, and since this is something of BI-focused blog, Covid has taught us a lot about How Charts Lie.

But back to our prediction. I think 2022 will be the year we stop thinking in pre-Covid and post-Covid terms, and accept that Covid-19 will become endemic. Much as malaria brought us screened windows and cholera brought us clean water supplies, Covid will be with us for a long time and bring with it lasting (and hopefully in some cases, positive) changes to our day-to-day lives.

2. Web3 hype peaks. Is web3 going to change everything because, as Chris Dixon argues, the best entrepreneurs and developers have learned not to build atop centralized platforms? Or, as Stephen Diehl so indelicately puts it, is web3 bullshit whereby, "the only problem to be solved by web3 is how to post-hoc rationalize its own existence?" Or are Moxie Marlinspike's first impressions right -- e.g., the missing element in "crypto" is cryptography and that decentralizing the internals of underlying layers won't prevent centralization at the more nimbly evolving layers above?

Is web3 a ploy to put crypto bros in charge where "the promise of decentralization is just a veneer -- and blockchain is, in fact, the worst kind of vendor lock-in?" Or, did the venerable Grady Booch get web3 right in his retweet below?

Maybe Tim O'Reilly, the person who coined the phrase web 2.0, has the best take [8], arguing simply that it's too early to get excited about web3.

It sure does feel like 2005. There are a bunch of new ideas in circulation. Everyone is talking about them. People are struggling to understand them and building frameworks to organize and explain them. And sometimes it's hard to tell what's foundational to the new concept and what's trying to hitchhike a ride on the back of it. Based on this, I think we're building towards a web3 hype peak that should happen in 2022 [9].

I've always believed that blockchain was invented to support a specific use-case (i.e., bitcoin) and, unsurprisingly, is good for that use-case but has otherwise largely been a technology in search of a business problem -- particularly in the enterprise. Imagine if you went to SIGMOD twenty years ago and predicted the database of the future would be:

A "ledger," not a database
A linked list
Append-only
Not ACID (nor BASE), but SALT [10]
Immutable at the block level, thanks to hashing and proof-of-work
Require crazy amounts of wasted compute because of consensus algorithms

You'd have been laughed out of the room. Despite that, the reality is that database (i.e., blockchain technology) is quite useful for cryptocurrency applications. The addition of smart contracts were a very a powerful extension that came with Ethereum. Changing from proof-of-work to proof-of-stake may eliminate the crazy wasted compute and associated energy consumption [11].

But, as I'd say with any special-purpose database -- from an OLAP server to an XML database to the Hadoop ecosystem: it's great at what it's built for, but why should you use it for something else? The default answer is you shouldn't [12].

When it comes to the decentralization argument, enterprises are inherently centralized in power and rely on centralized systems run by a centralized IT department. Moving enterprises to decentralized internal systems does nothing to change lock-in factors of their products (e.g., network effects that lock you into Facebook). Nor necessarily does empowering distributed networks with decentralized technologies -- see the above-linked proof-of-stake recentralization arguments. And if blockchain means automatic freedom from intermediaries, why is Coinbase worth $50B again?

I think DAOs are an interesting concept (great primer here), but the blockchain linkage seems contrived [13] -- I could make a Dunbar-number-sized group with organic governance rules and run it via in-person meetings, Zoom, Slack, or of course, Discord. (Arguably, Richard Branson did, many times.)

I don't know why anyone would pay $10M for a CryptoPunk or $300K for a Bored Ape, but I do understand collectibles: an ape costs $300K in part for the same reason that a 1943 bronze Lincoln cent costs $1M -- scarcity. I just thought we were going to use the Internet to eliminate scarcity, not artificially create it.

Finally, I think the self-referentiality of this ecosystem is interesting. If you want to buy a non-fungible token (NFT) of a Bored Ape, you're going to need to pay in Ether because that's the currency the price is listed in. Which in turn increases demand for Ether. Note interestingly that while you can use Ether to buy an NFT, you can't use an NFT to buy Ether because NFTs are not fungible, as Alexis Gay says, "in the sense that you couldn't funge them."

when you definitely understand NFTs pic.twitter.com/39I5EZ6Kde
— Alexis Gay (@yayalexisgay) November 2, 2021

3. Disruptors get disrupted. When I graduated from college, Oracle (founded 1977) was a ~$30M brash upstart challenging the entrenched leader, IBM, who no one ever got fired for selecting. I watched Oracle aggressively grow to $1B in revenues, flail several times trying to organically expand into applications, give up on building applications and instead acquire them, inexplicably get into hardware with the acquisition of Sun, and eventually plateau at $40B, effectively having become IBM in the process. As the saying goes, we become our parents.

Salesforce (founded 22 years later) is well into that cycle, going from brash disruptor to organic grower to M&A-driven grower, though they do a better job of preserving the entrepreneurial spirit if not growth (both were growing at ~25% at the $20B mark).

This is an ongoing pattern driven by Clayton Christensen's cycles of disruptive innovation. If you watch this cycle long enough, you can see the disruptors get disrupted -- e.g. Siebel was disrupted by Salesforce who was disrupted by Zendesk who is being disrupted by Freshworks. What drives these disruptive cycles:

Feature creep, which leads to market overshoot over time.
Management changes, as leadership teams drift from a spirit of value creation for customers to value extraction from them.
Specialization, as market leaders build breadth with integration of good-enough products, an opportunity is created for great, point solutions (which often later expand to challenge the core product).
Technology platform changes, which antiquate previous architectures, allow new solutions to be built more quickly, and enable entirely new classes of applications.

For several reasons, I believe in 2022 we are going to see many disruptors get disrupted. Why?

Change to cloud-native. First-generation cloud solved a deployment problem; second-generation solves a development problem as well. When I build new apps, I can rely not just on my previously developed or open source modules, but on live, running services. Upstarts can stand yet again on the shoulders of giants.
Flood of venture capital (VC). VC is flowing at unprecedented rates driving record funding amounts at both the early company-creation stage (e.g., seed, angel) and the later growth stage as well.
High-growth. The combination of Covid accelerating digital transformation and unprecedented VC financing has accelerated software company growth (aka, the Covid boost). At the second order, I can't help but wonder if accelerating the growth cycle hastens the aforementioned process that creates new disruption opportunities. Software companies become their parents faster.
Product-led growth (PLG). SaaS provided provided both a market disruption opportunity and a total available market (TAM) expansion in each market segment. While I'll cover PLG more below, I think it will have a similar effect, providing both a disruption opportunity in existing segments while simultaneously expanding their potential.

4. Venture capital continues to flow. 2018 was the first year since "the OB" (the original bubble) that we again reached 2000-era levels of VC financing. 2019 dipped a bit, but 2020 came back strong, and 2021 looks to be a blockbuster [14].

PitchBook data reveals that while total funding and mega-funding (where the round raises $100M+) are up, deal is count slightly down, meaning average deal sizes are up and consistent with my view that VC today is have or have-not market. The haves can raise can raise a ton of money and on good terms. But the have-nots -- those who have yet to demonstrate a strong team, product-market fit, or a scalable growth model -- cannot, and face a frustrating form of hunger in the land of plenty.

They keys to success in this environment are two:

Raise when the raising's good. If you can raise money, you (likely) should. If you can't, figure out why -- dig beyond superficial, "nice" explanations into real reasons, and then go fix them. Fast.
But trigger spending on business signals. You undoubtedly raised your most recent financing on the back of an aggressive operating plan. But don't, don't, don't -- for example -- hire 10 sellers because they're in the plan: hire them because the CRO made the last 10 productive and wants to hire 10 more.

One of these years -- maybe 2022, maybe thereafter -- VC will be in tighter supply. So raise money in large quantity when you can. Fear not dilution -- you'll likely be raising at (what are, by historical standards) stratospheric valuations. Most of all, while you shouldn't follow my miserly great-aunt Jo's expense strategy (whose dying words were "don't spend"), you should spend if, only if, and when it makes good business sense to do so.

5. The metaverse remains meta. If you've not taken the 10 minutes yet, you should probably look at this Facebook/Meta, rebranding launch video, a well-produced but at times amazingly awkward metaverse concept video.

The metaverse vision has provoked a range of reactions from dystopian nightmare to dead-on-arrival to heated discussions of "reality privilege" and accusations about the new billionaire utopian boondoggle.

It's also invited a fair bit of parody, my favorite being the Icelandic tourism board's, Icelandverse.

Back to the metaverse, I find the vision more Oasis-style (Ready Player One) dystopia than utopia. While I find the idea of reality privilege interesting intellectual banter, no, I don't think the best solution to humankind's problems is to hook everyone into an alternative, virtual reality. Good sci fi? Yes. Good reality? No. Not in the least.

Are virtual worlds fun for immersive gaming? Yes.
Do you need virtual (or crypto) currencies in those worlds? No, they're just an add-on money-making opportunity like a Starbucks card [15]. You can buy an upgraded weapon in a game today via a regular credit card [16].
Do you need virtual museums in which to hang your NFTs? They're cool and I guess collectors do like to show off their collectibles, so maybe [17]. That said, CryptoPunks weigh in at a slim 576 pixels so I don't think you'll need fancy display capabilities for some NFTs at least.
Do you need virtual real-estate within your virtual world? Second Life had a full economy with Linden dollars and real-estate, so the idea's not new, but metaverse real-estate is setting records today. If the key to real estate is location, location, location, that's not really a constraint in the virtual world. That said, a key theme of web3 seems to be manufactured scarcity (which generative NFT collections do well) and which ultimately comes down to a simple matter of trust [18].
Can augmented reality help business applications, like customer service? Yes, I think AR has numerous practical enterprise use-cases and, if nothing else, all the VR technology will benefit more pragmatic use-cases in enterprise.

6. PLG momentum builds. While I generally have a negative reaction to hype, and I don't like the either/or nature of the slogan below, I do think PLG is a good idea.

Let me separate PLG into what I see as two pieces:

PLG as business strategy, where the business is built around a model in which marketing and community relations drive end-users to try a product, hopefully like it, buy the ability to use it (or use it more fully), tell their colleagues (directly or virally, e.g., through a Calendly invite), and repeat the cycle. While Slack, Zoom, and Dropbox are frequently-cited examples, a full list might include over 300 companies. (You can read a great anatomy of them, here.)
PLG as as set of product requirements. I think PLG brings three core, generic product requirements, none of which have frankly been common to previous generations of enterprise software: build a product that (a) is quick to deliver end-user value, (b) is easy and even fun for the end-user to use, and (c) is built with the company's revenue growth strategy in mind, e.g., in-built virality and carefully-selected functional and enterprise-level pay gates.

Many of the concepts behind PLG aren't new. Open source has always been about building a community of users who love the product, though historically composed of developers and not end-users. Market-seeding isn't new, though prior-generation seeders like Crystal Reports did so not through marketing- and community-driven downloads and trials, but channels of distribution [19]. Consumerization of enterprise software isn't a new idea , but I'd argue that it's only become real with the advent of PLG. Velocity sales models aren't new either, but they're also a key part of PLG.

Some PLG ideas are new:

User-experience (UX) as job #1. Only when UX became critical to business/sales strategy did it get serious commitment instead of lip service (in the enterprise at least).
Growth teams, subordinating functional silos to united teams of marketers, engineers, analysts, and designers working together to drive growth.
Digital experience tools, that go beyond useability testing labs to track what users actually do in the software with an eye towards making it better -- such as Pendo, Heap, and Amplitude.

While I think it's serious overstatement to say, "sales- and marketing-led growth is dead; long live product-led growth," I think it's equally dangerous to dismiss PLG along with quarter-zip sweaters as the latest VC fad. PLG brings many good ideas that companies should consider and map to their own business models. Despite the risk of PLG noise drowning out PLG signal, I believe companies will increasingly and intelligently apply PLG principles in 2022 -- and if you're not thinking about how to do that, you should be.

7. Year of the privacy vault. While I'm not an expert in this field, I am learning more, and I see a lot of exciting things happening in information security:

Innovations in digital identity from companies like Ory and Presidio Identity [20].
Innovations in cloud security and governance from companies like Cyral and Privacera [20].
Innovations in enterprise privacy from DataGrail [20].

The emerging and ever-changing nature of information security is a big part of what interests me, because it means that a lot of smart people with interesting ideas are attacking numerous problems from different angles. While this leaves me in a near-perpetual state of confusion, I'll repeat what I've often said about the metadata space: anyone who isn't confused doesn't really understand the situation (Edward R. Murrow). In metadata, I feel like I finally do understand the space. In information security, well, I'm still working at it.

In the past ~25 years, there's a particular feeling I've had only on rare occasion:

When Bernard Liautaud explained the semantic layer during my interviews at Business Objects.
When Satyen Sangani explained the machine-learning data catalog as I was contemplating an angel investment in Alation [21].
When Anshu Sharma explained the privacy vault to me while we were having a drink talking about his latest company.

I've met a lot of great entrepreneurs and worked with a lot of great companies during those years, but only those three times did I have the immediate reaction:

This is obvious. (Well, post facto obvious, once you understood it.)
This is huge; everyone needs this.
I need to be a part of this.

In Anshu's case it admittedly took more than one drink for me to understand the idea, but what I liked about it, what made it seem so post facto obvious was this:

Enterprises, where possible, should get out of the business of handling sensitive information. I know it's not always possible, but if the data is non-core to operations, why not delegate storing it to someone else? While hospitals need to store medical images, does TurboTax really need to store your social security number to file your taxes once per year? It's hard. Let someone else do it.
You can replace sensitive data with tokens. You don't need to store someone's credit score when you can store a token that maps to it and isolate the score to a separate database. It's classic indirection. But it usually means you can't then do anything with the data -- unless you incorporate the ideas in the next two bullets.
You actually need an API more often than you need access. Most of the time you don't need direct access to sensitive data, you just need to do something with it. You don't need to know someone's credit score; you need to know if you can make them a loan and at what interest rate. That is, you can pass a token for credit score to a service that returns approval status and approved rate in a loan approval application.
You can encrypt data without losing the ability to work with it. Polymorphic encryption lets you verify the last four digits or a social security number or return all phone numbers in the same area code without first decrypting the data. This means you can get utility from encrypted data. Not being a security person, this idea was entirely new and fairly mind-blowing to me [22].
Vaults are an existing design pattern. Google, Apple, and Netflix have taken a low-trust, tokenized vault approach to handling sensitive information in their internal systems.

We will see if my spider sense was correct a third time. While my sense is most developed in data and analytics, I love modularization, normalization, and specialization and this play is about all three. To hear the Skyflow story directly from Anshu himself, watch the video here.

8. MSDS is the new MBA. For decades, and often contrary to prevailing fashion, I've counseled people to consider getting an MBA during their career journey for any of the following reasons:

The knowledge. MBA coursework is generally useful in business, regardless of the caliber of school you attend.
The network. At a top school, you will likely become part of a great network that will benefit you throughout your career.
The career-change opportunity. The MBA offers a unique chance to switch roles or industries (e.g., from engineering into product, from consumer to enterprise).

Given the time and cost of MBAs, it's popular these days to say that MBAs aren't worth the trouble. Autocomplete confirms these doubts.

While I frequently still recommend MBAs to those who seek my advice, I find myself increasingly asking them: have you considered a master of science in data science? Such programs can be done in as little as half the time and at half the cost of an MBA, have numerous online and hybrid options, are offered by many prestigious schools, provide superior analytical training, and offer similar career change opportunity.

While a top-tier MBA will still be de rigeur in investment banking, VC, and management consulting for the foreseeable future, I do believe that mid-career professionals will increasingly evaluate the MBA and the MSDS as alternative means to advance their careers -- and that many will take the MSDS route.

9. Get ready for social impact. Millennials, and for that matter, many of the rest of us, increasingly demand purpose in our work. If we're going to spend 40, 50, or more hours per week working, then we'd like the company to provide both a paycheck and a sense of purpose. In the workplace, according to a recent Gallup report, millennials want leadership to change its approach:

The sense of purpose, however, goes well beyond the workplace and includes the desire to address societal concerns related to sustainability, capitalism, human rights, and social justice. While Boomers and Xers were content to Party Like It's 1999, the next generation wants to focus on the future and solving the world's largest problems. Good.

This drives for whom and how they want to work, the products they buy, the brands they value, the vacations they take, the causes they support, the hobbies they pursue, the lifestyles they lead, and the money they invest. In short, everything.

This era has brought us everything from local organic produce and forks over knives to the 1% Pledge, the B Corp, DEI, impact investing, ESG funds, stakeholder capitalism, carbon offsets, and data rights as human rights.

I think Europe is leading the US on many of these changes so, as per the famous William Gibson quote, I get a glimpse into the future through my work with Balderton Capital which has not only committed itself to a set of sustainable future goals (SFGs), but also recently announced their first annual progress report on them.

ESG momentum will build in 2022.

10. The rise of causal inference. For the past decade I've told people that data science was the new plastics -- in the sense of the famous quote from The Graduate.

While I think that was spot-on, this year I have a new "one word" -- causal inference. Why?

Most of the data science we do today is some sort of classification and regression. We can group like entities, we can predict into which group a new one will fall. We can build a mathematical model of an independent variable and make predictions about it based on dependent variables. It's cool stuff, but in the end, this is about correlation. How things move together.
Yet, we all know that correlation does not imply causation. We know that windmill rotation doesn't make the wind blow [24]. We know that waking up dressed doesn't cause headaches and that ice cream sales don't cause drownings [25]. Yet, most businesspeople today forget that when they're interpreting data. We say that correlation does not imply causation and then we say stuff like, "all of the customers who churned last quarter filed more than five severity-one cases in the past year!" [26]
The first-generation of data science has given us lots of data and some great modeling tools to interpret data. The bad news is that we -- not data scientists, but regular analysts and business people -- are not very good at interpreting it.
Where possible, we need to figure not just where variables correlate but what actually causes what. To do so normally requires an experiment (i.e., a RCT) but sometimes causal questions can be correctly answered using observational data. The insight about how to do that, by the way, is not trivial -- it won the 2021 Nobel Prize in Economics.
The big guys are doing it. A decade ago the hyperscalers had data science teams and typical companies, even large ones, didn't. Today, the hyperscalers have causal inference teams and typical companies don't. To the extent you believe the big guys are leading indicators of the mainstream, you should believe that determining not just correlation, but causation, is coming soon to a business meeting near you [27]. You can get ready the easy way or the hard way.

If you made it this far, thank you! Read the links -- there's gold in those hills. Remember that I write this post in the spirit of fun and to force myself to research interesting topics. Have a happy, healthy, and Rule of 40 positive 2022.

Peace out / Dave.

# # #

Notes

[1] I did study seismology (i.e., geophysics) after all. Earthquakes happen.

[2] As mentioned in last year's post there are plenty of possible reasons for this including the possibility that the companies are higher quality and/or growing faster -- see last year's post.

[3] Some might argue growth is top -- particularly if you define top as most correlated to revenue multiple. Based on data as of this writing, the R^2 between EV/NTM-revenue multiple and NTM-revenue-growth is 0.52 vs. 0.24 for NDR. Play around here for more.

[4] Reminder that I am an angel investor in and sit on the board of Alation.

[5] Who are the first-generation cloud EPM vendors

[6] I am an investor in Planful and Cube, an advisor to OnPlan, and occasionally chat with Mosaic and Pigment, among others. Hey, I like EPM.

[7] Louisiana actually has about a 10% smaller population (4.6M) than New Zealand (5.1M)

[8] Tim's What is Web 2.0 post is well worth reading both for the history lesson and, more subtly, to beam you back to a time where something was emerging and what it looked like for people to try and understand and describe it.

[9] Gartner has a blockchain hype cycle (that lists numerous web3 technologies) but not a web3 hype cycle. Currently, NFTs are at the approximate peak of that cycle.

[10] Technically, BASE as a database concept didn't exist at the time.

[11] Though not without its own problems.

[12] A repeated pattern in database history -- everyone wants to rule the world because it's a big world to rule. Most of the time, however -- and relational databases are a notable exception -- the new database is not a great general-purpose alternative. The reductio argument here is there should be no general-purpose databases as every purpose is a special one.

[13] See prior comment about hitchhiking.

[14] Sources include Statista, PitchBook, CBInsights, and (in one case) my estimates.

[15] In addition to providing Starbucks with consumer data, they have $1.6B in prepaid value today. Remember a big part of how Warren Buffet got to be Warren Buffet: float.

[16] Yes, I understand that games can force you into their currency by providing rewards in game-units and that you can create a one-way transformation between cash and game-units (i.e., you can buy units with cash, but not cash with units).

[17] Museums provide access as their core function but also offer security, preservation, and education (e.g., docents) surrounding their works.

[18] Trust that the promoters will keep their promise about number and trait distribution of works and avoid the tendency to excessively extract value by minting more and/or derivative works (e.g., mutant apes) that potentially undermine the original collection and devalue traits. Creating scarcity is easy. Preserving it might well be hard.

[19] In many cases because, well, the Internet didn't exist yet. Microsoft helped to put Crystal Reports on the map by distributing it with Visual Studio.

[20] Disclaimers: I'm an advisor to Presidio Identity. Ory is a Balderton portfolio company. I'm an advisor to and investor in Cyral. I have done some consulting with Privacera. I am an investor in DataGrail.

[21] Which quite happily I made.

[22] Read up on fully homomorphic encryption which enables you to perform calculations on data without first decrypting it. While fully homomorphic encryption is prohibitively computationally expensive, another key Skyflow insight was that many "numbers" aren't fully treated as numbers in practice -- e.g., you might verify the last 4 digits of an SSN but you're never going to multiply two of them.

[23] The SFGs linked come from Balderton Capital where I work part-time as an EIR.

[24] Reverse causation.

[25] The third-cause fallacy. Going to bed drunk increases both waking up dressed and having a headache. Warm weather increases both swimming rates (which increase drownings) and ice cream sales.

[26] They also all had, e.g., brown-eyed CIOs, more than $500M in revenues, and parking lots with more than 200 spaces.

[27] Irony alert, I'm making a correlation-based argument here!

Kellblog's 10 Predictions for 2020

Dave Kellogg — Sun, 05 Jan 2020 18:07:24 GMT

As I’ve been doing every year since 2014, I thought I’d take some time to write some predictions for 2020, but not without first doing a review of my predictions for 2019. Lest you take any of these too seriously, I suggest you look at my batting average and disclaimers.

Kellblog 2019 Predictions Review

1. Fred Wilson is right, Trump will not be president at the end of 2019. PARTIAL. He did get impeached after all, but that’s a long way from removed or resigned.

2. The Democratic Party will continue to bungle the playing of its relatively simple hand. HIT. This is obviously subjective and while I think they got some things right (e.g., delaying impeachment), they got others quite wrong (e.g., Mueller Report messaging), and continue to play more left than center which I believe is a mistake.

3. 2019 will be a rough year for the financial markets. MISS. The Dow was up 22% and the NASDAQ was up 35%. Financially, maybe the only thing that didn’t work in 2019 were over-hyped IPOs. Note to self: avoid quantitative predictions if you don’t want to risk ending up very wrong. I am a big believer in regression to the mean, but nailing timing is the critical (and virtually impossible) part. Nevertheless, I do use tables like these to try and eyeball situations where it seems a correction is needed. Take your own crack at it.

4. VC tightens. MISS. Instead of tightening, VC financing hit a new record. The interesting question here is whether mean reversion is relevant. I’d argue it’s not – the markets have changed structurally such that companies are staying private far longer and thus living off venture capital (and/or growth-stage private equity) in ways not previously seen. Mark Suster did a great presentation on this, Is VC Still a Thing, where he explains these and other changes in VC. A must read.

5. Social media companies get regulated. PARTIAL. While “history may tell us the social media regulation is inevitable,” it didn’t happen in 2019. However, the movement continued to gather steam with many Democratic presidential candidates calling for reform and, more notably, none other than Facebook investor Roger McNamee launching his attack on social media via his book Zucked: Waking Up To The Facebook Catastrophe. As McNamee says, “it’s an issue of ‘right vs. wrong,’ not ‘right vs. left.’”

6. Ethics make a comeback. HIT. Ethics have certainly been more discussed than ever and related to the two reasons I cited: the current administration and artificial intelligence. The former forces ethics into the spotlight on a daily basis; the later provokes a slew of interesting questions, from questions of accidental bias to the trolley car problem. Business schools continue to increase emphasis on ethics. Mark Benioff has led a personal crusade calling for what he calls a new capitalism.

7. Blockchain, as an enterprise technology, fades away. HIT. While I hate to my find myself on the other side of Ray Wang, I’m personally not seeing much traction for blockchain in the enterprise. Maybe I’m running with the wrong crowd. I have always felt that blockchain was designed for one purpose (to support cybercurrency), hijacked to another, and ergo became a vendor-led technology in search of a business problem. McKinsey has a written a sort of pre-obituary, Blockchain’s Occam Problem, which was McKinsey Quarterly’s second most-read article of the year. The 2019 Blockchain Opportunity Summit’s theme was “Is Blockchain Dead? No. Industry Experts Join Together to Share How We Might Not be Using it Right” which also seems to support my argument.

8. Oracle enters decline phase and is increasingly seen as a legacy vendor. HIT. Again, this is highly subjective and some people probably concluded it years ago. My favorite support point comes from a recent financial analyst note: “we believe Oracle can sustain ~2% constant currency revenue growth, but we are dubious that Oracle can improve revenue growth rates.” That pretty much says it all.

9. ServiceNow and/or Splunk get acquired. MISS. While they’re both great businesses and attractive targets, they are both so expensive only a few could make the move – and no one did. Today, Splunk is worth $24B and ServiceNow a whopping $55B.

10. Workday succeeds with its Adaptive Insights agenda. HIT. Changing general ledgers is a heart transplant while changing planning systems is a knee replacement. By acquiring Adaptive, Workday gave itself another option – and a far easier entry point – to get into corporate finance departments. While most everyone I knew scratched their head at the enterprise-focused Workday acquiring a more SMB-focused Adaptive, Workday has done a good job simultaneously leaving Adaptive alone-enough to not disturb its core business while working to get the technology more enterprise-ready for its customers. Whether that continues I don’t know, but for the first 18 months at least, they haven’t blown it. This remains high visibility to Workday as evidenced by the Adaptive former CEO (and now Workday EVP of Planning) Tom Bogan’s continued attendance on Workday’s quarterly earnings calls.

With the dubious distinction of having charitably self-scored a 6.0 on my 2019 predictions, let’s fearlessly roll out some new predictions for 2020.

Kellblog 2020 Predictions

1. Ongoing social unrest. The increasingly likely trial in the Senate will be highly contentious, only to be followed by an election that will be highly contentious as well. Beyond that, one can’t help but wonder if a defeated Trump would even concede, which could lead to a Constitutional Crisis of the next level. Add to all that the possibility of a war with Iran. Frankly, I am amazed that the Washington, DC continuous distraction machine hasn’t yet materially damaged the economy. Like many in Silicon Valley, I’d like Washington to quietly go do its job and let the rest of us get back to doing ours. The reality TV show in Washington is getting old and, happily, I think many folks are starting to lose interest and want to change the channel.

2. A desire for re-unification. I remain fundamentally optimistic that your average American – Republican, Democrat, or the completely under-discussed 38% who are Independents -- wants to feel part of a unified, not a divided, America. While politicians often try to leverage the most divisive issues to turn people into single-issue voters, the reality is that far more things unite us as Americans than divide us. Per this recent Economist/YouGov wide-ranging poll, your average American looks a lot more balanced and reasonable than our political party leaders. I believe the country is tired of division, wants unification, and will therefore elect someone who will be seen as able to bring people together. We are stronger together.

3. Climate change becomes the new moonshot. NASA’s space missions didn’t just get us to the moon; they produced over 2,000 spin-off technologies that improve our lives every day – from emergency “space” blankets to scratch-resistant lenses to Teflon-coated fabrics. Instead of seeing climate change as a hopeless threat, I believe in 2020 we will start to reframe it as the great opportunity it presents. When we mobilize our best and brightest against a problem, we will not only solve it, but we will create scores to hundreds of spin-off technologies that will benefit our everyday lives in the process. See this article for information on 10 startups fighting climate change, this infographic for an overview of the kinds of technologies that could alleviate it, or this article for a less sanguine view on the commitment required and extent to which we actually can de-carbonize the air. Or check out this startup which makes "trees" that consume the pollution of 275 regular trees.

4. The strategic chief data officer (CDO). I’m not a huge believer in throwing an “O” at every problem that comes along, but the CDO role is steadily becoming mainstream – in 2012 just 12% of F1000 companies reported having a CDO; in 2018 that’s up to 68%. While some of that growth was driven by defensive motivations (e.g., compliance), increasingly I believe that organizations will define the CDO more strategically, more broadly, and holistically as someone who focuses on data, its cleanliness, where to find it, where it came from, its compliance with regulations as to its usage, its value, and how to leverage it for operational and strategic advantage. These issues are thorny, technical, and often detail-oriented and the CIO is simply too busy with broader concerns (e.g., digital transformation, security, disruption). Ergo, we need a new generation of chief data officers who want to play both offense and defense, focused not just tactically on compliance and documentation, but strategically on analytics and the creation of business value for the enterprise. This is not a role for the meek; only half of CDOs succeed and their average tenure is 2.4 years. A recent Gartner CDO study suggests that those who are successful take a more strategic orientation, invest in a more hands-on model of supporting data and analytics, and measure the business value of their work.

5. The ongoing rise of DevOps. Just as agile broke down barriers between product management and development so has DevOps broken down walls between development and operations. The cloud has driven DevOps to become one of the hottest areas of software in recent years with big public company successes (e.g., Atlassian, Splunk), major M&A (e.g., Microsoft acquiring GitHub), and private high-flyers (e.g., HashiCorp, Puppet, CloudBees). A plethora of tools, from configuration management to testing to automation to integration to deployment to multi-cloud to performance monitoring are required to do DevOps well. All this should make for a $24B DevOps TAM by 2023 per a recent Cowen & Company report. Ironically though, each step forward in deployment is often a step backward in developer experience.

6. Database proliferation slows. While 2014 Turning Award winner Mike Stonebraker was right over a decade ago when he argued in favor of database specialization (One Size Fits All: An Idea Whose Time Has Come and Gone), I think we may now too much of a good thing. DB Engines now lists 350 different database systems of 14 different types (e.g., relational, graph, time series, key-value). Crunchbase lists 274 database (and database-related) startups. I believe the database market is headed for consolidation. One of the first big indicators of a resurgence in database sanity was the failure of the (Hadoop-based) data lake, which happened in 2018-2019 and was the closest thing I’ve seen to déjà vu in my professional career – it was as if we learned nothing from the Field of Dreams enterprise data warehouse of the 1990s (“build it and they will come”). Moreover, after a decade of developer-led database selection, developers and now re-realizing what database people knew along – that a lot of the early NoSQL movement was akin to throwing out the ACID transaction baby with the tabular schema bathwater.

7. A new, data-layer approach to data loss prevention (DLP). I always thought DLP was a great idea, especially the P for prevention. After all, who wants tools that can help with forensics after a breach if you could prevent one from happening at all -- or at least limit one in progress? But DLP doesn’t seem to work: why is it that data breaches always seem to be measured not in rows, but in millions of rows? For example, Equifax was 143M and Marriott was 500M. DLP has many known limitations. It’s perimeter-oriented in a hybrid cloud world of dissolving perimeters and it’s generally offline, scanning file systems and database logs to find "misplaced data." Wouldn’t a better approach be to have real-time security monitored and enforced at the data layer, just the same way as it works at the network and application layer? Then you could use machine learning to understand normal behavior, detect anomalous behavior, and either report it -- or stop it -- in real time. I think we’ll see such approaches come to market in 2020, especially as cloud services like Snowflake, RDS, and BigQuery become increasingly critical components of the data layer.

8. AI/ML continue to see success in highly focused applications. I remain skeptical of vendors with broad claims around “enterprise AI” and remain highly supportive of vendors applying AI/ML to specific problems (e.g., Moveworks and Astound who both provide AI/ML-based trouble-ticket resolution). In the end, AI and ML are features, not apps, and while both technologies can be used to build smart applications, they are not applications unto themselves. In terms of specificity, the No Free Lunch Theorem reminds us that any two optimization techniques perform equivalently when averaged across all possible problems – meaning that no one modeling technique can solve everything and thus that AI/ML is going to be about lots of companies applying different techniques to different problems. Think of AI/ML more as a toolbox than a platform. There will not be one big winner in enterprise AI as there was in enterprise applications or databases. Instead, there will be lots of winners each tackling specific problems. The more interesting battles will those between systems of intelligence (e.g., Moveworks) and systems of record (e.g., ServiceNow) with the systems-of-intelligence vendors running Trojan Horse strategies against systems-of-record vendors (first complementing but eventually replacing them) while the system-of-record vendors try to either build or acquire systems of intelligence alongside their current offerings.

9. Series A rounds remain hard. I think many founders are surprised by the difficulty of raising A rounds these days. Here’s the problem in a nutshell:

Seed capital is readily available via pre-seed and seed-stage investments from angel investors, traditional early-stage VCs, and increasingly, seed funds. Simply put, it’s not that hard to raise seed money.
Companies are staying in the seed stage longer (a median of 1.6 years), increasingly extending seed rounds, and ergo raising more money during seed stage (e.g., $2M to $4M).
Such that, companies are now expected to really have achieved something in order to raise a Series A. After all, if you have been working for 2 years and spent $3M you better have an MVP product, a handful of early customers, and some ARR to show for it – not just a slide deck talking about a great opportunity.

Moreover, you should be making progress roughly in line with what you said at the outset and, if you took seed capital from a traditional VC, then they better be prepared to lead your round otherwise you will face signaling risk that could imperil your Series A.

Simply put, Series A is the new chokepoint. Or, as Suster likes to say, the Series A and B funnel hasn’t really changed – we’ve just inserted a new seed funnel atop it that is 3 times larger than it used to be.

10. Autonomy’s former CEO gets extradited. Silicon Valley is generally not a place of long memories, but I saw the unusual news last month that the US government is trying to extradite Autonomy founder and former CEO Mike Lynch from the UK to face charges. You might recall that HP, in the brief era under Leo Apotheker, acquired enterprise search vendor Autonomy in August, 2011 for a whopping $11B only to write off about $8.8B under subsequent CEO Meg Whitman a little more than a year later in November, 2012. Computerworld provides a timeline of the saga here, including a subsequent PR war, US Department of Justice probe, UK Serious Fraud Office investigation (later dropped), shareholder lawsuits, proposed settlements, more lawsuits including Lynch’s suing HP for $150M for reputation damages, and HP’s spinning-off the Autonomy assets. Subsequent to Computerworld’s timeline, this past May Autonomy’s former CFO was sentenced to five years in prison. This past March, the US added criminal charges of securities fraud, wire fraud, and conspiracy against Lynch. Lynch continues to deny all wrongdoing, blames the failed acquisition on HP, and even maintains a website to present his point of view on the issues. I don’t have any special legal knowledge or specific knowledge of this case, but I do believe that if the US government is still fighting this case, still adding charges, and now seeking extradition, that they aren’t going to give up lightly, so my hunch is that Lynch does come to the US and face these charges.

More broadly, regardless of how this particular case works out, in a place so prone to excess, where so much money can be made so quickly, frauds will periodically happen and it's probably the most under-reported class of story in Silicon Valley. Even this potentially huge headline case – the proposed extradition of a British billionaire tech mogul -- never seems to make page one news. Hey, let’s talk about something positive like Loft’s $175M Series C instead.

To finish this up, I’ll add a bonus prediction: Dave doesn’t get a traditional job in 2020. While I continue to look at VC-backed startup and/or PE-backed CEO opportunities, I am quite enjoying my work doing a mix of boards, advisory relationships, and consulting gigs. While I remain interested in looking at great CEO opportunities, I am also interested in adding a few more boards to my roster, working on stimulating consulting projects, and a few more advisory relationships as well.

I wish everyone a happy, healthy, and above-plan 2020.

Joining the Profisee Board of Directors

Dave Kellogg — Tue, 27 Aug 2019 18:30:16 GMT

We’re announcing today that I’m joining the board of directors of Profisee, a leader in master data management (MDM). I’m doing so for several reasons, mostly reflecting my belief that successful technology companies are about three things: the people, the space, and the product.

I like the people at both an investor and management level. I’m old friends with a partner at ParkerGale, the private equity (PE) firm backing Profisee, and I quite like the people at ParkerGale, the culture they’ve created, their approach to working with companies, and of course the lead partner on Profisee, Kristina Heinze.

The management team, led by veteran CEO and SAP alumnus Len Finkle, is stocked with domain experts from larger companies including SAP, Oracle, Hyperion, and Informatica. What’s more, Gartner VP and analyst Bill O’Kane recently joined the company. Bill covered the space at Gartner for over 8 years and has personally led MDM initiatives at companies including MetLife, CA Technologies, Merrill Lynch, and Morgan Stanley. It’s hard to read Bill’s decision to join the team as anything but a big endorsement of the company, its leadership, and its strategy.

These people are the experts. And instead of working at a company where MDM is an element of an element of a suite that no one really cares about anymore, they are working at a focused market leader that worries about MDM -- and only MDM – all day, every day. Such focus is powerful.

I like the MDM space for several reasons:

It’s a little obscure. Many people can’t remember if MDM stands for metadata management or master data management (it’s the latter). It’s under-penetrated; relatively few companies who can benefit from MDM use it. Historically the market has been driven by “reluctant spend” to comply with regulatory requirements. Megavendors don’t seem to care much about MDM anymore, with IBM losing market share and Oracle effectively exiting the market. It’s the perfect place for a focused specialist to build a team of people who are passionate about the space and build a market-leading company.

It’s substantial. It’s a $1B market today growing at 5%. You can build a nice company stealing share if you need to, but I think there’s an even bigger opportunity.

It’s teed up to grow. On the operational side, I think that single source of truth, digital transformation, and compliance initiatives will drive the market. On the analytical side, if there’s one thing 20+ years in and around business intelligence (BI) has taught me, it’s GIGO (garbage in, garbage out). If you think the GIGO rule was important in traditional BI, I’d argue it’s about ten times more important in an artificial intelligence and machine learning (AI/ML) world. Garbage data in, garbage model and garbage predictions out. Data quality is the Achilles’ heel of modern analytics.

I like Profisee's product because:

It’s delivering well for today’s customers.
It has the breadth to cover a wide swath of MDM domains and use-cases.
It provides a scalable platform with a broad range of MDM-related functionality, as opposed to a patchwork solution set built through acquisition.
It’s easy to use and makes solving complex problems simple.
It’s designed for rapid implementation, so it’s less costly to implement and faster to get in production which is great for both committed MDM users and -- particularly important in an under-penetrated market – those wanting to give MDM a try.

I look forward to working with Len, Kristina, and the team to help take Profisee to the next level, and beyond.

Now, before signing off, let me comment on how I see Profisee relative to my existing board seat at Alation. Alation defined the catalog space, has an impressive list of enterprise customers, raised a $50M round earlier this year, and has generally been killing it. If you don't know the data space well you might see these companies as competitive; in reality, they are complementary and I think it's synergistic for me to work with both.

Data catalogs help you locate data and understand the overall data set. For example, with a data catalog you can find all of the systems and data sets where you have customer data across operational applications (e.g., CRM, ERP, FP&A) and analytical systems (e.g., data warehouses, data lakes).

MDM helps you rationalize the data across your operational and analytical systems. At its core, MDM solves the problem of IBM being entered in your company's CRM system as "Intl Business Machines," in your ERP system as "International Business Machines," and in your planning system as "IBM Corp," to give a simple example. Among other approaches, MDM introduces the concept of a golden record which provides a single source of truth of how, in this example, the customer should be named.

In short, data catalogs help you find the right data and MDM ensures the data is clean when you find it. You pretty obviously need both.

It Ain’t Easy Making Money in Open Source: Thoughts on the Hortonworks S-1

Dave Kellogg — Tue, 18 Nov 2014 13:26:58 GMT

It took me a week or so to get to it, but in this post I’ll take a dive into the Hortonworks S-1 filing in support of a proposed initial public offering (IPO) of their stock.

While Hadoop and big data are unarguably huge trends driving the industry and while the future of Hadoop looks very bright indeed, on reading the Hortonworks S-1, the reader is drawn to the inexorable conclusion that it’s hard to make money in open source, or more crassly, it’s hard to make money when you give the shit away.

This is a company that, in the past three quarters, lost $54M on $33M of support/services revenue and threw in $26M in non-recoverable (i.e., donated) R&D atop that for good measure.

Let’s take it top to bottom:

They have solid bankers: Goldman Sachs, Credit Suisse, and RBC are leading the underwriting with specialist support from Pacific Crest, Wells Fargo, and Blackstone.

They have an awkward, jargon-y, and arguably imprecise marketing slogan: "Enabling the Data-First Enterprise." I hate to be negative, but if you’re going to lose $10M a month, the least you can do is to invest in a proper agency to make a good slogan.

Their mission is clear: “to establish Hadoop as the foundational technology of the modern enterprise data architecture.”

Here’s their solution description: “our solution is an enterprise-grade data management platform built on a unique distribution of Apache Hadoop and powered by YARN, the next generation computing and resource management framework.”

They were founded in 2011, making them the youngest company I’ve seen file in quite some years. Back in the day (e.g., the 1990s) you might go public at age 3-5, but these days it’s more like age 10.

Their strategic partners include Hewlett-Packard, Microsoft, Rackspace, Red Hat, SAP, Teradata, and Yahoo.

Business model: “consistent with our open source approach, we generally make the Hortonworks Data Platform available free of charge and derive the predominant amount of our revenue from customer fees from support subscription offerings and professional services.” (Note to self: if you’re going to do this, perhaps you shouldn’t have -35% services margins, but we’ll get to that later.)

Huge market opportunity: “According to Allied Market Research, the global Hadoop market spanning hardware, software and services is expected to grow from $2.0 billion in 2013 to $50.2 billion by 2020, representing a compound annual growth rate, or CAGR, of 58%.” This vastness of the market opportunity is unquestioned.

Open source purists: “We are committed to serving the Apache Software Foundation open source ecosystem and to sharing all of our product developments with the open source community.” This one’s big because while it’s certainly strategic and it certainly earns them points within the Hadoop community, it chucks out one of the better ways to make money in open source: proprietary versions / extensions. So, right or wrong, it’s big.

Headcount: The company has increased the number of full-time employees from 171 at December 31, 2012 to 524 at September 30, 2014

Before diving into the financials, let me give readers a chance to review open source business models (Wikipedia, Kellblog) if they so desire, before making the (generally true but probably slightly inaccurate) assertion: the only open source company that’s ever made money (at scale) is Red Hat.

Sure, there have been a few great exits. Who can forget MySQL selling to Sun for $1B? Or VMware buying SpringSource for $420M? Or RedHat buying JBoss for $350M+? (Hortonworks CEO Rob Bearden was involved in both of the two latter deals.) Or Citrix buying XenSource for $500M?

But after those deals, I can’t name too many others. And I doubt any of those companies was making money.

In my mind there are a two common things that go wrong in open source:

The market is too small. In my estimation open source compresses the market size by 10-20x. So if you want to compress the $30B DBMS market 10x, you can still build several nice companies. However, if you want to compress the $1B enterprise search market by 10x, there’s not much room to build anything. That’s why there is no Red Hat of Lucene or Solr, despite their enormous popularity in search. For open source to work, you need to be in a huge market.

People don’t renew. No matter which specific open source business model you’re using, the general play is to sell a subscription to that complements your offering. It might be a hardened/certified version of the open source product. It might be additions to it that you keep proprietary forever or, in a hardcover/paperback analogy, roll back into the core open source projects with a 24 month lag. It might be simply technical support. Or, it might be “admission the club” as one open source CEO friend of mine used to say: you get to use our extensions, our support, our community, etc. But no matter what you’re selling, the key is to get renewals. The risk is that the value of your extensions decreases over time and/or customers become self-sufficient. This was another problem with Lucene. It was so good that folks just didn’t need much help and if they did, it was only for a year or so.

So Why Does Red Hat work?

Red Hat uses a professional open source business model applied to primarily two low-level infrastructure categories: operating systems and later middleware. As general rules:

The lower-level the category the more customers want support on it.

The more you can commoditize the layers below you, the more the market likes it. Red Hat does this for servers.

The lower-level the category the more the market actually “wants” it standardized in order to minimize entropy. This is why low-level infrastructure categories become natural monopolies or oligopolies.

And Red Hat set the right price point and cost structure. In their most recent 10-Q, you can see they have 85% gross margins and about a 10% return on sales. Red Hat nailed it.

But, if you believe this excellent post by Andreessen Horowitz partner Peter Levine, There Will Never Be Another Red Hat. As part of his argument Levine reminds us that while Red Hat may be a giant among open source vendors, that among general technology vendors they are relatively small. See the chart below for the market capitalization compared to some megavendors.

Now this might give pause to the Hadoop crowd with so many firms vying to be the Red Hat of Hadoop. But that hasn’t stopped the money from flying in. Per Crunchbase, Cloudera has raised a stunning $1.2B in venture capital, Hortonworks has raised $248M, and MapR has raised $178M. In the related Cassandra market, DataStax has raised $190M. MongoDB (with its own open source DBMS) has raised $231M. That’s about $2B invested in next-generation open source database venture capital.

While I’m all for open source, disruption, and next-generation databases (recall I ran MarkLogic for six years), I do find the raw amount of capital invested pretty crazy. Yes, it’s a huge market today. Yes, it’s exploding as do data volumes and the new incorporation of unstructured data. But we will be compressing it 10-20x as part of open-source-ization. And, given all the capital these guys are raising – and presumably burning (after all, why else would you raise it), I can assure you that no one’s making money.

Hortonworks certainly isn’t -- which serves as a good segue to dive into the financials. Here’s the P&L, which I’ve cleaned up from the S-1 and color-annotated.

$33M in trailing three quarter (T3Q) revenues ($41.5M in TTM, though not on this chart)
109% growth in T3Q revenues
85% gross margins on support
Horrific -35% gross margins on services which given the large relative size of the services business (43% of revenues) crush overall gross margins down to 34%
More scarily this calls into question the veracity of the 85% subscription gross margins -- I recall reading in the S-1 that they current lack VSOE for subscription support which means that they've not yet clearly demonstrated what is really support revenue vs. professional services revenue. [See footnote 1]
$26M in T3Q R&D expense. Per their policy all that value is going straight back to the open source project which begs the question will they ever see return on it?
Net loss of $86.7M in T3Q, or nearly $10M per month

Here are some other interesting tidbits from the S-1:

Of the 524 full-time employee as of 9/30/14, there are 56 who are non-USA-based
CEO makes $250K/year in base salary cash compensation with no bonus in FY13 (maybe they missed plan despite strong growth?)
Prior to the offering CEO owns 6.8% of the stock, a pretty nice percentage, but he was a kind-of a founder
Benchmark owns 18.7%
Yahoo owns 19.6%
Index owns 9.5%
$54.9M cash burn from operations in T3Q, $6.1M per month
Number of support subscription customers has grown from 54 to 233 over the year from 9/30/13 to 9/30/14
A single customer represented went from 47% of revenues for the T3Q ending 9/30/13 down to 22% for the T3Q ending 9/30/14. That's a lot of revenue concentration in one customer (who is identified as "Customer A," but who I believe is Microsoft based on some text in the risk factors.)

Here's a chart I made of the increase in value in the preferred stock. A ten-bagger in 3 years.

One interesting thing about the prospectus is they show "gross billings," which is an interesting derived metric that financial analysts use to try and determine bookings in a subscription company. Here's what they present:

While gross billings is not a bad stab at bookings, the two metrics can diverge -- primarily when the duration of prepaid contracts changes. Deferred revenue can shoot up when sales sells longer prepaid contracts to a given number of customers as opposed to the same-length contract to more of them. Conversely, if happy customers reduce prepaid contract duration to save cash in a downturn, it can actually help the vendor's financial performance (they will get the renewals because the customer is happy and not discount in return for multi-year), but deferred revenue will drop as will gross billings. In some ways, unless prepaid contract duration is held equal, gross billings is more of a dangerous metric than anything else. Nevertheless Hortonworks is showing it as an implied metric of bookings or orders and the growth is quite impressive.

Sales and Marketing Efficiency

Let's now look at sales and marketing efficiency, not using the CAC which is too hard to calculate for public companies but using JMP's sales and marketing efficiency metric = gross profit [current] - gross profit [prior] / S&M expense [prior].

On this metric Hortonworks scores a 41% for the T3Q ended 9/30/14 compared to the same period in 2013. JMP considers anything above 50% efficient, so they are coming in low on this metric. However, JMP also makes a nice chart that correlates S&M efficiency to growth and I've roughly hacked Hortonworks onto it here:

I'll conclude the main body of the post by looking at their dollar-based expansion rate. Here's a long quote from the S-1:

Dollar-Based Net Expansion Rate. We believe that our ability to retain our customers and expand their support subscription revenue over time will be an indicator of the stability of our revenue base and the long-term value of our customer relationships. Maintaining customer relationships allows us to sustain and increase revenue to the extent customers maintain or increase the number of nodes, data under management and/or the scope of the support subscription agreements. To date,

only a small percentage of our customer agreements has reached the end of their original terms and, as a result, we have not observed a large enough sample of renewals to derive meaningful conclusions

. Based on our limited experience, we observed a

dollar-based net expansion rate of 125% as of September 30, 2014

. We calculate dollar-based net expansion rate as of a given date as the aggregate annualized subscription contract value as of that date from those customers that were also customers as of the date 12 months prior, divided by the aggregate annualized subscription contract value from all customers as of the date 12 months prior. We calculate annualized support subscription contract value for each support subscription customer as the total subscription contract value as of the reporting date divided by the number of years for which the support subscription customer is under contract as of such date.

This is probably the most critical section of the prospectus. We know Hortonworks can grow. We know they have a huge market. We know that market is huge enough to be compressed 10-20x and still have room to create a a great company. What we don't know is: will people renew? As we discussed above, we know it's one of the great risks of open source

Hortonworks pretty clearly answers the question with "we don't know" in the above quote. There is simply not enough data, not enough contracts have come up for renewal to get a meaningful renewal rate. I view the early 125% calculation as a very good sign. And intuition suggests that -- if their offering is quality -- that people will renew because we are talking low-level, critical infrastructure and we know that enterprises are willing to pay to have that supported.

# # #

Appendix

In the appendix below, I'll include a few interesting sections of the S-1 without any editorial comments.

A significant portion of our revenue has been concentrated among a relatively small number of large customers. For example, Microsoft Corporation historically accounted for 55.3% of our total revenue for the year ended April 30, 2013, 37.8% of our total revenue for the eight months ended December 31, 2013 and 22.4% of our total revenue for the nine months ended September 30, 2014. The revenue from our three largest customers as a group accounted for 71.0% of our total revenue for the year ended April 30, 2013, 50.5% of our total revenue for the eight months ended December 31, 2013 and 37.4% of our total revenue for the nine months ended September 30, 2014. While we expect that the revenue from our largest customers will decrease over time as a percentage of our total revenue as we generate more revenue from other customers, we expect that revenue from a relatively small group of customers will continue to account for a significant portion of our revenue, at least in the near term. Our customer agreements generally do not contain long-term commitments from our customers, and our customers may be able to terminate their agreements with us prior to expiration of the term. For example, the current term of our agreement with Microsoft expires in July 2015, and automatically renews thereafter for two successive twelve-month periods unless terminated earlier. The agreement may be terminated by Microsoft prior to the end of its term. Accordingly, the agreement with Microsoft may not continue for any specific period of time.

# # #

We do not currently have vendor-specific objective evidence of fair value for support subscription offerings, and we may offer certain contractual provisions to our customers that result in delayed recognition of revenue under GAAP, which could cause our results of operations to fluctuate significantly from period-to-period in ways that do not correlate with our underlying business performance.

In the course of our selling efforts, we typically enter into sales arrangements pursuant to which we provide support subscription offerings and professional services. We refer to each individual product or service as an “element” of the overall sales arrangement. These arrangements typically require us to deliver particular elements in a future period. We apply software revenue recognition rules under U.S. generally accepted accounting principles, or GAAP. In certain cases, when we enter into more than one contract with a single customer, the group of contracts may be so closely related that they are viewed under GAAP as one multiple-element arrangement for purposes of determining the appropriate amount and timing of revenue recognition. As we discuss further in “Management’s Discussion and Analysis of Financial Condition and Results of Operations—Critical Accounting Policies and Estimates—Revenue Recognition,” because we do not have VSOE for our support subscription offerings, and because we may offer certain contractual provisions to our customers, such as delivery of support subscription offerings and professional services, or specified functionality, or because multiple contracts signed in different periods may be viewed as giving rise to multiple elements of a single arrangement, we may be required under GAAP to defer revenue to future periods. Typically, for arrangements providing for support subscription offerings and professional services, we have recognized as revenue the entire arrangement fee ratably over the subscription period, although the appropriate timing of revenue recognition must be evaluated on an arrangement-by-arrangement basis and may differ from arrangement to arrangement. If we are unexpectedly required to defer revenue to future periods for a significant portion of our sales, our revenue for a particular period could fall below our expectations or those of securities analysts and investors, resulting in a decline in our stock price

# # #

We generate revenue by selling support subscription offerings and professional services. Our support subscription agreements are typically annual arrangements. We price our support subscription offerings based on the number of servers in a cluster, or nodes, data under management and/or the scope of support provided. Accordingly, our support subscription revenue varies depending on the scale of our customers’ deployments and the scope of the support agreement.

Our early growth strategy has been aimed at acquiring customers for our support subscription offerings via a direct sales force and delivering consulting services. As we grow our business, our longer-term strategy will be to expand our partner network and leverage our partners to deliver a larger proportion of professional services to our customers on our behalf. The implementation of this strategy is expected to result in an increase in upfront costs in order to establish and further cultivate such strategic partnerships, but we expect that it will increase gross margins in the long term as the percentage of our revenue derived from professional services, which has a lower gross margin than our support subscriptions, decreases.

# # #

Deferred Revenue and Backlog

Our deferred revenue, which consists of billed but unrecognized revenue, was $47.7 million as of September 30, 2014.

Our total backlog, which we define as including both cancellable and non-cancellable portions of our customer agreements that we have not yet billed, was $17.3 million as of September 30, 2014. The timing of our invoices to our customers is a negotiated term and thus varies among our support subscription agreements. For multiple-year agreements, it is common for us to invoice an initial amount at contract signing followed by subsequent annual invoices. At any point in the contract term, there can be amounts that we have not yet been contractually able to invoice. Until such time as these amounts are invoiced, we do not recognize them as revenue, deferred revenue or elsewhere in our consolidated financial statements. The change in backlog that results from changes in the average non-cancelable term of our support subscription arrangements may not be an indicator of the likelihood of renewal or expected future revenue, and therefore we do not utilize backlog as a key management metric internally and do not believe that it is a meaningful measurement of our future revenue.

# # #

We employ a differentiated approach in that we are committed to serving the Apache Software Foundation open source ecosystem and to sharing all of our product developments with the open source community. We support the community for open source Hadoop, and employ a large number of core committers to the various Enterprise Grade Hadoop projects. We believe that keeping our business model free from architecture design conflicts that could limit the ultimate success of our customers in leveraging the benefits of Hadoop at scale is a significant competitive advantage.

# # #

International Data Corporation, or IDC, estimates that data will grow exponentially in the next decade, from 2.8 zettabytes, or ZB, of data in 2012 to 40 ZBs by 2020. This increase in data volume is forcing enterprises to upgrade their data center architecture and better equip themselves both to store and to extract value from vast amounts of data. According to IDG Enterprise’s Big Data Survey, by late 2014, 31% of enterprises with annual revenues of $1 billion or more expect to manage more than one PB of data. In comparison, as of March 2014 the Library of Congress had collected only 525 TBs of web archive data, equal to approximately half a petabyte and two million times smaller than a zettabyte.

# # #

Footnotes:

[1] Thinking more about this, while I'm not an accountant, I think the lack of VSOE has the following P&L impact: it means that in contracts that mix professional services and support they must recognize all the revenue ratably over the contract. That's fine for the support revenue, but it should have the effect of pushing out services revenue, artificially depressing services gross margins. Say, for example you did a $240K that was $120K of each. The support should be recognized at $30K/quarter. However, if the consulting is delivered in the first six months it should be delivered at $60K/quarter for the first and second quarters and $0 in the third and fourth. Since, normally, accountants will take the services costs up-front this should have the effect of hurting services by taking the costs as delivered but by the revenue over a longer period.

[2] See here for generic disclaimers and please note that in the past I have served as an advisor to MongoDB

Thoughts on MongoDB's Humongous $150M Round

Dave Kellogg — Mon, 21 Oct 2013 16:10:56 GMT

Two weeks ago MongoDB, formerly known as 10gen, announced a massive $150M funding round said to be the largest in the history of databases lead by Fidelity, Altimeter, and Salesforce.com with participation from existing investors Intel, NEA, Red Hat, and Sequoia. This brings the total capital raised by MongoDB to $231M, making it the best-funded database / big data technology of all time.

What does this mean?

The two winners of the next-generation NoSQL database wars have been decided: MongoDB and Hadoop. The faster the runner-ups figure that out, the faster they can carve off sensible niches on the periphery of the market instead of running like decapitated chickens in the middle. [1]

The first reason I say this is because of the increasing returns (or, network effects) in platform markets. These effects are weak to non-existent in applications markets, but in core platform markets like databases, the rich invariably get richer. Why?

The more people that use a database, the easier it is to find people to staff teams so the more likely you are to use it.
The more people that use a database, the richer the community of people you can leverage to get help
The more people that build applications atop a database, the less perceived risk there is in building a new application atop it.
The more people that use a database, the more jobs there are around it, which attracts more people to learn how to use it.
The more people that use a database, the cooler it is seen to be which in turn attracts more people to want to learn it.
The more people that use a database, the more likely major universities are to teach how to use it in their computer science departments.

To see just how strong MongoDB has become in this regard, see here. My favorite analysis is the 451 Groups' LinkedIn NoSQL skills analysis, below.

This is why betting on horizontal underdogs in core platform markets is rarely a good idea. At some point, best technology or not, a strong leader becomes the universal safe choice. Consider 1990 to about 2005 where the relational model was the chosen technology and the market a comfortable oligopoly ruled by Oracle, IBM, and Microsoft.

It's taken 30+ years (and numerous prior failed attempts) to create a credible threat to the relational stasis, but the combination of three forces is proving to be a perfect storm:

Open source business models which cut costs by a factor of 10
Increasing amounts of data in unstructured data types which do not map well to the relational model.
A change in hardware topology to from fewer/bigger computers to vast numbers of smaller ones.

While all technologies die slowly, the best days of relational databases are now clearly behind them. Kids graduating college today see SQL the way I saw COBOL when I graduated from Berkeley in 1985. Yes, COBOL was everywhere. Yes, you could easily get a job programming it. But it was not cool in any way whatsoever and it certainly was not the future. It was more of a "trade school" language than interesting computer science.

The second reason I say this is because of my experience at Ingres, one of the original relational database providers which -- despite growing from ~$30M to ~$250M during my tenure from 1985 to 1992 -- never realized that it had lost the market and needed a plan B strategy. In Ingres's case (and with full 20/20 hindsight) there was a very viable plan B available: as the leader in query optimization, Ingres could have easily focused exclusively on data warehousing at its dawn and become the leader in that segment as opposed to a loser in the overall market. Yet, executives too often deny market reality, preferring to die in the name of "going big" as opposed to living (and prospering) in what could be seen as "going home." Runner-up vendors should think hard about the lessons of Ingres.

The last reason I say this is because of what I see as a change in venture capital. In the 1980s and 1990s VCs used to fund categories and cage-fights. A new category would be identified, 5-10 companies would get created around it, each might raise $20-$30M in venture capital and then there would be one heck of a cage-fight for market leadership.

Today that seems less true. VCs seem to prefer funding companies to categories. (Does anyone know what category Box is in? Does anyone care about any other vendor in it?) Today, it seems that VCs fund fewer players, create fewer cage-fights, and prefer to invest much more, much later in a company that appears to be a clear winner.

This, so-called "momentum investing" itself helps to anoint winners because if Box can raise $309M, then it doesn't really matter how smart the folks at WatchDox are or how clever their technology.

MongoDB is in this enviable position in the next-generation (open source) NoSQL database market. It has built a huge following, that huge following is attracting a huge-r (sorry) following. That cycle is attracting momentum investors who see MongoDB as the clear leader. Those investors give MongoDB $150M.

By my math, if entirely invested in sales [2], that money could fund hiring some 500 sales teams who could generate maybe $400M a year in incremental revenue. Which would in turn will attract more users. Which would make the community bigger. Which would de-risk using the system. Which would attract more users.

And, quoting Vonnegut, so it goes.

# # #

Disclaimer: I own shares in several of the companies mentioned herein as well as competitors who are not. See my FAQ for more.

[1] Because I try to avoid writing about MarkLogic, I should be clear that while one can (and I have) argued that MarkLogic is a NoSQL system, my thinking has evolved over time and I now put much more weight on the open-source test as described in the "perfect storm" paragraph above. Ergo, for the purposes of this post, I exclude MarkLogic entirely from the analysis because they are not in the open-source NoSQL market (despite the 451's including them in their skills index). Regarding MarkLogic, I have no public opinion and I do not view MongoDB's or Hadoop's success as definitively meaning either anything either good or bad for them.

[2] Which, by the way, they have explicitly said they will not do. They have said, "the company will use these funds to further invest in the core MongoDB project as well as in MongoDB Management Service, a suite of tools and services to operate MongoDB at scale. In addition, MongoDB will extend its efforts in supporting its growing user base throughout the world."

The Information Continuum and the Three Types of Subtly Semi-Structured Information

Dave Kellogg — Tue, 11 May 2010 18:24:45 GMT

We generally refer to MarkLogic Server as an XML server, which is a special-purpose database management system (DBMS) for unstructured information. This often sparks debate about the term "unstructured" and the information continuum in general. Surprisingly, while both analysts and vendors frequently discuss the concept, the Wikipedia entry for information continuum is weak, and I couldn't easily find a nice picture of it, so I decided to make my own.

The general idea that information spans a continuum with regard to structure is pretty much undisputed. The placement of any given type of information on that continuum is more problematic. While it seems clear the purchase orders are highly structured and that free text is not, the placement of, for example, email is more interesting. Some might argue that email is unstructured. In fact, only the body of an email is unstructured and there is plenty of metadata (e.g., from, send-to, date, subject) wrapping an email. In addition, an email's body actually does have latent structure -- while it may not be explicit, you typically have a salutation followed by numerous paragraphs of text, a sign-off, a signature, and perhaps a legal footer. Email is unquestionably semi-structured.

In fact, I believe that the vast majority of information is semi-structured. PowerPoint decks have slides, slides have titles and bullets. Contracts are typically word documents, but have more-or-less standard sections. Proposals are usually Word or PowerPoint documents that tend to have similar structures. Even the humble tweet is semi-structured: while the contents are ostensibly 140 unstructured characters, the anatomy of a tweet reveals lots of metadata (e.g., location) and even the contents contain some structural information (e.g,. RT indicating re-tweet or #hashtags serving as topical metadata).

New let's consider XML content. Some would argue that XML is definitionally structured. But I'd say that an arbitrary set of documents all stored within and tags is only faux structured; it appears structured because it's XML, but the XML is just used as a container. A corpus of twenty 2,000-page medical textbooks in 6 different schemas is indeed structured, but not well so. To paraphrase an old saw about standards: the nice thing about structures is that there are so many to choose from. I believe that knowing content is marked up in XML reveals nothing about its structure, i.e., that XML-ness and structure are orthogonal. Put differently, XML is simply a means of representing information. The information represented may be highly structured (e.g., 100 purchase orders all in perfect adherence to a given schema) or highly unstructured (e.g., 20 documents only vaguely complying with 20 different schemas).

I have two primary beliefs about the information continuum:

The vast majority of information is semi-structured. There is relatively little highly structured and relatively little completely unstructured information out there. Most information lies somewhere in the fat middle. I overlaid a bell curve on top of the information continuum to reflect volume.

Even information that initially appears structured is often semi-structured. I see three types of this subtly semi-structured information which, hopefully without being too cute, I'll abbreviate as SSSI. The three types are (1) schema as aspiration, (2) time-varying schema, and (3) unknowable schema.

Let's look at each of the three types more closely.

Schema as Aspiration

The first type of subtly semi-structured information (SSSI) is where a schema exists, but only notionally. The schema itself is either poorly defined (actual quote: "it is believed that this element is used for") or well defined but not followed. This is frequently the case with publishing and media companies. Here are two free jokes that work well at any publishing conference:

Raise your hand if you have a standard schema. Keep it up if your content actually adheres to it.
Oxymorons aside, how many of you have 3 or more "standard" schemas, 5 or more, ... do I hear 10?

These jokes are funny because of the state of the content. This state is the result of two primary business trends: (1) consolidation -- most large publishers have been built through M&A thus inheriting numerous different standards, each of which may be only partly implemented -- and (2) licensing -- publishers frequently license content from numerous other sources, each with its own standard format.

Time-Varying Schema

The second case of SSSI is you where you have a well defined, enforced schema at any moment in time, but it keeps changing over time. Typically this happens for one of two reasons:

The business reality that you're modeling is changing. For example, in 2009 Federal Sales was part of Eastern Sales but in 2010 it becomes its own division. This makes comparison of Eastern results between 2009 and 2010 potentially difficult. In BI circles, this is known as the slow-changing dimension problem.

Standards keep changing. If you're modeling information in a corporate- or industry-standard schema and that schema is changing, then your information becomes semi-structured because it is contained within multiple different schemas. Sometimes you can avoid this by migrating all prior information to the current schema, but sometimes (e.g., massive data volumes, regulatory desire to not change existing records) you will not.

When viewed with a flash camera this information looks well structured. When you look at the movie, you can clearly see that it's not.

Unknowable Schema

The last case of SSSI is where you have an unknowable schema. Consider terrorist tracking. If you were to make a schema for a terrorist database, here are some of the attributes that spring to mind: name, alias(es), address, former address(es), height, weight, hair color, eye color, member-of, enemy-of, friend-of, tattoos/markings.

Here are some problems with this:

Many of the attributes are multi-valued, such as alias or friend-of. In a de-normalized approach, this means dealing with repeating group problems and creating N columns (e.g., alias, alias1, alias2, and up to the maximum number of aliases for any terrorist). Normalization would take care of the repeating group but at the cost of creating a table for each multi-valued attribute and then having to join back to those tables when you run queries. (One such real system ended up with 500 tables, with the result that no one could find anything.)

It is difficult to create a type for the tattoo attribute. First, it's multi-valued. Second, while tattoos are sometimes images, they often contain text (e.g., Mom) and sometimes in a foreign language (e.g., 愛, the Chinese symbol for love). Since you're trying to secure the nation against threat you don't want to throw away any potentially valuable information, but it's not obvious how to store this.

New attributes are coming all the time. Say you get a shoe print on a suspect as he runs away. You need to add a shoe-size attribute to the database. Say a terrorist runs away and leaves a pair of eyeglasses. Now we need to add eyeglass prescription. My favorite is what's called pocket litter. You find a piece of paper in a person's pocket and it has a number on it. It could be a phone number, a lock combination, or maybe map coordinates. You don't know what it is -- but again, since you don't want to throw any potentially valuable information -- you have to find a place to store it.

Combining an enormous number of potential attributes with the reality that very few are known for most individuals creates two problems: (1) you end up with a sparse table which is not well handled in most RDBMSs and (2) you end up hitting column limits.

Another example of unknowable schemas would be in financial services, modeling derivatives. Because derivatives are sometimes long-lived instruments (e.g., 30 years) you may face the time-varying schema problem. In addition, you have the unknowable schema problem because the industry is constantly creating new products. First we had CDOs and CDSs on banks, then single-tranche CDOs, then CDSs on single-tranche CDOs, and then synthetic CDOs. If this makes your head hurt in terms of understanding, then think for a minute about data modeling. How are you going to store these complex products in a database? And what are you going to do with the never-ending stream of new ones -- last I heard they were considering selling derivatives on movies.

(As it turns out XML is a great way to model both these problems as you can easily add new attributes on the fly and only provide values for attributes where you know them.)

To finish the post, I'll revisit the statement I started with: we generally refer to MarkLogic Server as an XML server, a special-purpose database management system (DBMS) for unstructured information. Going forward, I think I'll keep saying that because it's simpler, but at the MarkLogic 201 level, the more precise statement is: a special-purpose DBMS for semi-structured information.

There's way more semi-structured information out there. Realizing that information is semi-structured is sometimes subtle. And semi-structured information is, in fact, the optimization point for our product. So what's MarkLogic in three concepts? Speed, scale, and semi-structured information.

Dear CIO: Stop Writing Big Checks for Commodity (Database) Software

Dave Kellogg — Wed, 14 Oct 2009 17:11:00 GMT

Dear CIO,

What’s wrong this picture?

At 50%+, Oracle’s operating margins have never been higher

The differentiation of Oracle’s database technology, however, has never been lower and the number of both core and specialized alternatives has never been greater.

So what’s going on? You, kind Sir or Madam, are being milked. What’s worse is that you, in an example of collective behavioral dysfunction, have inadvertently played a role in setting up the milking. What happened?

Like all smart CIOs you followed a bit of herd mentality when it came to core technology. Pity the poor fools who, back in the day, bet big on Ingres or Sybase. You played it safe and went with Oracle, IBM, or if your requirements weren’t too heavy, Microsoft.

The problem is, of course, that everyone executed the same strategy you did. Hence, the market created a system of increasing returns where the strong vendors got stronger and the weak ones died. The result: the RDBMS market is an (order of magnitude) $10B/year market, structured as an oligopoly with 3 players. Most other software markets worked out the same way.

You were focused on standardization. You realized that through a combination of decentralized IT decision making and growth-by-acquisition your organization had become a kitchen sink of enterprise software. You had everything. In order to reduce the administrative, training, and license acquisition costs, you fought tooth and nail with your divisions to standardize the environment. You said, “Heck, it’s all the same stuff in the end, folks, so let’s make Oracle our DBMS standard, Business Objects our BI standard, Documentum our ECM standard, and SAP our ERP standard.”

And you won. Mostly. There’s still some Cognos in finance. And marketing didn’t totally give up on Interwoven. But, for the most part, you won. You reduced the entropy of your IT environment and drove cost savings for your organization.

The problem is you’ve won the battle but lost the war. Why? Because if, as you say, the “stuff really is all the same” you shouldn’t standardize on the most expensive product. You should standardize on the cheapest.

Do you really need to be paying those big fees to Oracle for enterprise licenses? Wouldn’t MySQL do?

Are you really using all the functionality of that $1M/year Documentum ECM system? Wouldn’t SharePoint or Alfresco do?

For BI, do you need all the bells and whistles of BusinessObjects? Wouldn’t Pentaho or Qlikview do a fine job, at a fraction of the cost?

But these alternatives are obvious. Heck, even "the establishment" (i.e, Gartner) says it’s safe to tread in the open source water. So the question is, what’s holding you back?

Switching costs. It’s hard to move off Oracle or Documentum and you don’t want to pay the nut to do so.

Organizational inertia. Your whippersnapper DBAs who were in their 30s in the 1980s are now in their 50s. They’re thinking that change devalues their knowledge and experience; some just want to cruise into retirement. But that’s their personal agenda, not your enterprise one.

Accounting: you made it free for your divisions to keep using Documentum, Oracle, or BusinessObjects because you bought an enterprise license. While this appeared to “save” you money on a per-license basis, and it helped support your standardization initiative, it squashed innovation in your divisions, reinforced the organization inertia, and has a lot of people using the wrong tool for the job, resulting in projects that either take more or more expensive hardware than necessary (Oracle is good at this), that take too long to develop, or that simply fail.

So, what do I recommend doing about all this? I suggest that you adopt these policies, which –- for full disclosure, are at least partially in the self-interest of this blog’s author:

Stop writing big checks for commodity software. Every time a big check comes along, ask yourself: is this software differentiated or commoditized? Be willing to pay a premium for differentiated software, and price shop commodity software. Call a group of your smartest staff together periodically to help you make the commodity versus differentiated call.

When you see a big check coming for commodity software, make a migration plan. My hunch is that most of the time, you can create a nice 3-year ROI in the transition from premium to cheaper software. (This reminds me of the time I visited an investment bank’s CIO asking about their Documentum strategy. The answer: “our Documentum strategy is to get off Documentum,” because we're paying too much and using too little.)

Stop doing enterprise agreements that create poor economic incentives within your organization. Don’t pay $XM at the enterprise level, spread that as a “tax” across your divisions, and then make use of certain software “free.” It distorts project reality, creates false incentives, squashes innovation, and generates lots of hidden costs. If you want to negotiate a master agreement and discount rate, that’s fine. Shoot for centralized discounts without central planning.

Don’t worry that the prior policies will create mayhem. While I understand that you don’t want arbitrary taste differences increasing the entropy of your enterprise software portfolio, recognize that with the first policy you’ve solved that problem already. If you deem a category (e.g., core RDBMS, enterprise search) commoditized, then you are going to force people to pick on cost. You’ll get standardization on the commodity categories –- just on the least expensive alternatives. The only entropy you’ll need to manage will be on the differentiated software which, having dispatched the commodity majority, you’ll have time to explore, study, and exploit.

Why I am taking the time to write this note to you? Back in the 1980s I was a foot soldier in the relational database revolution, and today I’m the CEO of one specialized DBMS company and on the board of another.

Mark Logic makes an XML server which can save great amounts of time and money in creating applications against unstructured information, replacing the combination of an RDBMS, an enterprise search engine, and an application server. Not only can Mark Logic manage 100s of TB of XML, the system eliminates the object / relational/ hierarchical impedance mismatch between Java, SQL, and XML that hampers developer productivity. Mark Logic was recently named the fourth fastest-growing IT company in Silicon Valley.

Aster Data makes a specialized data warehouse DBMS that runs on low-cost commodity hardware with a shared nothing architecture and leverages in-database MapReduce technology for parallelism and high scalability.

And during the past 25 years or so I've watched the market evolve. While I fully understand the policies and market forces that have led us to where we are, I feel like we've come full circle. Vendor power is now concentrated in the big three. Vendor margins top 50%. Big vendors don't innovate; they consolidate. Inertia has set in customer organizations. And there's a major platform shift in progress; last time it was mainframe to minicomputer, this time it's cloud.

Things feel a lot to me the way they did in 1985, just past dawn of the relational revolution. So in one way I'm writing to point out the oft-overlooked obvious: stop paying premium prices for commodity items. And in another way I'm saying, take the money you save in so doing and invest it in innovation technologies that:

Drive competitive advantage (which will matter again as we come out of the Great Recession)

Enable the Internet-scale applications you'll need to face the coming information deluge

Reform the application development stack in ways that make sense for the coming generation of information applications, not that made sense for the last generation of data-centric ones.

Thank you for reading my note. If you have any questions or comments, please give me a ping at dave-dot-kellogg-at-marklogic-com or comment on this post.

Sincerely,

Dave Kellogg

Gartner’s Death of the Database

Dave Kellogg — Wed, 07 Dec 2005 00:21:00 GMT

Early in my career, I interviewed at Gartner for a job as an analyst. During the meetings, I met with Mike Braude, then head of research. Mike had made himself famous by declaring the death of IBM’s SAA about two days after it was launched.

I remember when we talked about research quality. “Do you know what great research is, Dave?” he asked.

I can’t remember exactly what I said, but I probably used words like “fair, factual, and customer-centric.”

“Dave, great research is research that sells.”

At the time, I think I saw little horns sprouting from his head as he said it, but today I know what he meant. He assumed research would be fair, factual, and customer-centric. But to be great, it needed more. It needed to have sizzle and controversy. It needed to make you think.

I’m happy to report that the tradition of sizzle and controversy lives on today. While I didn’t attend Gartner’s recent Symposium conference, I have heard a lot about a presentation by Mark Beyer and Donald Feinberg entitled: “The Death of the Database.”

From what I’ve read of the session, here’s the key point: some types of data don’t really need to be in databases, especially in an RFID world.

In fact, this is a question I'd wondered about ever since I first started using databases in the 1980s. The inventory table is a model of a reality. Reality is what’s in the warehouse. And for lots of reasons (e.g., error, theft, or damage) the reality and the model don't always match.

Companies do a lot of work to minimize discrepancies between reality and the model. They take great pains to ensure that all additions and subtractions are correctly reflected in the database. And because they know that even painstaking processes won't guarantee synchronization, they periodically audit the inventory and update the database.

You can even get philosophical in thinking about this problem. At Ingres I always wondered about this epistemological question -- if my record in the employee database were flagged "terminated," which was more likely:

That there was an error in the database. (Reality was right and the model was wrong.)
That I'd been fired and no one had actually gotten around to telling me.

My guess was the world would increasingly find itself in the second case. When, in effect, does the model become reality?

So if the question you’re asking is “what’s in the warehouse right now” then I agree that it’s infinitely better to just go ask the warehouse via RFID. However, if the question is "what was in the warehouse last week" or "how have inventory levels changed over time" then you’re back in the database business. (And the Gartner guys both address and concede this example.)

I believe they’re really arguing that real-time databases about physical objects need not exist in an RFID world. I think it's a great point. But what happens in this world is not the death of databases, but the replacement of databases with data warehouses. (The former typically focus on the present and the latter on the past.)

I'm almost giddy that, per this blog, Feinberg and Beyer also apparently said:

Only 20% of the data that's stored will be structured anyway
That XML and XQuery will be useful for accessing the other 80%
That searching unstructured information will be important

Since the blog doesn't double-quote them, I can't be sure they said 'SQL will take a back seat to XQuery,' but one can dream. Either way, this is not business-as-usual for Gartner who, just a few years ago, answered most database inquiries with "take DB2, Oracle, or SQL Server and call me in the morning."

BI was not spared in the presentation, with the analysts arguing that it "wasn't an application anymore" and would be embedded into operational applications. I both agree and disagree. In my nearly 10 years in BI, I came to believe there were three segments:

Contextual BI. The use of BI to produce standard reports that enable everyone to develop a common understanding of "what's normal" and "how things work around here." This is the most popular use of BI and it's basically ignored by the market.

Operational BI. The embedding of intelligence into applications. What's better -- a data mining tool that produces a list of the top 50 leads or a telemarketing application that sorts the leads automatically by sales-value and presents them to the telemarketers in that order? Operational applications will get smarter over time and this will intrude on the traditional BI market.

Analytical BI. This is heavy lifting with stats tools and data mining. This will remain the domain of the "lab coat crowd" and will remain an important segment of the market.

So I'm happy to see that Gartner is producing some thought-provoking database research, mixing it up, and generating some controversy. But, to paraphrase Twain, I do think the rumors of the database's death have been at least somewhat exaggerated.

Author's Notes (1/17/06)

Since the original posting, I have learned the following things via emails from Mark Beyer and Donald Feinberg of Gartner

The presentation was done by Donald Feinberg and Mark Beyer (not Ted Friedman, as I had originally said). Ted and Daniel Sholler co-contributors to the materials. Apologies for the mistake.

Mark Beyer says they said that XQuery would challenge and eventually overwhelm SQL.

Donald Feinberg says that XQuery wouldn't replace SQL but the two would work together.

I don't take the last two bullets as inconsistent, but interpret them as meaning that they believe XQuery will gradually increase in uptake over time, be better than SQL for accessing XML data (yes, I'm reaching here), and that two will need to live together for a long time.