Survivor Bias in Churn Calculations: Say It’s Not So!

I was chatting with a fellow SaaS executive the other day and the conversation turned to churn and renewal rates.  I asked how he calculated them and he said:

Well, we take every customer who was also a customer 12 months ago and then add up their ARR 12 months ago and add up their ARR today, and then divide today’s ARR by year-ago ARR to get an overall retention or expansion rate.

Well, that sounds dandy until you think for a minute about survivor bias, the often inadvertent logical error in analyzing data from only the survivors of a given experiment or situation.  Survivor bias is subtle, but here are some common examples:

  • I first encountered survivor bias in mutual funds when I realized that look-back studies of prior 5- or 10-year performance include only the funds still in existence today.  If you eliminate my bogeys I’m actually an below-par golfer.
  • My favorite example is during World War II, analysts examined the pattern of anti-aircraft fire on returning bombers and argued to strengthen them  in the places that were most often hit.  This was exactly wrong — the places where returning bombers were hit were already strong enough.  You needed to reinforce them in the places that the downed bombers were hit.

So let’s turn back to churn rates.  If you’re going to calculate an overall expansion or retention rate, which way should you approach it?

  1. Start with a list of customers today, look at their total ARR, and then go compare that to their ARR one year ago, or
  2. Start with a list of customers from one year ago and look at their ARR today.

Number 2 is the obvious answer.  You should include the ARR from customers who choose to stop being customers in calculating an overall churn or expansion rate.  Calculating it the first way can be misleading because you are looking at the ARR expansion only from customers who chose to continue being customers.

Let’s make this real via an example.

survivor bias

The ARR today is contained in the boxed area.  The survivor bias question comes down to whether you include or exclude the orange rows from year-ago ARR.  The difference can be profound.  In this simple example, the survivor-biased expansion rate is a nice 111%.  However, the non-biased rate is only 71% which will get you a quick “don’t let the door hit your ass on the way out” at most VCs.  And while the example is contrived, the difference is simply one of calculation off identical data.

Do companies use survivor-biased calculations in real life?  Let’s look at my post on the Hortonworks S-1 where I quote how they calculate their net expansion rate:

We calculate dollar-based net expansion rate as of a given date as the aggregate annualized subscription contract value as of that date from those customers that were also customers as of the date 12 months prior, divided by the aggregate annualized subscription contract value from all customers as of the date 12 months prior.

When I did my original post on this, I didn’t even catch it.  But therein lies the subtle head of survivor bias.

# # #

Disclaimers:

  • I have not tracked the Hortonworks in the meantime so I don’t know if they still report this metric, at what frequency, how they currently calculate it, etc.
  • To the extent that “everyone calculates it this way” is true, then companies might report it this way for comparability, but people should be aware of the bias.  One approach is to create a present back-looking and a past forward-looking metric and show both.
  • See my FAQ for additional disclaimers, including that I am not a financial analyst and do not make recommendations on stocks.

6 responses to “Survivor Bias in Churn Calculations: Say It’s Not So!

  1. Great point :-) and a standardized version of Churn would be useful, indeed to report on and compare. I believe Bessemer offered a pretty easy and standard way to calculate ARR/Expansion/Churn.

    Small note (as I was checking on others too): I believe Hortonworks is calculating properly: I understand “divided by the aggregate annualized subscription contract value from all customers as of the date 12 months prior” as the sum of ARR of all customers 12 month ago (not the surviving ones). And “aggregate annualized subscription contract value as of that date from those customers that were also customers as of the date 12 months prior” if the current ARR for customer that were also customer a 12 month ago.
    So: churn = ARR from customers where age > 12m / total ARR a year ago.

  2. Pingback: A Fresh Look at How to Measure SaaS Churn Rates | Kellblog

  3. I understand survivorship bias can taint churn calculations in the aggregate, but what about the information Saas companies gather from churning customers directly? For example, if you survey all churning customers during a given month and 20 percent say that they wanted feature x, and cancelled because the product didn’t have it, is there implicit non-survivorship bias in making the decision to add feature x?

    • The question is if you surveyed churning customers and asked if there were mammals they would also say as well. What we really want is questions that separate those who churn from those don’t. That’s theoretical. In practice, yes I would definitely look at the features churning customers are asking for but with a critical eye as to whether I believed those features really drove the churn. I may stop using my gym and if they ask my input, I’ll say the lobby was always dirty — a clean lobby wouldn’t haven’t prevented my cancellation.

  4. Pingback: Bookings vs. Billings in a SaaS Company | Kellblog

  5. Pingback: Appearance on the Metrics That Measure Up Podcast | Kellblog

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.