Category Archives: cloud computing

Interview by SandHill.com on Big Data, Cloud Computing, and the Future of IT

[This is a re-post of a recent interview with me, authored by Darren Cunningham of Informatica.  The post originally appeared on SandHill.com where Darren writes a column on Cloud Computing.]

—-

The Cloud in Action

Big Data, Cloud Computing and Industry Perspectives with Dave Kellogg

BY Darren Cunningham

I had the pleasure of working with Dave Kellogg early in my marketing career and continue to learn from him as a regular subscriber to his popular blog, Kellblog. A seasoned Silicon Valley executive, Dave has been a board member (Aster Data), CEO (MarkLogic), CMO (Business Objects) and VP of Marketing (Versant and Ingres). I recently sat down with Dave to discuss industry trends. As always, he didn’t hold back.

Dave, you’ve written a lot about “Big Data” on your blog. Why is it such a hot topic in the world of data management?

First I think Big Data is a hot topic because it represents the first time in about 30 years that people are rethinking databases. Literally, since about 1980 people haven’t had to think much about databases. If you were an SMB, you went SQL server; if you were enterprise, you’d go Oracle or IBM depending on your enterprise preferences. But in terms of technology, to paraphrase Henry Ford: any color you want, as long it’s relational.

Overall, I think Big Data is hot for three reasons:

  • Major new innovation is finally happening with databases for the first time in three decades.
  • Hardware architectures have changed — people want to scale horizontally like Google.
  • We are experiencing a serious explosion in the amount of data people are analyzing and managing. Machine-generated data, the exhaust of the Web, is driving a lot of it.

I think Big Data is challenging on many fronts from the cool (e.g., analytics and query optimization), to the practical (e.g., horizontal scaling), to the mundane (e.g., backup and recovery).

What’s the intersection with Cloud Computing?

I think when people say cloud computing, they mean one of several things:

  • SaaS: The use of software applications or platforms as services.
  • Dynamic scaling: My favorite example of this is Britain’s Got Talent, which uses Cassandra. Most of the time they have nothing to do. Then one night half the country is trying to vote for their favorite contestants.
  • Service orientation: The ability to weave together applications by calling various cloud services — in effect using a series of cloud services as a platform on which to build applications.

I think Big Data intersects with cloud in several ways. First, the people running cloud services are dealing with Big Data problems. They are hosting thousands of customers’ databases and generating log records from hundreds of thousands of users. I also think Big Data analytics are very dynamic loads. One minute you want nothing, then suddenly you need to throw 100 servers at a complex problem for several hours.

How do you see these trends changing the role of IT?

I think corporate IT is constantly evolving because smart corporations want their internal resources focused on activities that they can’t buy elsewhere and that generate competitive advantage for the business.

IT used to buy and run computers. Then they used to build and run applications. Then they focused on weaving together packaged applications. Going forward, they will focus on tightly integrating cloud-based services. They will also continue to focus on company-proprietary analytics used to gain competitive advantage.

The other trend driving IT is consumerization. The Web sets expectations for functionality, user interface and quality that corporate IT must meet with internal systems. The bar has gone way up – people won’t tolerate old-school ERP-style interfaces at work when they’re used to Facebook or Yelp.

What does that mean for technology sales and marketing?

If Mr. McGuire in The Graduate were dishing out advice today, instead of saying “plastics,” he’d say “data science.” More and more companies will use data scientists to analyze their business and drive tactical operations. First you need to gather a whole bunch of data about your operations and customers. Then you need to throw world-class data analysts at it to get business value and to be sure you don’t draw false conclusions – e.g., mixing causality with correlation.

Today, most companies have their sales departments on salesforce.com. Leading marketing departments are on Marketo or Eloqua, but most marketers still don’t have much technology backing them. Going forward you will see a whole class of analytics applications vendors providing advanced analytics for Salesforce (e.g., Cloud9, Good Data) and the marketing automation vendors will move beyond lead incubation into providing overall marketing suites. I expect Marekto or Eloqua to try to do for the chief marketing officer what SuccessFactors did for the chief people officer – and if they don’t, then there’s a real opportunity for someone else.

Speaking of all things cloud, you often write about Silicon Valley trends. How would you characterize what’s going on in the market right now?

From my perception, the Silicon Valley innovation engine is running full out. Top VCs are raising new funds. I meet a few new startups every day. Of late, I’ve met fascinating companies in next-generation business intelligence, analytics, Big Data, social media monitoring and exploitation and Web application development. One of the more interesting things I’ve found is a VC fund dedicated to big data - IA Ventures (in New York). When I heard about them, I thought: oh, lots of Big Data infrastructure and platform technologies. Then I spent some time and realized that most of their portfolio is about exploiting new Big Data infrastructure technologies via vertical applications. That was really interesting.

People will debate whether we’re in a mini tech bubble or a social networking-specific bubble. Who knows? I just read an article in the The Wall Street Journal that argues $140B valuation for Facebook is realistic, and it was fairly convincing. So you can debate the bubble issue but you can’t debate that the IPO market has been closed for a long time. Now it is starting to open, and that’s a huge change in Silicon Valley.

Entrepreneurs have historically dreamed of creating $1B independent companies. I’d say for most of the last decade they’ve dreamed of getting bought for 5-10x revenues. Michael Arrington had a great quote a while back saying that “an entire generation of entrepreneurs [has been lost] building dipshit companies that sell to Google for $25M.” I think those days are over. When the IPO window opens, people dream of building stand-alone companies.

What advice do you have for both entrepreneurs and IT veterans?

Don’t build or run things that you can buy or rent. If you follow that mantra, you will follow market trends, and always stay at the right stack-layer to ensure that you are adding value as opposed to leveraging old skill sets. While you may know how to run a Big Data center, you can now rent time in one more cost-effectively. So either go work for a company that runs data centers (e.g., Equinix) if that’s your pleasure, or go leverage the people who do. Put differently, don’t be static. If you’re still using skills you learned 10 years ago, make sure that you’re not teeing yourself up to get left behind.

As always, great advice, Dave! Thank you.

Darren Cunningham is VP of Marketing for Informatica Cloud.

[Notes:  Minor changes made from the SandHill post.  I added emphasis via bolding and I corrected the attribution of the famous lines "plastics" from The Graduate.  It was not Mr. Robinson, but Mr. McGuire, who said it.]

The Best SaaS / Cloud White Paper: Bessemer’s Top 10 Laws of Cloud Computing and SaaS

After doing a lot of reading in recent days, I thought I’d take a few minutes to share what I think is one of the best resources I’ve discovered:  Bessemer’s Top 10 Laws of Cloud Computing and SaaS (PDF), co-authored by about ten people from Bessemer including Byron Deeter.

Here is a quick summary of their top 10 laws:

1.       Less is more!  Use the cloud where you can in your own business.  I think this is a great idea in the eat-your-own-dogfood (at a model level, at least) department.  While MarkLogic was not a SaaS company, we were nevertheless big SaaS users (e.g., sales automation, marketing automation, finance, time tracking, expense reporting) because I’m a big believer in the model.

2.       Trust the 6 C’s of cloud finance.  Your new key metrics should be (1) committed monthly recurring revenue (CMRR), (2) cash flow, (3) CMRR pipeline, (4) churn, (5) customer acquisition cost (CAC), and (6) customer lifetime value.  This is a different set of metrics from the traditional enterprise software business and one worth taking the time to understand.

3.       Study the sales learning curve (SLC) and only invest behind success.  The SLC is a creation of former Veritas CEO Mark Leslie and discussed in this HBR article (paid) or this presentation.  A simpler version of the principle is to hire reps in groups of threes and only expand when 2 of 3 become profitable in the first group.  This avoids prematurely scaling-up the sales force which, probably more than any other sinkhole, has wasted countless venture capital over the past few decades.

4.       Forget everything you learned about software channels.  Because cloud products, by their nature, are not services-intensive and this fundamentally changes the role, and reduces the importance, of service providers in the industry equation.  Put more simply:  SaaS businesses are generally direct, leverage the Internet as a direct channel, and are not indirect-channel friendly.

5.       Build employee software.  Employees are now powerful customers, not just their managers.  We’re witnessing “the consumerization of software,” so ease up.  This is a very clear trend, in fact, many SaaS/cloud businesses work their way into the enterprise by starting out with individual consumer managers at small and medium businesses.  In the past, you could sell executive management “a better return on information” and condemn clerks to horrific user interfaces.  Those days are gone.

6.       Savvy online marketing is a core competence (sometimes the only one) of every successful cloud business.  Among other things this foretells of the rise of analytical and quantitative marketing VPs, over the more traditionally product-strategy and/or communications-creative types.

7. The most important part of software-as-a-service isn’t “software,” it’s “service!” Support!  Support!  Support! Culturally, this runs dead opposite to the traditional enterprise software “drive-by sales” approach whereby, as one search-engine salesrep once told me:  “we sold the customer a Ferrari – but then we dumped the pieces in his driveway.”  This natural incentive alignment (which by the way was also a by-product of the vertical-focus strategy at MarkLogic) is one of my favorite features of the SaaS model.

8.       Leverage and monetize the data asset.  You can do this by leveraging your expertise to identify the metrics and dashboards of most analytic value and further by then selling industry benchmark data on them.  This, to me, is one of the more obvious SaaS opportunities, yet nevertheless to-date, in my experience, one of the most unexploited.  I expect to see much more progress in this area in the coming few years.

9.       Mind the GAAP.  Cloud accounting is all about matching revenue and costs to consumption … except when it’s not (i.e., professional services).    Taleo’s struggles have been well publicized, Bessemer’s paper provides a great overview of the issues, and for those who want to know more, here is an excellent paper (SaaS is Different, An Accounting Primer for SaaS Companies by Jay Howell of BDO) that discusses SaaS accounting differences which are primarily related to (1) recognizing revenue over the term during which the service is live/delivered and (2) pro-rating professional services over the full duration of the software-service contract and potentially the lifetime of the customer relationship.

10.   Cloudonomics requires that you plan your fuel stops very carefully.  SaaS companies are capital intensive and typically require at least 4 years before they are cash-flow positive.  NetSuite needed $126M before its IPO, DemandTec $66M, Salesforce $61M, and SuccessFactors $45M.

The First Rule of Data Centers: Don't Talk About Data Centers

Who’d have guessed that data centers were like Fight Club?

Trying to chart the cloud’s geography can be daunting, a task that is further complicated by security concerns. “It’s like ‘Fight Club,’ ” says Rich Miller, whose Web site, Data Center Knowledge, tracks the industry. “The first rule of data centers is: don’t talk about data centers.”

The excerpt is from a great article, entitled Data Center Overload, in this past Sunday’s New York Times Magazine. The article provides a layperson’s introduction to the cloud and to the hidden, massive data centers — which collectively now consume more power than Sweden — that underlie it. Excerpt:

Yet as data centers increasingly become the nerve centers of business and society — even the storehouses of our fleeting cultural memory (that dancing cockatoo on YouTube!) — the demand for bigger and better ones increases: there is a growing need to produce the most computing power per square foot at the lowest possible cost in energy and resources. All of which is bringing a new level of attention, and challenges, to a once rather hidden phenomenon. Call it the architecture of search: the tens of thousands of square feet of machinery, humming away 24/7, 365 days a year — often built on, say, a former bean field — that lie behind your Internet queries.

The full story is here.

McKinsey Releases Cloud-Unfriendly Cloud Computing Report

McKinsey has released a “discussion document” on cloud computing that comes to some fairly un-trendy conclusions about cloud computing. Entitled Clearing the Air on Cloud Computing the 34-page unwieldy PDF slide presentation argues the cloud computing is at the peak of Gartner’s hype cycle, has created a gold rush atmosphere, has about 22 different definitions, and that in many cases cloud computing is more expensive than what large companies could accomplish on their own.

I don’t know what’s causing it, but I’ve had a huge amount of trouble accessing the PDF, which says (SECURED) in my Window bar, mysteriously appears to be only an improbable 1MB in size, but which just eats CPU when you open it. I’ve lost about an hour and crashed my machine twice trying to make this post, which has become a Sisyphean quest at this point.

Ergo, I’ve uploaded it to Scribd in case you can’t access it either, and in so doing discovered what the problem is. The geniuses at McKinsey have published a cloud computing report in some encrypted PDF format such that it’s quite unusable in the cloud. Bravo. Incroyable.

Somehow, someone got it uploaded to Slideshare, where it also acts quite unwieldy, but here it is:

[Sorry, it still went too slow when embedded, and when I tried to simply link to it in SlideShare form, the uploader had subsequently marked it as private. The whole thing is an exercise in how not to do viral marketing.]

Amazon Elastic MapReduce: Power to Burn, On Demand

Amazon Web Services today announced Amazon Elastic MapReduce, a new member of the Amazon web services family designed to help users process vast amounts of data using the divide-and-conquer parallel processing approach made famous by Google’s MapReduce and as implemented in the Apache Hadoop project.

Background on Hadoop (from the project site):

Here’s what makes Hadoop especially useful–
  • Scalable: Hadoop can reliably store and process petabytes.
  • Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes.
  • Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid.
  • Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.

Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS). MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located. Hadoop has been demonstrated on clusters with 2000 nodes. The current design target is 10,000 node clusters.

Here’s some background on MapReduce (from Google Labs):

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.

Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.

Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google’s clusters every day.

So Amazon Elastic MapReduce is a cloud-based service that enables you to perform highly parallel operations against large amounts of data, all in an on-demand model. This strikes me as a great offering, particularly for those organizations who have an intermittent need for large Hadoop clusters.

From the Amazon press release:

It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3). Using Amazon Elastic MapReduce, you can instantly provision as much or as little capacity as you like to perform data-intensive tasks for distributed applications such as web indexing, data mining, log file analysis, machine learning, financial analysis, scientific simulation, and bioinformatics research. As with all AWS services, Amazon Elastic MapReduce customers will still only pay for what they use, with no up-front payments or commitments.

Amazon says they made the offering in response to users who were already deploying Hadoop clusters on their lower-level EC2 framework — i.e., that this was an organic evolution:

“Some researchers and developers already run Hadoop on Amazon EC2, and many of them have asked for even simpler tools for large-scale data analysis,” said Adam Selipsky, Vice President of Product Management and Developer Relations for Amazon Web Services. “Amazon Elastic MapReduce makes crunching in the cloud much easier as it dramatically reduces the time, effort, complexity and cost of performing data-intensive tasks.”

I suspect this was a bad day at CloudEra, an Accel-backed startup that wants to be the RedHat of Hadoop. Perhaps, like SugarCRM in competing against Salesforce, CloudEra will soon offer an on-demand Hadoop as well. But that means supporting two business models at once and buying a lot of hardware to boot. And, I suspect, a lot more hardware than SugarCRM needs to buy to support sales automation as a service.

DBMS in the Cloud: Amazon SimpleDB

Continuing to steadily and patiently execute on their Amazon Web Services vision, Amazon recently announced SimpleDB, a web service for running queries in real time against structured data.

It’s the first instance of which I’m aware of someone offering DBMS-level services in the cloud. Arguably, GoogleBase is a competitor, but I’ve always viewed that as more aimed at eBay and Craigslist and less about cloud computing.

While most SaaS-type applications are indeed applications (e.g., NetSuite, Salesforce), Amazon has been coming at cloud computing from an infrastructure-up, rather than an application-down, perspective. Previously Amazon launched lower-level services including EC2 (elastic compute cloud) and S3 (simple storage service) in the same “pay as you go to use our infrastructure” manner.

I’m told Amazon got into cloud computing because, due to the spikey nature of retail, they have built a massive infrastructure to handle demand peaks (e.g., Christmas) that goes largely unused most of the time. AWS is their attempt to monetize it.

For more on SimpleDB, see this post on the ProgrammableWeb blog, or check out the developer’s guide here.

Finally, here’s the pricing for SimpleDB:

Machine Utilization - $0.14 per Amazon SimpleDB Machine Hour consumed (normalized to the hourly capacity of a circa 2007 1.7 GHz Xeon processor).

Data Transfer

    $0.10 per GB – all data transfer in
    $0.18 per GB – first 10 TB / month data transfer out
    $0.16 per GB – next 40 TB / month data transfer out
    $0.13 per GB – data transfer out / month over 50 TB

Structured Data Storage - $1.50 per GB-month