Category Archives: relational database

Thoughts on Category Creation and Information Access Platforms [Revised]

[Revised 8/2/08; still working on cleaning up this consciousness stream.]

Back in the old days, it seemed easy to create a category in software. Look at the database market, for example:

  • IBM invents the relational DBMS (RDBMS) category
  • Oracle, Ingres, and Informix enter in a largely undifferentiated way, though Informix eventually drifts towards the low-end/cheap segment
  • Sybase creates the derivative category of high-performance OLTP RDBMS.
  • Arbor re-christens the failed multi-dimensional DBMS as the OLAP Server
  • Tandem creates the non-stop RDBMS with its superb fault tolerance
  • Illustra launches the universal DBMS and is quickly acquired by Informix
  • Sybase launches the bitmap-indexed DBMS with SybaseIQ
  • Teradata launches the data-warehouse DBMS category

And you can find just as many examples outside database-land.

  • ASK defines the manufacturing resource planning (MRP) category
  • SAP hijacks MRP, redefines it as ERP, and goes on to become the world’s largest applications software company
  • PeopleSoft invents the HRMS category
  • Gartner Group’s Howard Dresner invents the business intelligence (BI) category, re-christening and re-framing what was formally known as DSS or EIS.
  • Siebel pioneers the sales force automation (SFA) category
  • Scopus pioneers call center automation (CCA)
  • Companies like Rubric pioneer enterprise marketing automation (EMA)
  • Siebel, through acquisition, coalesces SFA, CCA, and EMA into a single category called customer relationship management (CRM)
  • Oracle and SAP work to coalesce CRM back into ERP. Such is the ebb and flow of categories.

(And I could go on and on — BPM, KM, CMS, WCM, ECM, LMS, DRM, SCM, PLM, ETL, DI, EII — but I think I’ll stop here with the initials list.)

People are still creating categories today, and sometimes it looks easy. Uber-categories have been quite popular in the past decade as people have focused on different ways of developing and delivering software:

  • SaaS as an uber-category has worked well, with a variety offerings in various SaaS sub-categories (e.g., Salesforce, NetSuite)
  • Appliances have done pretty much the same thing — i.e., offering an appliance alternative for a wide variety of existing categories (e.g., a data warehouse appliance a la Netezza)
  • Open source has also done the same thing — again serving as a different flavor/dimension for a wide variety of largely existing software categories.

Only a few genuinely new categories have emerged, virtualization being the most obvious example. (Though you could argue that virtualization is itself an uber-category covering storage virtualization, server virtualization, et cetera.)

Companies are still working to carve new categories, particularly in the database market:

Sometimes vendors and/or the analysts who cover them try to impose either a straight name change (e.g., from MD-DBMS to OLAP) or a strategic shift (e.g., from BI to analytic applications) in category. Sometimes they’re just bored. Sometimes a vendor’s trying to redefine the market in line with its strengths. Sometimes an analyst is trying to make his/her mark on the industry and earn the coveted “father/mother of [category name],” much as Howard Dresner successfully did with BI.

BI got bored with its name several times during my tenure at Business Objects. At one point both the analysts and Informatica were trying to re-dub the category “analytic applications” in an attempt to get a fresh name and raise the abstraction level from tools to applications. Informatica nearly died on that hill.

Later, analysts tried to redefine the category, dubbing it corporate performance management (CPM), and arguing that business intelligence needed to link with financial planning systems. While knowing actuals is good, knowing actuals compared to the plan is better, and using actuals to drive the future plan better still. Cognos nearly tripped over itself repositioning around the CPM, ultimately acquiring Adaytum, which in turn lead to SRC’s eventual acquisition by Business Objects.

In an art-imitates-life sort of way, one wonders if the analysts predicted a move in the market or provoked it? My chips are on the latter.

This stream-of-consciousness is a long way of winding up to a single question: are enterprise search vendors successfully repositioning themselves as “information access platforms” or not?

Background: the enterprise-search-related vendors (e.g., Fast/Microsoft, Endeca) and search/content analysts who cover them are in the midst of an attempted category repositioning:

  • The word “enterprise search” is now seemingly dead, having been contaminated by the Google Appliance. When a shark gets in the water, all the fish jump out.
  • The word “information” is increasingly being used as a unifying term to describe both data and content (aka, unstructured data)
  • Enterprise search vendors are increasingly calling themselves “information access platforms” (though not generally abbreviated as IAP, I will do so here for brevity).

For example, consider Endeca’s corporate boilerplate:

Endeca’s innovative information access software that helps people explore, analyze, and understand complex information, guiding them to unexpected insights and better dec
isions. The Endeca Information Access Platform, built around a new class of access-optimized database, powers applications that combine the ease of searching and browsing with the analytical power of business intelligence.

I have a number of concerns on and related to this attempted shift:

  • The important thing about categories is that they exist in the mind of the customer. Analysts and vendors can try to put them there — but they have to stick. In my mind, IAP is not sticking. I have never heard a customer say: “I need to go out and get an IAP.”
  • I do, however, believe that “information” might well stick as an overall term, meaning both data and content (aka, structured and unstructured data).
  • It is not clear to me why someone who desires a unified platform for “information” would turn to a search vendor. Search engines were designed as read-only indexes to help people find documents containing tokens; hardly ideal as an application development platform.
  • In my estimation, someone managing “special” data should turn to a database vendor. While databases have classically not handled “special” data well, databases were designed as application platforms, and there is a whole new class of specialized databases emerging for handling various “special” types of data.
  • While I think a unified platform is a dandy vision, I think no one is close to delivering a unified platform that handles all types of data equally well. Bolting Lucene and MySQL together isn’t a platform. Relational databases still do a poor job with both content and many types of data (e.g., sparse, hierarchical, or semi-structured). XML servers (like MarkLogic) handle XML brilliantly, but need work before they can match RDBMSs at classical relational data.
  • I believe that someone who needs a crawl-and-index the intranet value proposition should use the Google Appliance; so I think the search vendors are correct in their desire to flee, I don’t think that “information access platform” is a good refuge.

Overall, my chips remain on the don’t come line for the attempted category repositioning from “enterprise search” to “information access platform.” You can find my stack on the come line for the emerging “special-purpose database” category and “XML servers” as an instance of them.

The Demise of Closed-Source RDBMSs?

A friend pointed me to this interesting post by Allan Packer of Sun entitled Are Proprietary Databases Doomed? Overall, I think it’s a well done analysis of the DBMS market and well worth reading.

First, a nit. When I was a lad, “proprietary” didn’t mean “closed source“, it meant proprietary (i.e., vendor controlled) interface. For example, Ingres originally spoke a query language called Quel. SQL then emerged as the standard and any DBMS that spoke a language other than ANSI standard SQL was deemed proprietary. While I know that some people in the open source community view the opposite of “open source” as “proprietary,” I think that’s a misnomer. I think the correct antonym is closed source.

First, I think Allan makes an excellent point about stagnation:

By the turn of the millenium, relational databases had already pretty much met the essential requirements of end users, and proprietary database companies were either pointing their vaccuum cleaners toward other interesting money piles, or losing the plot entirely and sailing off the edge of the world. Today, database releases continue to tout new features, but they’re frosting on the cake rather than essentials. No-one issues a tender for a database unless they have unusual requirements. No-one loses their job because they chose the wrong database. And it’s been that way for years.

As a general rule I am shocked by the lack of innovation returned by the R&D budgets of most technology companies. As I mentioned yesterday, despite billions of R&D investment, Google has yet to come up with another big business. And what does Microsoft get for the billions they spend each year on R&D? An incompatible version of Office with irritating “ribbons” that takes four years to make.

Silicon Valley startups create new categories with $10s of millions in venture capital. It seems that once they become “real companies” they forget how to innovate at all, let alone on a shoestring.

Specifically in the DBMS market, I think the lack of innovation — enabled by the oligopolistic structure of the market — creates a soft underbelly for focused, innovative companies to carve our niches. (And remember “niches” of $10B market can be pretty big.)

Allan goes on to do some interesting pricing analysis, and then poses the question:

Why, then, is proprietary database software becoming more expensive while everything else reduces in price? End users normally expect to benefit from the cost savings resulting from improvements in technology. I am writing this blog, for example, on an affordable computer that would easily outperform expensive commercial systems from just 10 years ago.

It seems difficult to resist the conclusion that proprietary database companies have managed to redirect a good chunk of these savings away from end users and into their own coffers. Successful as this strategy has been, though, it could ultimately backfire. The more expensive proprietary databases become, the more attractive lower cost alternatives appear.

I think the short answer to his question is (1) the market is an oligopoly and (2) there is a lot of inertia when it comes to database management systems. So change will happen, but it will happen slowly. And, ironically, the force that drives the market change will be overpricing on the leaders’ part. Were RDBMSs not so expensive, there would be less impetus to move to open source.

Now, the RDBMS vendors probably argue they should “milk” the market until the real threat emerges and then “wave a wand” to reduce price, but that is a risky strategy because they could very easily wave the wand too late, which is what I think they are doing.

The only point I think Alan misses in his analysis is that some powerful vendors like SAP and EMC don’t like the fact that their applications run on top of lower-level DBMS technologies from competitors. For example, SAP has been trying to get itself off Oracle for about a decade, and I’m told they fund developers to work on MySQL towards that end. I know that EMC/Documentum is not comfortable that the vendors who provide the DBMSs they run on are all now challenging them in content management (e.g., Oracle/Stellent, IBM/FileNet, Microsoft SharePoint).

He then speculates on what he thinks will happen going forward:

My vote for the Strategy Most Likely To Succeed is a tie between Revenue Pull-Through and Reduce Prices. Oracle is arguably becoming the most successful proponent of the pull-through strategy. Oracle wants to supply you with a full software stack, including an OS, virtualization software, a broad range of middleware, a database, and end user applications. The largest component of Oracle’s revenue currently still comes from database licenses, but the company is working hard to reduce that dependency. Until that happens, reducing prices across the board will be challenging for Oracle. If Oracle succeeds with a pull-through strategy, it doesn’t mean that OSDBs will fail, of course. It simply means that Oracle is less likely to sustain major damage from their success.

He concludes:

Are proprietary databases doomed, then? Not at all. Even if proprietary database companies pull no surprises, they won’t fade away anytime soon … Make no mistake, though, open source databases are coming. For established companies it’s more likely to be an evolution than a revolution.

I believe there are two major trends in the DBMS market today: (1) open-source chipping away at the closed-source oligopoly, and (2) special-purpose DBMSs innovating and carving out niches in the soft underbelly. I actually think point 1 provides powerful “air cover” for vendors pursuing strategy 2, because point 1 is a direct attack on the existing business.

How The Web Disrupts the RDBMS World

I found an interesting post on The Future of Software minisite run by the GigaOM network, best known for Om Malik and his GigaOM blog. The post is entitled “Data 2.0: How the Web disrupts our relational database world” and is written by Nitin Borwankar.

The post begins with:

The great online shift is creating massive amounts of data – whether it is videos on YouTube or social networking profiles on MySpace. And that data is stored in databases, making them the key component of the new web infrastructure. But managing that information isn’t easy

I think he nails the problem statement. The Web world is changing fast. And relational databases are having trouble keeping up.

The good news is that database management will be vastly different in the future. In fact, change has already begun; it just isn’t (cliché alert!) “evenly distributed” yet.

He then goes on to describe some leading examples of companies or problems that are pushing the relational database envelope.

  1. Yahoo’s creation of its own user management software based on BerkeleyDB
  2. Google’s MapReduce
  3. Amazon’s S3 (simple storage service) and SQS (simple queue service) which externalize operations normally done by a database.
  4. The general use of Lucene, Nutch, and Solr to do indexing of unstructured content, “something an old relational database cannot do well.”
  5. The graph-structured data problem (also known as the parts explosion problem) inherent in social networking and which remains an Achilles’ heel for relational databases

So while I generally agree with his thesis, the examples cited are basically all technology companies who are able to write their own system-level software to bypass and/or accommodate the limitations of relational databases.

My question is: what about everybody else? What are they supposed to do?

My short answer is — perhaps not shockingly — MarkLogic. At MarkLogic, we call Data 2.0 “content.”

  • We manage XML natively
  • We manage graph-structured data easily
  • We manage, search, storage and index text and XML natively

Some companies will always be able to write their own stuff to get around problems. But the reason MarkLogic exists is provide a commercial DBMS that “the rest of us” can use when managing content and building web applications with it.

See this post on top-to-bottom XML for more.

Stonebraker's "One Size Fits All" Papers

As frequent readers know, one of my memes is the rise of special-purpose databases, whether they be data warehouse appliances like Netezza, stream databases like Streambase, or OLAP (aka multi-dimensional) databases like Essbase, recently purchased by Oracle through the Hyperion Acquisition.

I believe that MarkLogic is one of a class of special-purpose DBMSs that will be necessary to handle new requirements that were never envisioned when the RDBMS was born. The relational database is now pushing 40 years old since its invention (and pushing 30 since the first implementations in commercial products).

An easy way of seeing the problem is to think about the computers you used even 20 years ago, their disk and memory configuration, their network connection speed, the types of data they managed, and the applications they ran. For me, that would be a 1 MIPS MicroVAX II with 8MB of memory, 256 MB of disk space, 40 users (among other things I was the sysadmin), and we used it to run a technical support call tracking system at Ingres, then known as Relational Technology, Inc.

While RDBMSs have proven remarkably extensible, for certain classes of applications (e.g., ultra-low latency trading) and databases (e.g., managing tens to hundreds of terabytes of XML documents), they are simply not appropriate.

As it turns out, I’m not the only person who sees this problem. Michael Stonebraker, noted computer science professor (formerly of UC Berkeley and now of MIT), serial entrepreneur (a founder of Ingres, Illustra, Cohera, Streambase, and Vertica), and general database visionary, thinks the same thing.

Towards that end, he co-authored of two papers:

  • One Size Fits All: An Idea Whose Time Has Come and Gone. This paper makes the argument that the relational database cannot be extended ad infinitum, demonstrates how RDBMSs are inappropriate for several new applications, and argues that the DBMS market will fragment into a series of special-purpose engines, perhaps unified by a common front-end parser.
  • One Size Fits All: Part 2, Benchmarking Results. This paper buttresses the first with benchmark results for relational vs. special-purpose databases in several applications. Interestingly and pragmatically, Stonebraker argues that most people won’t even consider a special-purpose database (largely due to inertia) unless it is at least 10x faster than relational for a given application. He then demonstrates several applications where you can see 10 – 100x gains in performance. (Large text and XML contentbases are one the cases he discusses, citing Google’s creation of their own file system and software stack to deal with Internet-scale documentbases.)

I have always found Stonebraker’s work very clear; he’s one of the few authors of academic computer science literature whose work I can always read and understand. Take a look at the articles.

If you’re not up for the papers, then here’s an interview in Red Hat Magazine that hits many of the key points. (But bear in mind he’s doing PR for Vertica here, so the examples are a bit biased towards column-orientation, and I’m sure the webinar mentioned at the bottom is a Vertica one.)

Celebrating XML Independence

Today, I’d like to highlight a (4th of July holiday) post on Matt Turner’s Discovering XQuery blog. Matt’s post refers to this article, entitled XQuery: The Server Language, on XML.com, written by Kurt Cagle.

I’d read Kurt’s article when it was posted on June 6 and had meant to blog on it, but didn’t get around to it (or frankly, much blogging at all) during the busy month of June. Nevertheless, here are few chunky morsels from Kurt’s article:

As an XML developer, one of the problems that I come across almost invariably within these [server-side scripting] languages is the fact that they are shaped by people who view XML as something of an afterthought, a small subset of the overall language that’s intended to satisfy those strange people who think in angle brackets.

He then shows an example (that warmed Matt Turner’s heart) of how often people have to create HMTL by composing strings in-line. More morsels:

The original intent of the developers of XQuery was to use it, not surprisingly, as an XML-oriented query language. XQuery is not itself XML based (nor for that matter is XPath), but all of its operations are designed to work with XML documents or XML databases to provide a way of filtering or manipulating that XML to produce some form of output, most typically as XML or HTML.

Intriguingly, as a filter on XML, XQuery has seen only limited success. Part of this has to do with the fact that a significant number of the databases currently in use are SQL based, not XML based, so the benefits to gained by using an XML query filter are offset by the need to convert relational data into XML in the first place.

While I’d agree with Kurt thus far on the market adoption of XQuery and the hassle introduced by having to map XML to an RDBMS (see this post on Top-to-Bottom XML Apps), we at Mark Logic like to think of ourselves as the exception to the slow XQuery adoption rule. While XQuery is not a huge wind at our back, we have been able to grow the company eight-fold since I joined in 3Q04 and that growth is most definitely helped by the de-risking that comes with XQuery by virtue of it being both an industry standard and an eventual, inexorable replacement for SQL.

(If green is the new black, then XQuery is the new SQL, and SQL the new COBOL.)

Kurt concludes his article with:

This article serves as a very basic introduction to XQuery as a server language. I will be addressing this topic in more detail in subsequent articles in this series, examining some of the more sophisticated capabilities and the gotchas inherent in working with XQuery and eXist, and showing what explosive power you can release when you combine eXist or other rest based XQuery engines with XForms and Ajax.

My prediction is that REST based XML databases like eXist will seriously challenge the existing raft of server languages, from ASP to Ruby, within the next couple of years. Right now, it’s something of a closed secret among a few developers, but the power, sophistication and ease of use inherent in working with the XML as if it were a natural part of the server landscape can only be understood by trying it.

I couldn’t agree more with the bolded statement and we all look forward to seeing the subsequent articles in the series.

Tacit, Illumio, and David Gilmour

I first met David Gilmour in 1992 when I was leaving Ingres and looking for a new job. At the time, Gilmour was head of marketing and designated CEO successor at Versant Object Technology. He had joined the company after a successful run at Lotus. With three degrees from Harvard, David was of one the most passionate, smartest, and best educated people I’d ever met.

David hired me in June 1992 to be director of product marketing. Matt Miller, later a VP at Remedy, CEO of Moai, and now a VC at Walden, was our director of marketing communications. Carol Garnett ran our marketing programs. (Last I heard she was doing fantastic non-profit marketing work at Cityteam Ministries.) It was a great team.

The fun didn’t last long, however. That Fall it became clear that object databases were not going to do unto relational databases as relational had done unto network (and hierarchical) databases. People began to see that a broad, horizontal object database play was simply not in the cards. (Poof; there went about $100M in venture capital invested across 6-8 companies, chasing the chance to be the next Oracle.)

I was promoted to marketing VP, and decided to stick around with the new CEO, Dave Banks (now a management consultant and interim CEO at Zend) to do a chasm-crossing strategy focused on telecom network management applications. Many others, including David Gilmour, left.

David founded a venture backed by Gideon Gartner (of Gartner Group fame) called ExperNet, which was a system designed to track experts and to connect people who needed expertise with those who had it. ExperNet was later folded into Giga Group, which in turn was acquired by Forrester Research.

After the ExperNet and Giga experience, David went on to found Tacit Knowledge Systems. While I was never that close to it, I’d always viewed Tacit’s product as an automated, email-based version of ExperNet; I thought of it as an email sniffer. It would watch the email traffic in an organization, and then decide who was an expert in which subjects based on the emails that they sent.

So this is a long way of saying that I think about two things when I think about David. The first is “really smart.” The second is: he has spent 15 years focused on the general problem of figuring out who knows something and how to leverage that knowledge. Here is a really smart guy that has devoted most of his career to the general subject of expertise: how to identify it, how to track it, and how to leverage it.

Today, we might call this a form of social networking. But it’s really the knowledge identification side of knowledge management (KM).

To me, there are two basic approaches to KM. The first approach is to capture information in a knowledge base and to let people query that knowledge base. (Mark Logic is an excellent repository for such knowledge bases, and we are in use at the US Army’s Battle Command Knowledge System.) The second approach is to figure how “who knows what” and then direct those with questions to those with answers. It’s the basic approach of ExperNet and Tacit.

I was happy to see this cover story in the business section of today’s San Jose Mercury News featuring David, Tacit, and their new offering, Illumio. It seems to be a new, free version of the same basic concept, but this time leveraging data from Google Desktop Search in addition to email and other sources.

Excerpts:

“In fact, Illumio borrows the desktop search technology that was first released by Google — and subsequently by competitors — to discover what a person cares about. It analyzes e-mail, Web searches and documents stored on a computer’s hard drive and uses a mathematical formula to match that information with requests submitted by other Illumio users.”

“Illumio allows members to manually create profiles that list their areas of expertise, but Gilmour said the analysis of a person’s hard drive has proven to be more useful, because it can capture areas of knowledge a person might overlook.”