<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Kellblog &#187; Database</title>
	<atom:link href="http://kellblog.com/category/database/feed/" rel="self" type="application/rss+xml" />
	<link>http://kellblog.com</link>
	<description>The official blog of Dave Kellogg</description>
	<lastBuildDate>Tue, 01 May 2012 01:07:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='kellblog.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/8ecfcffdb3cd0948a0c38207c0ca38d6?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>Kellblog &#187; Database</title>
		<link>http://kellblog.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://kellblog.com/osd.xml" title="Kellblog" />
	<atom:link rel='hub' href='http://kellblog.com/?pushpress=hub'/>
		<item>
		<title>My Slides from the MarkLogic Government Summit: &#8220;Relationertia&#8221;</title>
		<link>http://kellblog.com/2010/11/27/my-slides-from-the-marklogic-government-summit-relationertia/</link>
		<comments>http://kellblog.com/2010/11/27/my-slides-from-the-marklogic-government-summit-relationertia/#comments</comments>
		<pubDate>Sat, 27 Nov 2010 17:43:37 +0000</pubDate>
		<dc:creator>Dave Kellogg</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[Database management system]]></category>
		<category><![CDATA[Government]]></category>
		<category><![CDATA[RDBMS]]></category>
		<category><![CDATA[relational database]]></category>

		<guid isPermaLink="false">http://kellblog.com/?p=7365</guid>
		<description><![CDATA[Below please find an embedded copy of the slides I presented a few weeks back at the MarkLogic Government Summit at the Ritz-Carlton in Tyson&#8217;s Corner. I had three fun quotes/concepts from this session. First, I created a new word &#8230; <a href="http://kellblog.com/2010/11/27/my-slides-from-the-marklogic-government-summit-relationertia/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kellblog.com&#038;blog=11070789&#038;post=7365&#038;subd=davidkellogg&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Below please find an embedded copy of the slides I presented a few weeks back at the <a href="http://www.marklogic.com/government-summit/">MarkLogic Government Summit</a> at the Ritz-Carlton in Tyson&#8217;s Corner.</p>
<p>I had three fun quotes/concepts from this session.</p>
<p>First, I created a new word to describe all the reasons organizations use <a href="http://en.wikipedia.org/wiki/RDBMS">relational databases</a> to try and solve problems for which they were never designed and at which they are suboptimal:  <strong>relationertia</strong>.  You know those reasons:</p>
<ul>
<li>It&#8217;s safe</li>
<li>We have it already</li>
<li>It&#8217;s what we know</li>
<li>It&#8217;s free at the project level (if expensive at the agency one)</li>
</ul>
<p>The fact is relational databases are about 40 years old and were never designed to solve some of the problems that government agencies are throwing at them.  To drive home the age point, I made a list of &#8220;other things&#8221; that happened in 1970, the year that <a href="http://portal.acm.org/citation.cfm?id=362685">Codd&#8217;s seminal paper</a> was published.</p>
<ul>
<li>Janis Joplin died</li>
<li>The Beatles broke up, after releasing Let It Be</li>
<li>The first 747 entered service</li>
<li>The first episode of <a href="http://en.wikipedia.org/wiki/All_My_Children">All My Children</a> aired</li>
</ul>
<p>It was a long time ago.  (And that was the second fun thing.)</p>
<p>The third fun thing was to dust off one of my favorite old saws:  if your only tool&#8217;s a hammer, then every problem looks like a nail.  Or, as I more colorfully saw on Twitter today:  if your only tool&#8217;s a chainsaw, then every problem looks like a Zombie.</p>
<p>Applying this idea to relational databases, we come up with:</p>
<blockquote><p>If your only data modeling element&#8217;s a table, then every problem looks like a column.</p>
</blockquote>
<p>The slides are embedded below.</p>
<iframe src='http://www.slideshare.net/slideshow/embed_code/5934155' width='500' height='410'></iframe>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/davidkellogg.wordpress.com/7365/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/davidkellogg.wordpress.com/7365/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/davidkellogg.wordpress.com/7365/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/davidkellogg.wordpress.com/7365/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/davidkellogg.wordpress.com/7365/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/davidkellogg.wordpress.com/7365/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/davidkellogg.wordpress.com/7365/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/davidkellogg.wordpress.com/7365/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/davidkellogg.wordpress.com/7365/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/davidkellogg.wordpress.com/7365/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/davidkellogg.wordpress.com/7365/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/davidkellogg.wordpress.com/7365/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/davidkellogg.wordpress.com/7365/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/davidkellogg.wordpress.com/7365/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kellblog.com&#038;blog=11070789&#038;post=7365&#038;subd=davidkellogg&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://kellblog.com/2010/11/27/my-slides-from-the-marklogic-government-summit-relationertia/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">Dave Kellogg</media:title>
		</media:content>
	</item>
		<item>
		<title>The Information Continuum and the Three Types of Subtly Semi-Structured Information</title>
		<link>http://kellblog.com/2010/05/11/the-information-continuum-and-the-three-types-of-subtly-semi-structured-information/</link>
		<comments>http://kellblog.com/2010/05/11/the-information-continuum-and-the-three-types-of-subtly-semi-structured-information/#comments</comments>
		<pubDate>Tue, 11 May 2010 18:24:45 +0000</pubDate>
		<dc:creator>Dave Kellogg</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[Database management system]]></category>
		<category><![CDATA[semi-structured data]]></category>
		<category><![CDATA[unstructured data]]></category>
		<category><![CDATA[information continuum]]></category>
		<category><![CDATA[SSI]]></category>
		<category><![CDATA[subtly semi-structured information]]></category>

		<guid isPermaLink="false">http://www.kellblog.com/?p=4804</guid>
		<description><![CDATA[We generally refer to MarkLogic Server as an XML server, which is a special-purpose database management system (DBMS) for unstructured information.  This often sparks debate about the term &#8220;unstructured&#8221; and the information continuum in general.  Surprisingly, while both analysts and &#8230; <a href="http://kellblog.com/2010/05/11/the-information-continuum-and-the-three-types-of-subtly-semi-structured-information/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kellblog.com&#038;blog=11070789&#038;post=4804&#038;subd=davidkellogg&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>We generally refer to <a href="http://www.marklogic.com/product/marklogic-server.html">MarkLogic Server</a> as an XML server, which is a special-purpose database management system (DBMS) for unstructured information.  This often sparks debate about the term &#8220;unstructured&#8221; and the information continuum in general.  Surprisingly, while both <a href="http://www.gartner.com/DisplayDocument?id=1197013">analysts</a> and vendors frequently discuss the concept, the <a href="http://en.wikipedia.org/wiki/Information_continuum">Wikipedia entry for information continuum</a> is weak, and I couldn&#8217;t easily find a nice picture of it, so I decided to make my own.</p>
<p style="text-align:center;"><a href="http://davidkellogg.files.wordpress.com/2010/05/ic2.png"><img class="size-full wp-image-4805 aligncenter" title="Information Continuum" src="http://davidkellogg.files.wordpress.com/2010/05/ic2.png?w=500" alt=""   /></a></p>
<p>The general idea that information spans a continuum with regard to structure is pretty much undisputed.  The placement of any given type of information on that continuum is more problematic.  While it seems clear the purchase orders are highly structured and that free text is not, the placement of, for example, email is more interesting.  Some might argue that email is unstructured.  In fact, only the body of an email is unstructured and there is plenty of metadata (e.g., from, send-to, date, subject) wrapping an email.  In addition, an email&#8217;s body actually does have latent structure &#8212; while it may not be explicit, you typically have a salutation followed by numerous paragraphs of text, a sign-off, a signature, and perhaps a legal footer.  Email is unquestionably semi-structured.</p>
<p>In fact, I believe that the vast majority of information is semi-structured.  PowerPoint decks have slides, slides have titles and bullets.  Contracts are typically word documents, but have more-or-less standard sections.  Proposals are usually Word or PowerPoint documents that tend to have similar structures.  Even the humble tweet is semi-structured:  while the contents are ostensibly 140 unstructured characters, the <a href="http://www.scribd.com/doc/30146338/map-of-a-tweet">anatomy of a tweet</a> reveals lots of metadata (e.g., location) and even the contents contain some structural information (e.g,. RT indicating re-tweet or #hashtags serving as topical metadata).</p>
<p>New let&#8217;s consider XML content.  Some would argue that XML is definitionally structured.  But I&#8217;d say that an arbitrary set of documents all stored within &lt;document&gt; and &lt;/document&gt; tags is only <em>faux </em>structured; it appears structured because it&#8217;s XML, but the XML is just used as a container.  A corpus of twenty 2,000-page medical textbooks in 6 different schemas is indeed structured, but not well so.  To paraphrase an old saw about standards:  the nice thing about  structures is that there are so many to choose from.  I believe that knowing content is marked up in XML reveals nothing about its structure, i.e., that XML-ness and structure are orthogonal.  Put differently, XML is simply a means of representing information.  The information represented may be highly structured (e.g., 100 purchase orders all in perfect adherence to a given schema) or highly unstructured (e.g., 20 documents only vaguely complying with 20 different schemas).</p>
<p>I have two primary beliefs about the information continuum:</p>
<ul>
<li><strong>The vast majority of information is semi-structured</strong>. There is relatively little highly structured and relatively little completely unstructured information out there.  Most information lies somewhere in the fat middle.  I overlaid a bell curve on top of the information continuum to reflect volume.</li>
</ul>
<ul>
<li><strong>Even information that initially appears structured is often semi-structured</strong>.  I see three types of this subtly semi-structured information which, hopefully without being too cute, I&#8217;ll abbreviate as SSSI.  The three types are (1) schema as aspiration, (2)  time-varying schema, and (3) unknowable schema.</li>
</ul>
<p>Let&#8217;s look at each of the three types more closely.</p>
<p><strong>Schema as Aspiration</strong></p>
<p>The first type of subtly semi-structured information (SSSI) is where a schema exists, but only notionally.  The schema itself is either poorly defined (actual quote:  &#8220;it is believed that this element is used for&#8221;) or well defined but not followed.  This is frequently the case with publishing and media companies.  Here are two free jokes that work well at any publishing conference:</p>
<ul>
<li>Raise your hand if you have a standard schema.  Keep it up if your content actually adheres to it.</li>
<li>Oxymorons aside, how many of you have 3 or more &#8220;standard&#8221; schemas, 5 or more, &#8230; do  I hear 10?</li>
</ul>
<p>These jokes are funny because of the state of the content.  This state is the result of two primary business trends:  (1) consolidation &#8212; most large publishers have been built through M&amp;A thus inheriting numerous different standards, each of which may be only partly implemented &#8212; and (2) licensing &#8212; publishers frequently license content from numerous other sources, each with its own standard format.</p>
<p><strong>Time-Varying Schema</strong></p>
<p>The second case of SSSI is you where you have a well defined, enforced schema at any moment in time, but it keeps changing over time.  Typically this happens for one of two reasons:</p>
<ul>
<li>The business reality that you&#8217;re modeling is changing.  For example, in 2009 Federal Sales was part of Eastern Sales but in 2010 it becomes its own division.  This makes comparison of Eastern results between 2009 and 2010 potentially difficult.  In BI circles, this is known as the slow-changing dimension problem.</li>
</ul>
<ul>
<li>Standards keep changing.  If you&#8217;re modeling information in a corporate- or industry-standard schema and that schema is changing, then your information becomes semi-structured because it is contained within multiple different schemas.  Sometimes you can avoid this by migrating all prior information to the current schema, but sometimes (e.g., massive data volumes, regulatory desire to not change existing records) you will not.</li>
</ul>
<p>When viewed with a flash camera this information looks well structured.  When you look at the movie, you can clearly see that it&#8217;s not.</p>
<p><strong>Unknowable Schema</strong></p>
<p>The last case of SSSI is where you have an unknowable schema.  Consider terrorist tracking.  If you were to make a schema for a terrorist database, here are some of the attributes that spring to mind:  name, alias(es), address, former address(es), height, weight, hair color, eye color, member-of, enemy-of, friend-of, tattoos/markings.</p>
<p>Here are some problems with this:</p>
<ul>
<li>Many of the attributes are multi-valued, such as alias or friend-of.  In a de-normalized approach, this means dealing with <a href="http://encyclopedia2.thefreedictionary.com/repeating+group">repeating group</a> problems and creating N columns (e.g., alias, alias1, alias2, and up to the maximum number of aliases for any terrorist).  Normalization would take care of the repeating group but at the cost of creating a table for each multi-valued attribute and then having to join back to those tables when you run queries.  (One such real system ended up with 500 tables, with the result that no one could find anything.)</li>
</ul>
<ul>
<li>It is difficult to create a type for the tattoo attribute.  First, it&#8217;s multi-valued.  Second, while tattoos are sometimes images, they often contain text (e.g., Mom) and sometimes in a foreign language (e.g., 愛, the Chinese symbol for love).  Since you&#8217;re trying to secure the nation against threat you don&#8217;t want to throw away any potentially valuable information, but it&#8217;s not obvious how to store this.</li>
</ul>
<ul>
<li>New attributes are coming all the time.  Say you get a shoe print on a suspect as he runs away.  You need to add a shoe-size attribute to the database.  Say a terrorist runs away and leaves a pair of eyeglasses.  Now we need to add eyeglass prescription.  My favorite is what&#8217;s called pocket litter.  You find a piece of paper in a person&#8217;s pocket and it has a number on it.  It could be a phone number, a  lock combination, or maybe map coordinates.  You don&#8217;t know what it is &#8212; but again, since you don&#8217;t want to throw any potentially valuable information &#8212; you have to find a place to store it.</li>
</ul>
<ul>
<li>Combining an enormous number of potential attributes with the reality that very few are known for most individuals creates two problems:  (1) you end up with a sparse table which is not well handled in most RDBMSs and (2) you end up hitting <a href="http://ora-01792.ora-code.com/">column limits</a>.</li>
</ul>
<p>Another example of unknowable schemas would be in financial services, modeling derivatives.   Because derivatives are sometimes long-lived instruments (e.g., 30 years) you may face the time-varying schema problem.  In addition, you have the unknowable schema problem because the industry is constantly creating new products.  First we had <a href="http://en.wikipedia.org/wiki/Collateralized_debt_obligation">CDOs</a> and <a href="http://en.wikipedia.org/wiki/Collateralized_debt_obligation">CDSs</a> on banks, then <a href="http://en.wikipedia.org/wiki/Single_tranche">single-tranche CDOs</a>, then CDSs on single-tranche CDOs, and then <a href="http://en.wikipedia.org/wiki/Synthetic_CDO">synthetic CDOs</a>.  If this makes your head hurt in terms of understanding, then think for a minute about data modeling.  How are you going to store these complex products in a database?   And what are you going to do with the never-ending stream of new ones &#8212; last I heard they were considering selling <a href="http://articles.latimes.com/2010/apr/25/opinion/la-ed-derivatives-20100425-25">derivatives on movies</a>.</p>
<p>(As it turns out XML is a great way to model both these problems as you can  easily add new attributes on the fly and only provide values for  attributes where you know them.)</p>
<p>To finish the post, I&#8217;ll revisit the statement I started with:  we generally refer to <a href="http://www.marklogic.com/product/marklogic-server.html">MarkLogic  Server</a> as an XML server, a special-purpose database management  system (DBMS) for <strong>unstructured </strong>information.  Going forward, I think I&#8217;ll keep saying that because it&#8217;s simpler, but at the MarkLogic 201 level, the more precise statement is:  a special-purpose DBMS for<strong> semi-structured</strong> information.</p>
<p>There&#8217;s way more semi-structured information out there.  Realizing that information is semi-structured is sometimes subtle.  And semi-structured information is, in fact, the optimization point for our product.  So what&#8217;s MarkLogic in three concepts?  Speed, scale, and semi-structured information.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/davidkellogg.wordpress.com/4804/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/davidkellogg.wordpress.com/4804/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/davidkellogg.wordpress.com/4804/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/davidkellogg.wordpress.com/4804/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/davidkellogg.wordpress.com/4804/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/davidkellogg.wordpress.com/4804/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/davidkellogg.wordpress.com/4804/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/davidkellogg.wordpress.com/4804/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/davidkellogg.wordpress.com/4804/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/davidkellogg.wordpress.com/4804/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/davidkellogg.wordpress.com/4804/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/davidkellogg.wordpress.com/4804/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/davidkellogg.wordpress.com/4804/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/davidkellogg.wordpress.com/4804/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kellblog.com&#038;blog=11070789&#038;post=4804&#038;subd=davidkellogg&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://kellblog.com/2010/05/11/the-information-continuum-and-the-three-types-of-subtly-semi-structured-information/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">Dave Kellogg</media:title>
		</media:content>

		<media:content url="http://davidkellogg.files.wordpress.com/2010/05/ic2.png" medium="image">
			<media:title type="html">Information Continuum</media:title>
		</media:content>
	</item>
		<item>
		<title>Dear CIO: Stop Writing Big Checks for Commodity (Database) Software</title>
		<link>http://kellblog.com/2009/10/14/dear-cio-stop-writing-big-checks-for-commodity-database-software/</link>
		<comments>http://kellblog.com/2009/10/14/dear-cio-stop-writing-big-checks-for-commodity-database-software/#comments</comments>
		<pubDate>Wed, 14 Oct 2009 17:11:00 +0000</pubDate>
		<dc:creator>Dave Kellogg</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[Database management system]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://test.kellblog.com/2009/10/14/dear-cio-stop-writing-big-checks-for-commodity-database-software/</guid>
		<description><![CDATA[Dear CIO, What’s wrong this picture? At 50%+, Oracle’s operating margins have never been higher The differentiation of Oracle’s database technology, however, has never been lower and the number of both core and specialized alternatives has never been greater. So &#8230; <a href="http://kellblog.com/2009/10/14/dear-cio-stop-writing-big-checks-for-commodity-database-software/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kellblog.com&#038;blog=11070789&#038;post=4493&#038;subd=davidkellogg&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Dear CIO,</p>
<p>What’s wrong this picture?</p>
<ul>
<li>At 50%+, Oracle’s operating margins have never been higher</li>
</ul>
<ul>
<li>The differentiation of Oracle’s database technology, however, has never been lower and the number of both core and specialized alternatives has never been greater.</li>
</ul>
<p>So what’s going on?  You, kind Sir or Madam, are being milked.  What’s worse is that you, in an example of collective behavioral dysfunction, have inadvertently played a role in setting up the milking.  What happened?</p>
<ul>
<li>Like all smart CIOs you followed a bit of herd mentality when it came to core technology.  Pity the poor fools who, back in the day, bet big on Ingres or Sybase.  <strong>You played it safe</strong> and went with Oracle, IBM, or if your requirements weren’t too heavy, Microsoft. </li>
</ul>
<ul>
<li>The problem is, of course, that <strong>everyone executed the same strategy</strong> you did.  Hence, the market created a system of <a href="http://www.softwaretimes.com/files/increasing%20returns.html" target="_blank">increasing returns</a> where the strong vendors got stronger and the weak ones died.  The result:  the RDBMS market is an (order of magnitude) $10B/year market, structured as an <a href="http://en.wikipedia.org/wiki/Oligopoly" target="_blank">oligopoly</a> with 3 players.  Most other software markets worked out the same way.</li>
</ul>
<ul>
<li><strong>You were focused on standardization</strong>.  You realized that through a combination of decentralized IT decision making and growth-by-acquisition your organization had become a kitchen sink of enterprise software.  You had everything.  In order to reduce the administrative, training, and license acquisition costs, you fought tooth and nail with your divisions to standardize the environment.  You said, “Heck, it’s all the same stuff in the end, folks, so let’s make Oracle our DBMS standard, Business Objects our BI standard, Documentum our ECM standard, and SAP our ERP standard.”</li>
</ul>
<ul>
<li><strong>And you won</strong>.  Mostly.  There’s still some Cognos in finance.  And marketing didn’t totally give up on Interwoven.  But, for the most part, you won.  You reduced the entropy of your IT environment and drove cost savings for your organization.</li>
</ul>
<p>The problem is <strong>you’ve won the battle but lost the war</strong>.  Why?   Because if, as you say, the “stuff really is all the same” you shouldn’t standardize on the most expensive product.  You should standardize on the cheapest.  </p>
<ul>
<li>Do you really need to be paying those big fees to Oracle for enterprise licenses?  Wouldn’t MySQL do?</li>
</ul>
<ul>
<li>Are you really using all the functionality of that $1M/year Documentum ECM system?  Wouldn’t SharePoint or Alfresco do?</li>
</ul>
<ul>
<li>For BI, do you need all the bells and whistles of BusinessObjects?  Wouldn’t Pentaho or Qlikview do a fine job, at a fraction of the cost?</li>
</ul>
<p>But these alternatives are obvious.  Heck, even &#8220;the establishment&#8221; (i.e, Gartner) says <a href="http://www.gartner.com/DisplayDocument?id=1183714" target="_blank">it’s safe to tread in the open source water</a>.  So the question is, what’s holding you back?</p>
<ul>
<li><strong>Switching costs</strong>.  It’s hard to move off Oracle or Documentum and you don’t want to pay the nut to do so.  </li>
</ul>
<ul>
<li><strong>Organizational inertia</strong>.  Your whippersnapper DBAs who were in their 30s in the 1980s are now in their 50s.  They’re thinking that change devalues their knowledge and experience; some just want to cruise into retirement. But that’s their personal agenda, not your enterprise one. </li>
</ul>
<ul>
<li><strong>Accounting:  y</strong><strong>ou made it free for your divisions</strong> to keep using Documentum, Oracle, or BusinessObjects because you bought an enterprise license.  While this appeared to “save” you money on a per-license basis, and it helped support your standardization initiative, it squashed innovation in your divisions, reinforced the organization inertia, and has a lot of people using the wrong tool for the job, resulting in projects that either take more or more expensive hardware than necessary (Oracle is good at this), that take too long to develop, or that simply fail.  </li>
</ul>
<p>So, what do I recommend doing about all this?  I suggest that you adopt these policies, which –- for full disclosure, are at least partially in the self-interest of this blog’s author:</p>
<ul>
<li><strong>Stop writing big checks for commodity software</strong>.  Every time a big check comes along, ask yourself:  is this software differentiated or commoditized?  Be willing to pay a premium for differentiated software, and price shop commodity software.  Call a group of your smartest staff together periodically to help you make the commodity versus differentiated call.</li>
</ul>
<p><strong></strong>
<ul>
<li><strong>When you see a big check coming for commodity software, make a migration plan</strong>.  My hunch is that most of the time, you can create a nice 3-year ROI in the transition from premium to cheaper software.  (This reminds me of the time I visited an investment bank’s CIO asking about their Documentum strategy.  The answer: “our Documentum strategy is to get off Documentum,” because we&#8217;re paying too much and using too little.)</li>
</ul>
<p><strong></strong>
<ul>
<li><strong>Stop doing enterprise agreements that create poor economic incentives within your organization</strong>.  Don’t pay $XM at the enterprise level, spread that as a “tax” across your divisions, and then make use of certain software “free.”  It distorts project reality, creates false incentives, squashes innovation, and generates lots of hidden costs.  If you want to negotiate a master agreement and discount rate, that’s fine.  Shoot for centralized discounts without central planning. </li>
</ul>
<ul>
<li><strong>Don’t worry that the prior policies will create mayhem</strong>. While I understand that you don’t want arbitrary taste differences increasing the entropy of your enterprise software portfolio, recognize that with the first policy you’ve solved that problem already.  If you deem a category (e.g., core RDBMS, enterprise search) commoditized, then you are going to force people to pick on cost.  You’ll get standardization on the commodity categories –- just on the least expensive alternatives.  The only entropy you’ll need to manage will be on the differentiated software which, having dispatched the commodity majority, you’ll have time to explore, study, and exploit.</li>
</ul>
<p>Why I am taking the time to write this note to you?  Back in the 1980s I was a foot soldier in the relational database revolution, and today I’m the CEO of one specialized DBMS company and on the board of another.    </p>
<ul>
<li><a href="http://www.marklogic.com/" target="_blank">Mark Logic</a> makes an XML server which can save great amounts of time and money in creating applications against unstructured information, replacing the combination of an RDBMS, an enterprise search engine, and an application server.  Not only can Mark Logic manage 100s of TB of XML, the system eliminates  the object / relational/ hierarchical impedance mismatch between Java, SQL, and XML that hampers developer productivity.  Mark Logic was recently named <a href="http://www.marketwire.com/press-release/Mark-Logic-Corporation-915832.html">the fourth fastest-growing IT company in Silicon Valley</a>.</li>
</ul>
<ul>
<li><a href="http://www.asterdata.com/" target="_blank">Aster Data</a> makes a specialized data warehouse DBMS  that runs on low-cost commodity hardware with a shared nothing architecture and leverages in-database <a href="http://en.wikipedia.org/wiki/MapReduce" target="_blank">MapReduce</a> technology for parallelism and high scalability.  </li>
</ul>
<p>And during the past 25 years or so I&#8217;ve watched the market evolve.  While I fully understand the policies and market forces that have led<br />
us to where we are, I feel like we&#8217;ve come full circle.  Vendor power is now concentrated in the big three.  Vendor margins top 50%.  Big vendors don&#8217;t innovate; they consolidate.  Inertia has set in customer organizations.  And there&#8217;s a major platform shift in progress; last time it was mainframe to minicomputer, this time it&#8217;s cloud.</p>
<p>Things feel a lot to me the way they did in 1985, just past dawn of the relational revolution.  So in one way I&#8217;m writing to  point out the oft-overlooked obvious:  stop paying premium prices for commodity items.  And in another way I&#8217;m saying, take the money you save in so doing and invest it in innovation technologies that:</p>
<ul>
<li>Drive competitive advantage (which will matter again as we come out of the Great Recession)</li>
</ul>
<ul>
<li>Enable the Internet-scale applications you&#8217;ll need to face the coming information deluge</li>
</ul>
<ul>
<li>Reform the application development stack in ways that make sense for the coming generation of information applications, not that made sense for the last generation of data-centric ones.</li>
</ul>
<p>Thank you for reading my note.  If you have any questions or comments, please give me a ping at dave-dot-kellogg-at-marklogic-com or comment on this post.</p>
<p>Sincerely,</p>
<p>Dave Kellogg</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/davidkellogg.wordpress.com/4493/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/davidkellogg.wordpress.com/4493/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/davidkellogg.wordpress.com/4493/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/davidkellogg.wordpress.com/4493/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/davidkellogg.wordpress.com/4493/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/davidkellogg.wordpress.com/4493/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/davidkellogg.wordpress.com/4493/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/davidkellogg.wordpress.com/4493/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/davidkellogg.wordpress.com/4493/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/davidkellogg.wordpress.com/4493/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/davidkellogg.wordpress.com/4493/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/davidkellogg.wordpress.com/4493/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/davidkellogg.wordpress.com/4493/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/davidkellogg.wordpress.com/4493/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kellblog.com&#038;blog=11070789&#038;post=4493&#038;subd=davidkellogg&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://kellblog.com/2009/10/14/dear-cio-stop-writing-big-checks-for-commodity-database-software/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">Dave Kellogg</media:title>
		</media:content>
	</item>
	</channel>
</rss>
