Don't Trust Anyone Over 30

My friends joke that someday I should write a business book based on 1960’s bumper stickers because I often re-use 1960’s maxims (e.g., “question authority”) as basic rules for business. Today’s posting is based on the idea of “the establishment” and under what circumstances it can be trusted to provide unbiased information.

I am often amazed when I talk to members of the data management establishment about XML content. Sometimes they just look perplexed. Other times they accuse me of crazy talk. Most all the time, the majority of them seem to be operating around several fixed assumptions:

  • Relational databases are the destination point in database evolution. There will be no more disruptive change in the database industry. Everything will be accomplished through incremental evolution of the relational model. Darwin is dead.
  • SQL is the destination point for query languages. Any new requirement will be handled through extensions to SQL.
  • Any new DBMS will fail because it cannot compete with RDBMSs on industrial-strength, transaction performance, and proven, production applications support.

I ask myself three questions when faced with these people.

1. What happened to them?

These are, after all, pretty much the same folks who twenty-five years earlier were espousing the virtues of the relational model in a seemingly hopeless battle against deeply-entrenched, production-proven, high-performance IMS and IDMS systems. They remind me of the sixties mantra “don’t trust anyone over 30” on the theory that, once past 30, a person had so many vested interests in the status quo that he or she wouldn’t be open to change.

Have these folks forgotten that relational was a disruptive change? Did they forget that it took relational about a decade to become as OLTP-capable and as production-worthy as prior-generation DBMSs? Have they not realized that new technologies do not have to be better at everything than those they displace and that the typical high-tech pattern is to be an order of magnitude better at one thing and catch-up on the baseline over time?

  • Could no one beat IMS in database? Oracle did. They had queries
  • Could no one beat Yahoo in Internet search? Google did. They had pagerank.
  • Moving beyond high-tech, could no one beat McDonald’s in hamburgers? In-and-Out Burger did. The sold hamburgers — and only hamburgers.

Technology matters. Product matters. Innovation happens. So when I hear these “relational end-point” arguments, I feel like the folks espousing them have either become closed minded or are preserving their self interests.

2. Do they have kids?

You need to look no further than kids to get a glimpse of how people will use computers in the future. The people at Outsell spend a lot of time studying the habits of Generation-Z, “digital natives” because they are the future customers of the information industry.

These digital natives use computers very differently than we (older types) do. For example, Outsell calls them “polychronic” because they are constantly doing several things at once: IMing one friend about homework, emailing another about gossip, and SMSing a third about meeting … all while writing a book report, using Google and Wikipedia to research it, and listening to some iTunes while they work. Most importantly, that’s not some special state. That’s normal.

Digital natives don’t see a data/content divide. We boomers and gen-Xers do, because that divide was very real to us when we learned computers. Data, well that’s for databases. But content … uh, just put it in a file. Once in a while we’d get confused and do things for doing’s sake. Look, I can put “Stairway To Heaven” in Oracle. But what can you then do with it? Uh, well nothing. It was like putting the toaster in the refrigerator. You could do it, but why?

To digital natives, there is no data/content divide. It’s all content: numbers, words, pictures, music, and video. For example, my son’s first computer project in middle school was to build a website with words, pictures, and graphics. To him, it’s all content. This should shiver the establishment to the timbers, but it’s the way digital natives see things. And logically, it’s correct. Data is a special case of content. Data is content that is highly regular in structure. Content is not a special case of data.

3. Have they ever really looked at content?

Finally, I wonder if any of these folks have ever taken off their data-colored lenses to really look at content. There is an enormous tendency amongst the database establishment to see data when looking at content – for example, to summarize content into data (e.g., count the number of service emails by product line by tone). Their behavior is logical at one level. If your tool is a database management system, then you want to feed it data.

But content is fundamentally different:

  • Content cannot be regularized into fixed fields or fixed structure
  • The authoring process is typically beyond your control for legacy reasons, content sourcing reasons, or both. So you need to take content on an as-is basis.
  • Content is massive in size, and being produced at an exploding rate, all of which makes preprocessing impractical
  • Markup can be structural in nature or simply pure enrichment. You might be able to map the former to a relational schema if it doesn’t change over time, but the latter must be taken as-is and when-is.
  • Content has a multitude of semantic issues that are absent in data (e.g., stemming as in Dave vs. David, synonyms as in fracture vs. break, and taxonomy as in fruit vs. apple). These issues must be intelligently handled, typically through markup, in content applications

What the data establishment is saying today would be akin to people telling our generation: “you are going to program in COBOL and IMS because I programmed in COBOL and IMS and there is too much inertia to consider moving to anything else, regardless of changes in the type of information you are managing, the computing environment you are working in, and any innovations that may have happened in the past 20 years.”

I firmly believe that “content is the new data” and that our kids will view SQL and RDBMS the way that we view FORTRAN and VSAM. And our kids will view SQL/XML the way we view OO-COBOL. Yeah, I suppose you could do that, but why in the world would you. It’s back to the toasters and refrigerators thing.

Coming soon: Steal This Blog

3 responses to “Don't Trust Anyone Over 30

  1. You wrote:
    >>>>>>>>>
    These digital natives use computers very differently than we (older types) do. For example, Outsell calls them “polychronic” because they are constantly doing several things at once: IMing one friend about homework, emailing another about gossip, and SMSing a third about meeting … all while writing a book report, using Google and Wikipedia to research it, and listening to some iTunes while they work. Most importantly, that’s not some special state. That’s normal.
    <<<<<<<<<

    You couldn't have said it better, Dave!

    Let me elaborate upon this by narrating a very recent experience:

    We were in the process of launching an English to English dictionary site last week, primarily targeting these digital natives; the "polychronics". With this user group as the target, the site is coded so that there's a search box near the top of the page and as soon as the user starts typing a word in the search box, 4 boxes come up on the page, each displaying a word beginning with the typed letters, and its definition. As the user keeps typing, the 4 boxes keep updating (more or less in real time).

    Once the basic site was up and running on a live server, I e-mailed an old friend to test it, in the devil's advocate mode. He called me as he was checking it to complain loudly; something to the effect that 4 boxes, with 4 (different) words and definitions, all updating simultaneously with each keystroke, was kind of distracting/de-stabilizing.

    Now, he is around 50, and so am I. He has been doing IT for close to 30 years and so have I. In all likelihood, the reason he failed to register the appeal of the UI is that he doesn't have a teenage child. My secret weapon is my 14-year old daughter (who simply loves the site).

    I had had to explain my rational to him with a great deal of effort. If I had read this blog post before our conversation, I'd have simply read out the paragraph (quoted above) to him.

    Very well said, Dave-

    In case you're interested, the site (still a work in progress) is at
    http://www.RapiDefs.com

    And if you do give it a twirl, I'd *love* to hear your feedback on any aspect of the site.

  2. Excellent posting Dave – I have enjoyed working my way through your blog.

    While I understand and agree with the point that you are making here, I think there is also another, opposite, danger against which we must be guarded. Namely, the sentiment “If it is new, and cool and Google is doing it, than we should do it to,” and it’s systems integrator corollary, “if it looks good in marketing collateral or a proposal, lets put it in.”

    Both relational and no-SQL databases are tools, each of which is suited to solving a specific set of problems. If you are working with highly structured data the volumn of which is not the same order of magnitude as pages on the WWW, than a relational database is the most reliable and cost effective tool to get the job done because:

    1. Lots of people know how to use it so you don’t have to recruit for specialized expertise.
    2. Lots of products support it and it is well understood, reliable and available at every possible price point.
    3. It scales well (to a point).
    4. It is really good at managing highly structure data.

    All of the data captured for the last US national census was stored in relational database that kept up with a form check-in peak of 15 million forms in an 18 hour work day. I know that 140 million households and the person records for everyone in those households doesn’t begin to approach big data. But our databases handled that load without ever pushing utilization of any resource to a point where we were ever concerned about capacity – and that was with only a two node Oracle RAC.

    But at my last conversation about where the Census should go with their Adaptive Survey Design Initiative, people were tossing around the need to use big data techniques. And while the discussion about what technology should be used must be had, my starting point is that, based on everything I know about this particular problem, there is no need to transition to a new, less mature technology, and the CB would be well advised to stick to tried and true relational databases.

    Don’t get me wrong. If there is a need to capture and interrogate unstructured data, than CB should look at other approaches. And I hope to get involved in big data and unstructured data projects. I think they will be full of fascinating problems to solve.

    My reply is a little bit of topic, because, for the Census, we don’t have to manage content. I just want to reinforce the idea that it is important to apply the right tool to address the specific characteristics of the problem at hand.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s