My friends joke that someday I should write a business book based on 1960’s bumper stickers because I often re-use 1960’s maxims (e.g., “question authority”) as basic rules for business. Today’s posting is based on the idea of “the establishment” and under what circumstances it can be trusted to provide unbiased information.
I am often amazed when I talk to members of the data management establishment about XML content. Sometimes they just look perplexed. Other times they accuse me of crazy talk. Most all the time, the majority of them seem to be operating around several fixed assumptions:
- Relational databases are the destination point in database evolution. There will be no more disruptive change in the database industry. Everything will be accomplished through incremental evolution of the relational model. Darwin is dead.
- SQL is the destination point for query languages. Any new requirement will be handled through extensions to SQL.
- Any new DBMS will fail because it cannot compete with RDBMSs on industrial-strength, transaction performance, and proven, production applications support.
I ask myself three questions when faced with these people.
1. What happened to them?
These are, after all, pretty much the same folks who twenty-five years earlier were espousing the virtues of the relational model in a seemingly hopeless battle against deeply-entrenched, production-proven, high-performance IMS and IDMS systems. They remind me of the sixties mantra “don’t trust anyone over 30” on the theory that, once past 30, a person had so many vested interests in the status quo that he or she wouldn’t be open to change.
Have these folks forgotten that relational was a disruptive change? Did they forget that it took relational about a decade to become as OLTP-capable and as production-worthy as prior-generation DBMSs? Have they not realized that new technologies do not have to be better at everything than those they displace and that the typical high-tech pattern is to be an order of magnitude better at one thing and catch-up on the baseline over time?
- Could no one beat IMS in database? Oracle did. They had queries
- Could no one beat Yahoo in Internet search? Google did. They had pagerank.
- Moving beyond high-tech, could no one beat McDonald’s in hamburgers? In-and-Out Burger did. The sold hamburgers — and only hamburgers.
Technology matters. Product matters. Innovation happens. So when I hear these “relational end-point” arguments, I feel like the folks espousing them have either become closed minded or are preserving their self interests.
2. Do they have kids?
You need to look no further than kids to get a glimpse of how people will use computers in the future. The people at Outsell spend a lot of time studying the habits of Generation-Z, “digital natives” because they are the future customers of the information industry.
These digital natives use computers very differently than we (older types) do. For example, Outsell calls them “polychronic” because they are constantly doing several things at once: IMing one friend about homework, emailing another about gossip, and SMSing a third about meeting … all while writing a book report, using Google and Wikipedia to research it, and listening to some iTunes while they work. Most importantly, that’s not some special state. That’s normal.
Digital natives don’t see a data/content divide. We boomers and gen-Xers do, because that divide was very real to us when we learned computers. Data, well that’s for databases. But content … uh, just put it in a file. Once in a while we’d get confused and do things for doing’s sake. Look, I can put “Stairway To Heaven” in Oracle. But what can you then do with it? Uh, well nothing. It was like putting the toaster in the refrigerator. You could do it, but why?
To digital natives, there is no data/content divide. It’s all content: numbers, words, pictures, music, and video. For example, my son’s first computer project in middle school was to build a website with words, pictures, and graphics. To him, it’s all content. This should shiver the establishment to the timbers, but it’s the way digital natives see things. And logically, it’s correct. Data is a special case of content. Data is content that is highly regular in structure. Content is not a special case of data.
3. Have they ever really looked at content?
Finally, I wonder if any of these folks have ever taken off their data-colored lenses to really look at content. There is an enormous tendency amongst the database establishment to see data when looking at content – for example, to summarize content into data (e.g., count the number of service emails by product line by tone). Their behavior is logical at one level. If your tool is a database management system, then you want to feed it data.
But content is fundamentally different:
- Content cannot be regularized into fixed fields or fixed structure
- The authoring process is typically beyond your control for legacy reasons, content sourcing reasons, or both. So you need to take content on an as-is basis.
- Content is massive in size, and being produced at an exploding rate, all of which makes preprocessing impractical
- Markup can be structural in nature or simply pure enrichment. You might be able to map the former to a relational schema if it doesn’t change over time, but the latter must be taken as-is and when-is.
- Content has a multitude of semantic issues that are absent in data (e.g., stemming as in Dave vs. David, synonyms as in fracture vs. break, and taxonomy as in fruit vs. apple). These issues must be intelligently handled, typically through markup, in content applications
What the data establishment is saying today would be akin to people telling our generation: “you are going to program in COBOL and IMS because I programmed in COBOL and IMS and there is too much inertia to consider moving to anything else, regardless of changes in the type of information you are managing, the computing environment you are working in, and any innovations that may have happened in the past 20 years.”
I firmly believe that “content is the new data” and that our kids will view SQL and RDBMS the way that we view FORTRAN and VSAM. And our kids will view SQL/XML the way we view OO-COBOL. Yeah, I suppose you could do that, but why in the world would you. It’s back to the toasters and refrigerators thing.
Coming soon: Steal This Blog