Tuesday, January 22, 2008

Beyond Search Engines: The Database is Now

Steven Arnold writes a thoughtful post on his Beyond Search blog about the inadequacy of traditional databases and search engines to deal with organizing and delivering content when the Web and many private content collections measure in petabytes and exabytes of information. Steve hints at a "next generation" database management system that can start to leapfrog over these problems, but the greater question is perhaps unasked in his article. Namely, as the problems that people need to solve with content technologies become increasingly complex and increasingly fleeting, why is it that we really need permanent unified databases to solve those problems? There is an important need for data normalization, but if normalization can be achieved "on the fly," as leading content federation services can provide, do people need a database or instead data objects that solve specific problems in the moment?

When data normalization was associated with creating massive databases that would be used for repeated functions such as payroll management or publishing functions such as newspapers or directories permanently structured databases made a lot of sense. But as market advantages gained through content publishing fall increasingly to those who can mine unstructured content, aggregate content from disparate sources and enable people normally confined to consuming content to create it and organize it, the traditional database is being relegated to one of many silos from which advanced content services can develop on-demand content solutions. Search engines, which rely on databases that can be queried in a standard format to provide standard answers, are beginning to fall into this same role of specialized answer tools. If you look at the typical search results page today from major providers you're looking at federated content from multiple sources, logically related to a greater whole but residing in separate storage environments and coming together in the moment as the answer to a specific question or need.

In short, what we have called a database is no longer a storage and indexing device. Rather, the database is now, the content sets that we assemble in a given moment to solve the moment's problem. Its structure is consistent thanks to XML standards, data dictionaries and data mining normalization tools, it can be stored as needed for time series analysis or corporate compliance, it can be shared with others to develop collaboration services or new forms of content and analysis. But in the next moment our needs may shift, sources may change structure or become unavailable or be replaced by different sources.

Market advantages tend to flow from institutions who can take advantage of content most effectively, and in the markets we can see how this concept already impacts business in a large way. In financial markets profits are shifting from public securities exchanges, whose transactions are built around highly normalized databases and data formats, to private transactions on highly complex financial instruments, whose underlying complex calculations on financial risk and return may apply to only a single transaction at a time. There is structure in such transactions, yes, and lots of normalized data, but the uniqueness of the content's structure at the moment that a deal is executed is far more important than its standard components.

Search engine providers such as Google understand this paradox explicitly and work hard to provide value-add interfaces that enable people to use search engine content as one of many feeds that can power "mashup" consumer and enterprise content applications. The Google search engine may be one of the world's largest databases but if other content in a form that's more usable in a specific context can come along and complement it in the moment, it becomes rather moot beyond a certain point whether or not it's in Google's index or another index. This federated approach to content value becomes at least as important as the quality of the individual sources. In a "the database is now" world, quality is as quality does - and it may mean something else a moment from now.

The implications of this concept for content publishers is enormous. Long used to building their standardized databases, the long-promised New Aggregation is on the verge of becoming the value leader for both enterprise and media publishers. Through the on-demand federation of content sources into aggregated content solutions the uniqueness of insights for small audiences is becoming a much more important method for creating value in aggregation than the pervasiveness of standardized insights.

Make no mistake, we'll be using today's search engines and databases for a long time as building blocks for federated content services, but we'll be less fixated on owning databases and more focused on owning the contexts in which they provide solutions. This is likely to change the pricing structure of content aggregation services significantly and to force traditional publishers into becoming on-the-fly aggregation services pulling in content agnostically from many sources that may not be under their direct control for more than a few moments. Subscription databases will yield, sometimes gradually and sometimes very rapidly, to subscription contexts, services that can assemble content from anywhere consistently and reliably for workflow and lifestyle applications. Yesterday's email inbox is becoming today's content inbox via feeds and social media: tomorrow's federated inboxes will be even more rich and complex through databases that live in the moment.

Social media and enterprise content federation services have already pressed many of these changes forward, but expect 2008 to be the year in which more than one company will begin to recognize the value of databases in the moment. The database is now - and so is the opportunity for publishers and enterprises to move beyond isolated content solutions.
Post a Comment