The New York Times reports along with a host of others on the new Google News Archive feature launched to cries of "What took you so long?" from aggregators and publishers eager to expose their "dark" content via Google's search facilities. The feature mixes a range of free and premium content covering archives from suppliers such as The Wall Street Journal, The New York Times, The Washington Post, Time, Guardian Unlimited, Factiva, Lexis-Nexis, HighBeam Research and Thomson Gale - some going back centuries. While the emphasis is on archive search it's also the debut of features from the Google Premium capability long in development. In an email from Google to Google Premium program participants it was described as "an evolution of that initiative," an indication that Google is still trying to determine how best to position premium content in open Web search.
The problems with placing premium content in a valuable perspective via Web search are illustrated well by the News Archive Search feature. With only metadata from publishers and the content itself to provide search cues, the relevance sorting in News Archive Search does not necessarily show off the "best stuff on earth" automatically. Taking a relatively recent event, the search result in News Archive Search on "Bill Clinton" "Monica Lewinsky" returns a relatively hodge-podge list of results, mostly from 1998. By contrast, using the same terms on Google's Web search returns a list topped by the Wikipedia entries on Monica Lewinsky and a summary of the events surrounding her relationship with Clinton, followed by a CNN timeline summary of key moments in that story thread and several key weblog entries highlighted in that period. If I am wanting to start research into this historical event, which of these two are likely to provide a good starting point?
It's great that these archives are available, and it's clearly a stronger showing than Yahoo's aborted attempt at premium search but it's not clear that they're being put in the best light via this isolated search feature. But then again, perhaps that's the point. As with so many new Google features its value needs a lot of tuning before it can really take off in a big way. With a relatively isolated set of content Google can see what content people are looking for and begin to tune the service for enough relevance that some level of integration with other search results would be fruitful.
With all this said, the basic design of Google News Archive Search is pretty good and there's lots of value integrated into the details. Results come back for the most relevant time periods, with faceted search indexing on the sidebar offering quick access to other potentially interesting time frames and highlighting the most relevant time frame overall. In the instance of the Clinton/Lewinsky search the News Archive Search navigation highlights content from 1998 as the most relevant time frame. Slick. It's a much broader set of content than found in Yahoo's premium beta last year, with the Google Premium metadata providing important and powerful queues as to how to access the content right in the search results. Those queues include pricing for purchase via the publisher's own ecommerce facilities and other access options such as Thomson Gale's Access My Library feature that allows free access to subscription content for local library patrons (see our earlier coverage). There are also links to related Web pages and features from Google News that provide related news coverage for specific search results.
Who wins with Google News Archive Search? Certainly major publishers and aggregators stand to benefit from a new facility that is likely to become a "go-to" spot for starting search for premium content available on a free or a la carte basis. But it's probably at least as much a boon to an extensive network of small to medium publishers and aggregators who need an effective way to get their premium content exposed to broader audiences and compared to the big services on a more level playing field. HighBeam Research's mix of free and premium content gets good placement in many search results, providing a new way for people in a research mode to appreciate their content in the context of a very broad array of archives - and to appreciate their easy-on-the pockets pricing for individual researchers. There are also databases such as Ancestry.com and NewsBank that will gain exposure to new audiences looking for point solutions for specific research requests. People have been clamoring for access to the "dark" content for some time, but I doubt that few realized how this kind of broad exposure for this content would open so many new competitive opportunities and challenges.
Overall it's a pretty good debut for Google's archive search, isolated but with a broad enough exposure that further refinements and integrations should follow fairly rapidly. In its current incarnation its features may not be overwhelming (I disagree with Steve Rubel's assessment that Topix offers better archive search - Topix is fine for recent news but fails on anything of any significant age - see Topix results for our sample query) but they provide a solid foundation for researchers looking for a universal starting point for the world of archived content that's available from high-value publishing sources. The key tuning required is on relevance. Our standard General Motors search returns navigation highlighting the 1981-1982 time frame as the most relevant - perhaps appropriate given that this was a moment of great pain and transformation for GM - but the search results are a hodge-podge of old news reports dating from 1925 onwards. While the "advanced archive search" feature can help users whittle this down to more specific types of results, the lack of a clear criteria for relevance ranking may frustrate users used to Web search results. That's a common problem for many archive search services, so it's probably not to be faulted too much at this stage.
Over time as more people access these archives via Google more online publishers will provide links to this content, which will improve their ability to demonstrate relevance alongside open Web content. When that starts to happen we're more likely to see this content surfacing in open Web search results with true relevance. Premium publishers have great opportunities via Google News Archive Search - if they have already prepared aggressive marketing strategies for open Web marketing. For those that have been more timid, it's not clear that the highly competitive environment provided by Google for premium content is going to favor slow-to-adapt premium publishers. For those that have been brave enough to play Google's game to get in on this first wave of exposing online databases, congratulations, you have some very interesting problems to solve, now...