Wednesday, July 13, 2005

The Internet Archive Sued over Copyright Infringement: Threat to Mining?

The New York Times reports on a lawsuit raised by Healthcare Advocates against the Internet Archive, which is accused of violating the copyright on their public Web site's content when they scooped it into their petabyte-scaled archive of Internet content accrued since 1996. The suit also claims that The Internet Archive violated the Digital Millennium Copyright Act and the Computer Fraud and Abuse Act in the process. The irony in this is that the "Wayback Machine" retrieval feature for the archive was being used in this instance to retrieve information for use in another trademark legal action against Health Advocate, a similarly-named company. The current suit claims that when lawyers defending Health Advocate started to scoop up old Web pages from Healthcare Advocates the Wayback Machine was delivering pages that Healthcare Advocates had tagged to be excluded from visits by search engines. This is a pretty shaky case from a number of angles explored in the article, but the worrisome thing is that the mining and archiving of old content from Web pages may come under more general fire should this suit succeed. In that case content services such as Zoominfo that store and deliver content from defunct Web pages may have cause for concern, as may services that mine content for financial analysts and other publication channels.

I don't think that it's likely that the courts will return a broad ruling on copyright or abuse that turns on the voluntary use of "robots.txt" tagging files that are supposed to steer away search engines from crawling all or some portion of a Web site. These files are the electronic equivalent of "beware of the dog" signs that warn but carry little if any weight in protecting against legal actions. Content made available in a public environment that's not truly secured against intrusion is available for public inspection, plain and simple. Issues of copyright may come into play since the content was copied knowing that the primary purpose of copying was to make it available for reuse by others, albeit for no known commercial purpose. Fair use will probably be cited, and that's probably the point most in question.

Every major browser available will by default copy content into a local cache to facilitate better browser performance. Tools like Google Desktop make that content more readily available to people, but do not make the copy themselves: you're making the copy by looking at pages like this. So is making a copy a violation of copyright? Or is it rather ENABLING THE USE of copyrighted content in an illegal way that violates copyright? Fair use doctrine seems to favor the latter, as does the recent U.S. Supreme Court ruling on malicious file sharing. The Internet Archive will probably come out a winner in this case, but we're not likely to gain much further clarification on copyrights through this action. The onus is on electronic publishers of all stripes to package their content in a way that can make its proscribed legal uses clear within the body of the content itself. The old copyright footer seen on many Web pages has been doing the heavy lifting on this front for some time, but more sophisticated packaging standards and better tools for enabling conscious compliance with copyright law are required for an age in which content use and reuse takes on so many different forms.
