Investor Relations magazine reports on an interesting approach to openness in Web text mining taken by Quoza.com, an online service that focuses on extracting content from the Web sites of more than 7,000 public companies and news sources covering those companies. Much of this content appears on investor relations sites, many of which are hosted by Thomson Financial's CCBN corporate communications service. When Quoza's crawlers were getting so aggressive that they started to skew CCBN's Web site report statistics they were tipped off - and ticked off enough to bar Quoza crawlers. Quoza responded with an email blitz to CCBN clients suggesting that they check with their lawyers as to whether Thomson's actions were putting their companies in jeopardy of violating U.S. Sarbanes-Oxley Act Section 409 real-time issuer disclosure regulations. Needless to say, this caused quite a stir in corporate communications circles.
It's an interesting play to protect Web crawling, but it may be on shaky legal ground. The CCBN service is already exposing content to the public, while services such as Quoza are simply helping to accelerate the redistribution of this content. Quoza provides an aggressive crawling scheme, hitting sources once each minute on a 24-hour basis. This puts it in the zone of being potentially subject to the Computer Fraud and Abuse Act (CFAA), which has been used in a number of instances to rein in aggressive access to Web sites and other electronic facilities. The real question hinges on a key phrase in SOX 409 which says that information must be disclosed by corporations to the public on "an urgent basis". Is posting something on a Web site really "urgent" distribution? If there are distributors willing to do better, shouldn't public records be available to them on an urgent basis?
While good legal teams could push this to Thomson's favor without too much difficulty, there are reasons enough for them to rethink their approach to this situation. It's a simple enough fight to take on a renegade redistributor of public information, but what would happen if corporations with their own crawlers were excluded? That would be a tougher fight, no doubt, and a greater threat to service performance. Quoza could stand to get some marketing savvy and work with services suppliers such as Thomson to share the wealth from premium services derived from their proactive crawls so that their infrastructure costs could be born fairly. But at the same time suppliers like Thomson could get smart and recognize that there are great opportunities in distributing public information of all kinds far more aggressively than most IR site services are equipped to support. There are no clear heros or villains in this tiff but plenty of opportunity to make the most of public content.