Sometimes it’s the vocabulary of different types organizations, being mismatched, that gets in the way of the correct understanding and use of technology, not the technology itself.
In this case, Google Operating System Blog give the example of the World Association of Newspapers developing a new system for granting permission to search crawlers, on a selective basis, to crawl their news articles, when Robots.txt already is in place and can do the same job.
"After a Belgian press organization sued Google for copyright infringement and won, World Association of Newspapers decided to create "an automated system for granting permission on how to use their content", reports Reuters. The system will be called Automated Content Access Protocol (ACAP).
If you’re wondering why a such a system would be useful, you’re not the only one. "Since search engine operators rely on robotic ’spiders’ to manage their automated processes, publishers’ Web sites need to start speaking a language which the operators can teach their robots to understand. What is required is a standardized way of describing the permissions which apply to a Web site or Web page so that it can be decoded by a dumb machine without the help of an expensive lawyer."
"The publishers seem to ignore the fact that there is a system that lets you control what pages you want search engines to crawl: it’s called robots.txt and it’s available to every site owner. "
It seems far fetched that publishers don’t think of Robots.txt first - before going off and building a new system. Granted, Robots.txt is not really geared to the kind of selective filtering a News Site needs - but in that case, certainly Google and Yahoo should offering a fix rather than the Newspapers.