Web Cataloguing Through Cache Exploitation and Steps Toward Consistency Maintenance
This paper presents a new Web cataloguing strategy based upon the automatic analysis of documents stored in a proxy server cache. This could be an elegant method of Web cataloguing as it creates no extra network load and runs completely automatically. Naturally such a mechanism will only reach a subset of Web documents, but at an institute such as the Alfred Wegener Institute, due to the fact that scientists tend to make quite good search engines, the cache usually contains large numbers of documents related to polar and marine research. Details of a database for polar, marine and global change research, based upon a cache scanning mechanism are given, and it is shown that it is becoming an increasingly useful resource.A problem with any collection of information about Web documents is that it quickly becomes old. Strategies have been developed to maintain the database consistency with respect to changes on the Web, while attempting to keep network load to a minimum. This has been found to provide a better quality of response and it appears to be keeping information in the database current. Such strategies are of interest to anyone attempting to create and maintain a Web document location resource.


 
                                        