Please use this identifier to cite or link to this item:
|Department:||Department of Computer Science|
|Supervisor:||Dr. Chan Edward. First Reader: Dr. Poon C K. Second Reader: Dr. Chun Andy H W|
|Abstract:||As the Internet has been developed readily, electronic information becomes our essential asset. It is, however, not easy to trace the historic online information from the World Wide Web. A Web archival system serves as an Internet library to keep track of all online information, so researchers, historians, scholars and even our next generation can get the cyber history. The project aims to design a Web archival system that will visit user-selected Web sites periodically, determine whether there has been any change, retrieve the modified version of the Web site and save a copy in the local machine, and also provide a user-friendly interface for searching and browsing the archived information. Most of the existing Web archival systems, however, face the high storage and network overhead problem. In order to relax these problems, the project develops “change detection mechanism” and “Web object change interval estimation”. Change detection mechanism is an algorithm to determine any “meaningful” change between the last archived version and the latest downloaded version. The algorithm can determine if the latest downloaded version is whether worth to save or not. Web object change interval estimation is a scheduler to project the future modification time of a Web page. Using the scheduler to organize Web archival tasks, the system can arrange more resources on the frequently-change Web site and reduce the unnecessary visits.|
|Appears in Collections:||Computer Science - Undergraduate Final Year Projects|
Items in Digital CityU Collections are protected by copyright, with all rights reserved, unless otherwise indicated.