Please use this identifier to cite or link to this item:
|Title:||Automation Archive from PDF Files|
|Authors:||Tong, Chi Wai|
|Department:||Department of Electronic Engineering|
|Supervisor:||Supervisor: Prof. Wong, Hei; Assessor: Prof. Chan, Yan Cheong|
|Abstract:||Portable document format (PDF) has been widely used in various systems and as the unique and standard form of electronic file for official publishing, printing, and information exchange. To archive PDF documents without assistance of program, the users or the database operators usually need to re-type some required information which had already appeared on the PDF files. This process is not only time consuming, but also introduce typos and some other mistakes or miss-operations. This project aims to develop a document processing system that is able to archive PDF files automatically. In this system, the required information for pre-defined database fields are extracted directly from the PDF files. In most cases, the users can produce a database record by uploading a relevant PDF file only. The key tasks of this project include the development of programs/subroutines for: (a) converting PDF files to plain text; (b) extracting file information, text mining/information extraction from the converted text based on some rules, formats or keywords; (c) database management; and (d) website page development. This program provides a website for online uploading PDF files by different users. A record should be automatically created in the central server based on the information given in the uploaded PDF file. It can be used as a user-friendly report or paper submission system, a part of office automation system, or an intelligent document sharing website with minimum human interference.|
|Appears in Collections:||Electronic Engineering - Undergraduate Final Year Projects |
Items in Digital CityU Collections are protected by copyright, with all rights reserved, unless otherwise indicated.