Please use this identifier to cite or link to this item:
http://dspace.cityu.edu.hk/handle/2031/9482
Title: | Documents retrieval system |
Authors: | Wong, Tsz Fung |
Department: | Department of Electrical Engineering |
Issue Date: | 2021 |
Supervisor: | Supervisor: Prof. Chow, Tommy W S; Assessor: Dr. Chan, K L |
Abstract: | People used to store documents in operating systems and cloud storages; hardly can they retrieve document instantly. They are slow and only search for exact wording without ranking results, and they search for the existence of the keyword but not the level of importance. This study investigates a Document Retrieval System (DRS), which has an efficient document retrieval algorithm. DRS is a web application implemented in Python that store documents and supports multiple users. Specifically, it investigates the concepts of information retrieval, text-mining and natural language processing in producing a search engine. To build up the search engine, a numerical statistic, Term Frequency-Inverse Document Frequency, is used to reflect each word's importance to a document in the entire system. Search results are computed by comparing the similarity of each document and search keys in a dimensionality-reduced TF-IDF matrix. To maintain easy-access and easy-to-use features, DRS provides 31 Application-Programming-Interfaces to perform tasks. To evaluate DRS, standard data collections were imported and tested. DRS was able to work on 10000+ documents. Moreover, users were invited to evaluate DRS. |
Appears in Collections: | Electrical Engineering - Undergraduate Final Year Projects |
Files in This Item:
File | Size | Format | |
---|---|---|---|
fulltext.html | 148 B | HTML | View/Open |
Items in Digital CityU Collections are protected by copyright, with all rights reserved, unless otherwise indicated.