Skip navigation
Run Run Shaw Library City University of Hong KongRun Run Shaw Library

Please use this identifier to cite or link to this item: http://dspace.cityu.edu.hk/handle/2031/6713
Full metadata record
DC FieldValueLanguage
dc.contributor.authorLo, Chi Munen_US
dc.date.accessioned2012-08-27T00:46:59Z
dc.date.accessioned2017-09-19T09:13:14Z
dc.date.accessioned2019-02-12T07:31:04Z-
dc.date.available2012-08-27T00:46:59Z
dc.date.available2017-09-19T09:13:14Z
dc.date.available2019-02-12T07:31:04Z-
dc.date.issued2012en_US
dc.identifier.other2012eelcm747en_US
dc.identifier.urihttp://144.214.8.231/handle/2031/6713-
dc.description.abstractThis is a software project aims at developing a system which can be used to analyze and auto-categorize English documents, and study how different factors could affect system performance. The categorization system can be partitioned into 3 stages: data representation, classifier building and performance evaluation. As a document is originally raw text, system transforms document into N-grams which would be more suitable for later processing. And feature selection methods are used to filter out less important features (n-gram) and to reduce calculation complexity; and Chi Square Test is used in this project. After pre-processing of data, classifying approach is used to build a classifier for categorization. The system provides 2 classifying approaches; they are k-nearest neighbor (KNN) and Naive Bayes. By using the classifier, documents can be categorized automatically. To evaluate system performance, F1 score approach is used for measuring accuracy of classification. After testing of system, it is found that the classifying system is working with an acceptable accuracy with both KNN approach and Naive Bayes approach.en_US
dc.rightsThis work is protected by copyright. Reproduction or distribution of the work in any format is prohibited without written permission of the copyright owner.en_US
dc.rightsAccess is restricted to CityU users.en_US
dc.titleEnglish document analysis and visualizationen_US
dc.contributor.departmentDepartment of Electronic Engineeringen_US
dc.description.supervisorSupervisor: Prof. Chow, Tommy W S; Assessor: Prof. Chen, Guanrongen_US
Appears in Collections:Electrical Engineering - Undergraduate Final Year Projects 

Files in This Item:
File SizeFormat 
fulltext.html146 BHTMLView/Open
Show simple item record


Items in Digital CityU Collections are protected by copyright, with all rights reserved, unless otherwise indicated.

Send feedback to Library Systems
Privacy Policy | Copyright | Disclaimer