Skip navigation
Run Run Shaw Library City University of Hong KongRun Run Shaw Library

Please use this identifier to cite or link to this item: http://dspace.cityu.edu.hk/handle/2031/537
Title: Document clustering in email client system
Authors: Chang, Matthew Chor Ming
Department: Department of Computer Science
Issue Date: 2003
Supervisor: Dr. C.K. Poon. First Reader: Dr. Y.T. Yu. Second Reader: Pro. Horace IP
Abstract: In this project, I study the problem of email clustering. I have implemented an email client system, CEMail system, for addressing and testing an email clustering algorithm with specially consideration of the email characteristics. I have first found out the characteristics of email characteristics which I can distinguish from other kinds of document clustering. The characteristics help to design an algorithm with a high accurate rate for specially classifying emails. I then apply the notion of resemblance to measure the similarity between emails. Then the emails are classified by using k-nearest neighbor classification model. I will show the efficiency on the supervised learning rate, the comparison with Naïve Bayesian classification method, the elimination of heavy pre-processing, and the high accurate rate by using the approach. I have realized the approach by implementing an application called CEMail system in Java. I have also tested it by clustering the incoming email corpus which is received from various companies for a period of time. I found the accurate rate of clustering is as high as 95%.
Appears in Collections:Computer Science - Undergraduate Final Year Projects 

Files in This Item:
File SizeFormat 
fulltext.html164 BHTMLView/Open
Show full item record


Items in Digital CityU Collections are protected by copyright, with all rights reserved, unless otherwise indicated.

Send feedback to Library Systems
Privacy Policy | Copyright | Disclaimer