City University of Hong Kong
DSpace
 

CityU Institutional Repository >
4_Student Final Year Projects >
Electronic Engineering - Undergraduate Final Year Projects >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2031/619

Title: Data conversion from numerical to nominal data for classification and clustering
Authors: Chow, Man Chung
Department: Department of Electronic Engineering
Issue Date: 2005
Supervisor: Prof. Chow, Tommy W S. Assessor: Dr. Tang, K S
Abstract: Real world classification tasks involve different types of attributes such as categorical, nominal and numerical data. Classifiers can handle categorical and nominal values, but not all classifiers can handle numerical data. If a classifier can handle numerical data, it will perform a discretization before running any classification tasks, decision trees is one of the representatives. Decision trees play an important role in classification tasks and behave in an efficient manner. In the project, I have implemented three different heuristic discretization methods which aim to increase the classification accuracy of decision trees. I have empirically evaluated more than twenty datasets. All the experiments were conducted under the same computational environment. “Weka”, a popular and efficient machine learning tools, was used as a benchmark to measure the classification accuracy of different algorithms including the non-discretized numerical datasets. The obtained results show that the classification results can be retained or improved in terms of accuracy after discretization being applied. In addition, it was found that the proposed algorithms not only enhance the efficiency of decision trees classifiers, it also increases the clustering accuracy. This corroborates my argument that with the aid of an appropriate discretization method, classification accuracy can be increased either in a supervised or unsupervised classification. In order to demonstrate the benefits of the proposed algorithms, a “MP3 players” survey, designed to identify and study certain interesting data such as customer behaviors, has been conducted. The classification result of the survey data indicate that an improved accuracy was achieved after the application of the developed discretization method. Thus, it is believed that the proposed methods are applicable to many real life problems.
Appears in Collections:Electronic Engineering - Undergraduate Final Year Projects

Files in This Item:

File SizeFormat
fulltext.html164 BHTMLView/Open

Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0!
DSpace Software © 2013 CityU Library - Send feedback to Library Systems
Privacy Policy · Copyright · Disclaimer