City University of Hong Kong

CityU Institutional Repository >
3_CityU Electronic Theses and Dissertations >
ETD - Dept. of Electronic Engineering  >
EE - Doctor of Philosophy  >

Please use this identifier to cite or link to this item:

Title: A study on clustering technique in data mining applications
Other Titles: Ju lei ji shu zai shu ju fa jue ying yong yan jiu
Authors: Ma, Wai Man (馬慧敏)
Department: Dept. of Electronic Engineering
Degree: Doctor of Philosophy
Issue Date: 2005
Publisher: City University of Hong Kong
Subjects: Cluster analysis
Data mining
Notes: CityU Call Number: QA76.9.D343 M3 2005
Includes bibliographical references (leaves 178-188)
Thesis (Ph.D.)--City University of Hong Kong, 2005
xv, 198 leaves : ill. ; 30 cm. + 1 CD-ROM (4 3/4 in.)
Type: Thesis
Abstract: In this thesis, three areas of clustering technique supporting different data types are studied. Clustering has been developed from a branch of statistical analysis to an independent research discipline since about four decades ago. Due to the unsupervised learning nature of clustering, it is widely applied in many applications such as data mining, pattern recognition, and other engineering applications. There are diverse types of clustering algorithms proposed to handle totally different data type ranging from numerical data to nominal data recently. Despite enormous numerical clustering algorithms, the number of user-intervention free algorithms is handful. In addition, user-intervention free algorithms normally process large datasets, which is computational demanding. Developing an efficient user-intervention free clustering algorithm is therefore undoubtedly important. In this thesis, a newly developed efficient user-intervention free grid based clustering algorithm by using the concept of shifted grid is suggested. Clustering is a data-driven process and each cluster description depicts the data structure from various perspectives. There may be a number of appropriate descriptions yet no absolutely correct ones. In this regard, cluster validation is required to evaluate the quality of cluster descriptions. All cluster validation indices, hitherto, proposed so far are for handling numerical data or nominal data but there is none for transactional data. A transactional cluster validation index is developed to quantify cluster descriptions to ease the abovementioned problem. In addition, the proposed index not only can evaluate transactional cluster description, but also evaluate nominal cluster description. Compared to other data mining techniques, clustering is a technique that can be used as both a stand-alone tool and a pre-processing tool. Feature selection is an important pre-processing step to reduce the number of features for further investigation. Most feature selection methods are supervised methods, which require class labels. Most real-world datasets, however, do not come with class labels. Hence it is essential to develop an unsupervised feature selection method. An unsupervised feature selection method for nominal data is therefore proposed in this thesis to fill the gap. In summary, three methods tackling different clustering problems are proposed in this thesis. The proposed user-intervention free shifted grid clustering algorithm handles numerical data in an efficient way. Transactional validation index evaluating transactional and nominal cluster descriptions is put forth. At last, an unsupervised nominal feature selection scheme ranking the relevance of features is developed.
Online Catalog Link:
Appears in Collections:EE - Doctor of Philosophy

Files in This Item:

File Description SizeFormat
fulltext.html157 BHTMLView/Open
abstract.html157 BHTMLView/Open

Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0!
DSpace Software © 2013 CityU Library - Send feedback to Library Systems
Privacy Policy · Copyright · Disclaimer