City University of Hong Kong

CityU Institutional Repository >
3_CityU Electronic Theses and Dissertations >
ETD - Dept. of Computer Science  >
CS - Doctor of Philosophy  >

Please use this identifier to cite or link to this item:

Title: Hybrid machine learning using ensemble-based approaches
Other Titles: Ji yu ji cheng fang fa de hun he ji qi xue xi
Authors: Zhang, Shaohong (張少宏)
Department: Department of Computer Science
Degree: Doctor of Philosophy
Issue Date: 2010
Publisher: City University of Hong Kong
Subjects: Machine learning.
Notes: CityU Call Number: Q325.5 .Z43 2010
xiv, 183 leaves : ill 30 cm.
Thesis (Ph.D.)--City University of Hong Kong, 2010.
Includes bibliographical references (leaves 168-183)
Type: thesis
Abstract: Recent advances in different data collection methods and feature extraction techniques have created massive data of different types. Compared to the traditional types of data, modern data types might become more complex due to the significant variations in their sizes, distributions, structures, and the presence of noise and bias. As a result, applying a single technique is usually not enough to learn the data at hand. In view of this problem, researchers begin to investigate hybrid learning techniques based on information ensembles. Specifically, this class of hybrid learning techniques can be classified into two main categories: (i) hybrid learning based on different views of the data and (ii) hybrid learning based on ensembles of different techniques. For the first category, the data are learnt using information ensemble from different perspectives, which might include data features, pairwise relationship of the data objects and side label information. Constrained clustering is one of the most representative techniques in this category, which learns the data in a semi-supervised manner with pairwise constraints. For the second category, the data are learnt using an ensemble of different techniques which might come from either similar or different families. Cluster ensembles and classifier ensembles are two representative examples in this category. In this thesis, we investigate a number of problems related to the topic of hybrid learning. Our first contribution is the study of constrained clustering, which belongs to the first category of hybrid learning. We develop several new constrained clustering algorithms from different perspectives: (i) We introduce the concept of closure into the partial constrained clustering framework; (ii) We propose to use multiple representatives for clusters in the active constrained clustering framework and (iii) We propose to integrate different types of constraints into the constrained clustering framework. Experiments on several benchmark data sets demonstrate the advantages of our algorithms over competing algorithms. Our second contribution belongs to the study of cluster ensembles, which belongs to the second category of hybrid learning, ensembles of different techniques from the same family (i.e., unsupervised learning techniques). For this problem, our main contribution is the generalization of the Adjusted Rand Index (ARI). ARI is a traditional evaluation measure for comparing two clustering solutions. We generalize ARI into two new measures such that they are applicable to the comparison between (i) a clustering and a cluster ensemble, and (ii) the comparison between two ensembles. We also prove the equivalence between ARI and our new measures. Desirable properties of ARI are preserved in our proposed measure and new attractive properties are introduced. We also provide a number of application examples to illustrate the effectiveness of our new measures. We have also developed a new hybrid associative retrieval (HAR) framework of 3D models as a practical application of hybrid learning, ensembles of different techniques from different families (i.e., both supervised and unsupervised learning techniques). Unlike the conventional 3D model similarity retrieval approach, the query model and the models obtained by 3D model hybrid associative retrieval have the following property: They belong to different model classes and have different shape characteristics in general, but are related semantically and pre-assembled in a certain associative group. The main contributions are as follows: (i) To establish the relationship between different 3D model categories which have semantic associations, we propose three approaches based on neural network learning; (ii) To obtain better representation of the different classes of 3D models, we integrate a number of different cluster validity indices to determine the number of subclasses for each class; (iii) To obtain better retrieval results, we combine multiple retrieval results into a final result. Experiments using different data sets demonstrate the effectiveness of our proposed framework on the new hybrid associative retrieval task.
Online Catalog Link:
Appears in Collections:CS - Doctor of Philosophy

Files in This Item:

File Description SizeFormat
abstract.html132 BHTMLView/Open
fulltext.html132 BHTMLView/Open

Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0!
DSpace Software © 2013 CityU Library - Send feedback to Library Systems
Privacy Policy · Copyright · Disclaimer