|
|
CityU Institutional Repository >
CityU Electronic Theses and Dissertations >
ETD - Dept. of Computer Science >
CS - Doctor of Philosophy >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2031/6221
|
| Title: | Hybrid machine learning using ensemble-based approaches |
| Other Titles: | Ji yu ji cheng fang fa de hun he ji qi xue xi 基於集成方法的混合機器學習 |
| Authors: | Zhang, Shaohong (張少宏) |
| Department: | Department of Computer Science |
| Degree: | Doctor of Philosophy |
| Issue Date: | 2010 |
| Publisher: | City University of Hong Kong |
| Subjects: | Machine learning. |
| Notes: | CityU Call Number: Q325.5 .Z43 2010 xiv, 183 leaves : ill 30 cm. Thesis (Ph.D.)--City University of Hong Kong, 2010. Includes bibliographical references (leaves 168-183) |
| Type: | thesis |
| Abstract: | Recent advances in different data collection methods and feature extraction techniques
have created massive data of different types. Compared to the traditional types
of data, modern data types might become more complex due to the significant variations
in their sizes, distributions, structures, and the presence of noise and bias. As a
result, applying a single technique is usually not enough to learn the data at hand. In
view of this problem, researchers begin to investigate hybrid learning techniques based
on information ensembles. Specifically, this class of hybrid learning techniques can
be classified into two main categories: (i) hybrid learning based on different views of
the data and (ii) hybrid learning based on ensembles of different techniques. For the
first category, the data are learnt using information ensemble from different perspectives,
which might include data features, pairwise relationship of the data objects and
side label information. Constrained clustering is one of the most representative techniques
in this category, which learns the data in a semi-supervised manner with pairwise
constraints. For the second category, the data are learnt using an ensemble of different
techniques which might come from either similar or different families. Cluster ensembles
and classifier ensembles are two representative examples in this category. In this
thesis, we investigate a number of problems related to the topic of hybrid learning.
Our first contribution is the study of constrained clustering, which belongs to the
first category of hybrid learning. We develop several new constrained clustering algorithms
from different perspectives: (i) We introduce the concept of closure into the
partial constrained clustering framework; (ii) We propose to use multiple representatives
for clusters in the active constrained clustering framework and (iii) We propose to
integrate different types of constraints into the constrained clustering framework. Experiments on several benchmark data sets demonstrate the advantages of our algorithms
over competing algorithms.
Our second contribution belongs to the study of cluster ensembles, which belongs
to the second category of hybrid learning, ensembles of different techniques from the
same family (i.e., unsupervised learning techniques). For this problem, our main contribution
is the generalization of the Adjusted Rand Index (ARI). ARI is a traditional
evaluation measure for comparing two clustering solutions. We generalize ARI into two
new measures such that they are applicable to the comparison between (i) a clustering
and a cluster ensemble, and (ii) the comparison between two ensembles. We also prove
the equivalence between ARI and our new measures. Desirable properties of ARI are
preserved in our proposed measure and new attractive properties are introduced. We
also provide a number of application examples to illustrate the effectiveness of our new
measures.
We have also developed a new hybrid associative retrieval (HAR) framework of 3D
models as a practical application of hybrid learning, ensembles of different techniques
from different families (i.e., both supervised and unsupervised learning techniques). Unlike
the conventional 3D model similarity retrieval approach, the query model and the
models obtained by 3D model hybrid associative retrieval have the following property:
They belong to different model classes and have different shape characteristics in general,
but are related semantically and pre-assembled in a certain associative group. The
main contributions are as follows: (i) To establish the relationship between different 3D
model categories which have semantic associations, we propose three approaches based
on neural network learning; (ii) To obtain better representation of the different classes
of 3D models, we integrate a number of different cluster validity indices to determine the number of subclasses for each class; (iii) To obtain better retrieval results, we combine
multiple retrieval results into a final result. Experiments using different data sets
demonstrate the effectiveness of our proposed framework on the new hybrid associative
retrieval task. |
| Online Catalog Link: | http://lib.cityu.edu.hk/record=b3947823 |
| Appears in Collections: | CS - Doctor of Philosophy
|
Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.
|