CityU Institutional Repository >
CityU Electronic Theses and Dissertations >
ETD - Dept. of Mathematics >
MA - Doctor of Philosophy >
Please use this identifier to cite or link to this item:
|Title: ||Learning algorithms producing sparse approximations|
|Other Titles: ||Chan sheng xi shu bi jin de xue xi suan fa|
|Authors: ||Guo, Xin ( 郭昕)|
|Department: ||Department of Mathematics|
|Degree: ||Doctor of Philosophy|
|Issue Date: ||2011|
|Publisher: ||City University of Hong Kong|
|Subjects: ||Machine learning.|
|Notes: ||CityU Call Number: Q325.5 .G86 2011|
v, 90 leaves 30 cm.
Thesis (Ph.D.)--City University of Hong Kong, 2011.
Includes bibliographical references (leaves 81-89)
|Abstract: ||A class of learning algorithms for regression is studied. They are modified kernel
projection machines in a least squares ℓq-regularization scheme with 0 < q ≤ 1,
on a data dependent hypothesis space spanned by empirical features (constructed
by a reproducing kernel and the learning data). The algorithms have three advantages.
First, they do not involve any high dimensional optimization process,
thus the computational complexity is reduced, which also makes it easy to adjust
the regularization parameter by, e.g., cross validation approaches. Second, they
produce sparse representations with respect to empirical features under a mild
condition, without assuming sparsity of the regression function in terms of any
basis or system. Third, the output function converges to the regression function
in the reproducing kernel Hilbert space at a satisfactory rate which is explicitly
We analyze the algorithm with ℓ1 penalty first. Our analysis shows that
while having sparsity, the output function converges to the underlying regression
function. The convergent rate O(mε-½) is obtained in two different cases where
we assume the eigenvalues of the integrate operator generated by the Mercer
kernel decay polynomially and exponentially respectively.
We then study the algorithm with the general ℓq penalty where 0 < q ≤ 1.
Our goal is to understand the in
uence of the regularizing parameter on both
learning rate and sparsity. Our analysis suggests that as q decreases to zero, the
sparsity increases while the approximating ability is weakened.
We also study the algorithm in a special probability model where the noise
is assumed to be independent of the sampling place. The goal is to obtain better learning rates as well as sparsity. Our analysis shows that in this model, the
sparsity is independent of the index q while the learning rate power exponent
takes the form 1/2+O(1/r)when r is large, and the index q appears only in the term O(1/r).|
|Online Catalog Link: ||http://lib.cityu.edu.hk/record=b4086783|
|Appears in Collections:||MA - Doctor of Philosophy |
Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.