City University of Hong Kong
DSpace
 

CityU Institutional Repository >
3_CityU Electronic Theses and Dissertations >
ETD - Dept. of Electronic Engineering  >
EE - Master of Philosophy  >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2031/3922

Title: Subspace based multi-model and non-uniform spectral compression for noisy speech recognition
Other Titles: Ji yu cao za huan jing xia yu yin shi bie de zi kong jian sheng cheng duo mo xing jie gou he fei jun yun pin pu ya suo
基於嘈雜環境下語音識别的子空間生成多模型結構和非均勻頻譜壓縮
Authors: Yip, Chun Shing (葉俊成)
Department: Dept. of Electronic Engineering
Degree: Master of Philosophy
Issue Date: 2002
Publisher: City University of Hong Kong
Subjects: Automatic speech recognition
Speech processing systems
Notes: CityU Call Number: TK7882.S65 Y57 2002
Includes bibliographical references (leaves 118-122)
Thesis (M.Phil.)--City University of Hong Kong, 2002
ix, 123 leaves : ill. ; 30 cm.
Type: Thesis
Abstract: Speech recognition is always a difficult task in a noisy environment. To cope with this difficulty, recent research works in noisy speech recognition mainly focused in the following areas: i) speech enhancement; ii) robust feature extraction and iii) model compensation. In these areas, much effort has been concentrated in reducing the mismatch between the training and testing pattern. In this thesis, a multi-model method based on subspace approach to reduce the mismatch in pattern comparison will be studied. Moreover, a DFT based feature extraction method with non-uniform spectral compression will also be introduced with an aim of yielding a feature that can be more tolerant to noise contamination so as to achieve robust speech recognition. Multi-model method based on subspace approach (MMSA) in pattern comparison is developed to reduce the mismatch between clean and noisy data. In subspace approach, the vector space is divided into a signal space and a noise space via singular value decomposition (SVD). By which, several reduced rank representations are extracted. Each representation is used to train up a model by maximum likelihood method. Hence each word model is associated with multiple models. In recognition, the corresponding reduced rank representations are used to evaluate the sub-models in multi-model. Experimental results show that multi-model method based on subspace approach can achieve better recognition performance than the single model counterpart. The performance of the multi-model can be further enhanced by the combination weights, which combine the sub-models in multi-model associated with each word. Minimum classification error (MCE) criterion is deployed to train the combination weights for different environments. It is shown from experimental results that the recognizer with a few sets of combination weights and a simple SNR determination method can be applied to a wide range of SNRs for robust speech recognition. A DFT based feature extraction with non-uniform spectral compression (NSC) is developed that yields the resulting cepstral feature more tolerant to noise contamination. It generalizes the conventional DFT based feature extraction method and allows different DFT components of segmented speech having different compressions. With the proper choice of compressions, the proposed feature extraction method can provide significant improvement in the recognition performance over the original counterparts. An extended version of the proposed method, namely frame-dependent non-uniform spectral compression (FD-NSC), is also introduced. It allows different frames having their own non-uniform spectral compression values in order to cope with the dynamic variation of frames. Working together with the nonlinear spectral subtraction (NSS) front-end, experimental results show that the proposed methods, NSC and FD-NSC can achieve good recognition rates even in very low SNR. The proposed techniques are compared with several benchmarked algorithms using the isolated word set in TIDigit. A dataset NOISEX-92 is used to simulate the noisy environment. From the simulation results, the proposed techniques provide significant improvement in recognition accuracy compared with the benchmarked algorithms.
Online Catalog Link: http://lib.cityu.edu.hk/record=b1761200
Appears in Collections:EE - Master of Philosophy

Files in This Item:

File Description SizeFormat
fulltext.html159 BHTMLView/Open
abstract.html159 BHTMLView/Open

Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0!
DSpace Software © 2013 CityU Library - Send feedback to Library Systems
Privacy Policy · Copyright · Disclaimer