City University of Hong Kong

CityU Institutional Repository >
3_CityU Electronic Theses and Dissertations >
ETD - Dept. of Electronic Engineering  >
EE - Master of Philosophy  >

Please use this identifier to cite or link to this item:

Title: Feature extraction based on perceptual non-uniform spectral compression for noisy speech recognition
Other Titles: Ji yu ting jue gan zhi fei jun yun pin pu ya suo de kang zao yu yin shi bie de te zheng chou qu fang fa
Authors: Chu, Kam Keung (朱錦強)
Department: Dept. of Electronic Engineering
Degree: Master of Philosophy
Issue Date: 2005
Publisher: City University of Hong Kong
Subjects: Automatic speech recognition
Speech processing systems
Notes: 148 leaves : ill. ; 30 cm.
CityU Call Number: TK7882.S65 C47 2005
Includes bibliographical references (leaves 143-147)
Thesis (M.Phil.)--City University of Hong Kong, 2005
Type: Thesis
Abstract: This dissertation details the development of a robust speech feature extraction technique called spectral compression and its relation to the perceived loudness of human beings. The function of spectral compression is to reduce the mismatch between the reference models trained from clean speech data, and the testing patterns derived from noisy speech. In simple words, spectral compression is to compress the spectral components of the speech power spectrum by taking a root value (i.e. a positive exponent smaller than one). Frame-based DFT speech analysis is adopted in this study to investigate the robustness of this technique. In the literature, constant or uniform spectral compression has already been used in Root Cepstral Analysis (RCA) and Perceptual Linear Prediction (PLP) speech analysis method. In the first part of the thesis, a compression function is proposed where each spectral component has its own root, instead of one constant root or exponent for the whole spectrum. We call this approach Non-uniform Spectral Compression (NSC). A decaying exponential curve is used as the compression function in the NSC. Experimental results show that the commonly used feature extraction methods, such as MFCC, PLP and LPC, incorporated with the NSC show significant improvement under white, pink and factory noise environments, as compared to the conventional uncompressed and constant root approaches. Surprisingly, the human auditory system employs a similar intensity compression operation, which can be approximated by the “power law of hearing” in psychophysics, to convert the physical sound intensity to human perceptual loudness. Psychoacoustic studies have shown that the compression root or exponent in the power law of hearing is smaller for a broadband signal than that for a narrowband 1kHz tone. This idea inspires the incorporation of human perception into the NSC scheme and leads to the development of the Perceptual Non-uniform Spectral Compression (PNSC). In this PNSC scheme, sound segments and speech components that have a large bandwidth are given a small exponent for a large degree of compression. On the other hand, the extent of compression to narrowband signals is small and these signals are largely retained for recognition. Using the PNSC, substantial improvement over the NSC scheme is obtained under white, pink and factory noises. Under babble and volvo noise environments, the improvement of both NSC and PNSC over conventional features is not so significant. Both the NSC and PNSC have a compression function which is based on a decaying exponential function. The compression schemes may not be effective when the noise has speech like or coloured characteristics. A radically different compression scheme called SNR-dependent Non-uniform Spectral Compression (SNSC) is then proposed to deal with this problem. In the SNSC scheme, speech components that are corrupted by noise will be de-emphasized depending on the estimated SNR of the components. Since different noise models have their own spectral characteristics and corrupt different regions of the speech spectrum, an SNR-dependent compression scheme is a favorable one to deal with different noise types. Moreover, the principle of SNSC can be supported by psychoacoustic evidence that a background noise produces a partial masking effect on a sound, where the background SNR determines the perceived loudness magnitude. Simulation results show that the SNSC can further boost the recognition accuracy over the PNSC and is able to deal with different noise models.
Online Catalog Link:
Appears in Collections:EE - Master of Philosophy

Files in This Item:

File Description SizeFormat
fulltext.html159 BHTMLView/Open
abstract.html159 BHTMLView/Open

Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0!
DSpace Software © 2013 CityU Library - Send feedback to Library Systems
Privacy Policy · Copyright · Disclaimer