City University of Hong Kong
DSpace
 

CityU Institutional Repository >
3_CityU Electronic Theses and Dissertations >
ETD - Dept. of Computer Engineering and Information Technology  >
IT - Doctor of Philosophy  >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2031/4729

Title: The development of an automatic real-time lipreading system
Other Titles: Zi dong hua shi shi chun yu shi bie xi tong de kai fa
自動化實時唇語識別系統的開發
Authors: Wang, Shilin (王士林)
Department: Department of Computer Engineering and Information Technology
Degree: Doctor of Philosophy
Issue Date: 2004
Subjects: Automatic speech recognition
Lipreading -- Computer simulation
Notes: xi, 134 leaves : ill. (some col.) ; 30 cm.
Thesis (Ph.D.)--City University of Hong Kong, 2004
Includes bibliographical references (leaves 120-127)
CityU Call Number: TK7895.S65 W36 2004
Type: Thesis
Abstract: Using visual information in automatic speech recognition has aroused the interest of many researchers in recent years because the visual information will help enhancing the robustness of the system. This thesis presents the studies of developing a real-time system for automatic speech recognition solely based on the visual cues from the lip shape and movement. Lip segmentation, modeling, visual feature extraction and recognition are the major issues of the system and new algorithms for these issues are presented in this thesis. For lip segmentation, most of the widely used methods are based on the color or intensity information. However, for the lip images with low contrast, these methods will not achieve satisfactory results. In our study, both the color information and spatial location are integrated into a fuzzy clustering framework. The lip image is represented in CIELAB and CIELUV color space, where the luminance and chromaticity information is separated and the distance between any two points in the color space is proportional to their perceptual color difference. By integrating the spatial information, the proposed algorithm can differentiate the pixels with similar color but located in different regions. From the experimental results, the proposed algorithm outperforms other lip segmentation techniques especially for images with low color contrast. An extension of the algorithm has also been developed to solve the lip segmentation problem with the presence of beards, which is regarded as a difficult problem for lip region segmentation. For the lip contour modeling and extraction, accuracy, robustness and efficiency are the primary concerns. A 16-point model is employed to describe the lip contour. Some geometric constraints are applied to ensure that the extracted lip contour is physically meaningful. Based on the membership distribution derived from the lip segmentation procedure, a region-based cost function is defined, which is much more robust than the edge-based and intensity-based cost functions. A point-driven optimization procedure with some fast implementation techniques is used for model fitting and thus the lip contour can be obtained in an efficient manner. A visual feature set containing the geometric parameters, lip shape descriptors, and inner mouth information is obtained from the lip model for visual speech recognition purpose. A spline representation is employed to translate the discrete- sampled visual features into the continuous domain. The spline coefficients in the same word class are constrained to have the same mean and covariance matrix and can be estimated from the training data by the EM algorithm. In the speaker independent recognition task, a multi-model approach is proposed to overcome the difficulty due to the large variation caused by different speakers. By comparing with the HMM, the proposed method gives better result especially when only limited training data is available. An automatic lipreading system has been implemented and running on a 1.9 GHz PC. An accuracy of 96% for the speaker dependent recognition and 88% for the speaker independent recognition have been achieved. With the efficient implementation of all the algorithms, the system is able to process images at a rate higher than 25 frames/sec, leaving room for additional tasks in real-time applications.
Online Catalog Link: http://lib.cityu.edu.hk/record=b1871339
Appears in Collections:IT - Doctor of Philosophy

Files in This Item:

File Description SizeFormat
abstract.html162 BHTMLView/Open
fulltext.html162 BHTMLView/Open

Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0!
DSpace Software © 2013 CityU Library - Send feedback to Library Systems
Privacy Policy · Copyright · Disclaimer