2001: P. Wiggers

title:	Hidden Markov models for automatic speech recognition and their multimodal applications
author:	Pascal Wiggers
published in:	August 2001
appeared as:	Master of Science thesis Delft University of Technology
pages:	103
	PDF (726 KB)

Abstract

State of the art automatic speech recognition systems reach acceptable levels of performance when used under laboratory conditions. In more realistic noisy environments however, their performance rapidly degrades. A possible solution to this problem lies in the use of multiple modalities for speech recognition; the audio signal is augmented by for example lipreading signals or information on facial expressions. Open question is how and at what point during the recognition process to integrate multiple modalities in a speech recognizer. This report describes the development of a large vocabulary speaker independent continuous speech recognizer for the Dutch language using Hidden Markov Toolkit and the Polyphone database of recorded Dutch speech. This recognizer can be used as a starting point for a multimodal recognizer based on multimodal Hidden Markov models. Furthermore a number of models for multimodal recognition are presented and a number of experiments on the incorporation of other modalities in the speech recognizer are described and tested.