|title:||Hidden Markov models for automatic speech recognition and their multimodal applications|
|published in:||August 2001|
Master of Science thesis
Delft University of Technology
|PDF (726 KB)|
State of the art automatic speech recognition systems reach acceptable levels of performance when used under laboratory conditions. In more realistic noisy environments however, their performance rapidly degrades. A possible solution to this problem lies in the use of multiple modalities for speech recognition; the audio signal is augmented by for example lipreading signals or information on facial expressions. Open question is how and at what point during the recognition process to integrate multiple modalities in a speech recognizer. This report describes the development of a large vocabulary speaker independent continuous speech recognizer for the Dutch language using Hidden Markov Toolkit and the Polyphone database of recorded Dutch speech. This recognizer can be used as a starting point for a multimodal recognizer based on multimodal Hidden Markov models. Furthermore a number of models for multimodal recognition are presented and a number of experiments on the incorporation of other modalities in the speech recognizer are described and tested.