1996: Speelman

title:	A Survey of the PE500 Speech Recognition System and the Development of a Benchmark Test
author:	J. Speelman
published in:	1996
appeared as:	Master of Science thesis Delft University of Technology
also as:	Alparon report nr. 96-07 Section of Knowledge Based Systems Faculty of Technical Mathematics and Informatics Delft University of Technology

Abstract

At the department of Technical Informatics at the Delft University of Technology, research on automatic speech recognition is a rather new topic. To gain good knowledge of speech recognition several research projects have been started. The project described in this thesis, is concerned with the research on the Phonetic Engine 55 automatic speech recognition system and the development of a general testing environment (benchmark test) for Automatic Speech Recognition systems (ASR systems).

The PE500 system is an American English phoneme-based continuous speech recognition system, that converts ananlogue speech input into the so-called phonetic codes, after which the phonetic codes are decoded into text. In this thesis, several experiments have been performed to find out how the phonetic codes should be interpreted and what kind of phonemes the system uses. The experiments also showed that the system is based on phoneme segmentation, which corresponds to a recognition method called Acoustic-Phonetic Recogition. Also it is indicated what changes should nbe made, to use the PE500 system for the recognition of speech from another language than the American English language. It turned out that these changes could only be made by the manufacturer of the PE500 system.

In the near future researchers at our department will evelop their own contiuous ASR systems. This requires a good testing environment, because at certain stages in the development process of ASR systems, it would be necessary to evaluate the performance of the system. In this case we can get good understanding of the system, by which it would be possible to adapt the system in order to improve the performance.

In this thesis it has been shown that it is only meaninigful to test an ASR system, if the performance can be compared to the performance of another ASR system tested under the same test conditions. In order to dertmine how ASR systems can be tested under sthe same test conditions. In order to determine how ASR systems can be tested under the same test conditions, all factors that have influence on the recognition performance were investigated. From this investigation it was concluded that a benchmark test should contain 5 basic components (a speech database, dictionary, vocabulary, grammar and scoring algorithm).

The developed benchmark test was based on the TIMIT (American English) speech database. A few command-line programs were implemented to create the dictionalry, grammar and vocabulary out of data contained by the TIMIT database. The implemented scoring algorithm was based on an efficient dynamic programming algorithm. The benchmark test was performed on the PE500 system and the RECNET system (based on a recurrent error propagation neural network), which resulted in a much better recognition performance for the RECNET system than for the PE500 system.