Abstracts of published papers

Generating Fractals with Hopfield Neural Network

    The paper shows a simple method for generating fractals using neural computational approach. The artificial neural network chosen for this approach is the Hopfield autoassociator.

Gradient Based Approach to Mouth Shape Recognition

    At this moment there is an ongoing project at Delft University of Technology aimed at the development of the Integrated System for Facial Expression Recognition (ISFER). The ISFER project is being developed in the Knowledge Based Systems group at the Faculty of Technical Mathematics and Informatics, TU Delft. In this project the works about image preprocessing, feature extraction and the interpretation of the results are bound in a single system. One of the most important parts of the facial expression recognition is recognizing the mouth expression. This paper describes a new approach to this task. Contrary to the works already done in this field, it is rather knowledge- than graphically-based. We use here the fuzzy-system and the artificial neural network techniques to obtain the desired results.
(full paper)

Mixed Fuzzy-System and Artificial Neural Network Approach to the Automated Recognition of Mouth Expressions

    At this moment there is an ongoing project in the Knowledge Based Systems group at the Delft University of Technology, which is aimed at the development of the Integrated System for Facial Expression Recognition(ISFER). In this project the works about image processing, feature extraction and interpretation of the results are bound in a single system. One of the most important parts of the facial expression recognition is recognizing the mouth expression. This paper describes a new approach to this task. Contrary to the works already done in this fields, it is rather knowledge- than graphically-based. We use here the fuzzy-system in combination with the artificial neural network to obtain the recognition of the mouth shape.
(full paper)

Dual-view Recognition of Emotional Facial Expressions

    Non-verbal communication plays an important role in human communication. At the Delft University of Technology there is a project running on the automatic recognition of facial expressions. The developed system ISFER (Integrated System for Facial Expression Recognition) is based on the analysis of a frontal view of the face. In this paper we investigate the possible advantages of using an additional lateral view of the face. The experiments done with use of neural networks show the expected improvements when using the dual view approach. In this paper we describe the setup of the experiments and report the results.
(full paper)

Analysis of Facial Expressions Based on Silhouettes

    Non-verbal communication plays an important role in human communication. At the Delft University of Technology there is a project running on the automatic recognition of facial expressions. The developed system ISFER (Integrated System for Facial Expression Recognition) consists of modules suited for the analysis of a frontal view of the face. As the obtained results are still far from being perfect we present in this paper a new complementary approach based on a lateral view of the face. The underlying model is based on the silhouette of the facial profile with 10 characteristic points on specific places. The points are chosen so that using Matlab tool we can assess their locations as extreme points on the silhouette. From this information emotional facial expressions can be classified up to some degree using expert systems or neural networks. The new model and results of testing are reported in this paper.
(full paper)

Visually Based Speech Onset/Offset Detection

    In this paper we present a new model for video sequence processing that is aimed at extraction of the data describing the mouth shape in the image. The proposed technique can be used in any lip-reading related application. One of such applications, chosen for our experiments, is speech onset and offset detection. The experiments that are described here are aimed at detecting boundaries of the speech on the basis of visual modality only. This paper describes the data processing technique and the results of the experiments.
(full paper)

Silence Detection and Vowel/Consonant Discrimination in Video Sequences

    In this paper we present a set of experiments that were aimed at investigation of feasibility of using artificial neural networks (ANNs) in a lip-reading task. We present here the method for data extraction that is applied on video sequences containing lower half of the face of speaking subject. Further the data is used to evaluate the performance of ANNs in a task of classifying the frames in the video stream into three possible classes: vowel, consonant or silence.
(full paper) (demo)

Robust Video Processing for Lipreading Applications

    In this paper we present a robust model for video sequence processing that is aimed at extraction of the data describing the mouth shape in the image. The method allows computationally light data extraction from the video sequence. The extracted data can then be used in lip-reading applications. The paper describes results of two different experiments with artificial neural network recognizers trained on the recognition of speech boundaries and vowel-consonant discrimination.
(full paper)

Using Artificial Neural Networks in Lip-reading

    In this paper we present a model for video sequence processing that is aimed at extraction of the data describing the mouth shape in the image. We show how artificial neural networks (ANNs) can be used in both the feature extraction and classification part of the recognition system. Appropriate ANN architectures are proposed depending on the task being performed.
(full-paper)

Using Aerial and Geometric Features in Automatic Lip-reading

    In this paper we present the lip-reading experiments with different sets of the features extracted from the video sequence. In our experiments we use a simple color based filtering techniques to extract the feature vectors from the incoming video signal. Some of those features are directly related to the geometrical properties of the lips (their position and visible thickness). Other features represent the information that relates to the visibility of the other components of the speech production system. The visibility of the teeth and vocal tract for example is described by means of the area they occupy in the image, we call them therefore the aerial features.
(full-paper)
NOTE: It's a bit embarrassing, but I made a mistake in choosing the name for the features that complement the geometry description. Obviously, the title should be: Using Area-related and Geometric Features in Automatic Lip-reading. It went through the review process without anyone noticing it neither in the title nor in the text of the paper. I present the paper here exactly the way it has been published in proceedings.

Obtaining Person-independent Feature Space for Lip-reading

    A person-independent representation of the lip-movements is crucial in developing a multimodal speech recognizer. The geometric models used in most lip-tracking techniques can remove some of the features such as skin texture or color, and appropriate normalization of the data and it's projection in the principal components space can reduce the amount of person-specific features even further. Although using Principal Component Analysis (PCA) of the multi-person dataset reveals some interesting features, the inter-person variation is too big to allow for robust speech recognition. There are, however, substantial similarities in the lip-shape variations when analyzing only single-person data sets. We propose to use an adaptive PCA that updates the projection coefficients with respect to the data available for the specific person. (published only as abstract, see
extended version)

Development of a speech recognizer for the Dutch language

    This paper describes the development of a large vocabulary speaker independent speech recognizer for the Dutch language. The recognizer was build using Hidden Markov Toolkit and the Polyphone database of recorded Dutch speech. A number of systems have been build ranging from a simple monophone recognizer to a sophisticated system that uses backed-off triphones. The system has been tested using audio from different acoustic environments to test its robustness. The design and the test results will be presented.
(full paper)

An Audio-Visual Corpus for Multimodal Speech Recognition in Dutch Language

    This paper describes the gathering and availability of an audio-visual speech corpus for Dutch language. The corpus was prepared with the multi-modal speech recognition in mind and it is currently used in our research on lip-reading and bimodal speech recognition. It contains the prompts used also in the well established POLYPHONE corpus and therefore captures the Dutch language characteristics with a reasonable accuracy.
(full paper)

Medium Vocabulary Continuous Audio-Visual Speech Recognition

    This paper presents our experiments on continuous audiovisual speech recognition. A number of bimodal systems using feature fusion or fusion within Hidden Markov Models are implemented. Experiments with different fusion techniques and their results are presented. Further the performance levels of the bimodal system and a unimodal speech recognizer under noisy conditions are compared.
(full paper) Created: 26.03.1998,
last modified: