Abstracts of published papers
Generating Fractals with Hopfield Neural Network
The paper shows a simple method for
generating fractals using neural computational approach. The
artificial neural network chosen for this approach is the
Hopfield autoassociator.
Gradient Based Approach to Mouth Shape Recognition
At this moment there is an ongoing
project at Delft University of Technology aimed at the
development of the Integrated System for Facial Expression
Recognition (ISFER). The ISFER project is being developed in
the Knowledge Based Systems group at the Faculty of
Technical Mathematics and Informatics, TU Delft. In this
project the works about image preprocessing, feature
extraction and the interpretation of the results are bound
in a single system. One of the most important parts of the
facial expression recognition is recognizing the mouth
expression. This paper describes a new approach to this
task. Contrary to the works already done in this field, it
is rather knowledge- than graphically-based. We use here the
fuzzy-system and the artificial neural network techniques to
obtain the desired results.
(full paper)
Mixed Fuzzy-System and Artificial Neural Network
Approach to the Automated Recognition of Mouth
Expressions
At this moment there is an ongoing
project in the Knowledge Based Systems group at the Delft
University of Technology, which is aimed at the development
of the Integrated System for Facial Expression
Recognition(ISFER). In this project the works about image
processing, feature extraction and interpretation of the
results are bound in a single system. One of the most
important parts of the facial expression recognition is
recognizing the mouth expression. This paper describes a new
approach to this task. Contrary to the works already done in
this fields, it is rather knowledge- than
graphically-based. We use here the fuzzy-system in
combination with the artificial neural network to obtain the
recognition of the mouth shape.
(full paper)
Dual-view Recognition of Emotional Facial Expressions
Non-verbal communication plays an
important role in human communication. At the Delft
University of Technology there is a project running on the
automatic recognition of facial expressions. The developed
system ISFER (Integrated System for Facial Expression
Recognition) is based on the analysis of a frontal view of
the face. In this paper we investigate the possible
advantages of using an additional lateral view of the
face. The experiments done with use of neural networks show
the expected improvements when using the dual view
approach. In this paper we describe the setup of the
experiments and report the results.
(full paper)
Analysis of Facial Expressions Based on Silhouettes
Non-verbal communication plays an
important role in human communication. At the Delft
University of Technology there is a project running on the
automatic recognition of facial expressions. The developed
system ISFER (Integrated System for Facial Expression
Recognition) consists of modules suited for the analysis of
a frontal view of the face. As the obtained results are
still far from being perfect we present in this paper a new
complementary approach based on a lateral view of the
face. The underlying model is based on the silhouette of the
facial profile with 10 characteristic points on specific
places. The points are chosen so that using Matlab tool we
can assess their locations as extreme points on the
silhouette. From this information emotional facial
expressions can be classified up to some degree using expert
systems or neural networks. The new model and results of
testing are reported in this paper.
(full paper)
Visually Based Speech Onset/Offset Detection
In this paper we present a new model for
video sequence processing that is aimed at extraction of the
data describing the mouth shape in the image. The proposed
technique can be used in any lip-reading related
application. One of such applications, chosen for our
experiments, is speech onset and offset detection. The
experiments that are described here are aimed at detecting
boundaries of the speech on the basis of visual modality
only. This paper describes the data processing technique and
the results of the experiments.
(full paper)
Silence Detection and Vowel/Consonant Discrimination in
Video Sequences
In this paper we present a set of
experiments that were aimed at investigation of feasibility
of using artificial neural networks (ANNs) in a lip-reading
task. We present here the method for data extraction that is
applied on video sequences containing lower half of the face
of speaking subject. Further the data is used to evaluate
the performance of ANNs in a task of classifying the frames
in the video stream into three possible classes: vowel,
consonant or silence.
(full paper)
(demo)
Robust Video Processing for Lipreading Applications
In this paper we present a robust model
for video sequence processing that is aimed at extraction of
the data describing the mouth shape in the image. The method
allows computationally light data extraction from the video
sequence. The extracted data can then be used in lip-reading
applications. The paper describes results of two different
experiments with artificial neural network recognizers
trained on the recognition of speech boundaries and
vowel-consonant discrimination.
(full paper)
Using Artificial Neural Networks in Lip-reading
In this paper we present a model for
video sequence processing that is aimed at extraction of the
data describing the mouth shape in the image. We show how
artificial neural networks (ANNs) can be used in both the
feature extraction and classification part of the
recognition system. Appropriate ANN architectures are
proposed depending on the task being performed.
(full-paper)
Using Aerial and Geometric Features in Automatic Lip-reading
In this paper we present the lip-reading
experiments with different sets of the features extracted from the
video sequence. In our experiments we use a simple color based
filtering techniques to extract the feature vectors from the
incoming video signal. Some of those features are directly related
to the geometrical properties of the lips (their position and
visible thickness). Other features represent the information that
relates to the visibility of the other components of the speech
production system. The visibility of the teeth and vocal tract for
example is described by means of the area they occupy in the
image, we call them therefore the aerial features.
(full-paper)
NOTE: It's a bit embarrassing, but I made a mistake in
choosing the name for the features that complement the geometry
description. Obviously, the title should be: Using Area-related
and Geometric Features in Automatic Lip-reading. It went
through the review process without anyone noticing it neither in the
title nor in the text of the paper. I present the paper here exactly
the way it has been published in proceedings.
Obtaining Person-independent Feature Space for Lip-reading
A person-independent representation of the
lip-movements is crucial in developing a multimodal speech
recognizer. The geometric models used in most lip-tracking
techniques can remove some of the features such as skin texture or
color, and appropriate normalization of the data and it's
projection in the principal components space can reduce the amount
of person-specific features even further. Although using Principal
Component Analysis (PCA) of the multi-person dataset reveals some
interesting features, the inter-person variation is too big to
allow for robust speech recognition. There are, however,
substantial similarities in the lip-shape variations when
analyzing only single-person data sets. We propose to use an
adaptive PCA that updates the projection coefficients with respect
to the data available for the specific person. (published only
as abstract, see
extended version)
Development of a speech recognizer for the Dutch language
This paper describes the development of a
large vocabulary speaker independent speech recognizer for
the Dutch language. The recognizer was build using Hidden
Markov Toolkit and the Polyphone database of recorded Dutch
speech. A number of systems have been build ranging from a
simple monophone recognizer to a sophisticated system that
uses backed-off triphones. The system has been tested using
audio from different acoustic environments to test its
robustness. The design and the test results will be
presented.
(full paper)
An Audio-Visual Corpus for Multimodal Speech Recognition in
Dutch Language
This paper describes the gathering and
availability of an audio-visual speech corpus for Dutch
language. The corpus was prepared with the multi-modal
speech recognition in mind and it is currently used in our
research on lip-reading and bimodal speech recognition. It
contains the prompts used also in the well established
POLYPHONE corpus and therefore captures the Dutch language
characteristics with a reasonable accuracy.
(full paper)
Medium Vocabulary Continuous Audio-Visual Speech Recognition
This paper presents our experiments on
continuous audiovisual speech recognition. A number of
bimodal systems using feature fusion or fusion within Hidden
Markov Models are implemented. Experiments with different
fusion techniques and their results are presented. Further
the performance levels of the bimodal system and a unimodal
speech recognizer under noisy conditions are compared.
(full paper)
Created: 26.03.1998,
last modified: