
title: | Multimodal McDrive System |
author: | Yun Pang |
published in: | August 2002 |
appeared as: |
Master of Science thesis Knowledge Based Systems group Delft University of Technology |
pages | 132 |
PDF (2.640 KB) |

Abstract
This project is aimed to build a multimodal intelligent system to replace the human operator of
McDrive. McDonald™s is the largest and best-known global foodservice retailer in the whole world. It
has more than 30,000 restaurants in 121 countries. There are 1.5 million people working at
McDonald™s. McDrive is one of the braches of McDonalds fast food chains. As we all know the
human power is very expensive. In order to reduce the human costs we want to build a system to
replace the operators of McDrive who take orders from the customers. This automated system uses
speech recognition technology to communicate with the customers. This issue has been discussed in
their works of Farhaad Mohamed-Hoesein and Ramya Ramaswamy. To make this system works
better we want to make it multi-modal. This system can talk to the customer, understand what the
customer says and give the right response. To improve the customer understanding this system gives
not only audio feedback but also visual feedback such as text and graphics. The graphics may be a
picture or a flash movie. Our goal is to reduce the manpower cost, but in the meanwhile we must
maintain the service level. It means this system needs to think and behave like a human being. We,
human being, can communicate with each other using different ways. Often no talk is needed, just a
gesture or a facial expression is enough and we can understand each other perfectly. We want this
system also having such emotion expressions. We build a human wizard to express these feelings.
The final system will be automatic but first we need to build some prototypes for testing. There will be
three prototypes that have different control modes: manual, semi-automatic and automatic. At this
moment a manual prototype is built. In this prototype the customer can only use text as input because
of the time and money limitation. The operator needs to produce the response manually through a
special keyboard. This keyboard has three parts: menu, commandos and expressions. The keyboard is
also needed in the semi-automatic prototypes. Only this keyboard is more intelligent. It is minimal -
only the necessary buttons will be dynamically generated with the current condition and
environments. The automatic prototype can generate the response automatically, but the operator can
take the control at any time. The final goal is an automatic system, which can replace the human
operator completely.