Y. Pang - Multimodal McDrive System

title:	Multimodal McDrive System
author:	Yun Pang
published in:	August 2002
appeared as:	Master of Science thesis Knowledge Based Systems group Delft University of Technology
pages	132
	PDF (2.640 KB)

Abstract

This project is aimed to build a multimodal intelligent system to replace the human operator of McDrive. McDonald™s is the largest and best-known global foodservice retailer in the whole world. It has more than 30,000 restaurants in 121 countries. There are 1.5 million people working at McDonald™s. McDrive is one of the braches of McDonalds fast food chains. As we all know the human power is very expensive. In order to reduce the human costs we want to build a system to replace the operators of McDrive who take orders from the customers. This automated system uses speech recognition technology to communicate with the customers. This issue has been discussed in their works of Farhaad Mohamed-Hoesein and Ramya Ramaswamy. To make this system works better we want to make it multi-modal. This system can talk to the customer, understand what the customer says and give the right response. To improve the customer understanding this system gives not only audio feedback but also visual feedback such as text and graphics. The graphics may be a picture or a flash movie. Our goal is to reduce the manpower cost, but in the meanwhile we must maintain the service level. It means this system needs to think and behave like a human being. We, human being, can communicate with each other using different ways. Often no talk is needed, just a gesture or a facial expression is enough and we can understand each other perfectly. We want this system also having such emotion expressions. We build a human wizard to express these feelings.

The final system will be automatic but first we need to build some prototypes for testing. There will be three prototypes that have different control modes: manual, semi-automatic and automatic. At this moment a manual prototype is built. In this prototype the customer can only use text as input because of the time and money limitation. The operator needs to produce the response manually through a special keyboard. This keyboard has three parts: menu, commandos and expressions. The keyboard is also needed in the semi-automatic prototypes. Only this keyboard is more intelligent. It is minimal - only the necessary buttons will be dynamically generated with the current condition and environments. The automatic prototype can generate the response automatically, but the operator can take the control at any time. The final goal is an automatic system, which can replace the human operator completely.