The “text-to-speech” mechanism

The “text-to-speech” is a technique used to have some synthesis of the words as realistic as possible. The ICE project has integrated a module that uses a good library of “text-to-speech” routines. The mechanism is used by the Flight Gear Client program on the project. For each new situation or new the next most probable flight procedure there is a speeched message for notification. The same for common commands, such as load a new context, exit,..

The implementation imply the existence of a separate program, the Speech Server. It acts like a server that is waiting for requests to speech. There is just an one way comunication because the server does not send back anything. This time the comunication is also on network, but a different procedure is used, by UDP (User Datagram Protocol). This is a less reliable connection, there is no data streams between the server and its clients. The data passes between the modules through datagrams, small packages of information. The advantage to implement such a system is that it is less complex and the results are the same. The main disadvantage of the UDP based network comunications is that it is possible that the packages can travel on different routes between the server and client. It may happen to have a network distance with high traffic at he moment. The final result is that to have some packages to reach the destination but having other early packages already received. So the package arrives the destination after other early packages sent by the other part. For the ICE project this is not a problem because of the fact that the speech server must be started on the same machine on which the Flight Gear Client is running. So the comunication is done correctly since it’s about the same machine. The Flight Gear Client can be started event without having the Speech Server program started before, but the result is that there is no “speech” during the analysis on the Client.

The Speech Server so waits for data packages. Each package contains the string of chars, the text to be spoken. Once there such a package received, the text is got and sent to the “text-to-speech” routines.

Back to the index