Free Web Hosting Provider - Web Hosting - E-commerce - High Speed Internet - Free Web Page
Search the Web

SYAH

Syah E-Bulletin

Science & Technology

 

[Back to Home]


Speech Synthesis [April 2, 2002]

Speech synthesis usually refers to electronically generated speech. Nowadays, speech synthesis technology is widely used, such as in ATMs, telephone services, lifts, computer programs, etc.

Basically, we can categorize speech synthesis technology into two types, one is with limited utterances, and the other is with unrestricted text input, which means that the first only 'speaks' pre-recorded sentences, while the latter can 'read' texts!

Speech synthesis technology method can also be divided according to the algorithm to produce speech. First, the simplest method is to simply record words or sentences, then combine them to make it understandable. Or, on an extreme case, there are actually computer programs distributed on the Net which reads text, but it simply combines pre-recorded vowels and consonants!

The second more sophisticated algorithm uses pre-recorded diftongs. Diftongs are transitions of two letters. Excellent quality speech synthesis system uses this method, but the drawback is that it needs a large diftong database, and it is hard to build such a database. 

The third way to build a speech synthesis system is by using a source-filter algorithm. This is a relatively simple mathematical modeling of a human speech system. So, no recording is required for this system, it only needs speech parameters which has already been obtained from past experiments. Among the important speech parameters are formant frequencies. Many researchers have listed their experiment results on this, and can be found in many technical literatures. One such famous system using this method is the Klatt Synthesizer.

Another way for a computer to produce human-like speech is to make an articulatory-model of the human speech system. Unlike the previous method, this algorithm takes into account all the articulatory parts of our speech system, e.g. toung position, mouth volume, nose, etc. This method has the highest cost in terms of computation time, but it may be the most flexible and reliable method in the future.


Copyrigth 2002 Irfansyah