| 13 | |
| 14 | '''How difficult is it to put together the database needed in order to synthesize Hebrew/Italian/Spanish/Hindi/...? Is Mary modular in that sense?''' |
| 15 | |
| 16 | Mary is very modular, and a number of modules exist in a language-independent and configurable implementation, but there is still enough work left to do. |
| 17 | |
| 18 | For many languages, you could start with the existing MBROLA diphone voices: |
| 19 | http://tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html |
| 20 | |
| 21 | You would then need at least the following MARY TTS modules: |
| 22 | |
| 23 | * needed: a Tokeniser, cutting the input into sentences and tokens (it may be possible to re-use source:trunk/java/de/dfki/lt/mary/modules/JTokeniser.java for a number of languages) |
| 24 | |
| 25 | * optional: a text normalisation which expands numbers, abbreviations etc. into a pronounceable form (but that can be left out at the beginning) |
| 26 | |
| 27 | * optional: a part-of-speech tagger, distinguishing at least between content words and function words |
| 28 | |
| 29 | * crucially needed: a phonemiser, converting the input text into sound symbols, e.g. in SAMPA. This can be based on rules for some languages (probably, Spanish), but a pronounciation lexicon is required for others when the link between spelling and pronounciation is less regular. Then, also, the lexicon must be complemented with "letter-to-sound" rules for unknown words. |
| 30 | |
| 31 | * optional: a prosody assignment module, predicting e.g. ToBI labels based on part-of-speech and other information. |
| 32 | source:java/de/dfki/lt/mary/modules/ProsodyGeneric.java, written by my student Stephanie Becker, may be a good place to start. |
| 33 | |
| 34 | * needed: a duration assignment module, predicting phone durations. As a very first start, the Klatt rules as currently used in the Tibetan language component: source:java/de/dfki/lt/mary/modules/tib/KlattDurationModeller |
| 35 | could be used, of course adapted to the language-specific phoneme set. |
| 36 | |
| 37 | * optional: an intonation contour realisation module. For example, there is a generic source:java/de/dfki/lt/mary/modules/TobiContourGenerator that can be used for different languages by writing appropriate config files. |
| 38 | |
| 39 | * needed: synthesis, e.g. using MBROLA voices. |
| 40 | |
| 41 | So, in summary, for adding a new language, you most crucially need a |
| 42 | phonemiser, and you need to get at least a tokeniser and a duration |
| 43 | assigner to work. Assuming that there is already an acceptable MBROLA |
| 44 | voice for your language. |
| 45 | |
| 46 | On the bright side, as data representation is based on Unicode, there |
| 47 | should be no problem with non-European scripts. |