wiki:FrequentlyAskedQuestions

Context Navigation

Version 9 (modified by schroed, 19 years ago) (diff)
added license info

Frequently Asked Questions

Can I install additional voices/components after the main installation, or do I have to reinstall everything "from scratch"?

You can run the installer again and select only the new packages that you wish to install. Make sure you specify the same installation directory as before. This should work OK, maybe with the exception of link creation and uninstaller creation.

It would of course be nicer to have an integrated update manager -- see ticket:8. Help is welcome! :-)

What exactly is the license for the software?

The core OpenMary system, as released on this development page, is distributed under a very liberal BSD-style license which basically allows you to do anything you want with the code provided that you acknowledge where you have it from: http://mary.dfki.de/download/MARY%20software%20user%20agreement.html

The German language modules as well as the English part-of-speech tagger is released in binary form, under a research license: http://mary.dfki.de/download/DFKI%20MARY%20software%20user%20agreement.html You must not use this code in a commercial setup unless you obtain a separate license from DFKI, and there are other restrictions. Do read the license agreement carefully when you use the German component.

The MBROLA binaries and voices, finally, are distributed with MARY because that is allowed by the MBROLA license: http://mary.dfki.de/download/Mbrola%20software%20user%20agreement.html These can only be used in a non-commercial, non-military setting.

How difficult is it to add support for Hebrew/Italian/Spanish/Hindi/...? Is Mary modular in that sense?

Mary is very modular, and a number of modules exist in a language-independent and configurable implementation, but there is still enough work left to do.

For many languages, you could start with the existing MBROLA diphone voices: http://tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html

You would then need at least the following MARY TTS modules:

needed: a Tokeniser, cutting the input into sentences and tokens (it may be possible to re-use source:trunk/java/de/dfki/lt/mary/modules/JTokeniser.java for a number of languages)

optional: a text normalisation which expands numbers, abbreviations etc. into a pronounceable form (but that can be left out at the beginning)

optional: a part-of-speech tagger, distinguishing at least between content words and function words

crucially needed: a phonemiser, converting the input text into sound symbols, e.g. in SAMPA. This can be based on rules for some languages (probably, Spanish), but a pronounciation lexicon is required for others when the link between spelling and pronounciation is less regular. Then, also, the lexicon must be complemented with "letter-to-sound" rules for unknown words.

optional: a prosody assignment module, predicting e.g. ToBI labels based on part-of-speech and other information.

source:trunk/java/de/dfki/lt/mary/modules/ProsodyGeneric.java, written by my student Stephanie Becker, may be a good place to start.

needed: a duration assignment module, predicting phone durations. As a very first start, the Klatt rules as currently used in the Tibetan language component: source:trunk/java/de/dfki/lt/mary/modules/tib/KlattDurationModeller.java

could be used, of course adapted to the language-specific phoneme set.

optional: an intonation contour realisation module. For example, there is a generic source:trunk/java/de/dfki/lt/mary/modules/TobiContourGenerator.java that can be used for different languages by writing appropriate config files.

needed: synthesis, e.g. using MBROLA voices.

So, in summary, for adding a new language, you most crucially need a phonemiser, and you need to get at least a tokeniser and a duration assigner to work. Assuming that there is already an acceptable MBROLA voice for your language.

On the bright side, as data representation is based on Unicode, there should be no problem with non-European scripts.

Download in other formats:

Plain Text