Context Navigation

Changes between Version 10 and Version 11 of NewLanguageSupport

Timestamp:: 06/30/09 14:09:27 (16 years ago)
Author:: masc01
Comment:: added short description for new language nlp components

Legend:

: Unmodified
: Added
: Removed
: Modified

NewLanguageSupport

-                      v10
+                      v11
 == 4. Minimal NLP components for the new language ==
+With the files generated by the Transcription tool, we can now create a first instance of the NLP components in the TTS system for our language.
+We add support for our language to MARY TTS by creating a new config file in the folder MARY TTS\conf. By convention the file is called <locale>.config. It tells the MARY server which TTS modules to load, and which data files to use.
+The following is an example for Turkish (locale "tr").
+{{{
+##########################################################################
+# MARY TTS configuration file tr.config
+##########################################################################
+name = tr
+tr.version = 4.0.0
+provides = a-language
+requires = \
+    marybase
+###########################################################################
+############################## The Modules  ###############################
+###########################################################################
+modules.classes.list = \
+        marytts.modules.JPhonemiser(tr.)  \
+        marytts.modules.MinimalisticPosTagger(tr,tr.) \
+####################################################################
+####################### Module settings  ###########################
+####################################################################
+# Phonemiser settings
+tr.allophoneset = MARY_BASE/lib/modules/tr/lexicon/allophones.tr.xml
+tr.lexicon = MARY_BASE/lib/modules/tr/lexicon/tr_lexicon.fst
+tr.lettertosound = MARY_BASE/lib/modules/tr/lexicon/tr.lts
+#tr.userdict = MARY_BASE/lib/modules/tr/lexicon/userdict.txt
+# POS tagger settings
+tr.partsofspeech.fst = MARY_BASE/lib/modules/tr/tagger/tr_pos.fst
+tr.partsofspeech.punctuation = ,.?!;
+}}}
+It can be seen that the tr.config file refers to the following files:
+{{{
+MARY_BASE/lib/modules/tr/lexicon/allophones.tr.xml
+MARY_BASE/lib/modules/tr/lexicon/tr_lexicon.fst
+MARY_BASE/lib/modules/tr/lexicon/tr.lts
+MARY_BASE/lib/modules/tr/tagger/tr_pos.fst
+}}}
+They must be copied from the TranscriptionGUI folder to the expected place on the file system.
+Now, it should be possible to start the mary server, and place a query via the HTTP interface, for input format TEXT, locale tr, and output formats up to TARGETFEATURES. A suitable test request can be placed from http://localhost:59125/documentation.html. It is a good idea to check whether the output for TOKENS, PARTSOFSPEECH, PHONEMES, INTONATION and ALLOPHONES looks roughly as expected.
+In order to continue with the next step, you will need to have a mary server with this config file running, so that the FeatureMaker can compute feature vectors for computing diphone coverage.
 == 5. Run feature maker with the minimal nlp components ==