Changes between Version 15 and Version 16 of NewLanguageSupport


Ignore:
Timestamp:
11/16/09 17:43:36 (14 years ago)
Author:
masc01
Comment:

added description of AudioConverterGUI

Legend:

Unmodified
Added
Removed
Modified
  • NewLanguageSupport

    v15 v16  
    1  
    21= Adding support for a new language to MARY TTS = 
    3  
    42This page outlines the steps necessary to add support for a new language to MARY TTS. 
    53 
     
    108The following sections describe the various steps involved. 
    119 
    12  
    13 == 1. Download xml dump of wikipedia in your language  == 
    14  
    15  Information about where and how to download the wikipedia in several languages is in: http://en.wikipedia.org/wiki/Wikipedia_database 
    16  
    17  for example:   
    18  1. English xml dump of wikipedia available at : http://download.wikimedia.org/enwiki/latest/ 
    19  ( example file: enwiki-latest-pages-articles.xml.bz2 4.1 GB ) 
    20  2. Telugu xml dump of wikipedia available at : http://download.wikimedia.org/tewiki/latest/ 
     10== 1. Download xml dump of wikipedia in your language == 
     11  Information about where and how to download the wikipedia in several languages is in: http://en.wikipedia.org/wiki/Wikipedia_database 
     12 
     13  for example: 
     14 
     15 1. English xml dump of wikipedia available at : http://download.wikimedia.org/enwiki/latest/ ( example file: enwiki-latest-pages-articles.xml.bz2 4.1 GB ) 
     16 1. Telugu xml dump of wikipedia available at : http://download.wikimedia.org/tewiki/latest/ 
    2117 
    2218{{{ 
    2319 wget -b http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 
    2420}}} 
    25  
    26  
    2721== 2. Extract clean text and most frequent words == 
    28  
    2922'''2.1. Split the xml dump''' 
    3023 
    31 Once downloaded the best way to handle the xml dump is splitting it into small chunks. 
    32 You can avoid this step if your wiki dump is not bigger than 500MB, and you do not have memory problems. [[BR]] 
    33  
    34 For example, after unziping the English wikipedia dump will be approx. 16GB, so for further processing 
    35 it can be split using the '''WikipediaDumpSplitter''' program.  [[BR]] 
     24Once downloaded the best way to handle the xml dump is splitting it into small chunks. You can avoid this step if your wiki dump is not bigger than 500MB, and you do not have memory problems. [[BR]] 
     25 
     26For example, after unziping the English wikipedia dump will be approx. 16GB, so for further processing it can be split using the '''WikipediaDumpSplitter''' program.  [[BR]] 
    3627 
    3728The following script explains its usage and possible parameters for enwiki: 
    38     
     29 
    3930{{{ 
    4031#!/bin/bash 
     
    5748 
    5849}}} 
    59  
    60  
    61 '''2.2. Wikipedia Markup cleaning and mysql database creation 
     50'''2.2. Wikipedia Markup cleaning and mysql database creation''' 
    6251 
    6352The next step will be to extract clean text (without wikipedia markup) from the split xml files and save this text and a list of words in a mysql database.[[BR]] 
     
    7362mysql> flush privileges; 
    7463}}} 
    75 Int this case the ''wiki'' database is created, all privileges are granted to user ''mary'' in the localhost and the password is for example ''wiki123''.  
    76 These values will be used in the scripts bellow. [[BR]] 
     64Int this case the ''wiki'' database is created, all privileges are granted to user ''mary'' in the localhost and the password is for example ''wiki123''.  These values will be used in the scripts bellow. [[BR]] 
    7765 
    7866If you do not have rights for creating a mysql database, please contact your system administrator for creating one for you.[[BR]] 
    7967 
    80   
    81 Once you have a mysql database, you can start to extract clean text and words from the wikipedia split files using the '''WikipediaProcessor''' program.  The following script explains its usage and possible parameters (The scripts examples presented in this tutorial use the enwiki, that is locale en_US):[[BR]] 
     68  Once you have a mysql database, you can start to extract clean text and words from the wikipedia split files using the '''WikipediaProcessor''' program.  The following script explains its usage and possible parameters (The scripts examples presented in this tutorial use the enwiki, that is locale en_US):[[BR]] 
    8269 
    8370{{{ 
     
    123110 
    124111}}} 
    125  
    126 The wikilist.txt should contain something like:[[BR]] 
    127 /current-dir/xml_splits/page1.xml[[BR]] 
    128 /current-dir/xml_splits/page2.xml[[BR]] 
    129 /current-dir/xml_splits/page3.xml[[BR]] 
    130 ...[[BR]] 
    131  
    132  
    133 '''NOTE:''' If you experience memory problems you can try to split the big xml dump in smaller chunks.  
     112The wikilist.txt should contain something like:[[BR]] /current-dir/xml_splits/page1.xml[[BR]] /current-dir/xml_splits/page2.xml[[BR]] /current-dir/xml_splits/page3.xml[[BR]] ...[[BR]] 
     113 
     114'''NOTE:''' If you experience memory problems you can try to split the big xml dump in smaller chunks. 
    134115 
    135116'''Output:''' 
    136117 
    137 - It creates a file "./done.txt" which contains the files already processed, in case the program stops it can be re-started and it will 
    138 continue processing the not "done" files in the input list.[[BR]] 
    139  
    140 - A text file "./wordlist-freq.txt" containing the list of words and their frequencies, this file will be created after processing each xml 
    141 file. [[BR]] 
    142  
    143 - It creates two tables in the the database, the name of the tables depends on the locale, for example if the locale is "en_US" it will 
    144 create the tables en_US_cleanText and en_US_wordList, their description is:[[BR]] 
     118- It creates a file "./done.txt" which contains the files already processed, in case the program stops it can be re-started and it will continue processing the not "done" files in the input list.[[BR]] 
     119 
     120- A text file "./wordlist-freq.txt" containing the list of words and their frequencies, this file will be created after processing each xml file. [[BR]] 
     121 
     122- It creates two tables in the the database, the name of the tables depends on the locale, for example if the locale is "en_US" it will create the tables en_US_cleanText and en_US_wordList, their description is:[[BR]] 
    145123 
    146124{{{ 
     
    165143+-----------+------------------+------+-----+---------+----------------+ 
    166144}}} 
    167 [[BR]] 
    168    
    169  
    170  
    171 == 3. Transcribe most frequent words ==  
    172  
    173       Transcribe most frequent words using MARY Transcription Tool. Transcription Tool is a graphical user interface which supports a semi-automatic procedure for transcribing new language text corpus and automatic training of Letter-to-sound(LTS) rules for that language. It stores all functional words in that language to build a primitive POS tagger. 
    174  
    175       Create pronunciation dictionary,  train letter-to-sound rules and prepare list of functional words for primitive POS tagger using MARY Transcription Tool.  
    176        
    177       More details available at  http://mary.opendfki.de/wiki/TranscriptionTool 
    178      
    179  
    180   
     145== 3. Transcribe most frequent words == 
     146  Transcribe most frequent words using MARY Transcription Tool. Transcription Tool is a graphical user interface which supports a semi-automatic procedure for transcribing new language text corpus and automatic training of Letter-to-sound(LTS) rules for that language. It stores all functional words in that language to build a primitive POS tagger. 
     147 
     148  Create pronunciation dictionary,  train letter-to-sound rules and prepare list of functional words for primitive POS tagger using MARY Transcription Tool. 
     149 
     150  More details available at  http://mary.opendfki.de/wiki/TranscriptionTool 
    181151 
    182152== 4. Minimal NLP components for the new language == 
    183  
    184153With the files generated by the Transcription tool, we can now create a first instance of the NLP components in the TTS system for our language. 
    185154 
     
    225194 
    226195}}} 
    227  
    228  
    229196It can be seen that the tr.config file refers to the following files: 
    230197 
     
    235202MARY_BASE/lib/modules/tr/tagger/tr_pos.fst 
    236203}}} 
    237  
    238204They must be copied from the TranscriptionGUI folder to the expected place on the file system. 
    239205 
    240  
    241206Now, it should be possible to start the mary server, and place a query via the HTTP interface, for input format TEXT, locale tr, and output formats up to TARGETFEATURES. A suitable test request can be placed from http://localhost:59125/documentation.html. It is a good idea to check whether the output for TOKENS, PARTSOFSPEECH, PHONEMES, INTONATION and ALLOPHONES looks roughly as expected. 
    242207 
    243 In order to continue with the next step, you will need to have a mary server with this config file running, so that the FeatureMaker can compute feature vectors for computing diphone coverage.  
     208In order to continue with the next step, you will need to have a mary server with this config file running, so that the FeatureMaker can compute feature vectors for computing diphone coverage. 
    244209 
    245210== 5. Run feature maker with the minimal nlp components == 
    246  
    247 The '''FeatureMaker''' program splits the clean text obtained in step 2 into sentences, classify them as reliable, or non-reliable (sentences with unknownWords or strangeSymbols) and extracts context features from the reliable sentences. All this extracted data will be  
    248 kept in the DB.[[BR]] 
     211The '''FeatureMaker''' program splits the clean text obtained in step 2 into sentences, classify them as reliable, or non-reliable (sentences with unknownWords or strangeSymbols) and extracts context features from the reliable sentences. All this extracted data will be  kept in the DB.[[BR]] 
    249212 
    250213The following script explains its usage and possible parameters:[[BR]] 
     
    289252 
    290253}}} 
    291  
    292254There is a variant of the program, '''FeatureMakerMaryServer''', which calls an external Mary server instead of starting the Mary components internally. It takes the additional command line arguments ''-maryHost localhost -maryPort 59125''. 
    293255 
     
    298260- A file containing the feature definition of the features used for selection, the name of this file depends on the locale, for example for "en_US" it will be "/current-dir/en_US_featureDefinition.txt". This file will be used in the Database selection step.[[BR]] 
    299261 
    300 - It creates one table in the the database, the name of the table depends on the locale, for example if the locale is "en_US" it will 
    301 create the table en_US_dbselection, its descriptions is: [[BR]] 
    302  
     262- It creates one table in the the database, the name of the table depends on the locale, for example if the locale is "en_US" it will create the table en_US_dbselection, its descriptions is: [[BR]] 
    303263 
    304264{{{ 
     
    318278+----------------+------------------+------+-----+---------+----------------+ 
    319279}}} 
    320  
    321  
    322280== 6. Database selection == 
    323  
    324 The '''DatabaseSelector''' program selects a phonetically/prosodically balanced recording script.  
     281The '''DatabaseSelector''' program selects a phonetically/prosodically balanced recording script. 
    325282 
    326283The following script explains its usage and possible parameters:[[BR]] 
     284 
    327285{{{ 
    328286#!/bin/bash 
     
    384342 
    385343}}} 
    386  
    387344The following is an example of covDef.config file:[[BR]] 
     345 
    388346{{{ 
    389347# 
     
    413371#missingPhones  
    414372}}} 
    415  
    416 '''Output:'''[[BR]] 
    417 - Several log information in "/current-dir/selection/" directory 
     373'''Output:'''[[BR]] - Several log information in "/current-dir/selection/" directory 
    418374 
    419375- A file containing the selected sentences in "/current-dir/selected.log" 
     
    434390+----------------+------------------+------+-----+---------+----------------+ 
    435391}}} 
    436  
    437 Also a description of this table will be set in the tablesDescription table.  
     392Also a description of this table will be set in the tablesDescription table. 
    438393 
    439394The tablesDescription has information about: [[BR]] 
     395 
    440396{{{ 
    441397mysql> desc tablesDescription; 
     
    453409+----------------------------+------------+------+-----+---------+----------------+ 
    454410}}} 
    455  
    456  
    457411== 7. Manually check/correct transcription of all words in the recording script [Optional] == 
    458  
    459 The '''SynthesisScriptGUI''' program allows you to check the sentences selected in the previous step, discard some (or all) and select and 
    460 add more sentences.  
     412The '''SynthesisScriptGUI''' program allows you to check the sentences selected in the previous step, discard some (or all) and select and add more sentences. 
    461413 
    462414The following script can be used to start the GUI:[[BR]] 
     415 
    463416{{{ 
    464417#!/bin/bash 
     
    470423 
    471424}}} 
    472  
    473  
    474425Synthesis script menu options: 
    475426 
    4764271. '''Run DatabaseSelector''': Creates a new selection table or adds sentences to an already existing one. 
    477    - After running the DatabaseSelector the selected sentences are loaded.[[BR]] 
     428 
     429 * After running the DatabaseSelector the selected sentences are loaded.[[BR]] 
    478430 
    4794312. '''Load selected sentences table''': reads mysql parameters and load a selected sentences table. 
    480    - Once the sentences are loaded, use the checkboxes to mark sentences as unwanted/wanted.[[BR]] 
    481    - Sentences marked as unwanted can be unselected and set as wanted again. [[BR]] 
    482    - The DB is updated every time a checkbox is selected. [[BR]] 
    483    - There is no need to save changes. Changes can be made before the window is updated or the program exits.[[BR]] 
     432 
     433 * Once the sentences are loaded, use the checkboxes to mark sentences as unwanted/wanted.[[BR]] 
     434 * Sentences marked as unwanted can be unselected and set as wanted again. [[BR]] 
     435 * The DB is updated every time a checkbox is selected. [[BR]] 
     436 * There is no need to save changes. Changes can be made before the window is updated or the program exits.[[BR]] 
    484437 
    4854383. '''Save synthesis script as''': saves the selected sentences, without unwanted, in a file.[[BR]] 
     
    4934467. '''Exit''': terminates the program.[[BR]] 
    494447 
    495  
    496  
    497448== 8. Record script with a native speaker using our recording tool "Redstart" == 
    498  
    499449In the recording tool Redstart, there is an import functionality for the text files generated from the synthesis script selection GUI. From the Redstart menu, select "File"->"Import text file..." and follow the on-screen instructions. 
    500450 
    501  
    502 == 9. Build an unit selection and/or hmm-based voice with Voice import tool == 
     451== 9. Convert recorded audio == 
     452Usually it makes sense to convert the audio recorded from the speaker before building a synthetic voice from it. MARY provides a GUI that provides a range or processing options: 
     453 
     454[[Image(AudioConverterGUI.png)]] 
     455 
     456The following options are provided: 
     457 
     458 * Process only the best take of each sentence: Redstart saves various takes of the same sentence under names such as w0001.wav, w0001a.wav, w0001b.wav etc. If this option is selected, only the last recorded version, w0001.wav, will be processed. 
     459 * Global amplitude scaling allows you to control the maximum amplitude of the converted files, independently of the recording amplitude. Power normalisation across recording sessions attempts to identify recording sessions by the time stamps of files: a pause longer than 10 minutes indicates a session break. For each session separately, a mean energy is computed, and conversion factors for each file are computed such that after the conversion, the average energy for all sessions is the same. The aim behind this processing is to compensate for the case that from one session to another, there may have been slightly different recording gains or minor differences in the speaker's distance to the microphone. Attention: This method can work only if the audio files have the original time stamps of the recordings, so take extra care when copying files if you intend to use this normalisation. 
     460 * Stereo to mono conversion: If you recorded in stereo, you must convert to mono before building a voice. Choose either the left channel only, the right channel only, or a mix of both channels. 
     461 * Remove low-frequency noise below 50 Hz: this applies a high-pass FIR filter with a cutoff frequency of 50 Hz and a transition bandwidth of 40 Hz. Since the FIR filter has a symmetric kernel, it has a linear phase response. 
     462 * Trim initial and final silences: this applies a k-means clustering to identify silence vs. speech portions of the audio file, leaving 0.5 seconds initial and final silence. This is useful to avoid training absurdly long pause duration models. 
     463 * If a sox binary is available, it is also possible to convert the sampling rate. A usual target rate is 16000 Hz, but other rates are also possible. 
     464 
     465== 10. Build an unit selection and/or hmm-based voice with Voice import tool ==