Changes between Initial Version and Version 1 of HMMVoiceCreationMary4.0


Ignore:
Timestamp:
03/23/09 18:58:20 (16 years ago)
Author:
marcela_charfuelan
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • HMMVoiceCreationMary4.0

    v1 v1  
     1 
     2= '''Voice Import Tools Tutorial : How to build a HMM-based voice for the MARY 4.0 platform (DRAFT)''' = 
     3 
     4For creating HMM-based voices we use a version of the speaker dependent training scripts provided by [http://hts.sp.nitech.ac.jp/ HTS] that was adapted to the MARY 4.0 
     5platform. The steps for building a HMM voice for the MARY platform can be summarised in:[[BR]] 
     6 
     7I)   Checking the necessary programs and files[[BR]] 
     8II)  Data preparation[[BR]] 
     9III) Training of HMM models[[BR]] 
     10IV)  Adding a new HMM voice in the Mary system.[[BR]] 
     11V)   Creating other voice in German or English (__if you want to train HMMs with another speech database__).[[BR]] 
     12VI) Creating other voice in a language different from German or English (US). 
     13 
     14The previous steps will be explained below creating a HMM voice using the HTS '''speaker dependent training demo'''.[[BR]] 
     15 
     16The training scripts used here are the latest versions, that is, it is required HTS_2.1 and SPTK-3.2. Some scripts have been added-modified to:[[BR]] 
     17- Use MARY instead of festival as text analyzer.[[BR]] 
     18- Train bandpass voicing strengths for mixed excitation.[[BR]] 
     19- Process language specific settings as parameters.[[BR]] 
     20 
     21 
     22''' 
     23=== I) Checking the necessary programs and files: === 
     24''' 
     25 
     26MARY requirements:[[BR]] 
     27- Operating System - Linux[[BR]] 
     28- MARY TTS Recent Version - Download Link: http://mary.dfki.de/Download [[BR]] 
     29- Openmary - SVN from http://mary.opendfki.de [[BR]] 
     30- MARY patch for HTS demo: HTS-2.1-demo_CMU-ARCTIC-SLT_for_Mary-4.0.patch [[BR]] 
     31 
     32 
     33HTS requirements, please download and follow the instructions for installing:[[BR]] 
     34- HTS-2.1_for_HTK-3.4.patch [[BR]] 
     35- HTK-3.4 ( [wiki:CompilingHTK Important note on compiling HTK] )   [[BR]]   
     36- SPTK-3.2 [[BR]] 
     37- HTS-demo_CMU-ARCTIC-SLT (for HTS-2.1) [[BR]] 
     38 
     39Other requirements, please download and follow the instructions for installing: [[BR]] 
     40- EHMM for automatic labeling, available with festvox-2.1 (Recent Version) http://festvox.org/download.html [[BR]] 
     41- sox, normally available in linux.  [[BR]] 
     42- tcl-tk supporting snack, for example  ActiveTcl - Download Link: http://www.activestate.com/Products/ActiveTcl/ [[BR]] 
     43- perl, normally available in linux.  [[BR]] 
     44 
     450.1) download and un-zip, un-tar the latest speaker dependent training demo for English. 
     46 
     47http://hts.sp.nitech.ac.jp/archives/2.1/HTS-demo_CMU-ARCTIC-SLT.tar.bz2 for HTS-2.1 
     48 
     490.2) download and unzip the patch file for using MARY instead of Festival as text analyser. 
     50 
     51https://mary.opendfki.de/repos/trunk/lib/hts/HTS-2.1-demo_CMU-ARCTIC-SLT_for_Mary-4.0.patch [[BR]] 
     52 
     53apply the patch to the HTS-demo_CMU-ARCTIC-SLT directory:  [[BR]]  
     54{{{ 
     55   patch -p1 -d . < HTS-2.1-demo_CMU-ARCTIC-SLT_for_Mary-4.0.patch 
     56}}} 
     57 
     580.3) create a wav directory. 
     59 
     600.4) Run the VoiceImport program 
     61 
     62First of all you need to set your MARY_BASE directory and then run the program:  [[BR]] 
     63{{{ 
     64   export MARY_BASE="/dir/to/openmary" 
     65   java -jar -Xmx1024m  $MARY_BASE/java/voiceimport.jar 
     66}}} 
     67 
     68If you are not familiar or have problems with the VoiceImport program, please read and follow the instructions in the Voice Import Tools 
     69Tutorial: http://mary.opendfki.de/wiki/VoiceImportToolsTutorial 
     70 
     71If you want to create another voice in German or English please see the section V below. 
     72 
     73Please remember that whenever you are in doubt about the settings of a particular component you can check its corresponding help for a description of the meaning 
     74(and possible values) of each variable. 
     75 
     76''' 
     77=== II) Data preparation: === 
     78''' 
     79 
     801- Run the HMMVoiceDataPreparation of the HMM Voice Trainer group to check if text, wav and data/raw files are available and in the correct paths. 
     81If just data/raw is provided, the program will do the conversion.  If no text files are available but data/utts in festival format, the program will do the conversion as well. 
     82 
     832- Run the PhoneUnitFeatureComputer component of the Feature Extraction group to extract context feature vectors from the text data. This procedure will create a "phonefeatures" directory.  For running this component the MARY server should be running as well. 
     84 
     853- Run the EHMMlabeler component of the Automatic Labeling group to label automatically the wav files using the corresponding transcriptions. This procedure might 
     86take several hours. For running EHMMLabeler, please use the settings editor of this component to set, according to your festvox installation, the variable: 
     87{{{ 
     88   EHMMLabeler.ehmm  = ../festvox/src/ehmm/bin/ 
     89}}} 
     90 
     914- Run the LabelPauseDeleter component of the Automatic Labeling group. Please use the settings editor of this component to set the variable: 
     92{{{ 
     93   LabelPauseDeleter.threshold  =  10 
     94}}} 
     95 
     965- Run the PhoneUnitLabelComputer component of the Labels and Pause Correction group. This procedure will create a "phonelab" directory. 
     97 
     986- Run the PhonelabelFeatureAligner component of the Labels and Pause Correction group. This procedure will verify alignment between "phonefeatures" and "phonelabels". 
     99 
     100 
     101''' 
     102=== III) HMM models training: === 
     103''' 
     104 
     1057- Run the HMMVoiceConfigure component of the HMM Voice trainer group. The default setting values of this component are already fixed for the HTS-demo_CMU-ARCTIC-SLT voice, although some setting depends on your installation, please provide paths for: 
     106{{{ 
     107  HMMVoiceConfigure.htsPath       = /yourpath/htk-hts2.1/bin 
     108  HMMVoiceConfigure.htsEnginePath = /yourpath/hts_engine_API-1.01/bin 
     109  HMMVoiceConfigure.sptkPath      = /yourpath/SPTK-3.2/bin 
     110  HMMVoiceConfigure.tclPath       = /yourpath/ActiveTcl-8.6/bin 
     111  HMMVoiceConfigure.soxPath       = /yourpath/usr/bin 
     112}}} 
     113 
     114If running configure for other voice, for example a male German voice, please use the settings editor of this component to set the variables: 
     115{{{ 
     116  HMMVoiceConfigure.dataSet      =  german_set_name 
     117  HMMVoiceConfigure.featureList  =  feature_list_de.pl  (the set of context features used for this voice can be change in this file). 
     118  HMMVoiceConfigure.speaker      =  speaker_name  
     119  HMMVoiceConfigure.lowerF0      =  40  (male=40,  female=80)   
     120  HMMVoiceConfigure.upperF0      =  280 (male=280, female=350) 
     121  HMMVoiceConfigure.voiceLang    =  de 
     122}}} 
     123 
     124Using the settings editor of this component you can also change other variables like using LSP instead og MGC, sampling frequency, etc., the same as you would do when running "make configure + parameters" with the original HTS scripts. 
     125 
     1268- Run the HMMVoiceMakeData component of the HMM Voice trainer group to run the HTS procedure "make data". This procedure is the same as in the original scripts with additional sections for calculating strengths and Fourier magnitudes (for mixed excitation), global variance, and handling of MARY context features. 
     127 
     128Particular procedures can be repeated isolated fixing the particular settings for this component. For example, if the procedure that creates strengths (in the str directory) has to be repeated with a different set of filters (data/filters/), please remove the old str directory and set: 
     129{{{ 
     130  HMMVoiceMakeData.makeSTR       =  1 
     131  HMMVocieMakeData.makeCMPMARY   =  1 
     132}}} 
     133all the other variables in 0, and run again the component. (In this case you need to run as well makeCMPMARY because you need to compose again the vectors mgc+lf0+str+mag). 
     134 
     135The procedures can be repeated manually as well, going to the data directory and running "make data" or "make str", as is normally done with the original HTS scripts. 
     136 
     137NOTE: the Makefile in data/ includes a gv: section to calculate global variance files. In MARY, these files are generated little endian and contain a header of size one short to indicate the size of the vectors it contains. 
     138 
     1399- Run the HMMVoiceMakeVoice component of the HMM Voice trainer group, here again particular training steps can be repeated selecting them (setting in 1, all the others in 0) from the settings of this component. This is equivalent to run again: 
     140{{{ 
     141   perl scripts/Training.pl scripts/Config.pm > logfile & 
     142}}} 
     143after modifying the Config.pm file, as is normally done with the original HTS scripts. 
     144  
     145This component will generate general information about the execution of the training steps. Detailed information about the training status can be found in the logfile in the current directory. 
     146 
     147The training procedure can take several hours, please check the log file time to time to check progress. 
     148 
     149 
     150''' 
     151=== IV) Adding a new voice in the MARY platform: === 
     152''' 
     153 
     15410- Run the HMMVoiceInstaller component of the Install Voice group. The default setting values of this component are already fixed for the HTS-demo_CMU-ARCTIC-SLT voice. If you are training other voice  please use the settings editor of this component to set: 
     155{{{ 
     156   
     157  HMMVoiceInstaller.FeaFile     =  phonefeatures/xx.pfeats 
     158                                   this is an example of a CONTEXTFEATURES file for synthesise during start-up.  
     159  HMMVoiceInstaller.useMixExc   =  true 
     160                                   set this variable to true if using mixed excitation 
     161  HMMVoiceInstaller.useGV       =  true  
     162                                   set this variable to true if using global variance in parameter generation. 
     163}}} 
     164 
     165The VoiceInstaller will:  [[BR]] 
     166- Create a new mary config file in: $MARY_BASE/conf/german-hsmm-voice.config [[BR]] 
     167- Add the files corresponding to this voice in: $MARY_BASE/lib/voices/hsmm-voice/  [[BR]] 
     168- copy features list: data/feature_list_en.pl to  $MARY_BASE/lib/voices/hsmm-voice  [[BR]] 
     169- copy one example of phonefeatures for testing the synthesiser: data/phonefeatures/cmu_us_arctic_slt_xxxx.pfeats to $MARY_BASE/lib/voices/hsmm-voice  [[BR]] 
     170- copy the HTS trees: voices/qst001/ver1/*.inf to $MARY_BASE/lib/voices/hmm-voice  [[BR]] 
     171- copy the HTS PDF models: voices/qst001/ver1/*.pdf to $MARY_BASE/lib/voices/hmm-voice [[BR]] 
     172- copy global variance models (if useGV is set to true): data/gv/gv-*-littend.pdf to $MARY_BASE/lib/voices/hmm-voice [[BR]] 
     173- copy filter taps for mixed excitation: data/filters/mix_excitation_filters.txt to $MARY_BASE/lib/voices/hmm-voice [[BR]] 
     174 
     175After successfully installing a new voice, it can be used with the mary_server and the mary_client. 
     176 
     177 
     178''' 
     179=== V) Creating other voice in German or English. === 
     180''' 
     181 
     182If using German: 
     183 
     184For creating a new German voice it is necessary: [[BR]] 
     185  * a wav or raw directory with the speech files you will use for training the German voice. [[BR]] 
     186  * transcriptions of the files, one text file per speech file, or transcriptions in festival format if available. [[BR]] 
     187 
     188Then we use as a base the original HTS-demo_CMU-ARCTIC-SLT directory: 
     189 
     190- Download and un-zip, un-tar the HTS-demo_CMU-ARCTIC-SLT for HTS-2.1 
     191 
     192- Rename this directory as your new voice name, for example german_voice, and delete the directories data/raw and data/utt. 
     193 
     194- Apply the MARY patch to the german_voice directory. [[BR]] 
     195  patch -p1 -d . < HTS-2.1-demo_CMU-ARCTIC-SLT_for_Mary-4.0.patch 
     196 
     197- Move your speech files to this directory, if you have a wav directory, this should be copied in the current directory (german_voice/wav). If you have a raw directory, this should be copied in the data directory (german_voice/data/raw). 
     198 
     199- Move your transcription files to this directory, if you have a text directory containing the transcription of each file in separate files, this should be copied in the current directory (german_voice/text). If you have transcriptions in festival format please copy this directory in the data/utts directory (german_voice/data/utts/). 
     200 
     201- Now run the VoiceImport program and follow the instructions as normal. Provide general settings for: 
     202{{{ 
     203   db.gender    =  male  (or female) 
     204   db.locale    =  de 
     205   db.marybase  =  /path/to/mary/base/ 
     206   db.voicename =  german_voice 
     207}}} 
     208 
     209''' 
     210=== VI) Creating other voice in a language different from German or English (US). === 
     211''' 
     212 
     213If you are trying other language you will need to specify or make changes in the following: 
     214 
     215- '''Phoneme set''': if some phone names are not compatible or can not be handled by HTK please create aliases for these phonenames. You need to make some changes in the code to include these aliases, for example these are the changes we need to do to create a British English voice: [[BR]] 
     216 
     217First of all check the phone set in allophones.en_GB.xml [[BR]] 
     218{{{ 
     219<allophones name="sampa" xml:lang="en-GB" features="vlng vheight vfront vrnd ctype cplace cvox"> 
     220<silence ph="_"/> 
     221 
     222<vowel ph="@U" vlng="d" vheight="2" vfront="3" vrnd="+"/> 
     223<vowel ph="A:" vlng="l" vheight="3" vfront="3" vrnd="-"/> 
     224<vowel ph="3:" vlng="l" vheight="2" vfront="2" vrnd="+"/> 
     225<vowel ph="u:" vlng="l" vheight="1" vfront="3" vrnd="+"/> 
     226<vowel ph="i:" vlng="l" vheight="1" vfront="1" vrnd="-"/> 
     227<vowel ph="EI" vlng="d" vheight="2" vfront="1" vrnd="-"/> 
     228<vowel ph="E" vlng="s" vheight="2" vfront="1" vrnd="-"/> 
     229<vowel ph="@" vlng="a" vheight="2" vfront="2" vrnd="-"/> 
     230<vowel ph="O:" vlng="l" vheight="3" vfront="3" vrnd="+"/> 
     231<vowel ph="A" vlng="l" vheight="3" vfront="3" vrnd="-"/> 
     232<vowel ph="O" vlng="l" vheight="3" vfront="3" vrnd="+"/> 
     233<vowel ph="I" vlng="s" vheight="1" vfront="1" vrnd="-"/> 
     234<vowel ph="aU" vlng="d" vheight="3" vfront="2" vrnd="-"/> 
     235<vowel ph="aI" vlng="d" vheight="3" vfront="2" vrnd="-"/> 
     236<vowel ph="U" vlng="s" vheight="1" vfront="3" vrnd="+"/> 
     237<vowel ph="V" vlng="s" vheight="2" vfront="2" vrnd="-"/> 
     238<vowel ph="OI" vlng="d" vheight="2" vfront="3" vrnd="+"/> 
     239<vowel ph="Q" vlng="l" vheight="3" vfront="3" vrnd="+"/> 
     240<vowel ph="i" vlng="l" vheight="1" vfront="1" vrnd="-"/> 
     241<vowel ph="u" vlng="l" vheight="1" vfront="3" vrnd="+"/> 
     242<vowel ph="r=" vlng="a" vheight="2" vfront="2" vrnd="-"/> 
     243<vowel ph="{" vlng="s" vheight="3" vfront="1" vrnd="-"/> 
     244 
     245<consonant ph="l=" ctype="l" cplace="a" cvox="+"/> 
     246<consonant ph="tS" ctype="a" cplace="p" cvox="-"/> 
     247<consonant ph="N=" ctype="n" cplace="v" cvox="+"/> 
     248<consonant ph="dZ" ctype="a" cplace="p" cvox="+"/> 
     249<consonant ph="5" ctype="l" cplace="a" cvox="+"/> 
     250<consonant ph="m=" ctype="n" cplace="l" cvox="+"/> 
     251<consonant ph="D" ctype="f" cplace="d" cvox="+"/> 
     252<consonant ph="N" ctype="n" cplace="v" cvox="+"/> 
     253<consonant ph="n=" ctype="n" cplace="a" cvox="+"/> 
     254<consonant ph="T" ctype="f" cplace="d" cvox="-"/> 
     255<consonant ph="S" ctype="f" cplace="p" cvox="-"/> 
     256<consonant ph="Z" ctype="f" cplace="p" cvox="+"/> 
     257<consonant ph="f" ctype="f" cplace="b" cvox="-"/> 
     258<consonant ph="g" ctype="s" cplace="v" cvox="+"/> 
     259<consonant ph="d" ctype="s" cplace="a" cvox="+"/> 
     260<consonant ph="b" ctype="s" cplace="l" cvox="+"/> 
     261<consonant ph="n" ctype="n" cplace="a" cvox="+"/> 
     262<consonant ph="l" ctype="l" cplace="a" cvox="+"/> 
     263<consonant ph="m" ctype="n" cplace="l" cvox="+"/> 
     264<consonant ph="j" ctype="r" cplace="p" cvox="+"/> 
     265<consonant ph="k" ctype="s" cplace="v" cvox="-"/> 
     266<consonant ph="h" ctype="f" cplace="g" cvox="-"/> 
     267<consonant ph="w" ctype="r" cplace="l" cvox="+"/> 
     268<consonant ph="v" ctype="f" cplace="b" cvox="+"/> 
     269<consonant ph="t" ctype="s" cplace="a" cvox="-"/> 
     270<consonant ph="s" ctype="f" cplace="a" cvox="-"/> 
     271<consonant ph="r" ctype="r" cplace="a" cvox="+"/> 
     272<consonant ph="p" ctype="s" cplace="l" cvox="-"/> 
     273<consonant ph="z" ctype="f" cplace="a" cvox="+"/> 
     274 
     275</allophones> 
     276 
     277}}} 
     278 
     279We have experienced problems in HTK with phone names that include ":", "=", "~" or the phone name is a number or a symbol like ?. In the case of "en-GB" we have created the following aliases, they should be included in the files: 
     280 
     281In data/scripts/common_routines.pl add/change in the array tricky_phones:  
     282{{{ 
     283%tricky_phones = ("A:", "Ac", 
     284                  "3:", "3c", 
     285                  "u:", "uc", 
     286                  "i:", "ic", 
     287                  "O:", "Oc", 
     288                  "r=", "re", 
     289                  "l=", "le", 
     290                  "N=", "Ne", 
     291                  "m=", "me", 
     292                  "n=", "ne" 
     293                  ); 
     294}}} 
     295You can define your own aliases, here we have used c for replacing the colon and e for replacing the equal. [[BR]] 
     296  
     297[[BR]] 
     298Add something similar but in java in he function marytts.htsengine.PhoneTranslator.replaceTrickyPhones(): 
     299{{{ 
     300public static String replaceTrickyPhones(String lab){ 
     301      String s = lab; 
     302      if(lab.contentEquals("A:") ) 
     303          s = "Ac"; 
     304      else if (lab.contentEquals("3:") ) 
     305          s = "3c"; 
     306      else if (lab.contentEquals("u:") ) 
     307          s = "uc"; 
     308      else if (lab.contentEquals("i:") ) 
     309          s = "ic"; 
     310      else if (lab.contentEquals("O:") ) 
     311          s = "Oc"; 
     312      else if (lab.contentEquals("r=") ) 
     313          s = "re"; 
     314      else if (lab.contentEquals("l=") ) 
     315          s = "le"; 
     316      else if (lab.contentEquals("N=") ) 
     317          s = "Ne"; 
     318      else if (lab.contentEquals("m=") ) 
     319          s = "Ne"; 
     320      else if (lab.contentEquals("n=") ) 
     321          s = "ne";  
     322      return s; 
     323    } 
     324}}} 
     325 
     326[[BR]] 
     327[[BR]] 
     328And the opposite in the function marytts.htsengine.PhoneTranslator.replaceBackTrickyPhones():  
     329{{{ 
     330public static String replaceBackTrickyPhones(String lab){ 
     331      String s = lab; 
     332      String s = lab; 
     333      if(lab.contentEquals("Ac") ) 
     334          s = "A:"; 
     335      else if (lab.contentEquals("3c") ) 
     336          s = "3:"; 
     337      else if (lab.contentEquals("uc") ) 
     338          s = "u:"; 
     339      else if (lab.contentEquals("ic") ) 
     340          s = "i:"; 
     341      else if (lab.contentEquals("Oc") ) 
     342          s = "O:"; 
     343      else if (lab.contentEquals("re") ) 
     344          s = "r="; 
     345      else if (lab.contentEquals("le") ) 
     346          s = "l="; 
     347      else if (lab.contentEquals("Ne") ) 
     348          s = "N="; 
     349      else if (lab.contentEquals("me") ) 
     350          s = "N="; 
     351      else if (lab.contentEquals("ne") ) 
     352          s = "n=";  
     353      return s; 
     354    } 
     355}}} 
     356 
     357After making the changes in java you need to re-compile openmary, you can do this calling ant in the openmary/ directory: 
     358{{{ 
     359openmary/ant 
     360}}} 
     361 
     362 
     363- '''Context features''': create a data/feature_list_xx.pl where xx is the db.locale for the new language. This file has the format: 
     364{{{ 
     365%requested_fea_keys = ( 
     366                        #"stressed", 
     367                        "pos_in_syl", 
     368                        "syl_break", 
     369                        "prev_syl_break", 
     370                        "position_type", 
     371                        #"next_is_pause", 
     372                        #"prev_is_pause", 
     373 
     374                        ...  
     375 
     376                ); 
     377 
     3781; 
     379}}} 
     380Please uncomment the context features you will use to train your voice. The list of context features can be taken from your mary/features.txt file. 
     381 
     382 
     383 
     384[[BR]] 
     385[[BR]] 
     386 
     387Marcela Charfuelan[[BR]] 
     388Mon Mar 23 18:08:38 CET 2009 
     389