wiki:HMMVoiceCreationMary4.0

Version 2 (modified by marcela_charfuelan, 15 years ago) (diff)

--

Voice Import Tools Tutorial : How to build a HMM-based voice for the MARY 4.0 platform (DRAFT)

For creating HMM-based voices we use a version of the speaker dependent training scripts provided by HTS that was adapted to the MARY 4.0 platform. The steps for building a HMM voice for the MARY platform can be summarised in:

I) Checking the necessary programs and files
II) Data preparation
III) Training of HMM models
IV) Adding a new HMM voice in the Mary system.
V) Creating other voice in German or English (to train a HMM voice with another speech database).
VI) (NEW) Creating other voice in a language different from German or English (US).

The previous steps will be explained below creating a HMM voice using the HTS speaker dependent training demo.

The training scripts used here are the latest versions, that is, it is required HTS_2.1 and SPTK-3.2. Some scripts have been added-modified to:

  • Use MARY instead of festival as text analyzer.
  • Train bandpass voicing strengths for mixed excitation.
  • Process language specific settings as parameters.

I) Checking the necessary programs and files:

MARY requirements:

HTS requirements, please download and follow the instructions for installing:

  • HTS-2.1_for_HTK-3.4.patch
  • HTK-3.4
  • SPTK-3.2
  • HTS-demo_CMU-ARCTIC-SLT (for HTS-2.1)

Other requirements, please download and follow the instructions for installing:

0.1) Download and un-zip, un-tar the latest speaker dependent training demo for English.

http://hts.sp.nitech.ac.jp/archives/2.1/HTS-demo_CMU-ARCTIC-SLT.tar.bz2 for HTS-2.1

0.2) Download and unzip the patch file for using MARY instead of Festival as text analyser.

https://mary.opendfki.de/repos/trunk/lib/hts/HTS-2.1-demo_CMU-ARCTIC-SLT_for_Mary-4.0.patch

apply the patch to the HTS-demo_CMU-ARCTIC-SLT directory:

   patch -p1 -d . < HTS-2.1-demo_CMU-ARCTIC-SLT_for_Mary-4.0.patch

0.3) Check if your installed tcl supports snack. In your HTS-demo_CMU-ARCTIC-SLT directory type the following:

  /your-Tcl-path/ActiveTcl-x.x/bin/tclshx.x data/scripts/getf0.tcl

The numbers x.x are the version of your installation, for example in my case is tclsh-8.6

If you get:

  /your-Tcl-path/ActiveTcl-x.x/bin/tclshx.x data/scripts/getf0.tcl
  can't find package snack
    while executing
  "package require snack"

Then you need to install snack manually downloading it from: http://www.speech.kth.se/snack/download.html
un-tar and un-zip the snack package in /your-Tcl-path/ActiveTcl-x.x/lib/, so after unpacked your tcl directory should look like:
/your-Tcl-path/ActiveTcl-x.x/lib/snack2.2/

Now test again with the getf0.tcl file, if you have installed correctly the snack library then the output is the following:

 /your-Tcl-path/ActiveTcl-8.6/bin/tclsh8.6 data/scripts/getf0.tcl
 pitch extract tool using snack library (= ESPS get_f0)
 Usage data/scripts/getf0.tcl ...

One final detail on the name of your tclsh executable, the HTS configure has some problems if the name of your executable is different from tclsh, to avoid this simply create an alias for your tclsh-x.x executable, in my case the soft link looks like:

  cd /your-Tcl-path/ActiveTcl-8.6/bin/
  ln -s tclsh8.6 tclsh

0.5) Create a wav directory.

0.6) Run the VoiceImport program

First of all you need to set your MARY_BASE directory and then run the program:

   export MARY_BASE="/dir/to/openmary"
   java -jar -Xmx1024m  $MARY_BASE/java/voiceimport.jar

If you are not familiar or have problems with the VoiceImport program, please read and follow the instructions in the Voice Import Tools Tutorial: http://mary.opendfki.de/wiki/VoiceImportToolsTutorial

If you want to create another voice in German or English please see the section V below.

Please remember that whenever you are in doubt about the settings of a particular component you can check its corresponding help for a description of the meaning (and possible values) of each variable.

II) Data preparation:

1- Run the HMMVoiceDataPreparation of the HMM Voice Trainer group to check if text, wav and data/raw files are available and in the correct paths. If just data/raw is provided, the program will do the conversion. If no text files are available but data/utts in festival format, the program will do the conversion as well.

2- Run the PhoneUnitFeatureComputer component of the Feature Extraction group to extract context feature vectors from the text data. This procedure will create a "phonefeatures" directory. For running this component the MARY server should be running as well.

3- Run the EHMMlabeler component of the Automatic Labeling group to label automatically the wav files using the corresponding transcriptions. This procedure might take several hours. For running EHMMLabeler, please use the settings editor of this component to set, according to your festvox installation, the variable:

   EHMMLabeler.ehmm  = ../festvox/src/ehmm/bin/

4- Run the LabelPauseDeleter component of the Automatic Labeling group. Please use the settings editor of this component to set the variable:

   LabelPauseDeleter.threshold  =  10

5- Run the PhoneUnitLabelComputer component of the Labels and Pause Correction group. This procedure will create a "phonelab" directory.

6- Run the PhonelabelFeatureAligner component of the Labels and Pause Correction group. This procedure will verify alignment between "phonefeatures" and "phonelabels".

III) HMM models training:

7- Run the HMMVoiceConfigure component of the HMM Voice trainer group. The default setting values of this component are already fixed for the HTS-demo_CMU-ARCTIC-SLT voice, although some setting depends on your installation, please provide paths for:

  HMMVoiceConfigure.htsPath       = /yourpath/htk-hts2.1/bin
  HMMVoiceConfigure.htsEnginePath = /yourpath/hts_engine_API-1.01/bin
  HMMVoiceConfigure.sptkPath      = /yourpath/SPTK-3.2/bin
  HMMVoiceConfigure.tclPath       = /yourpath/ActiveTcl-8.6/bin
  HMMVoiceConfigure.soxPath       = /yourpath/usr/bin

If running configure for other voice, for example a male German voice, please use the settings editor of this component to set the variables:

  HMMVoiceConfigure.dataSet      =  german_set_name
  HMMVoiceConfigure.featureList  =  feature_list_de.pl  (the set of context features used for this voice can be change in this file).
  HMMVoiceConfigure.speaker      =  speaker_name 
  HMMVoiceConfigure.lowerF0      =  40  (male=40,  female=80)  
  HMMVoiceConfigure.upperF0      =  280 (male=280, female=350)
  HMMVoiceConfigure.voiceLang    =  de

Using the settings editor of this component you can also change other variables like using LSP instead og MGC, sampling frequency, etc., the same as you would do when running "make configure + parameters" with the original HTS scripts.

8- Run the HMMVoiceMakeData component of the HMM Voice trainer group to run the HTS procedure "make data". This procedure is the same as in the original scripts with additional sections for calculating strengths and Fourier magnitudes (for mixed excitation), global variance, and handling of MARY context features.

Particular procedures can be repeated isolated fixing the particular settings for this component. For example, if the procedure that creates strengths (in the str directory) has to be repeated with a different set of filters (data/filters/), please remove the old str directory and set:

  HMMVoiceMakeData.makeSTR       =  1
  HMMVocieMakeData.makeCMPMARY   =  1

all the other variables in 0, and run again the component. (In this case you need to run as well makeCMPMARY because you need to compose again the vectors mgc+lf0+str+mag).

The procedures can be repeated manually as well, going to the data directory and running "make data" or "make str", as is normally done with the original HTS scripts.

NOTE: the Makefile in data/ includes a gv: section to calculate global variance files. In MARY, these files are generated little endian and contain a header of size one short to indicate the size of the vectors it contains.

9- Run the HMMVoiceMakeVoice component of the HMM Voice trainer group, here again particular training steps can be repeated selecting them (setting in 1, all the others in 0) from the settings of this component. This is equivalent to run again:

   perl scripts/Training.pl scripts/Config.pm > logfile &

after modifying the Config.pm file, as is normally done with the original HTS scripts. This component will generate general information about the execution of the training steps. Detailed information about the training status can be found in the logfile in the current directory.

The training procedure can take several hours, please check the log file time to time to check progress.

IV) Adding a new voice in the MARY platform:

10- Run the HMMVoiceInstaller component of the Install Voice group. The default setting values of this component are already fixed for the HTS-demo_CMU-ARCTIC-SLT voice. If you are training other voice please use the settings editor of this component to set:

  
  HMMVoiceInstaller.FeaFile     =  phonefeatures/xx.pfeats
                                   this is an example of a CONTEXTFEATURES file for synthesise during start-up. 
  HMMVoiceInstaller.useMixExc   =  true
                                   set this variable to true if using mixed excitation
  HMMVoiceInstaller.useGV       =  true 
                                   set this variable to true if using global variance in parameter generation.

The VoiceInstaller will:

  • Create a new mary config file in: $MARY_BASE/conf/german-hsmm-voice.config
  • Add the files corresponding to this voice in: $MARY_BASE/lib/voices/hsmm-voice/
  • copy features list: data/feature_list_en.pl to $MARY_BASE/lib/voices/hsmm-voice
  • copy one example of phonefeatures for testing the synthesiser: data/phonefeatures/cmu_us_arctic_slt_xxxx.pfeats to $MARY_BASE/lib/voices/hsmm-voice
  • copy the HTS trees: voices/qst001/ver1/*.inf to $MARY_BASE/lib/voices/hmm-voice
  • copy the HTS PDF models: voices/qst001/ver1/*.pdf to $MARY_BASE/lib/voices/hmm-voice
  • copy global variance models (if useGV is set to true): data/gv/gv-*-littend.pdf to $MARY_BASE/lib/voices/hmm-voice
  • copy filter taps for mixed excitation: data/filters/mix_excitation_filters.txt to $MARY_BASE/lib/voices/hmm-voice

After successfully installing a new voice, it can be used with the mary_server and the mary_client.

V) Creating other voice in German or English.

If using German:

For creating a new German voice it is necessary:

  • a wav or raw directory with the speech files you will use for training the German voice.
  • transcriptions of the files, one text file per speech file, or transcriptions in festival format if available.

Then we use as a base the original HTS-demo_CMU-ARCTIC-SLT directory:

  • Download and un-zip, un-tar the HTS-demo_CMU-ARCTIC-SLT for HTS-2.1
  • Rename this directory as your new voice name, for example german_voice, and delete the directories data/raw and data/utt.
  • Apply the MARY patch to the german_voice directory.
    patch -p1 -d . < HTS-2.1-demo_CMU-ARCTIC-SLT_for_Mary-4.0.patch
  • Move your speech files to this directory, if you have a wav directory, this should be copied in the current directory (german_voice/wav). If you have a raw directory, this should be copied in the data directory (german_voice/data/raw).
  • Move your transcription files to this directory, if you have a text directory containing the transcription of each file in separate files, this should be copied in the current directory (german_voice/text). If you have transcriptions in festival format please copy this directory in the data/utts directory (german_voice/data/utts/).
  • Now run the VoiceImport program and follow the instructions as normal. Provide general settings for:
       db.gender    =  male  (or female)
       db.locale    =  de
       db.marybase  =  /path/to/mary/base/
       db.voicename =  german_voice
    

VI) Creating other voice in a language different from German or English (US).

If you are trying other language you will need to specify or make changes in the following:

  • Phoneme set: if some phone names are not compatible or can not be handled by HTK please create aliases for these phonenames. You need to make some changes in the code to include these aliases, for example these are the changes we need to do to create a British English voice:

First of all check the phone set in allophones.en_GB.xml

<allophones name="sampa" xml:lang="en-GB" features="vlng vheight vfront vrnd ctype cplace cvox">
<silence ph="_"/>

<vowel ph="@U" vlng="d" vheight="2" vfront="3" vrnd="+"/>
<vowel ph="A:" vlng="l" vheight="3" vfront="3" vrnd="-"/>
<vowel ph="3:" vlng="l" vheight="2" vfront="2" vrnd="+"/>
<vowel ph="u:" vlng="l" vheight="1" vfront="3" vrnd="+"/>
<vowel ph="i:" vlng="l" vheight="1" vfront="1" vrnd="-"/>
<vowel ph="EI" vlng="d" vheight="2" vfront="1" vrnd="-"/>
<vowel ph="E" vlng="s" vheight="2" vfront="1" vrnd="-"/>
<vowel ph="@" vlng="a" vheight="2" vfront="2" vrnd="-"/>
<vowel ph="O:" vlng="l" vheight="3" vfront="3" vrnd="+"/>
<vowel ph="A" vlng="l" vheight="3" vfront="3" vrnd="-"/>
<vowel ph="O" vlng="l" vheight="3" vfront="3" vrnd="+"/>
<vowel ph="I" vlng="s" vheight="1" vfront="1" vrnd="-"/>
<vowel ph="aU" vlng="d" vheight="3" vfront="2" vrnd="-"/>
<vowel ph="aI" vlng="d" vheight="3" vfront="2" vrnd="-"/>
<vowel ph="U" vlng="s" vheight="1" vfront="3" vrnd="+"/>
<vowel ph="V" vlng="s" vheight="2" vfront="2" vrnd="-"/>
<vowel ph="OI" vlng="d" vheight="2" vfront="3" vrnd="+"/>
<vowel ph="Q" vlng="l" vheight="3" vfront="3" vrnd="+"/>
<vowel ph="i" vlng="l" vheight="1" vfront="1" vrnd="-"/>
<vowel ph="u" vlng="l" vheight="1" vfront="3" vrnd="+"/>
<vowel ph="r=" vlng="a" vheight="2" vfront="2" vrnd="-"/>
<vowel ph="{" vlng="s" vheight="3" vfront="1" vrnd="-"/>

<consonant ph="l=" ctype="l" cplace="a" cvox="+"/>
<consonant ph="tS" ctype="a" cplace="p" cvox="-"/>
<consonant ph="N=" ctype="n" cplace="v" cvox="+"/>
<consonant ph="dZ" ctype="a" cplace="p" cvox="+"/>
<consonant ph="5" ctype="l" cplace="a" cvox="+"/>
<consonant ph="m=" ctype="n" cplace="l" cvox="+"/>
<consonant ph="D" ctype="f" cplace="d" cvox="+"/>
<consonant ph="N" ctype="n" cplace="v" cvox="+"/>
<consonant ph="n=" ctype="n" cplace="a" cvox="+"/>
<consonant ph="T" ctype="f" cplace="d" cvox="-"/>
<consonant ph="S" ctype="f" cplace="p" cvox="-"/>
<consonant ph="Z" ctype="f" cplace="p" cvox="+"/>
<consonant ph="f" ctype="f" cplace="b" cvox="-"/>
<consonant ph="g" ctype="s" cplace="v" cvox="+"/>
<consonant ph="d" ctype="s" cplace="a" cvox="+"/>
<consonant ph="b" ctype="s" cplace="l" cvox="+"/>
<consonant ph="n" ctype="n" cplace="a" cvox="+"/>
<consonant ph="l" ctype="l" cplace="a" cvox="+"/>
<consonant ph="m" ctype="n" cplace="l" cvox="+"/>
<consonant ph="j" ctype="r" cplace="p" cvox="+"/>
<consonant ph="k" ctype="s" cplace="v" cvox="-"/>
<consonant ph="h" ctype="f" cplace="g" cvox="-"/>
<consonant ph="w" ctype="r" cplace="l" cvox="+"/>
<consonant ph="v" ctype="f" cplace="b" cvox="+"/>
<consonant ph="t" ctype="s" cplace="a" cvox="-"/>
<consonant ph="s" ctype="f" cplace="a" cvox="-"/>
<consonant ph="r" ctype="r" cplace="a" cvox="+"/>
<consonant ph="p" ctype="s" cplace="l" cvox="-"/>
<consonant ph="z" ctype="f" cplace="a" cvox="+"/>

</allophones>

We have experienced problems in HTK with phone names that include ":", "=", "~" or the phone name is a number or a symbol like ?. In the case of "en-GB" we have created the following aliases, they should be included in the files:

In data/scripts/common_routines.pl add/change in the array tricky_phones:

%tricky_phones = ("A:", "Ac",
                  "3:", "3c",
                  "u:", "uc",
                  "i:", "ic",
                  "O:", "Oc",
                  "r=", "re",
                  "l=", "le",
                  "N=", "Ne",
                  "m=", "me",
                  "n=", "ne"
		  );

You can define your own aliases, here we have used c for replacing the colon and e for replacing the equal.

Add something similar but in java in he function marytts.htsengine.PhoneTranslator.replaceTrickyPhones():

public static String replaceTrickyPhones(String lab){
      String s = lab;
      if(lab.contentEquals("A:") )
          s = "Ac";
      else if (lab.contentEquals("3:") )
          s = "3c";
      else if (lab.contentEquals("u:") )
          s = "uc";
      else if (lab.contentEquals("i:") )
          s = "ic";
      else if (lab.contentEquals("O:") )
          s = "Oc";
      else if (lab.contentEquals("r=") )
          s = "re";
      else if (lab.contentEquals("l=") )
          s = "le";
      else if (lab.contentEquals("N=") )
          s = "Ne";
      else if (lab.contentEquals("m=") )
          s = "Ne";
      else if (lab.contentEquals("n=") )
          s = "ne"; 
      return s;
    }



And the opposite in the function marytts.htsengine.PhoneTranslator.replaceBackTrickyPhones():

public static String replaceBackTrickyPhones(String lab){
      String s = lab;
      String s = lab;
      if(lab.contentEquals("Ac") )
          s = "A:";
      else if (lab.contentEquals("3c") )
          s = "3:";
      else if (lab.contentEquals("uc") )
          s = "u:";
      else if (lab.contentEquals("ic") )
          s = "i:";
      else if (lab.contentEquals("Oc") )
          s = "O:";
      else if (lab.contentEquals("re") )
          s = "r=";
      else if (lab.contentEquals("le") )
          s = "l=";
      else if (lab.contentEquals("Ne") )
          s = "N=";
      else if (lab.contentEquals("me") )
          s = "N=";
      else if (lab.contentEquals("ne") )
          s = "n="; 
      return s;
    }

After making the changes in java you need to re-compile openmary, you can do this calling ant in the openmary/ directory:

openmary/ant
  • Context features: create a data/feature_list_xx.pl where xx is the db.locale for the new language. This file has the format:
    %requested_fea_keys = (
    		        #"stressed",
                            "pos_in_syl",
                            "syl_break",
                            "prev_syl_break",
                            "position_type",
                            #"next_is_pause",
                            #"prev_is_pause",
    
                            ... 
    
    		);
    
    1;
    

Please uncomment the context features you will use to train your voice. The list of context features can be taken from your mary/features.txt file.



Marcela Charfuelan
Mon Mar 23 18:08:38 CET 2009