wiki:HMMVoiceCreationMary4.0

Context Navigation

Version 12 (modified by marcela_charfuelan, 16 years ago) (diff)
--

Voice Import Tools Tutorial : How to build a HMM-based voice for the MARY 4.0 (beta) platform

For creating HMM-based voices we use a version of the speaker dependent training scripts provided by HTS that was adapted to the MARY 4.0 beta platform. The steps for building a HMM voice for the MARY platform can be summarised in:

I) Checking the necessary programs and files
II) Data preparation
III) Training of HMM models
IV) Adding a new HMM voice in the Mary system.
V) Creating other voice in German (to train a HMM voice with another speech database).
VI) (NEW) Creating other voice in a language different from German or English (US).

The previous steps will be explained below creating a HMM voice using the HTS speaker dependent training demo adapted to the MARY 4.0 beta platform.

The training scripts used here are the latest versions, that is, it is required HTS_2.1 and SPTK-3.2. Some scripts have been added-modified to:

Use MARY instead of festival as text analyzer.
Train bandpass voicing strengths and Fourier magnitudes for mixed excitation.

I) Checking the necessary programs and files:

MARY requirements:

Operating System - Linux (tested on Ubuntu 9.04)
MARY TTS 4.0 (beta) including Voice import tools during installation - link: MARY TTS 4.0 beta
HTS speaker dependent training demo adapted to the MARY 4.0 beta platform:
- without CMU-ARCTIC-SLT data (112K): included in your MARY TTS 4.0 beta installation: $MARY_BASE/lib/hts/HTS-demo_for_MARY-4.0-beta.tar.gz
- with CMU-ARCTIC-SLT data (92MB) - link: HTS-demo_CMU-ARCTIC-SLT_for_MARY-4.0-beta

HTS requirements: please download and follow the instructions for installing:

HTS-2.1_for_HTK-3.4.patch
HTK-3.4 and HDecode patched with HTS-2.1_for_HTK-3.4.patch links:
- HTK-3.4 (you will need to register first)
- HDecode (you will need to register first)
SPTK-3.2
hts_engine_API-1.01

Other requirements: the following programs are also required:

awk normally available in linux
perl normally available in linux
bc normally available in linux
sox, v13.0 or greater SoX, normally available in linux.
tcl supporting snack, for example ActiveTcl.
snack library for tcl.
EHMM for automatic labeling, available with festvox-2.1

The HTS demo for MARY 4.0 beta, includes a shell script "check_programs.sh" that will help you to check if all the previous programs are installed.

II) Data preparation ():

Where to start? There are three options a, b and c:

a- If you would like to try the HTS-demo_CMU-ARCTIC-SLT for MARY 4.0 beta from scratch:
Download the HTS-demo_CMU-ARCTIC-SLT_for_MARY-4.0-beta (92MB), unpack the file and go to that directory:

   tar -zxvf HTS-demo_CMU-ARCTIC-SLT_for_MARY-4.0-beta.tar.gz
   cd HTS-demo_CMU-ARCTIC-SLT_for_MARY-4.0-beta

b- If you have already created a unit selection voice for MARY, with the CMU-ARCTIC-SLT data, and want to build a HMM-based voice for that, copy the $MARY_BASE/lib/hts/HTS-demo_for_MARY-4.0-beta.tar.gz (112K) in your unit selection voice creation directory and unpack the file:

   tar -zxvf HTS-demo_for_MARY-4.0-beta.tar.gz

If you have already created a unit selection voice for this data, most probably you have already created phonefeatures, phonelab and a mary/features.txt file for that, so you can run steps 1-3, skip steps 4-11 and continue with section III HMM models training.

c- If you want to create a HMM voice in other language please see the section V or VI below.

Once you have unpacked the HTS demo for MARY 4.0 beta, follow the steps:

1- Check if all the required programs (and versions) are available in your system, you can run the shell script:

./check_programs.sh

This is a simple shell script that will check which programs are available in the PATH and report what is missing. You can provide the paths where you have installed the required programs if they are not found in the PATH. The script will check minimal requirements for programs, versions, suggest how to install missing programs, etc. If all the necessary programs are installed correctly you can continue with step 2.

2- Run the Voice Import Tools program

The Voice Import Tools programs can be started from: Applications -> OpenMary -> Voice import tools

When starting the voice import tools, go to your working directory (the directory where you have unpacked the HTS demo for MARY 4.0 beta) and provide information for:

  db.gender    = female
  db.locale    = en_US
  db.marybase  = /path/to/$MARY_BASE/
  db.voicename = slt-hsmm

If you are not familiar or have problems with the Voice Import Tools program, please read the instructions in the Tutorial: VoiceImportToolsTutorial

Please remember that whenever you are in doubt about the settings of a particular component you can check its corresponding help for a description of the meaning (and possible values) of each variable.

After starting the Voice Import Tools check the global settings of the voice, make sure that the allophones file is provided and exist:

db.alophonesSet = $MARY_BASE/lib/modules/xx/lexicon/allophones.xx.xml  (where xx is the corresponding language)

3- Run the HMMVoiceDataPreparation of the HMM Voice Trainer group to check if text, wav or raw files are available and in the correct paths. If just raw is provided, the program will do the conversion. If no text files are available but utts in festival format, the program will do the conversion as well.

4- Run the AllophonesExtractor of the Automatic Labeling group to create the prompt_allophones directory required in the next step. This component requires the MARY server.

5- Run the EHMMlabeler component of the Automatic Labeling group to label automatically the wav files using the corresponding transcriptions. This procedure might take several hours. For running EHMMLabeler, please use the settings editor of this component to set, according to your festvox installation, the variable:

   EHMMLabeler.ehmm  = ../festvox/src/ehmm/bin/

The result of this step is a lab directory.

6- Run the LabelPauseDeleter component of the Automatic Labeling group. Please use the settings editor of this component to set the variable:

   LabelPauseDeleter.threshold  =  10

7- Run the TranscriptionAligner component of the Label-Transcript Alignment group. This program will create the allophones directory.

8- Run the PhoneUnitLabelComputer component of the Label-Transcript Alignment group. This procedure has as input the lab directory created with the EHMMLabeler and will create as an output the phonelab directory.

9- Run the FeatureSelelection component of the Feature Extraction group. This program will create a mary/features.txt file, it requires the MARY server running. Select here all the features and save the file.

10- Run the PhoneUnitFeatureComputer component of the Feature Extraction group to extract context feature vectors from the text data. This procedure will create a phonefeatures directory. For running this component the MARY server should be running as well.

11- Run the PhonelabelFeatureAligner component of the Verify Alignment group. This procedure will verify alignment between "phonefeatures" and "phonelabels".

As a result of steps 1-11 we should have:

phonefeatures directory
phonelab directory
mary/features.txt file

III) HMM models training:

12- Run the HMMVoiceConfigure component of the HMM Voice trainer group. The default setting values of this component are already fixed for the HTS-demo_CMU-ARCTIC-SLT voice, although some setting depends on your installation, please provide paths for:

  HMMVoiceConfigure.htsPath       = /yourpath/htk-hts2.1/bin
  HMMVoiceConfigure.htsEnginePath = /yourpath/hts_engine_API-1.01/bin
  HMMVoiceConfigure.sptkPath      = /yourpath/SPTK-3.2/bin
  HMMVoiceConfigure.tclPath       = /yourpath/ActiveTcl-8.6/bin
  HMMVoiceConfigure.soxPath       = /yourpath/usr/bin

If running configure for other voice, for example a male German voice, please use the settings editor of this component to set the variables:

  HMMVoiceConfigure.dataSet      =  german_set_name
  HMMVoiceConfigure.speaker      =  speaker_name 
  HMMVoiceConfigure.lowerF0      =  40  (male=40,  female=80)  
  HMMVoiceConfigure.upperF0      =  280 (male=280, female=350)

Using the settings editor of this component you can also change other variables like using LSP instead og MGC, sampling frequency, etc., the same as you would do when running "make configure + parameters" with the original HTS scripts.

13- Run the HMMVoiceFeatureSelection component of the HMM Voice trainer group. This program reads the mary/features.txt file (created in step 11), and generates the file mary/hmmFeatures.txt. This file contains extra features, apart from phone and phonological features, that will be used to train HMMs. When running this program a small set of features will be presented on top, separated by an empty line:

   pos_in_syl
   syl_break
   prev_syl_break
   position_type
   
   accented
   accented_syls_from_phrase_end
   accented_syls_from_phrase_start
   breakindex
   edge
   ...

If you are not sure about using other features, use the first four, delete the others and save the file.

14- Run the HMMVoiceMakeData component of the HMM Voice trainer group to run the HTS procedure "make data". This procedure require the following files:

   HMMVoiceMakeData.allophonesFile   = allophones.en_US.xml  # allophones set (language dependent)
   HMMVoiceMakeData.featureListFile  = mary/hmmFeatures.txt  # extra context features used for training HMMs.

The allophones set file is language dependent, it can be found in $MARY_BASE/lib/modules/en/us/lexicon/allophones.en_US.xml
The hmmFeatures.txt is the file created in step 15 and contains additional context features, apart from phone and phonological features, used for training HMMs.

The HMMVoiceMakeData procedure is similar to the original HTS scripts with additional sections for calculating strengths, Fourier magnitudes (for mixed excitation), global variance and composing training data files from mgc, lf0, str and mag files. This component will execute in the data/ directory:

  make mgc lf0 str mag cmp-mary gv-mary gv list scp

The label directory and the mlf files in MARY are done with the Voice Import Tools: HMMVoiceMakeData.makeLabels()
The questions file in MARY is done with the Voice Import Tools: HMMVoiceMakeData.makeQuestions()

Particular procedures can be repeated isolated fixing the particular settings for this component. For example, if the procedure that creates strengths (in the str directory) has to be repeated with a different set of filters (data/filters/), please remove the old str directory and set:

  HMMVoiceMakeData.makeSTR       =  1
  HMMVocieMakeData.makeCMPMARY   =  1

all the other variables in 0, and run again the component. (In this case you need to run as well make CMPMARY because you need to compose again the vectors mgc+lf0+str+mag).

The procedures can be repeated manually as well, going to the data directory and running "make data" or "make str", as it is normally done with the original HTS scripts.

NOTE: the Makefile in data/ includes a gv: section to calculate global variance files. In MARY, these files are generated little endian and contain a header of size one short to indicate the size of the vectors it contains.

15- Run the HMMVoiceMakeVoice component of the HMM Voice trainer group, here again particular training steps can be repeated selecting them (setting in 1, all the others in 0) from the settings of this component. This is equivalent to run again:

   perl scripts/Training.pl scripts/Config.pm > logfile &

after modifying the Config.pm file, as is normally done with the original HTS scripts. This component will generate general information about the execution of the training steps. Detailed information about the training status can be found in the logfile in the current directory.

The training procedure can take several hours, please check the log file time to time to check progress.

IV) Adding a new voice in the MARY platform:

16- Run the HMMVoiceInstaller component of the Install Voice group. The default setting values of this component are already fixed for the HTS-demo_CMU-ARCTIC-SLT voice. Some settings of the voice can be changed here, for example:

  HMMVoiceInstaller.useMixExc   =  true
                                   set this variable to true if using mixed excitation
  HMMVoiceInstaller.useGV       =  true 
                                   set this variable to true if using global variance in parameter generation.

The VoiceInstaller will:

Create a new mary config file in: $MARY_BASE/conf/german-hsmm-voice.config
Add the files corresponding to this voice in: $MARY_BASE/lib/voices/hsmm-voice/
copy one example of phonefeatures for testing the synthesiser: data/phonefeatures/cmu_us_arctic_slt_xxxx.pfeats to $MARY_BASE/lib/voices/hsmm-voice
copy the HTS trees: voices/qst001/ver1/*.inf to $MARY_BASE/lib/voices/hmm-voice
copy the HTS PDF models: voices/qst001/ver1/*.pdf to $MARY_BASE/lib/voices/hmm-voice
copy global variance models (if useGV is set to true): data/gv/gv-*-littend.pdf to $MARY_BASE/lib/voices/hmm-voice
copy filter taps for mixed excitation: data/filters/mix_excitation_filters.txt to $MARY_BASE/lib/voices/hmm-voice
copy the trickyPhones.txt file, if one was created during training, to $MARY_BASE/lib/voices/hmm-voice

After successfully installing a new voice, it can be used with the mary_server and the mary_client.

V) Creating other voice in German

If you are creating the HMM-based voice for German from scratch it will be necessary:
- NLP components for German, those should be available with MARY 4.0
- a wav or raw directory with the speech files you will use for training the German voice.
- transcriptions of the files, one text file per speech file, or transcriptions in festival format if available.

then copy the $MARY_BASE/lib/hts/HTS-demo_for_MARY-4.0-beta.tar.gz file in the directory where you have your wav and transcription data and unpack the file:

   tar -zxvf HTS-demo_for_MARY-4.0-beta.tar.gz

Once you have unpacked the HTS demo for MARY 4.0 beta, follow the instructions as normal from step 1. Provide general settings for:

   db.gender    =  male  (or female)
   db.locale    =  de
   db.marybase  =  /path/to/mary/base/
   db.voicename =  german_voice

If you have already created a German unit selection voice for MARY and want to build a HMM-based voice for that, copy the $MARY_BASE/lib/hts/HTS-demo_for_MARY-4.0-beta.tar.gz (112K) in your unit selection voice creation directory and unpack the file:
```
   tar -zxvf HTS-demo_for_MARY-4.0-beta.tar.gz
```

If you have already created a unit selection voice for German, most probably you have already created phonefeatures, phonelab and a mary/features.txt file for that, so you can run steps 1-3, skip steps 4-11 and continue with section III HMM models training.

VI) Creating other voice in a language different from German or English (US).

If you are creating a voice in other language you will need to specify:

Minimal NLP components: if you are creating a new voice from scratch, for example following the steps in NewLanguageSupport, you will need to create Minimal NLP components for the new language. These minimal components are necessary to run the MARY server in the new language and extract context features (phonefeatures directory).

Phoneme set: contained in $MARY_BASE/lib/modules/xx/lexicon/allophones.xx.xml , where xx corresponds to the new language.

After creating the minimal components, you will need wav files (in a wav directory) and the corresponding transcriptions (one file per wav file in a text directory).

Then copy the $MARY_BASE/lib/hts/HTS-demo_for_MARY-4.0-beta.tar.gz file in the directory where you have your wav and transcription data and unpack the file:

   tar -zxvf HTS-demo_for_MARY-4.0-beta.tar.gz

Once you have unpacked the HTS demo for MARY 4.0 beta, follow the instructions as normal from step 1. Provide general settings for:

   db.gender    =  male  (or female)
   db.locale    =  new_language locale (according to your minimal NLP components, ex. tr for Turkish, te for Telugu, etc.)
   db.marybase  =  /path/to/mary/base/
   db.voicename =  new_language_voice_name

Marcela Charfuelan
Thu Sep 24 15:19:25 CEST 2009

Download in other formats:

Plain Text