= '''Voice Import Tools Tutorial : How to build a HMM-based voice for MARY 5.0 TTS system''' = For creating HMM-based voices we use a version of the speaker dependent training scripts provided by [http://hts.sp.nitech.ac.jp/ HTS] that was adapted to the MARY 4.1.0 platform. The steps for building a HMM voice for the MARY platform can be summarised in:[[BR]] I) [#point1 Download MARY TTS including Voice import tools] [[BR]] II) [#point2 Check necessary programs and files] [[BR]] III) [#point3 Check data: audio and text files] [[BR]] IV) [#point4 Run the Voice import tools] [[BR]] V) [#point5 Creating other voice in a language different from German or English (US).] [[BR]] VI) [#point6 Adaptive scripts] [[BR]] The training scripts used here are the latest versions, that is, it is required HTS_2.2 and HTK-3.4.1. Some scripts have been added-modified to:[[BR]] - Use MARY instead of festival as text analyser.[[BR]] - Train bandpass voicing strengths for mixed excitation.[[BR]] '''MARY requirements:'''[[BR]] - Operating System - Linux (tested on Ubuntu 9.04) [[BR]] - MARY TTS 5.0 including Voice import tools during installation - [[BR]] ''' == I) [=#point1] Download MARY TTS including Voice import tools == ''' If using the trunk version: {{{ svn checkout https://mary.opendfki.de/repos/trunk openmary }}} {{{#!comment Click on the latest MARY release [http://mary.dfki.de/download/4.0/openmary-standalone-install-4.1.0.jar MARY download] or download the file and run it with: {{{ java -jar openmary-standalone-install-4.1.0.jar }}} }}} ''' == II) [=#point2]Check the necessary programs and files: == ''' We provide an script to facilitate the checking and installation of the necessary external programs, once installed MARY TTS open a command line shell in your voice building directory and run the shell script: {{{ $MARY_BASE/lib/external/check_install_external_programs.sh }}} With the option '''-check''', this script will check if the necessary programs and versions are installed (that is, the programs can be found in the PATH or in the paths provided by the user).[[BR]] With the option '''-install''' this script will try to download and install the necessary programs in: $MARY_BASE/lib/external/bin (if problems, it will suggest how to install manually the programs). If you have already installed some of the required programs, '''please include their paths in the PATH variable or provide the paths''', for example: {{{ $MARY_BASE/lib/external/check_install_external_programs.sh -check /your/path/to/htk/bin /your/path/to/Festival/festvox/src/ehmm/bin }}} This script generates a '''$MARY_BASE/lib/external/externalBinaries.config''' file that will be used by the Voice import tools to locate the necessary external programs. The necessary programs that this script checks are:[[BR]] '''HTS requirements:'''[[BR]] - [http://hts.sp.nitech.ac.jp/archives/2.2/HTS-2.2_for_HTK-3.4.1.tar.bz2 HTS-2.2_for_HTK-3.4.1.patch] [[BR]] - HTK-3.4.1 and HDecode patched with HTS-2.2_for_HTK-3.4.1.patch links: * [http://htk.eng.cam.ac.uk/ftp/software/HTK-3.4.1.tar.gz HTK-3.4.1] (you will need to register first) [[BR]] * [http://htk.eng.cam.ac.uk/prot-docs/hdecode.shtml HDecode] (you will need to register first) [[BR]] - [http://downloads.sourceforge.net/sp-tk/SPTK-3.4.1.tar.gz SPTK-3.4.1] [[BR]] - [http://downloads.sourceforge.net/hts-engine/hts_engine_API-1.05.tar.gz hts_engine_API-1.05] [[BR]] '''Other requirements:'''[[BR]] - awk normally available in linux [[BR]] - perl normally available in linux [[BR]] - bc normally available in linux [[BR]] - sox, v13.0 or greater [http://sox.sourceforge.net/ SoX], normally available in linux. [[BR]] - tcl supporting snack, for example [http://www.activestate.com/Products/ActiveTcl/ ActiveTcl.] Note that only ActiveTcl 8.4 includes snack; 8.5+ requires manual installation. [[BR]] - [http://www.speech.kth.se/snack/download.html snack] library for tcl. [[BR]] - EHMM for automatic labeling, available with [http://festvox.org/download.html festvox-2.1] [[BR]] ''' == III) [=#point3] Check data: audio and text files[[BR]] == ''' In your voice building directory execute the step-by-step procedure in [http://mary.opendfki.de/wiki/VoiceImportToolsTutorial VoiceImportToolsTutorial] to make sure that the data, sound (wav) and text files are in the correct place and format.[[BR]] As a result of this step your voice building directory should contain a wav and text directories. ''' == IV) [=#point4] Run the Voice Import tools == ''' In your voice building directory run the voice import tools (trunk version): {{{ export MARY_BASE="/your/directory/openmary/" java -cp $MARY_BASE/marytts-lang-en/target/marytts-lang-en-5.0-SNAPSHOT.jar:$MARY_BASE/marytts-builder/target/marytts-builder-5.0-SNAPSHOT-jar-with-dependencies.jar marytts.tools.voiceimport.DatabaseImportMain }}} {{{#!comment export MARY_BASE="/your/path/to/MARY TTS/" java -Xmx1024m -jar $MARY_BASE/java/voiceimport.jar }}} After starting the Voice Import Tools check the global settings of the voice, make sure that the allophones file is provided and exists: {{{ db.alophonesSet = $MARY_BASE/lib/modules/xx/lexicon/allophones.xx.xml (where xx is the corresponding language) }}} And run the following components: '''1-''' Run the AllophonesExtractor of the Automatic Labeling group to create the '''prompt_allophones''' directory required in the next step. This component requires the MARY server. [[BR]] '''2-''' Run the EHMMlabeler component of the Automatic Labeling group to label automatically the wav files using the corresponding transcriptions. If the pauses at the beginning and end of your recordings are longer than 0.2 seconds, you might consider to reduce these pauses using the tool: Convert recorded audio (as explained in [wiki:NewLanguageSupport NewLanguageSupport] No. 9) to trim initial and final silences.[[BR]] The EHMMLabeler procedure might take several hours. For running EHMMLabeler, please use the settings editor of this component to set, according to your festvox installation, the variable: {{{ EHMMLabeler.ehmm = ../festvox/src/ehmm/bin/ }}} The result of this step is a '''ehmm/lab''' directory. '''3-''' Run the LabelPauseDeleter component of the Automatic Labeling group. Please use the settings editor of this component to set the variable: {{{ LabelPauseDeleter.threshold = 10 }}} The result of this step is a '''lab''' directory. '''4-''' Run the PhoneUnitLabelComputer component of the Label-Transcript Alignment group. This procedure has as input the '''lab''' directory and will create as an output the '''phonelab''' directory. '''5-''' Run the TranscriptionAligner component of the Label-Transcript Alignment group. This program will create the '''allophones''' directory. '''6-''' Run the FeatureSelelection component of the Feature Extraction group. This program will create a '''mary/features.txt''' file, it requires the MARY server running. Select here all the features and save the file. '''7-''' Run the PhoneUnitFeatureComputer component of the Feature Extraction group to extract context feature vectors from the text data. This procedure will create a '''phonefeatures''' directory. For running this component the MARY server should be running as well. '''8-''' Run the PhonelabelFeatureAligner component of the Verify Alignment group. This procedure will verify alignment between "phonefeatures" and "phonelabels".[[BR]] As a result of previous steps we should have:[[BR]] - phonefeatures directory [[BR]] - phonelab directory [[BR]] - mary/features.txt file [[BR]] - $MARY_BASE/lib/external/externalBinaries.config ''' === HMM models training: === ''' '''9-''' Run the HMMVoiceDataPreparation of the HMM Voice Trainer group to set up the environment to create a HMM voice and check if required external programs and text and wav files are available and in the correct paths. '''10-''' Run the HMMVoiceConfigure component of the HMM Voice trainer group. The default setting values are already fixed for the arctic slt voice, some path settings depend on your installation, and will be taken from $MARY_BASE/lib/external/externalBinaries.config If running configure for other voice, for example a male German voice, please use the settings editor of this component to set the variables: {{{ HMMVoiceConfigure.dataSet = german_set_name HMMVoiceConfigure.speaker = speaker_name HMMVoiceConfigure.lowerF0 = 40 (male=40, female=80) HMMVoiceConfigure.upperF0 = 280 (male=280, female=350) }}} Using the settings editor of this component you can also change other variables like using LSP instead og MGC, sampling frequency, etc., the same as you would do when running "make configure + parameters" with the original HTS scripts. '''11-''' Run the HMMVoiceFeatureSelection component of the HMM Voice trainer group. This program reads the '''mary/features.txt''' file (created in step 6), and generates the file '''mary/hmmFeatures.txt'''. The hmmFeatures.txt file contains extra features, apart from phone and phonological features, that will be used to train HMMs. You can select or delete on the window extra context features (all can be used). '''12-''' Run the HMMVoiceMakeData component of the HMM Voice trainer group to run the HTS procedure "make data". This procedure require the following files: {{{ HMMVoiceMakeData.allophonesFile = allophones.en_US.xml # allophones set (language dependent) HMMVoiceMakeData.featureListFile = mary/hmmFeatures.txt # extra context features used for training HMMs. }}} The allophones set file is language dependent, it can be found in $MARY_BASE/lib/modules/en/us/lexicon/allophones.en_US.xml[[BR]] The hmmFeatures.txt is the file created in step 15 and contains additional context features, apart from phone and phonological features, used for training HMMs.[[BR]] The HMMVoiceMakeData procedure is similar to the original HTS scripts with additional sections for calculating strengths, Fourier magnitudes (for mixed excitation), global variance and composing training data files from mgc, lf0, str and mag files. This component will execute in the hts/data/ directory: {{{ make all-mary or make mgc lf0 str-mary mag-mary cmp-mary gv-mary gv list scp }}} The '''label''' directory and the '''mlf''' files in MARY are done with the Voice Import Tools: HMMVoiceMakeData.makeLabels()[[BR]] The '''questions''' file in MARY is done with the Voice Import Tools: HMMVoiceMakeData.makeQuestions() Particular procedures can be repeated isolated fixing the particular settings for this component. For example, if the procedure that creates strengths (in the str directory) has to be repeated with a different set of filters (data/filters/), set: {{{ HMMVoiceMakeData.makeSTR = 1 HMMVocieMakeData.makeCMPMARY = 1 }}} all the other variables in 0, and run again the component. (In this case you need to run makeCMPMARY again because you need to compose again the vectors mgc+lf0+str+mag). The procedures can be repeated manually as well, going to the hts/data directory and running "make str-mary" and "make cmp-mary". '''13-''' Run the HMMVoiceMakeVoice component of the HMM Voice trainer group, here again particular training steps can be repeated selecting them (setting in 1, all the others in 0) from the settings of this component. This is equivalent to run again: {{{ perl scripts/Training.pl scripts/Config.pm > logfile & }}} after modifying the Config.pm file, as is normally done with the original HTS scripts. This component will generate general information about the execution of the training steps. Detailed information about the training status can be found in the logfile in the current directory. The training procedure can take several hours, please check the log file time to time to check progress. ''' === Adding a new voice in the MARY platform: === ''' '''14-''' Run the HMMVoiceCompiler component of the Install Voice group. The default setting values of this component are already fixed. {{{#!comment Some settings of the voice can be changed here, for example: HMMVoicePackager.useMixExc = true set this variable to true if using mixed excitation HMMVoicePackager.useGV = true set this variable to true if using global variance in parameter generation. HMMVoicePackager.useAcousticModels = true set this variable to true to allow prosody modification specified in MARYXML }}} The HMMVoiceCompiler will pack in a zip file located in /voicebuildingdir/mary/voice-yourvoice-hsmm/target/voice-yourvoice-hsmm-5.0-SNAPSHOT.zip the following files: [[BR]] - A mary config file: voice.config [[BR]] - HMM files corresponding to this voice: - one example of phonefeatures for testing the synthesiser: data/phonefeatures/features_example.pfeats [[BR]] - the HTS trees: voices/qst001/ver1/*.inf [[BR]] - the HTS PDF models: voices/qst001/ver1/*.pdf [[BR]] - global variance models (if useGV is set to true): voices/qst001/ver1/gv-*.pdf [[BR]] - filter taps for mixed excitation: data/filters/mix_excitation_filters.txt [[BR]] - trickyPhones.txt file, if one was created during training [[BR]] After successfully packing a new voice, you must run the MARY Component Installer to install the voice! NOTE: workaround until the component installer is updated: {{{ cp /voicebuildingdir/mary/voice-yourvoice-hsmm/target/voice-yourvoice-hsmm-5.0-SNAPSHOT.jar $MARY_BASE/target/marytts-5.0-SNAPSHOT/lib/ }}} and then re-start the server. == '''V) [=#point5] Creating other voice in a language different from German or English (US)''' == If you are creating a voice in other language you will need to specify: (NOTE: THIS NEED TO BE UPDATED) - '''Minimal NLP components''': if you are creating a new voice from scratch, for example following the steps in [http://mary.opendfki.de/wiki/NewLanguageSupport NewLanguageSupport], you will need to create Minimal NLP components for the new language. These minimal components are necessary to run the MARY server in the new language and extract context features ('''phonefeatures''' directory). - '''Phoneme set''': contained in $MARY_BASE/lib/modules/xx/lexicon/allophones.xx.xml , where xx corresponds to the new language. - After creating the minimal components, you will need wav files (in a wav directory) and the corresponding transcriptions (one file per wav file in a text directory). [[BR]] Afterwards follow the instructions as normal from step 1. Provide general settings for: {{{ db.gender = male (or female) db.locale = new_language locale (according to your minimal NLP components, ex. tr for Turkish, te for Telugu, etc.) db.marybase = /path/to/mary/base/ db.voicename = new_language_voice_name }}} == '''VI) [=#point6] Adaptive scripts''' == '''1.''' For running the HTS Speaker adaptation/adaptive training demo we need the following directories in your voicebuilding directory:[[BR]] text:[[BR]] bdl clb slt jmk rms[[BR]] wav:[[BR]] bdl clb slt jmk rms[[BR]] '''2.''' With the voicebuilding tools we need to create phonelabels and phonefeatures directories for each set of data. This can be done working each set with voicebuilding tools, that is, use the general settings to define where your wave, text, etc. directories are. Then for each data set run the steps 1-8 of the speaker dependent tutorial. As a result we should have the following directories:[[BR]] phonelabels:[[BR]] bdl clb slt jmk rms[[BR]] phonefeatures:[[BR]] bdl clb slt jmk rms[[BR]] '''3.''' Create raw data from you wav files, this can be done using the script $MARY_BASE/lib/external/hts/data/scripts/wav2raw. As a results we should have the following directories:[[BR]] hts/data/raw:[[BR]] bdl clb slt jmk rms[[BR]] '''4.''' Having the previous directories, run the voiceimportools and execute the steps:[[BR]] '''4.1''' HMMVoiceDataPreparation, setting the adaptScripts variable in true[[BR]] '''4.2''' HMMVoiceConfigure, setting the adaptScripts variable in true[[BR]] If adapting other sets, be aware of the file names format for the adaptive scripts. Since it is used a mask for the names it is better if the names of your files have a particular format. For example we have experimented adapting a neutral voice to different styles with the male German PAVOQUE database. For this database the file names have the format: {{{ neutr --> pavoque_neutr_*.* training data, big corpus, male voice with neutral style. obadi --> pavoque_obadi_*.* data for adaptation, small corpus, the same male voice but with depressed style. poppy --> pavoque_poppy_*.* data for adaptation, small corpus, the same male voice but with happy style. spike --> pavoque_spike_*.* data for adaptation, small corpus, the same male voice but with angry style. }}} Having this distribution of files, our settings for configureAdapt looked like:[[BR]] {{{ HMMVoiceConfigure.dataSet = pavoque HMMVoiceConfigure.adaptTrainSpkr = neutr HMMVoiceConfigure.adaptSpkr = 'obadi poppy spike' HMMVoiceConfigure.adaptSpkrMask = */pavoque_%%%%%_* (here the voice names are exactly 5 letters long, it can not be a voice name with more that 5 letters!) HMMVoiceConfigure.adaptF0Ranges = 'neutr 40 280 obadi 40 280 poppy 40 280 spike 40 280' }}} '''4.3''' HMMVoiceFeatureSelection[[BR]] '''4.4''' HMMVoiceMakeData, setting the adaptScripts variable in true[[BR]] '''4.5''' HMMVoiceMakeVoice[[BR]] '''4.6''' HMMVoiceCompiler[[BR]] [[BR]] [[BR]] Marcela Charfuelan[[BR]] Mon Nov 7 11:39:13 CET 2011