Changes between Initial Version and Version 1 of HMMVoiceCreationAdapt


Ignore:
Timestamp:
05/08/08 17:38:12 (17 years ago)
Author:
marcela_charfuelan
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • HMMVoiceCreationAdapt

    v1 v1  
     1 
     2 
     3 
     4= '''(Draft) Voice Import Tools Tutorial : How to build an adapted HMM-based voice for the MARY platform''' = 
     5 
     6An adapted HMM-based voice is a voice created after adapting the (generally) small corpus of a particular voice to another voice that has been trained with  
     7a bigger corpus, or various voices. For the example described here, the CMU_US_ARTIC voices used for training are awb (male), bdl (male), clb (female),  
     8jmk (male) and rms (male). The voice to adapt is slt (female).  
     9 
     10The steps for creating an adapted voice are the same as for speaker dependent voice with small differences that will be explained below.  
     11The steps followed in this tutorial are: 
     12 
     13I)   Checking the necessary programs and files[[BR]] 
     14II)  Data preparation[[BR]] 
     15III) Training of HMM models[[BR]] 
     16IV)  Adding a new HMM voice in the Mary system.[[BR]] 
     17V)   Creating other voice in German or English (__if you want to train HMMs with another speech database__).[[BR]] 
     18 
     19 
     20''' 
     21=== I) Checking the necessary programs and files: === 
     22''' 
     23 
     24MARY requirements: (the same as for the speaker dependent demo) 
     25 
     26HTS requirements: (the same as for the speaker dependent demo) 
     27 
     28Other requirements: (the same as for the speaker dependent demo) 
     29 
     300.1) download and un-zip, un-tar the latest speaker adaptation/adaptive training demo for English. 
     31 
     32Here it is used: http://hts.sp.nitech.ac.jp/archives/2.0.1/HTS-demo_CMU-ARCTIC-ADAPT.tar.bz2 for HTS-2.0.1 
     33 
     340.2) download and unzip the adaptation/adaptive patch file for using MARY instead of Festival as text analyser. 
     35 
     36https://mary.opendfki.de/repos/trunk/lib/hts/HTS-2.0.1-demo_CMU-ARCTIC-ADAPT_for_Mary-3.6.patch.zip  [[BR]] 
     37 
     38apply the patch to the HTS-demo_CMU-ARCTIC-ADAPT directory:  [[BR]]  
     39   patch -p1 -d . < HTS-2.0.1-demo_CMU-ARCTIC-ADAPT_for_Mary-3.6.patch 
     40 
     410.3) create a wav directory. 
     42 
     430.4) Run the VoiceImport program 
     44 
     45First of all you need to set your MARY_BASE directory:  [[BR]] 
     46   export MARY_BASE="/dir/to/openmary" 
     47 
     48then you can run:  [[BR]] 
     49   java -jar -Xmx1024m  $MARY_BASE/java/voiceimport.jar 
     50 
     51If you are not familiar or have problems with the VoiceImport program, please read and follow the instructions in the Voice Import Tools 
     52Tutorial: http://mary.opendfki.de/wiki/VoiceImportToolsTutorial 
     53 
     54If you want to create another adapted voice in German or English please see the section V below. 
     55 
     56''' 
     57=== II) Data preparation: === 
     58''' 
     59 
     601- Run the HMMVoiceDataPreparation component of the HMM Voice Trainer group, to check if text, wav and data/raw files are available and in the correct paths. 
     61First use the settings editor of this component to set the variable: 
     62{{{ 
     63HMMVoiceDataPreparation.adaptScripts = true 
     64}}} 
     65If just data/raw is provided, the program will do the conversion.  If no text files are available but data/utts in festival format, the program will do the conversion as well. 
     66Since we are using several voices the distribution of files should look like: 
     67 
     68The speech files wav or raw files should be in: 
     69{{{ 
     70../wav/awb/     ../data/raw/awb/ 
     71../wav/bdl/     ../data/raw/bdl/ 
     72../wav/clb/     ../data/raw/clb/ 
     73../wav/jmk/     ../data/raw/jmk/ 
     74../wav/rms/     ../data/raw/rms/ 
     75../wav/slt/     ../data/raw/slt/ 
     76}}} 
     77The transcriptions corresponding to each voice should be located in: 
     78{{{ 
     79../text/awb/ 
     80../text/bdl/ 
     81../text/clb/ 
     82../text/jmk/ 
     83../text/rms/ 
     84../text/slt/ 
     85}}} 
     86 
     872- Run the PhoneUnitFeatureComputer component of the Feature Extraction group to extract context feature vectors from the text data. For running this component the MARY server should be running as well.  
     88 
     89Since we are using several voices for training the system plus another for adapting, we need to run the PhoneUnitComputer component with each voice.  
     90For doing so, for each voice use the settings editor of this component to set the corresponding output directory, for example: 
     91{{{ 
     92PhoneFeatureComputer.featureDir = ../phonefeatures/awb/ 
     93}}} 
     94 
     95Please remember to run this step for each voice. 
     96 
     973- Run the EHMMlabeler component of the Automatic Labeling group to label automatically the wav files using the corresponding transcriptions. 
     98For running the EHMMLabeler with each voice, please set: 
     99{{{ 
     100   EHMMLabeler.ehmm          = ../festvox/src/ehmm/bin/ 
     101   EHMMLabeler.featureDir    = ../phonefeatures/awb/ 
     102   EHMMLabeler.outputLabDir  = ../lab/awb/ 
     103}}} 
     104 
     1054- Run the LabelPauseDeleter component of the Automatic Labeling group. Please set the corresponding lab voice directory, for example:  
     106{{{ 
     107   LabelPauseDeleter.outputLabDir = ../lab/awb/ 
     108   LabelPauseDeleter.threshold = 10 
     109}}} 
     110 
     1115- Run the PhoneUnitLabelComputer component of the Labels and Pause Correction group. Please set the corresponding phonelab voice directory, for example: 
     112{{{ 
     113   PhoneUnitLabelComputer.labelDir  = ../phonelabel/awb/ 
     114}}} 
     115 
     1166- Run PhonelabelFeatureAligner component of the Labels and Pause Correction group. Please set the corresponding phonefeatures and phonelab voice directory, for example: 
     117{{{ 
     118   PhoneLabelFeatureAligner.featureDir = ../phonefeatures/awb/ 
     119   PhoneLabelFeatureAligner.labelDir   = ../phonelab/awb/ 
     120}}} 
     121 
     122Please remember to follow these steps for each voice. 
     123 
     124''' 
     125=== III) HMM models training: === 
     126''' 
     127 
     1287- Run the HMMVoiceConfigureAdapt component of the HMM Voice trainer group, the default setting values of this component are already fixed for the 
     129HTS-demo_CMU-ARCTIC-ADAPT voice. 
     130 
     131IMPORTANT: the names of the files should contain a label that identifies the data set and a label that identifies the voice.  
     132This is important because the training scripts require a mask to differentiate the data from one user to another.  
     133The spkrMask label should reflect the data set and the voice name labels. For example for the CMU_US_ARTIC data the settings are: 
     134{{{ 
     135HMMVoiceConfigureAdapt.dataSet     = cmu_us_arctic 
     136HMMVoiceConfigureAdapt.trainSpkr   = awb bdl clb jmk rms 
     137HMMVoiceConfigureAdapt.adaptSpkr   = slt 
     138HMMVoiceConfigureAdapt.spkrMask    = */cmu_us_arctic_%%%_* 
     139                                    (here the voice name is exactly 3 letters, so all the voice names should be 3 letters long) 
     140}}} 
     141 
     142The file names for CMU_US_ARCTIC have the fromat: 
     143{{{ 
     144awb --> cmu_us_arctic_awb_*.* 
     145bdl --> cmu_us_arctic_bdl_*.*   
     146clb --> cmu_us_arctic_clb_*.*  
     147jmk --> cmu_us_arctic_jmk_*.*  
     148rms --> cmu_us_arctic_rms_*.*  
     149slt --> cmu_us_arctic_slt_*.* 
     150}}} 
     151Using the setting of this component you can also change other variables like using LSP instead og MGC, sampling frequency, etc., 
     152the same as you would do when running "make configure" with the original HTS scripts. 
     153 
     154 
     1558- Run HMMVoiceMakeData component of the HMM Voice trainer group to run the HTS procedure "make data". This procedure is the same as in the original scripts with additional sections for calculating strenghts (for mixed exitation), global variance, and handling of MARY context features. 
     156 
     157Particular procedures can be repeated isolated, fixing the particular settings for this component. For example, if the procedure that creates strengths (str directory) has to be repeated with a different set of filters (data/filters/), please set: 
     158{{{ 
     159  HMMVoiceMakeData.makeSTR       1 
     160  HMMVocieMakeData.makeCMPMARY   1 
     161}}} 
     162all the other variables in 0, and run again the component. (In this case you need to run as well makeCMPMARY because you need to compose again the vectors mgc+lf0+str). 
     163 
     164The procedures can be repeated manually as well, going to the data directory and running "make data" or "make str", as is normally done with the original HTS scripts. 
     165 
     166Note: the Makefile in data/ includes a gv: section copied from HTS-2.1alpha version to calculate global variance files. In MARY, these files are generated little endian and contain a header of size one short to indicate the size of the vectors it contains. In the case of adapted voices, the gv variance is calculated from the adapted corpus. 
     167 
     1689- Run the HMMVoiceMakeVoiceAdapt component of the HMM Voice trainer group, here again particular training steps can be repeated selecting them (setting in 1, all the others in 0) from the settings of this component. This is equivalent to run again: 
     169{{{ 
     170   perl scripts/Training.pl scripts/Config.pm  
     171}}} 
     172after modifying the Config.pm file, as is normally done with the original HTS scripts. 
     173  
     174This component will generate general information about the execution of the training steps. Detailed information about the training status can be found in the logfile in the current directory. 
     175 
     176The training procedure can take several hours (or days), check the log file time to time to check progress. 
     177 
     178 
     179''' 
     180=== IV) Adding a new voice in the MARY platform: === 
     181''' 
     182 
     18310- Run HMMVoiceInstaller component of the Install Voice group. This step is similar to the the speaker dependent demo, but please set the appropriate directories 
     184for the gv and voices data, which should be in the directory of the adapted voice, for example for the adapted slt voice: 
     185{{{ 
     186HMMVoiceInstaller.Fgva data/gv/slt/gv-mag-littend.pdf 
     187HMMVoiceInstaller.Fgvf data/gv/slt/gv-lf0-littend.pdf 
     188HMMVoiceInstaller.Fgvm data/gv/slt/gv-mgc-littend.pdf 
     189HMMVoiceInstaller.Fgvs data/gv/slt/gv-str-littend.pdf 
     190 
     191and 
     192 
     193HMMVoiceInstaller.Fma voices/qst001/ver1/slt/mag.pdf 
     194HMMVoiceInstaller.Fmd voices/qst001/ver1/slt/dur.pdf 
     195HMMVoiceInstaller.Fmf voices/qst001/ver1/slt/lf0.pdf 
     196HMMVoiceInstaller.Fmm voices/qst001/ver1/slt/mgc.pdf 
     197HMMVoiceInstaller.Fms voices/qst001/ver1/slt/str.pdf 
     198HMMVoiceInstaller.Fta voices/qst001/ver1/slt/tree-mag.inf 
     199HMMVoiceInstaller.Ftd voices/qst001/ver1/slt/tree-dur.inf 
     200HMMVoiceInstaller.Ftf voices/qst001/ver1/slt/tree-lf0.inf 
     201HMMVoiceInstaller.Ftm voices/qst001/ver1/slt/tree-mgc.inf 
     202HMMVoiceInstaller.Fts voices/qst001/ver1/slt/tree-str.inf 
     203}}} 
     204 
     205 
     206''' 
     207=== V) Creating other voice in German or English. === 
     208''' 
     209 
     210If using German: 
     211 
     212For creating a new German voice it is necessary: [[BR]] 
     213  * a wav or raw directory with the speech files you will use for training the German voice. [[BR]] 
     214  * transcriptions of the files, one text file per speech file, or transcriptions in festival format if available. [[BR]] 
     215 
     216Then we use as a base the original HTS-demo_CMU-ARCTIC-SLT directory: 
     217 
     218- Download and un-zip, un-tar the HTS-demo_CMU-ARCTIC-SLT for HTS-2.0.1 
     219 
     220- Rename this directory as your new voice name, for example german_voice, and delete the directories data/raw and data/utt. 
     221 
     222- Apply the MARY patch to the german_voice directory. [[BR]] 
     223  patch -p1 -d . < HTS-2.0.1-demo_CMU-ARCTIC-ADAPT_for_Mary-3.5.0.patch 
     224 
     225- Move your speech files to this directory, if you have a wav directory, this should be copied in the current directory (german_voice/wav). If you have a raw directory, this should be copied in the data directory (german_voice/data/raw). 
     226 
     227- Move your transcription files to this directory, if you have a text directory containing the transcription of each file in separate files, this should be copied in the current directory (german_voice/text). If you have transcriptions in festival format please copy this directory in the data/utts directory (german_voice/data/utts/). 
     228 
     229 
     230- Please be aware of the names format for the adaptive scripts. Since it is used a mask for the names it is bettr if the names of your files have a particular format.  
     231For example for our PAVOQUE database the file names have the format: 
     232{{{ 
     233neutr --> pavoque_neutr_*.*    training data, big corpus, male voice and sound effect neutral. 
     234obadi --> pavoque_obadi_*.*    adapted voice, small corpus, the same male voice but sound effect depressed.  
     235poppy --> pavoque_poppy_*.*    adapted voice, small corpus, the same male voice but sound effect happy.  
     236spike --> pavoque_spike_*.*    adapted voice, small corpus, the same male voice but sound effect angry.  
     237}}} 
     238 
     239Having this distribution of files, our setting for configure will look like: 
     240{{{ 
     241HMMVoiceConfigureAdapt.dataSet     = pavoque 
     242HMMVoiceConfigureAdapt.trainSpkr   = neutr 
     243HMMVoiceConfigureAdapt.adaptSpkr   = obadi poppy spike 
     244HMMVoiceConfigureAdapt.spkrMask    = */pavoque_%%%%%_* 
     245                                    (here the voice names are exactly 5 letters long, it can not be a voice name with more that 5 letters!) 
     246}}} 
     247 
     248- Now run the VoiceImport program and follow the instructions as normal. Provide settings for locale must be "de", path to mary_base and name of the voice. 
     249 
     250[[BR]] 
     251[[BR]] 
     252 
     253Marcela Charfuelan DFKI 
     254Thu May  8 17:26:44 CEST 2008 
     255 
     256