Changes between Version 18 and Version 19 of VoiceImportToolsTutorial


Ignore:
Timestamp:
01/07/08 12:45:13 (17 years ago)
Author:
sach01
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • VoiceImportToolsTutorial

    v18 v19  
    106106 - !HalfPhoneUnitLabelComputer  
    107107 - !HalfPhoneLabelFeatureAligner  
     108 - !QualityControl 
    108109 - !HalfPhoneUnitfileWriter 
    109110 - !HalfPhoneFeatureFileWriter 
     
    116117 
    117118 
    118 == How to run? == 
     119== Step-by-Step Procedure: == 
    119120 
    1201211. First you need to have following 2 basic requirements for Voice Building 
    121122 
    122123                  a. Wave files  
    123                   b. Corresponding Transcription (in MARY or Festival Format)  
     124                  b. Corresponding Transcription (in MARY or Festvox Format)  
    124125 
    125126  MARY Format : Each transcription represented by a single file. All these files placed in a single directory. By default, all these files placed in 'text' directory of voice-building directory. 
    126127 
    127   Festival Format : A single file contains all transcriptions. For examples see below example.  
     128  Festvox (Festival) Format : A single file contains all transcriptions. For examples see below example.  
    128129 
    129130{{{ 
     
    214215 b. If your transcriptions are in Festvox Format, It is necessary to choose "''Festvox2MaryTranscripts''" Component. Because It will convert Festvox format transcriptions to MARY format transcriptions. Voice Import Tools uses MARY format transcription for building Voice. No need to choose "''Mary2FestvoxTranscripts''" component while Building a new Voice. Just we are providing that component for facilitating user to convert any format depending on requirements. 
    215216 c. ''!PhoneUnitFeatureComputer'' and ''!HalfPhoneUnitFeatureComputer'' needs a running MARY Server. It is very important point. User need to make sure a Mary Server running while executing above two Components. And one more important issue is MARY Server need to contain at least one Voice of language (German or English), which user wanted build a new voice. 
    216  d. ''!LabelledFilesInspector'' gives a GUI interface to check How good Automatic labeling. It will also support user to listen phone segments according to given timestamps from Automatic labeling. If user don't want to inspect labeling, better no need to choose this component. Because it will pause Voice building in between. 
     217 d. ''!LabelledFilesInspector'' gives a GUI interface to check how good Automatic labeling. It will also support user to listen phone segments according to given timestamps from Automatic labeling. If user don't want to inspect labeling, better no need to choose this component. Because it will pause Voice building in between. 
    217218     
     2197. While executing each component, a Progress bar shows the percentage of work completed for that component. Each Component converted to GREEN, if that component is executed successfully. And it converts to RED and it throws an exception, if that component unsuccessfully executed. If a component unsuccessfully executed, check configuration settings once again.       
     220 
    218221    
    219  == Explanation on Individual Voice Import Components == 
    220  
    221  
    222  
    223 == 1. Feature Extraction from Acoustic Data == 
    224  
    225  
    226 '''!PraatPitchmarker'''[[BR]] 
    227  
    228 It computes pitch markers with help of Praat. You need to compile or install Praat in your machine.[[BR]] 
    229  
    230 It also do corrections for Pitch Marks to align near by Zero Crossing. 
    231     
    232   
    233 Configuration Settings: 
    234   * command   - Give Absolute path of Praat Executable 
    235   * pmDir     - Output Dir Path for Praat Pitch marks  
    236   * corrPmDir - Output Dir Path for corrected pitch marks (Pitch marks tuned towards Zero Crossing)  
    237   * maxPitch, minPitch - For choosing Pitch Range (Ex: Male: 50-200 | Female: 150-300)  
    238    
    239 '''MCEPMaker'''[[BR]] 
    240  
    241 It calculate MFCCs from Speech Wave files, using Edinburgh Speech Tools.  
    242  
    243 Configuration Settings: 
    244  
    245  * estDir    - Edinburgh Speech Tools Compiled Directory 
    246  * pmDir     - Praat Pitch marks Directory 
    247  * corrPmDir - Corrected Pitch marks Directory 
    248  * mcepDir   - Output Dir for MFCCs  
    249    
    250  
    251 == 2. Support for Transcription Conversion == 
    252  
    253  
    254  
    255 '''Festvox2MaryTranscripts''' [[BR]] 
    256  
    257 This Component supports user to convert Festvox Transcription format (ex: txt.done.data) to MARY Supportable format. MARY contains individual text files for each wave file.  
    258 All Voice Import Components use Transcription from MARY Format. So This component is very useful, if user have Transcription in Festvox format. 
    259  
    260 Configuration Settings: 
    261  
    262  * transcriptFile   - Festvox format transcription file (Absolute path) 
    263  
    264     
    265 '''Mary2FestvoxTranscripts'''[[BR]] 
    266  
    267 It supports user to convert MARY Supportable format to Festvox format Transcription. It does reciprocal process to above component. 
    268  
    269 Configuration Settings: 
    270  
    271  * transcriptFile   -  Output Festvox format transcription file (Absolute path) 
    272  
    273  
    274  
    275 == 3. Feature Vector Extraction from Text Data == 
    276  
    277  
    278 '''!PhoneUnitFeatureComputer'''[[BR]] 
    279  
    280 !PhoneUnitFeatureComputer computes Phone feature vectors for Unit Selection Voice building process. [[BR]] 
    281  * Note: This module requires a running Maryserver from MARY Installation. [[BR]] 
    282 You can connect to a different server by altering the settings. See the settings help for more information on this. What type of features computed is depends on configuration file called "targetfeatures.config". This configuration file is in Marybase/conf/ directory and directs Server to compute feature vectors.  
    283  
    284  
    285 Configuration Settings: 
    286  
    287  * featureDir     -  Output Directory to place computed Phone feature vectors (Absolute path) 
    288  * maryServerHost -  Server Name  
    289  * maryServerPort -  Socket Port number (Default 59125)  
    290  
    291  
    292  
    293 '''!HalfPhoneUnitFeatureComputer''' 
    294  
    295 This component also same as above component. But It computes Half phone level feature vectors. Here "halfphone-targetfeatures.config" file, which is in Marybase/conf/ directory directs Server to compute Half-Phone level feature vectors. 
    296  
    297 Configuration Settings: 
    298  
    299  * featureDir     -  Output Directory to place computed Half-Phone feature vectors (Absolute path) 
    300  * maryServerHost -  Server Name  
    301  * maryServerPort -  Socket Port number (Default 59125)  
    302  
    303  
    304  
    305 == 4. Automatic Labeling == 
    306  
    307  
    308  
    309 '''EHMMLabeler'''[[BR]] 
    310  
    311 EHMM Labeler is a labeling tool, which generates label files with help of Wave files and corresponding Transcriptions. EHMM basic tool is available with Festvox Recent Version.  For running EHMM Labeler under MARY environment you need to compile EHMM tool in your machine. It may take long time depending on the size of the data and system configuration. 
    312  
    313 EHMMLabeler Supports:  
    314         1. Database labeling with Force alignment by Training with Flat-Start Initialization  
    315         2. Database Labeling with Force alignment by Training with initialized models (Re-Training) 
    316         3. Database Labeling with Force alignment by already existed models (Decoding only) 
    317  
    318  
    319 Configuration Settings: 
    320  
    321  * ehmmDir           - EHMM basic package compilation Directory. 
    322  * eDir              - Directory name (Absolute path) to copy Transcription (in ehmm Supported format) and to store ehmm model.      
    323  * featureDir        - Feature vectors Directory path, where phone features vectors were computed. (To get phone sequence) 
    324  * startEhmmModelDir - Already existing EHMM model Directory path to Initialize EHMM models (for Re-training or Decoding)     
    325  * reTrainFlag       - (true | false) true - Do re-training by initializing with given models. false - Do just Decoding  
    326  * outputLabDir      - Dir. Path to store generated Labels 
    327  
    328     
    329 '''Automatic Labeling using Sphinx Tools:''' 
    330  
    331  
    332 !SphinxLabelingPreparator, !SphinxTrainer and !SphinxLabeler Components used to do Automatic Labeling with Sphinx tools. These 3 components need !SphinxTrain, Sphinx Decoder  and Edinburgh Speech Tools for training models and Force alignment.  
    333  
    334  
    335  
    336 '''!SphinxLabelingPreparator'''[[BR]] 
    337  
    338 This Component prepares the required setup needed for !SphinxTrain to train Models. [[BR]] 
    339  
    340 Configuration Settings: 
    341  
    342  * estDir         - Edinburgh Speech Tools Compiled Directory 
    343  * maryServerHost - Server Name  
    344  * maryServerPort - Socket Port number (Default 59125)  
    345  * sphinxTrainDir - !SphinxTrain installation Directory 
    346  * stDir          - Directory name (Absolute path) to copy Dictionaries and Temp. files (in Sphinx Supported format). 
    347  * transcriptFile - Festvox format transcription file (Absolute path) 
    348  
    349 '''!SphinxTrainer'''[[BR]] 
    350  
    351 It trains models required for labeling using Sphinxtrain. It may take long time depending on the size of the data and system configuration. 
    352  
    353 Configuration Settings: 
    354  
    355  * stDir  - Absolute path of directory where all Dictionaries and Temp. files stored by !SphinxLabelingPreparator. 
    356  
    357  
    358  
    359 '''!SphinxLabeler'''[[BR]] 
    360  
    361 It produces labels with the help of the models built by the !SphinxTrainer. It uses Sphinx-2 Decoder for force alignment.  
    362  
    363 Configuration Settings: 
    364  
    365  * sphinx2Dir - Sphinx-2 Installation directory absolute path. 
    366  * stDir      - Absolute path of directory where all Dictionaries, Temp. files and models stored by !SphinxLabelingPreparator and !SphinxTrainer. 
    367  
    368  
    369 '''MRPALabelConverter'''[[BR]] 
    370  
    371 If you have labeled data in the Festvox format and using the MRPA-Phoneset, use this module to convert the phones into the phoneset used by Mary. 
    372  
    373 Configuration Settings: 
    374  
    375  * mrpaLabDir  - MRPA Label file directory  
    376  
    377  
    378  
    379 == 5. Label or Pause Correction and Label-Feature Alignment == 
    380   
    381  
    382  
    383 '''!LabelledFilesInspector'''[[BR]] 
    384  
    385 It allows user to browse through aligned labels and listen to the corresponding wave file. It is useful for perceptual manual verification on alignment. 
    386  
    387 Configuration Settings: 
    388  
    389  * corrPmDir - Directory Path for corrected pitch marks.  
    390  
    391 '''!PhoneUnitLabelComputer'''  and  '''!HalfPhoneUnitLabelComputer'''[[BR]] 
    392  
    393 These components converts the label files into the label files used by Mary. !PhoneUnitLabelComputer produces phone labels, !HalfPhoneUnitLabelComputer produces halfphone labels. User need both to build the voice. 
    394  
    395 Configuration Settings: 
    396  
    397  * labelDir - Output phone label dir. path for !PhoneUnitLabelComputer.  
    398               Output Half phone label dir. path !HalfPhoneUnitLabelComputer.   
    399  
    400 '''!PhoneLabelFeatureAligner'''[[BR]] 
    401  
    402 It tries to align the labels and the feature vectors. If alignment fails, you can start the automatic pause correction.[[BR]] 
    403  
    404 This works as follows: 
    405  
    406  - pauses, that are in the label file but not in the feature file are deleted in the label file, and the durations of the previous and next labels are stretched. 
    407  - pauses that are in the feature file but not in the label file are inserted into the label file with length zero. 
    408  
    409  
    410 If there are still errors after the pause correction, you are prompted for each error. You can skip the error or remove the corresponding file from the basename list (the list of files that are used for your voice). "skip all" and "remove all" does this for all problematic files. "Edit unit labels" allows you to edit the label file. "Edit RAWMARYXML" lets you edit the maryxml that is the input for computing the features. You have to have a Maryserver running in order to recompute the features from the maryxml. You can alter the host and port settings for the server by altering the settings for the !UnitFeatureComputer.  
    411  
    412 Configaration Settings: 
    413  
    414  * featureDir - Phone feature vectors directory  
    415  * labDir     - Phone Labels directory 
    416  
    417 '''!HalfPhoneLabelFeatureAligner'''[[BR]] 
    418  
    419 It also works same as !PhoneLabelFeatureAligner, but it works for halfphone units case.  
    420  
    421 Configuration Settings: 
    422  
    423  * featureDir - Half Phone feature vectors directory  
    424  * labDir     - Half Phone Labels directory 
    425  
    426  
    427 == 6. Basic Data Files == 
    428  
    429  
    430 Following components will create basic binary files, which contain whole voice database. So that it is easier and faster to access Database. These files are needed for various voice building steps and for synthesis.  
    431  
    432 '''!WaveTimelineMaker'''[[BR]] 
    433  
    434 The !WaveTimelineMaker split the waveforms as datagrams to be stored in a timeline in Mary format. It produces a binary file, which contains all wave files.   
    435  
    436 Configuration Settings: 
    437  
    438  * corrPmDir    - Directory Path for corrected pitch marks.  
    439  * !WaveTimeline - file containing all wave files. Will be created by this module 
    440  
    441 '''!BasenameTimelineMaker''' 
    442  
    443 The !BasenameTimelineMaker takes a database root directory and a list of basenames, and associates the basenames with absolute times in a timeline in Mary format. 
    444  
    445 Configuration Settings: 
    446  
    447  * pmDir        - Directory containing the pitchmarks 
    448  * timelineFile - file containing the list of files and their times, which will be created by this module.  
    449  
    450 '''MCepTimelineMaker''' 
    451  
    452 The MCepTimelineMaker takes a database root directory and a list of basenames, and converts the related wav files into a mcep timeline in Mary format. 
    453  
    454 Configuration Settings: 
    455  
    456  * mcepDir       - directory containing the mcep files   
    457  * mcepTimeline  - file containing all mcep files. Will be created by this module 
    458  
    459  
    460  
    461 == 7. Building acoustic models == 
    462  
    463  
    464  
    465 '''!PhoneUnitfileWriter'''[[BR]] 
    466  
    467 It produces a file containing all phone sized units.  
    468  
    469 Configuration Settings: 
    470  
    471  * corrPmDir  - Directory containing the corrected pitchmarks 
    472  * labelDir   - Directory containing the phone labels 
    473  * unitFile   - File containing all phone units. Will be created by this module 
    474  
    475  
    476 '''!PhoneFeatureFileWriter'''[[BR]] 
    477  
    478 It produces a file containing all the target cost features for the phone sized units. The module needs a file defining which features are to be used and what weights are given to them. They must be the same features as the ones that the !PhoneFeatureComputer used. If you do not have a feature definition, the module tries to create one. 
    479  
    480  
    481 For more information, see the example file: ''Marybase/lib/modules/import/examples/PhoneUnitFeatureDefinition.txt'' 
    482  
    483 Configuration Settings: 
    484  
    485  * featureDir  - directory containing the phone features 
    486  * featureFile - file containing all phone units and their target cost features.Will be created by this module 
    487  * unitFile    - file containing all phone units 
    488  * weightsFile - file containing the list of phone target cost features, their values and weights 
    489  
    490 '''DurationCARTTrainer'''[[BR]] 
    491  
    492 It builds an acoustic model of durations in the database using the program "wagon" from the Edinburgh Speech tools. 
    493  
    494 Configuration Settings: 
    495  
    496  * durTree          - file containing the duration CART. Will be created by this module 
    497  * estDir           - directory containing the local installation of the Edinburgh Speech Tools 
    498  * featureDir       - directory containing the phonefeatures 
    499  * featureFile      - file containing all phone units and their target cost features 
    500  * labelDir         - directory containing the phone labels 
    501  * stepwiseTraining - "false" or "true" 
    502  * unitFile         - file containing all phone units 
    503  * waveTimeline     - file containing all wave files 
    504  
    505 '''F0CARTTrainer'''[[BR]] 
    506  
    507 It builds acoustic models of F0 like DurationCARTTrainer. It uses "wagon" and the files produced by !PhoneUnitfileWriter and !PhoneFeatureFileWriter. 
    508  
    509 Configuration Settings: 
    510  
    511  * estDir           - directory containing the local installation of the Edinburgh Speech Tools 
    512  * f0LeftTreeFile   - file containing the left f0 CART. Will be created by this module 
    513  * f0MidTreeFile    - file containing the middle f0 CART. Will be created by this module 
    514  * f0RightTreeFile  - file containing the right f0 CART. Will be created by this module 
    515  * featureDir       - directory containing the phonefeatures 
    516  * featureFile      - file containing all phone units and their target cost features 
    517  * labelDir         - directory containing the phone label files 
    518  * stepwiseTraining - "false" or "true"  
    519  * unitFile         - file containing all phone units 
    520  * waveTimeline     - file containing all wave files 
    521  
    522  
    523  
    524  
    525 == 8. Unit Selection == 
    526  
    527  
    528 '''!HalfPhoneUnitfileWriter'''[[BR]] 
    529  
    530 It produces a file containing all halfphone sized units. 
    531  
    532  
    533 Configuration Settings: 
    534  
    535  * corrPmDir - directory containing the corrected pitchmarks 
    536  * labelDir - directory containing the halfphone labels 
    537  * unitFile - file containing all halfphone units. Will be created by this module 
    538  
    539  
    540 '''!HalfPhoneFeatureFileWriter'''[[BR]] 
    541  
    542 It produces a file containing all the target cost features for the phone sized units. The module needs a file defining which features are to be used and what weights are given to them. They must be the same features as the ones that the !HalfPhoneFeatureComputer used. If you do not have a feature definition, the module tries to create one. 
    543  
    544 For more information, see the example file: ''Marybase/lib/modules/import/examples/HalfPhoneUnitFeatureDefinition.txt'' 
    545  
    546 Configuration Settings: 
    547  
    548  * featureDir - directory containing the halfphone features 
    549  * featureFile - file containing all halfphone units and their target cost features.Will be created by this module 
    550  * unitFile - file containing all halfphone units 
    551  * weightsFile - file containing the list of halfphone target cost features, their values and weights 
    552  
    553  
    554 '''!JoinCostFileMaker'''[[BR]] 
    555  
    556 It produces a file containing all the join cost features for the halfphone sized units. 
    557  
    558 Configuration Settings: 
    559  
    560  * joinCostFile - file containing all halfphone units and their join cost features. Will be created by this module 
    561  * mcepDir - directory containing the mcep files 
    562  * mcepTimeline - file containing all mcep files 
    563  * unitFile - file containing all halfphone units 
    564  * weightsFile - file containing the list of join cost weights and their weights 
    565  
    566  
    567 '''!AcousticFeatureFileWriter'''[[BR]] 
    568  
    569 It produces a file containing all the target cost features plus two acoustic target cost features for the halfphone sized units. Also produces a feature definition containing those features. 
    570  
    571  * acFeatDef - file containing the list of phone target cost features, their values and weights 
    572  * acFeatureFile - file containing all halfphone units and their target cost features plus the acoustic target cost features. Will be created by this module. 
    573  * featureFile - file containing all halfphone units and their target cost features 
    574  * unitFile - file containing all halfphone units 
    575  * waveTimeLine - file containing all wave files 
    576  
    577  
    578 '''CARTBuilder'''[[BR]] 
    579  
    580 It builds a preselection tree for the target cost features using "wagon" (CART) from the Edinburgh Speech tools. 
    581  
    582 Additionally, User need to specify either a feature sequence or a top level tree. They are used to built a basic tree that is extendend by wagon. This way, wagon runs several times on smaller subsets of units rather than the whole set. It might still take some time to run this module.  
    583  
    584  - Feature sequence: A file containing a list of features for which to build the tree. 
    585  - Top level tree: A file containing the basic tree. 
    586  
    587 For more information on these two possibilities of specifying the basic tree, see the example files in ''Marybase/lib/modules/import/examples/'' 
    588  
    589 If you give the CARTBuilder neither a feature sequence nor a top level tree file, a default feature sequence is created which only contains "mary_phoneme" as feature. If the basic tree contains leaves that are contain more units than the maximum number of units allowed, the leaves are pruned and a warning message is printed. It is recommended that you make sure that there are no leaves that are too big. 
    590  
    591  
    592 Configuration Settings: 
    593  
    594  * acFeatureFile - file containing all halfphone units and their target cost features plus the acoustic target cost features 
    595  * cartFile - file containing the preselection CART. Will be created by this module 
    596  * estDir - directory containing the local installation of the Edinburgh Speech Tools 
    597  * featureSeqFile - file containing the feature sequence for the basic tree 
    598  * maxLeafSize - the maximum number of units in a leaf of the basic tree 
    599  * mcepTimeline - file containing the mcep files 
    600  * readFeatureSequence - if "true", basic tree is read from feature sequence file; if "false", basic tree is read from top level tree file. 
    601  * topLevelTreeFile - file containing the basic tree 
    602  * unitFile - file containing all halfphone units 
    603  
    604  
    605 '''CARTPruner'''[[BR]] 
    606  
    607 It prunes the preselection tree and this module also removes outliers from the preselection tree. 
    608  
    609 Configuration Settings: 
    610  
    611  * cartFile - file containing the preselection CART 
    612  * prunedCartFile - file containing the pruned preselection CART. Will be created by this module 
    613  * unitFeatureFile - file containing all halfphone units and their target cost features 
    614  * unitFile - file containing all halfphone units 
    615  * waveFile - file containing all wave files 
    616  
    617  
    618  
    619 == 9. Installation of New Voice in to MARY == 
    620  
    621  
    622 '''!VoiceInstaller'''[[BR]] 
    623  
    624 It supports the built voice installation in to MARY automatically. It copies all the necessary files to a new subdirectory in the ''lib/voices/'' directory of your Mary installation. Furthermore, a file that specifies the properties of the voice is created and stored in the ''conf/'' directory of your Mary installation. Next time you start the Mary server, the voice is loaded.  
    625  
    626  
    627 Configuration Settings: 
    628  
    629  * cartFile - file containing the preselection CART 
    630  * durTree - file containing the duration CART 
    631  * exampleText - file containing example text (for limited domain voices only) 
    632  * f0LeftTree - file containing the left f0 CART 
    633  * f0MidTree - file containing the mid f0 CART 
    634  * f0RightTree - file containing the right f0 CART 
    635  * halfPhoneFeatDefAc - file containing the list of halfphone target cost features, their values and weights 
    636  * halfPhoneFeatsAc - file containing all halfphone units and their target cost featuresplus the acoustic target cost features 
    637  * halfPhoneUnits - file containing all halfphone units 
    638  * joinCostFeatDef - file containing the list of join cost weights and their weights 
    639  * joinCostFeats - file containing all halfphone units and their join cost features 
    640  * phoneFeatDef - file containing the list of phone target cost features, their values and weights 
    641  * waveTimeline - file containing all wave files  
    642  
    643  
    644  
    645  
    646  
    647  
    648  
    649  
    650  
    651  
    652  
    653  
     222We hope this tutorial helps to build a new voice using Voice Import Tools under MARY environment and Working of Individual Voice Import Components explained [wiki:VoiceImportComponents here].  
     223[[BR]] 
     224  
     225 
     226* [wiki:VoiceImportComponents Explanation on Individual Voice Import Components] 
     227 
     228 
     229[[BR]] 
     230[[BR]] 
     231[[BR]] 
     232[[BR]] 
     233 
     234-  Sathish Chandra Pammi (Sathish.Chandra@dfki.de) 
     235 
     236 
     237 
     238 
     239 
     240 
     241 
     242 
     243 
     244 
     245 
     246 
     247