Changes between Initial Version and Version 1 of VoiceImportComponents


Ignore:
Timestamp:
01/07/08 11:53:03 (17 years ago)
Author:
sach01
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • VoiceImportComponents

    v1 v1  
     1= Voice Import Tools: Explanation on Individual Voice Import Components = 
     2 
     3 
     4 
     5== 1. Feature Extraction from Acoustic Data == 
     6 
     7 
     8'''!PraatPitchmarker'''[[BR]] 
     9 
     10It computes pitch markers with help of Praat. You need to compile or install Praat in your machine.[[BR]] 
     11 
     12It also do corrections for Pitch Marks to align near by Zero Crossing. 
     13    
     14  
     15Configuration Settings: 
     16  * command   - Give Absolute path of Praat Executable 
     17  * pmDir     - Output Dir Path for Praat Pitch marks  
     18  * corrPmDir - Output Dir Path for corrected pitch marks (Pitch marks tuned towards Zero Crossing)  
     19  * maxPitch, minPitch - For choosing Pitch Range (Ex: Male: 50-200 | Female: 150-300)  
     20   
     21'''MCEPMaker'''[[BR]] 
     22 
     23It calculate MFCCs from Speech Wave files, using Edinburgh Speech Tools.  
     24 
     25Configuration Settings: 
     26 
     27 * estDir    - Edinburgh Speech Tools Compiled Directory 
     28 * pmDir     - Praat Pitch marks Directory 
     29 * corrPmDir - Corrected Pitch marks Directory 
     30 * mcepDir   - Output Dir for MFCCs  
     31   
     32 
     33== 2. Support for Transcription Conversion == 
     34 
     35 
     36 
     37'''Festvox2MaryTranscripts''' [[BR]] 
     38 
     39This Component supports user to convert Festvox Transcription format (ex: txt.done.data) to MARY Supportable format. MARY contains individual text files for each wave file.  
     40All Voice Import Components use Transcription from MARY Format. So This component is very useful, if user have Transcription in Festvox format. 
     41 
     42Configuration Settings: 
     43 
     44 * transcriptFile   - Festvox format transcription file (Absolute path) 
     45 
     46    
     47'''Mary2FestvoxTranscripts'''[[BR]] 
     48 
     49It supports user to convert MARY Supportable format to Festvox format Transcription. It does reciprocal process to above component. 
     50 
     51Configuration Settings: 
     52 
     53 * transcriptFile   -  Output Festvox format transcription file (Absolute path) 
     54 
     55 
     56 
     57== 3. Feature Vector Extraction from Text Data == 
     58 
     59 
     60'''!PhoneUnitFeatureComputer'''[[BR]] 
     61 
     62!PhoneUnitFeatureComputer computes Phone feature vectors for Unit Selection Voice building process. [[BR]] 
     63 * Note: This module requires a running Maryserver from MARY Installation. [[BR]] 
     64You can connect to a different server by altering the settings. See the settings help for more information on this. What type of features computed is depends on configuration file called "targetfeatures.config". This configuration file is in Marybase/conf/ directory and directs Server to compute feature vectors.  
     65 
     66 
     67Configuration Settings: 
     68 
     69 * featureDir     -  Output Directory to place computed Phone feature vectors (Absolute path) 
     70 * maryServerHost -  Server Name  
     71 * maryServerPort -  Socket Port number (Default 59125)  
     72 
     73 
     74 
     75'''!HalfPhoneUnitFeatureComputer''' 
     76 
     77This component also same as above component. But It computes Half phone level feature vectors. Here "halfphone-targetfeatures.config" file, which is in Marybase/conf/ directory directs Server to compute Half-Phone level feature vectors. 
     78 
     79Configuration Settings: 
     80 
     81 * featureDir     -  Output Directory to place computed Half-Phone feature vectors (Absolute path) 
     82 * maryServerHost -  Server Name  
     83 * maryServerPort -  Socket Port number (Default 59125)  
     84 
     85 
     86 
     87== 4. Automatic Labeling == 
     88 
     89 
     90 
     91'''EHMMLabeler'''[[BR]] 
     92 
     93EHMM Labeler is a labeling tool, which generates label files with help of Wave files and corresponding Transcriptions. EHMM basic tool is available with Festvox Recent Version.  For running EHMM Labeler under MARY environment you need to compile EHMM tool in your machine. It may take long time depending on the size of the data and system configuration. 
     94 
     95EHMMLabeler Supports:  
     96        1. Database labeling with Force alignment by Training with Flat-Start Initialization  
     97        2. Database Labeling with Force alignment by Training with initialized models (Re-Training) 
     98        3. Database Labeling with Force alignment by already existed models (Decoding only) 
     99 
     100 
     101Configuration Settings: 
     102 
     103 * ehmmDir           - EHMM basic package compilation Directory. 
     104 * eDir              - Directory name (Absolute path) to copy Transcription (in ehmm Supported format) and to store ehmm model.      
     105 * featureDir        - Feature vectors Directory path, where phone features vectors were computed. (To get phone sequence) 
     106 * startEhmmModelDir - Already existing EHMM model Directory path to Initialize EHMM models (for Re-training or Decoding)     
     107 * reTrainFlag       - (true | false) true - Do re-training by initializing with given models. false - Do just Decoding  
     108 * outputLabDir      - Dir. Path to store generated Labels 
     109 
     110    
     111'''Automatic Labeling using Sphinx Tools:''' 
     112 
     113 
     114!SphinxLabelingPreparator, !SphinxTrainer and !SphinxLabeler Components used to do Automatic Labeling with Sphinx tools. These 3 components need !SphinxTrain, Sphinx Decoder  and Edinburgh Speech Tools for training models and Force alignment.  
     115 
     116 
     117 
     118'''!SphinxLabelingPreparator'''[[BR]] 
     119 
     120This Component prepares the required setup needed for !SphinxTrain to train Models. [[BR]] 
     121 
     122Configuration Settings: 
     123 
     124 * estDir         - Edinburgh Speech Tools Compiled Directory 
     125 * maryServerHost - Server Name  
     126 * maryServerPort - Socket Port number (Default 59125)  
     127 * sphinxTrainDir - !SphinxTrain installation Directory 
     128 * stDir          - Directory name (Absolute path) to copy Dictionaries and Temp. files (in Sphinx Supported format). 
     129 * transcriptFile - Festvox format transcription file (Absolute path) 
     130 
     131'''!SphinxTrainer'''[[BR]] 
     132 
     133It trains models required for labeling using Sphinxtrain. It may take long time depending on the size of the data and system configuration. 
     134 
     135Configuration Settings: 
     136 
     137 * stDir  - Absolute path of directory where all Dictionaries and Temp. files stored by !SphinxLabelingPreparator. 
     138 
     139 
     140 
     141'''!SphinxLabeler'''[[BR]] 
     142 
     143It produces labels with the help of the models built by the !SphinxTrainer. It uses Sphinx-2 Decoder for force alignment.  
     144 
     145Configuration Settings: 
     146 
     147 * sphinx2Dir - Sphinx-2 Installation directory absolute path. 
     148 * stDir      - Absolute path of directory where all Dictionaries, Temp. files and models stored by !SphinxLabelingPreparator and !SphinxTrainer. 
     149 
     150 
     151'''MRPALabelConverter'''[[BR]] 
     152 
     153If you have labeled data in the Festvox format and using the MRPA-Phoneset, use this module to convert the phones into the phoneset used by Mary. 
     154 
     155Configuration Settings: 
     156 
     157 * mrpaLabDir  - MRPA Label file directory  
     158 
     159 
     160 
     161== 5. Label or Pause Correction and Label-Feature Alignment == 
     162  
     163 
     164 
     165'''!LabelledFilesInspector'''[[BR]] 
     166 
     167It allows user to browse through aligned labels and listen to the corresponding wave file. It is useful for perceptual manual verification on alignment. 
     168 
     169Configuration Settings: 
     170 
     171 * corrPmDir - Directory Path for corrected pitch marks.  
     172 
     173'''!PhoneUnitLabelComputer'''  and  '''!HalfPhoneUnitLabelComputer'''[[BR]] 
     174 
     175These components converts the label files into the label files used by Mary. !PhoneUnitLabelComputer produces phone labels, !HalfPhoneUnitLabelComputer produces halfphone labels. User need both to build the voice. 
     176 
     177Configuration Settings: 
     178 
     179 * labelDir - Output phone label dir. path for !PhoneUnitLabelComputer.  
     180              Output Half phone label dir. path !HalfPhoneUnitLabelComputer.   
     181 
     182'''!PhoneLabelFeatureAligner'''[[BR]] 
     183 
     184It tries to align the labels and the feature vectors. If alignment fails, you can start the automatic pause correction.[[BR]] 
     185 
     186This works as follows: 
     187 
     188 - pauses, that are in the label file but not in the feature file are deleted in the label file, and the durations of the previous and next labels are stretched. 
     189 - pauses that are in the feature file but not in the label file are inserted into the label file with length zero. 
     190 
     191 
     192If there are still errors after the pause correction, you are prompted for each error. You can skip the error or remove the corresponding file from the basename list (the list of files that are used for your voice). "skip all" and "remove all" does this for all problematic files. "Edit unit labels" allows you to edit the label file. "Edit RAWMARYXML" lets you edit the maryxml that is the input for computing the features. You have to have a Maryserver running in order to recompute the features from the maryxml. You can alter the host and port settings for the server by altering the settings for the !UnitFeatureComputer.  
     193 
     194Configaration Settings: 
     195 
     196 * featureDir - Phone feature vectors directory  
     197 * labDir     - Phone Labels directory 
     198 
     199'''!HalfPhoneLabelFeatureAligner'''[[BR]] 
     200 
     201It also works same as !PhoneLabelFeatureAligner, but it works for halfphone units case.  
     202 
     203Configuration Settings: 
     204 
     205 * featureDir - Half Phone feature vectors directory  
     206 * labDir     - Half Phone Labels directory 
     207 
     208 
     209== 6. Basic Data Files == 
     210 
     211 
     212Following components will create basic binary files, which contain whole voice database. So that it is easier and faster to access Database. These files are needed for various voice building steps and for synthesis.  
     213 
     214'''!WaveTimelineMaker'''[[BR]] 
     215 
     216The !WaveTimelineMaker split the waveforms as datagrams to be stored in a timeline in Mary format. It produces a binary file, which contains all wave files.   
     217 
     218Configuration Settings: 
     219 
     220 * corrPmDir    - Directory Path for corrected pitch marks.  
     221 * !WaveTimeline - file containing all wave files. Will be created by this module 
     222 
     223'''!BasenameTimelineMaker''' 
     224 
     225The !BasenameTimelineMaker takes a database root directory and a list of basenames, and associates the basenames with absolute times in a timeline in Mary format. 
     226 
     227Configuration Settings: 
     228 
     229 * pmDir        - Directory containing the pitchmarks 
     230 * timelineFile - file containing the list of files and their times, which will be created by this module.  
     231 
     232'''MCepTimelineMaker''' 
     233 
     234The MCepTimelineMaker takes a database root directory and a list of basenames, and converts the related wav files into a mcep timeline in Mary format. 
     235 
     236Configuration Settings: 
     237 
     238 * mcepDir       - directory containing the mcep files   
     239 * mcepTimeline  - file containing all mcep files. Will be created by this module 
     240 
     241 
     242 
     243== 7. Building acoustic models == 
     244 
     245 
     246 
     247'''!PhoneUnitfileWriter'''[[BR]] 
     248 
     249It produces a file containing all phone sized units.  
     250 
     251Configuration Settings: 
     252 
     253 * corrPmDir  - Directory containing the corrected pitchmarks 
     254 * labelDir   - Directory containing the phone labels 
     255 * unitFile   - File containing all phone units. Will be created by this module 
     256 
     257 
     258'''!PhoneFeatureFileWriter'''[[BR]] 
     259 
     260It produces a file containing all the target cost features for the phone sized units. The module needs a file defining which features are to be used and what weights are given to them. They must be the same features as the ones that the !PhoneFeatureComputer used. If you do not have a feature definition, the module tries to create one. 
     261 
     262 
     263For more information, see the example file: ''Marybase/lib/modules/import/examples/PhoneUnitFeatureDefinition.txt'' 
     264 
     265Configuration Settings: 
     266 
     267 * featureDir  - directory containing the phone features 
     268 * featureFile - file containing all phone units and their target cost features.Will be created by this module 
     269 * unitFile    - file containing all phone units 
     270 * weightsFile - file containing the list of phone target cost features, their values and weights 
     271 
     272'''DurationCARTTrainer'''[[BR]] 
     273 
     274It builds an acoustic model of durations in the database using the program "wagon" from the Edinburgh Speech tools. 
     275 
     276Configuration Settings: 
     277 
     278 * durTree          - file containing the duration CART. Will be created by this module 
     279 * estDir           - directory containing the local installation of the Edinburgh Speech Tools 
     280 * featureDir       - directory containing the phonefeatures 
     281 * featureFile      - file containing all phone units and their target cost features 
     282 * labelDir         - directory containing the phone labels 
     283 * stepwiseTraining - "false" or "true" 
     284 * unitFile         - file containing all phone units 
     285 * waveTimeline     - file containing all wave files 
     286 
     287'''F0CARTTrainer'''[[BR]] 
     288 
     289It builds acoustic models of F0 like DurationCARTTrainer. It uses "wagon" and the files produced by !PhoneUnitfileWriter and !PhoneFeatureFileWriter. 
     290 
     291Configuration Settings: 
     292 
     293 * estDir           - directory containing the local installation of the Edinburgh Speech Tools 
     294 * f0LeftTreeFile   - file containing the left f0 CART. Will be created by this module 
     295 * f0MidTreeFile    - file containing the middle f0 CART. Will be created by this module 
     296 * f0RightTreeFile  - file containing the right f0 CART. Will be created by this module 
     297 * featureDir       - directory containing the phonefeatures 
     298 * featureFile      - file containing all phone units and their target cost features 
     299 * labelDir         - directory containing the phone label files 
     300 * stepwiseTraining - "false" or "true"  
     301 * unitFile         - file containing all phone units 
     302 * waveTimeline     - file containing all wave files 
     303 
     304 
     305 
     306 
     307== 8. Unit Selection == 
     308 
     309 
     310'''!HalfPhoneUnitfileWriter'''[[BR]] 
     311 
     312It produces a file containing all halfphone sized units. 
     313 
     314 
     315Configuration Settings: 
     316 
     317 * corrPmDir - directory containing the corrected pitchmarks 
     318 * labelDir - directory containing the halfphone labels 
     319 * unitFile - file containing all halfphone units. Will be created by this module 
     320 
     321 
     322'''!HalfPhoneFeatureFileWriter'''[[BR]] 
     323 
     324It produces a file containing all the target cost features for the phone sized units. The module needs a file defining which features are to be used and what weights are given to them. They must be the same features as the ones that the !HalfPhoneFeatureComputer used. If you do not have a feature definition, the module tries to create one. 
     325 
     326For more information, see the example file: ''Marybase/lib/modules/import/examples/HalfPhoneUnitFeatureDefinition.txt'' 
     327 
     328Configuration Settings: 
     329 
     330 * featureDir - directory containing the halfphone features 
     331 * featureFile - file containing all halfphone units and their target cost features.Will be created by this module 
     332 * unitFile - file containing all halfphone units 
     333 * weightsFile - file containing the list of halfphone target cost features, their values and weights 
     334 
     335 
     336'''!JoinCostFileMaker'''[[BR]] 
     337 
     338It produces a file containing all the join cost features for the halfphone sized units. 
     339 
     340Configuration Settings: 
     341 
     342 * joinCostFile - file containing all halfphone units and their join cost features. Will be created by this module 
     343 * mcepDir - directory containing the mcep files 
     344 * mcepTimeline - file containing all mcep files 
     345 * unitFile - file containing all halfphone units 
     346 * weightsFile - file containing the list of join cost weights and their weights 
     347 
     348 
     349'''!AcousticFeatureFileWriter'''[[BR]] 
     350 
     351It produces a file containing all the target cost features plus two acoustic target cost features for the halfphone sized units. Also produces a feature definition containing those features. 
     352 
     353 * acFeatDef - file containing the list of phone target cost features, their values and weights 
     354 * acFeatureFile - file containing all halfphone units and their target cost features plus the acoustic target cost features. Will be created by this module. 
     355 * featureFile - file containing all halfphone units and their target cost features 
     356 * unitFile - file containing all halfphone units 
     357 * waveTimeLine - file containing all wave files 
     358 
     359 
     360'''CARTBuilder'''[[BR]] 
     361 
     362It builds a preselection tree for the target cost features using "wagon" (CART) from the Edinburgh Speech tools. 
     363 
     364Additionally, User need to specify either a feature sequence or a top level tree. They are used to built a basic tree that is extendend by wagon. This way, wagon runs several times on smaller subsets of units rather than the whole set. It might still take some time to run this module.  
     365 
     366 - Feature sequence: A file containing a list of features for which to build the tree. 
     367 - Top level tree: A file containing the basic tree. 
     368 
     369For more information on these two possibilities of specifying the basic tree, see the example files in ''Marybase/lib/modules/import/examples/'' 
     370 
     371If you give the CARTBuilder neither a feature sequence nor a top level tree file, a default feature sequence is created which only contains "mary_phoneme" as feature. If the basic tree contains leaves that are contain more units than the maximum number of units allowed, the leaves are pruned and a warning message is printed. It is recommended that you make sure that there are no leaves that are too big. 
     372 
     373 
     374Configuration Settings: 
     375 
     376 * acFeatureFile - file containing all halfphone units and their target cost features plus the acoustic target cost features 
     377 * cartFile - file containing the preselection CART. Will be created by this module 
     378 * estDir - directory containing the local installation of the Edinburgh Speech Tools 
     379 * featureSeqFile - file containing the feature sequence for the basic tree 
     380 * maxLeafSize - the maximum number of units in a leaf of the basic tree 
     381 * mcepTimeline - file containing the mcep files 
     382 * readFeatureSequence - if "true", basic tree is read from feature sequence file; if "false", basic tree is read from top level tree file. 
     383 * topLevelTreeFile - file containing the basic tree 
     384 * unitFile - file containing all halfphone units 
     385 
     386 
     387'''CARTPruner'''[[BR]] 
     388 
     389It prunes the preselection tree and this module also removes outliers from the preselection tree. 
     390 
     391Configuration Settings: 
     392 
     393 * cartFile - file containing the preselection CART 
     394 * prunedCartFile - file containing the pruned preselection CART. Will be created by this module 
     395 * unitFeatureFile - file containing all halfphone units and their target cost features 
     396 * unitFile - file containing all halfphone units 
     397 * waveFile - file containing all wave files 
     398 
     399 
     400 
     401== 9. Installation of New Voice in to MARY == 
     402 
     403 
     404'''!VoiceInstaller'''[[BR]] 
     405 
     406It supports the built voice installation in to MARY automatically. It copies all the necessary files to a new subdirectory in the ''lib/voices/'' directory of your Mary installation. Furthermore, a file that specifies the properties of the voice is created and stored in the ''conf/'' directory of your Mary installation. Next time you start the Mary server, the voice is loaded.  
     407 
     408 
     409Configuration Settings: 
     410 
     411 * cartFile - file containing the preselection CART 
     412 * durTree - file containing the duration CART 
     413 * exampleText - file containing example text (for limited domain voices only) 
     414 * f0LeftTree - file containing the left f0 CART 
     415 * f0MidTree - file containing the mid f0 CART 
     416 * f0RightTree - file containing the right f0 CART 
     417 * halfPhoneFeatDefAc - file containing the list of halfphone target cost features, their values and weights 
     418 * halfPhoneFeatsAc - file containing all halfphone units and their target cost featuresplus the acoustic target cost features 
     419 * halfPhoneUnits - file containing all halfphone units 
     420 * joinCostFeatDef - file containing the list of join cost weights and their weights 
     421 * joinCostFeats - file containing all halfphone units and their join cost features 
     422 * phoneFeatDef - file containing the list of phone target cost features, their values and weights 
     423 * waveTimeline - file containing all wave files  
     424 
     425 
     426 
     427 
     428 
     429 
     430 
     431 
     432 
     433 
     434 
     435