wiki:VoiceImportComponents

Context Navigation

Version 8 (modified by ingmar.steiner, 16 years ago) (diff)
correction to property name (which should not be confused with the corresponding property in the voice config)

Voice Import Tools: Explanation on Individual Voice Import Components

1. Feature Extraction from Acoustic Data

PraatPitchmarker

It computes pitch markers with help of Praat. You need to compile or install Praat in your machine.

It also do corrections for Pitch Marks to align near by Zero Crossing. Configuration Settings:

command - Give absolute path of Praat Executable (Note for Mac OS users: this should be /Applications/Praat.app/Contents/MacOS/Praat)
pmDir - Output Dir Path for Praat Pitch marks
corrPmDir - Output Dir Path for corrected pitch marks (Pitch marks tuned towards Zero Crossing)
maxPitch, minPitch - For choosing Pitch Range (Ex: Male: 50-200 | Female: 150-300)

SnackPitchmarker

It computes pitch markers with help of Snack. For executing this component you need to install Snack in your machine.

It also do corrections for Pitch Marks to align near by Zero Crossing.

Configuration Settings:

command - Give absolute path of Tcl program (tclsh or wish) (Note for ActiveTcl under Mac OS: take care to enter /usr/local/bin/tclsh, not /usr/bin/tclsh. Cf. this thread for details on this pitfall)
pmDir - Output Dir Path for SNACK Pitch marks
maxPitch, minPitch - For choosing Pitch Range (Ex: Male: 50-200 | Female: 150-300)

MCEPMaker

It calculate MFCCs from Speech Wave files, using Edinburgh Speech Tools.

Configuration Settings:

estDir - Edinburgh Speech Tools Compiled Directory
pmDir - Praat Pitch marks Directory
corrPmDir - Corrected Pitch marks Directory
mcepDir - Output Dir for MFCCs

2. Support for Transcription Conversion

Festvox2MaryTranscripts

This Component supports user to convert Festvox Transcription format (ex: txt.done.data) to MARY Supportable format. MARY contains individual text files for each wave file. All Voice Import Components use Transcription from MARY Format. So This component is very useful, if user have Transcription in Festvox format.

Configuration Settings:

transcriptFile - Festvox format transcription file (Absolute path)

Mary2FestvoxTranscripts

It supports user to convert MARY Supportable format to Festvox format Transcription. It does reciprocal process to above component.

Configuration Settings:

transcriptFile - Output Festvox format transcription file (Absolute path)

3. Feature Vector Extraction from Text Data

PhoneUnitFeatureComputer

PhoneUnitFeatureComputer computes Phone feature vectors for Unit Selection Voice building process.

Note: This module requires a running Maryserver from MARY Installation.

You can connect to a different server by altering the settings. See the settings help for more information on this. What type of features computed is depends on configuration file called "targetfeatures.config". This configuration file is in Marybase/conf/ directory and directs Server to compute feature vectors.

Configuration Settings:

featureDir - Output Directory to place computed Phone feature vectors (Absolute path)
maryServerHost - Server Name
maryServerPort - Socket Port number (Default 59125)

HalfPhoneUnitFeatureComputer

This component also same as above component. But It computes Half phone level feature vectors. Here "halfphone-targetfeatures.config" file, which is in Marybase/conf/ directory directs Server to compute Half-Phone level feature vectors.

Configuration Settings:

featureDir - Output Directory to place computed Half-Phone feature vectors (Absolute path)
maryServerHost - Server Name
maryServerPort - Socket Port number (Default 59125)

4. Automatic Labeling

EHMMLabeler

EHMM Labeler is a labeling tool, which generates label files with help of Wave files and corresponding Transcriptions. EHMM basic tool is available with Festvox Recent Version. For running EHMM Labeler under MARY environment you need to compile EHMM tool in your machine. It may take long time depending on the size of the data and system configuration.

EHMMLabeler Supports:

Database labeling with Force alignment by Training with Flat-Start Initialization
Database Labeling with Force alignment by Training with initialized models (Re-Training)
Database Labeling with Force alignment by already existed models (Decoding only)

Configuration Settings:

ehmmDir - EHMM basic package compilation Directory.
eDir - Directory name (Absolute path) to copy Transcription (in ehmm Supported format) and to store ehmm model.
featureDir - Feature vectors Directory path, where phone features vectors were computed. (To get phone sequence)
startEhmmModelDir - Already existing EHMM model Directory path to Initialize EHMM models (for Re-training or Decoding)
reTrainFlag - (true | false) true - Do re-training by initializing with given models. false - Do just Decoding
outputLabDir - Dir. Path to store generated Labels

Note: It may be necessary to run the LabelPauseDeleter after the label files have been created by EHMM, to avoid problems with subsequent voice building components.

Automatic Labeling using Sphinx Tools:

SphinxLabelingPreparator, SphinxTrainer and SphinxLabeler Components used to do Automatic Labeling with Sphinx tools. These 3 components need SphinxTrain, Sphinx Decoder and Edinburgh Speech Tools for training models and Force alignment.

SphinxLabelingPreparator

This Component prepares the required setup needed for SphinxTrain to train Models.

Configuration Settings:

estDir - Edinburgh Speech Tools Compiled Directory
maryServerHost - Server Name
maryServerPort - Socket Port number (Default 59125)
sphinxTrainDir - SphinxTrain installation Directory
stDir - Directory name (Absolute path) to copy Dictionaries and Temp. files (in Sphinx Supported format).
transcriptFile - Festvox format transcription file (Absolute path)

SphinxTrainer

It trains models required for labeling using Sphinxtrain. It may take long time depending on the size of the data and system configuration.

Configuration Settings:

stDir - Absolute path of directory where all Dictionaries and Temp. files stored by SphinxLabelingPreparator.

SphinxLabeler

It produces labels with the help of the models built by the SphinxTrainer. It uses Sphinx-2 Decoder for force alignment.

Configuration Settings:

sphinx2Dir - Sphinx-2 Installation directory absolute path.
stDir - Absolute path of directory where all Dictionaries, Temp. files and models stored by SphinxLabelingPreparator and SphinxTrainer.

MRPALabelConverter

If you have labeled data in the Festvox format and using the MRPA-Phoneset, use this module to convert the phones into the phoneset used by Mary.

Configuration Settings:

mrpaLabDir - MRPA Label file directory

5. Label or Pause Correction and Label-Feature Alignment

LabelledFilesInspector

It allows user to browse through aligned labels and listen to the corresponding wave file. It is useful for perceptual manual verification on alignment.

Configuration Settings:

corrPmDir - Directory Path for corrected pitch marks.

PhoneUnitLabelComputer and HalfPhoneUnitLabelComputer

These components converts the label files into the label files used by Mary. PhoneUnitLabelComputer produces phone labels, HalfPhoneUnitLabelComputer produces halfphone labels. User need both to build the voice.

Configuration Settings:

labelDir - Output phone label dir. path for PhoneUnitLabelComputer.

Output Half phone label dir. path HalfPhoneUnitLabelComputer.

PhoneLabelFeatureAligner

It tries to align the labels and the feature vectors. If alignment fails, you can start the automatic pause correction.

This works as follows:

pauses, that are in the label file but not in the feature file are deleted in the label file, and the durations of the previous and next labels are stretched.
pauses that are in the feature file but not in the label file are inserted into the label file with length zero.

If there are still errors after the pause correction, you are prompted for each error. You can skip the error or remove the corresponding file from the basename list (the list of files that are used for your voice). "skip all" and "remove all" does this for all problematic files. "Edit unit labels" allows you to edit the label file. "Edit RAWMARYXML" lets you edit the maryxml that is the input for computing the features. You have to have a Maryserver running in order to recompute the features from the maryxml. You can alter the host and port settings for the server by altering the settings for the UnitFeatureComputer.

Configaration Settings:

featureDir - Phone feature vectors directory
labDir - Phone Labels directory

HalfPhoneLabelFeatureAligner

It also works same as PhoneLabelFeatureAligner, but it works for halfphone units case.

Configuration Settings:

featureDir - Half Phone feature vectors directory
labDir - Half Phone Labels directory

QualityControl

Quality Control Component for Voice Import Tool to perform 'Sensibility check' on Data. It identifies some suspicious labels in label files generated from Automatic Labeling. And also it gives a cost associated for each wave file. High Cost wave file file need to go for manual labeling with High Priority.

Configuration Settings:

featureDir - directory containing the phone features.
labelDir - directory containing the phone labels
markFricativeHighFreqEnergy - if true, Mark High-Frequency Energy for a Fricative is very low
markHighSILEnergy - if true, Mark Higher Silence Energy
markUnusuallyLongPhone - if true, Mark Unusually long Phone
markUnvoicedVowel - if true, Unvoiced Vowels
outPriorityFile - Output file which shows sorted suspicious aligned basenames according to a priority
outputFile - Output file which shows suspicious alignments

6. Basic Data Files

Following components will create basic binary files, which contain whole voice database. So that it is easier and faster to access Database. These files are needed for various voice building steps and for synthesis.

WaveTimelineMaker

The WaveTimelineMaker split the waveforms as datagrams to be stored in a timeline in Mary format. It produces a binary file, which contains all wave files.

Configuration Settings:

corrPmDir - Directory Path for corrected pitch marks.
WaveTimeline - file containing all wave files. Will be created by this module

BasenameTimelineMaker

The BasenameTimelineMaker takes a database root directory and a list of basenames, and associates the basenames with absolute times in a timeline in Mary format.

Configuration Settings:

pmDir - Directory containing the pitchmarks
timelineFile - file containing the list of files and their times, which will be created by this module.

MCepTimelineMaker

The MCepTimelineMaker takes a database root directory and a list of basenames, and converts the related wav files into a mcep timeline in Mary format.

Configuration Settings:

mcepDir - directory containing the mcep files
mcepTimeline - file containing all mcep files. Will be created by this module

7. Building acoustic models

PhoneUnitfileWriter

It produces a file containing all phone sized units.

Configuration Settings:

corrPmDir - Directory containing the corrected pitchmarks
labelDir - Directory containing the phone labels
unitFile - File containing all phone units. Will be created by this module

PhoneFeatureFileWriter

It produces a file containing all the target cost features for the phone sized units. The module needs a file defining which features are to be used and what weights are given to them. They must be the same features as the ones that the PhoneFeatureComputer used. If you do not have a feature definition, the module tries to create one.

For more information, see the example file: Marybase/lib/modules/import/examples/PhoneUnitFeatureDefinition.txt

Configuration Settings:

featureDir - directory containing the phone features
featureFile - file containing all phone units and their target cost features.Will be created by this module
unitFile - file containing all phone units
weightsFile - file containing the list of phone target cost features, their values and weights

DurationCARTTrainer

It builds an acoustic model of durations in the database using the program "wagon" from the Edinburgh Speech tools.

Configuration Settings:

durTree - file containing the duration CART. Will be created by this module
estDir - directory containing the local installation of the Edinburgh Speech Tools
featureDir - directory containing the phonefeatures
featureFile - file containing all phone units and their target cost features
labelDir - directory containing the phone labels
stepwiseTraining - "false" or "true"
unitFile - file containing all phone units
waveTimeline - file containing all wave files

F0CARTTrainer

It builds acoustic models of F0 like DurationCARTTrainer. It uses "wagon" and the files produced by PhoneUnitfileWriter and PhoneFeatureFileWriter.

Configuration Settings:

estDir - directory containing the local installation of the Edinburgh Speech Tools
f0LeftTreeFile - file containing the left f0 CART. Will be created by this module
f0MidTreeFile - file containing the middle f0 CART. Will be created by this module
f0RightTreeFile - file containing the right f0 CART. Will be created by this module
featureDir - directory containing the phonefeatures
featureFile - file containing all phone units and their target cost features
labelDir - directory containing the phone label files
stepwiseTraining - "false" or "true"
unitFile - file containing all phone units
waveTimeline - file containing all wave files

8. Unit Selection

HalfPhoneUnitfileWriter

It produces a file containing all halfphone sized units.

Configuration Settings:

corrPmDir - directory containing the corrected pitchmarks
labelDir - directory containing the halfphone labels
unitFile - file containing all halfphone units. Will be created by this module

HalfPhoneFeatureFileWriter

It produces a file containing all the target cost features for the phone sized units. The module needs a file defining which features are to be used and what weights are given to them. They must be the same features as the ones that the HalfPhoneFeatureComputer used. If you do not have a feature definition, the module tries to create one.

For more information, see the example file: Marybase/lib/modules/import/examples/HalfPhoneUnitFeatureDefinition.txt

Configuration Settings:

featureDir - directory containing the halfphone features
featureFile - file containing all halfphone units and their target cost features.Will be created by this module
unitFile - file containing all halfphone units
weightsFile - file containing the list of halfphone target cost features, their values and weights

JoinCostFileMaker

It produces a file containing all the join cost features for the halfphone sized units.

Configuration Settings:

joinCostFile - file containing all halfphone units and their join cost features. Will be created by this module
mcepDir - directory containing the mcep files
mcepTimeline - file containing all mcep files
unitFile - file containing all halfphone units
weightsFile - file containing the list of join cost weights and their weights

AcousticFeatureFileWriter

It produces a file containing all the target cost features plus two acoustic target cost features for the halfphone sized units. Also produces a feature definition containing those features.

acFeatDef - file containing the list of phone target cost features, their values and weights
acFeatureFile - file containing all halfphone units and their target cost features plus the acoustic target cost features. Will be created by this module.
featureFile - file containing all halfphone units and their target cost features
unitFile - file containing all halfphone units
waveTimeLine - file containing all wave files

CARTBuilder

It builds a preselection tree for the target cost features using "wagon" (CART) from the Edinburgh Speech tools.

Additionally, User need to specify either a feature sequence or a top level tree. They are used to built a basic tree that is extendend by wagon. This way, wagon runs several times on smaller subsets of units rather than the whole set. It might still take some time to run this module.

Feature sequence: A file containing a list of features for which to build the tree.
Top level tree: A file containing the basic tree.

For more information on these two possibilities of specifying the basic tree, see the example files in Marybase/lib/modules/import/examples/

If you give the CARTBuilder neither a feature sequence nor a top level tree file, a default feature sequence is created which only contains "mary_phoneme" as feature. If the basic tree contains leaves that are contain more units than the maximum number of units allowed, the leaves are pruned and a warning message is printed. It is recommended that you make sure that there are no leaves that are too big.

Configuration Settings:

acFeatureFile - file containing all halfphone units and their target cost features plus the acoustic target cost features
cartFile - file containing the preselection CART. Will be created by this module
estDir - directory containing the local installation of the Edinburgh Speech Tools
featureSeqFile - file containing the feature sequence for the basic tree
maxLeafSize - the maximum number of units in a leaf of the basic tree
mcepTimeline - file containing the mcep files
readFeatureSequence - if "true", basic tree is read from feature sequence file; if "false", basic tree is read from top level tree file.
topLevelTreeFile - file containing the basic tree
unitFile - file containing all halfphone units

CARTPruner

It prunes the preselection tree and this module also removes outliers from the preselection tree.

Configuration Settings:

cartFile - file containing the preselection CART
prunedCartFile - file containing the pruned preselection CART. Will be created by this module
unitFeatureFile - file containing all halfphone units and their target cost features
unitFile - file containing all halfphone units
waveFile - file containing all wave files

9. Installation of New Voice in to MARY

VoiceInstaller

It supports the built voice installation in to MARY automatically. It copies all the necessary files to a new subdirectory in the lib/voices/ directory of your Mary installation. Furthermore, a file that specifies the properties of the voice is created and stored in the conf/ directory of your Mary installation. Next time you start the Mary server, the voice is loaded.

Configuration Settings:

cartFile - file containing the preselection CART
durTree - file containing the duration CART
exampleText - file containing example text (for limited domain voices only)
f0LeftTree - file containing the left f0 CART
f0MidTree - file containing the mid f0 CART
f0RightTree - file containing the right f0 CART
halfPhoneFeatDefAc - file containing the list of halfphone target cost features, their values and weights
halfPhoneFeatsAc - file containing all halfphone units and their target cost featuresplus the acoustic target cost features
halfPhoneUnits - file containing all halfphone units
joinCostFeatDef - file containing the list of join cost weights and their weights
joinCostFeats - file containing all halfphone units and their join cost features
phoneFeatDef - file containing the list of phone target cost features, their values and weights
waveTimeline - file containing all wave files

There is experimental support for FD-PSOLA based synthesis; this can be enabled by setting the concatenatorClass property to FdpsolaUnitConcatenator.

There is also experimental support for HNM based synthesis. To build an HNM voice, make sure the line containing

marytts.tools.voiceimport.HnmTimelineMaker

is enabled (not commented out with #) in the importMain.config file, then run the voice import tools and run the HnmTimelineMaker in addition to the other components. Install the HNM voice by changing the waveTimeline property to point to the HNM timeline file (timeline_hnm.mry be default), and set the properties concatenatorClass and timelineReaderClass to HnmUnitConcatenator and HnmTimelineReader, respectively.

Download in other formats:

Plain Text