Version 12 (modified by sach01, 17 years ago) (diff) |
---|
Voice Import Tools Tutorial : How to build a new Voice with Voice Import Tools
This Tutorial explains the procedure to build a new voice with Voice Import Tools (VIT) under MARY Environment.
Voice Import Tool is a Graphical User Interface(GUI), which contains a set of Voice Import Components and helps the user to build new voices under MARY(Modular Architecture for Research in speech sYnthesis) Environment. This GUI Tool designing is primarily aims to build new voices very easily by any user with out knowing much technical details of Speech Synthesis.
Currently, Voice Import Tool supports following categories mainly:
- Feature Extraction from Acoustic Data
- Feature Vector Extraction from Text Data
- Automatic Labeling
- Unit Selection
- Voice Installation to MARY
Requirements Needed:
- Operating System - Linux (Recommended)
- MARY TTS Recent Version - Download Link: http://mary.dfki.de/Download
- Openmary - SVN from http://mary.opendfki.de
(we also able to use Windows also, if we can able to compile properly the following dependent tools.)
Dependendent Tools:
- Praat Pitch Marker or Snack - For pitch marks
Download Link for praat : http://www.fon.hum.uva.nl/praat
- Edinburgh Speech Tools Library – For MFCCs and Wagon (CART)
Download Link for Speech Tools: http://www.cstr.ed.ac.uk/projects/speech_tools/
- EHMM or Sphinx – For Automatic Labeling
EHMM is available with festvox-2.1 (Recent Version) - http://festvox.org/download.html
Sphinx - http://cmusphinx.sourceforge.net/webpage/html/download.php
Voice Import Components:
Following Components are available with Voice Import Components:
- PraatPitchmarker
- SnackPitchmarker
- MCEPMaker
- Festvox2MaryTranscripts
- Mary2FestvoxTranscripts
- PhoneUnitFeatureComputer
- HalfPhoneUnitFeatureComputer
- EHMMLabeler
- SphinxLabelingPreparator
- SphinxTrainer
- SphinxLabeler
- MRPALabelConverter
- HalfPhoneUnitfileWriter
- HalfPhoneFeatureFileWriter
- JoinCostFileMaker
- AcousticFeatureFileWriter
- CARTBuilder
- CARTPruner
- VoiceInstaller
How to run?
- First you need to have following 2 basic requirements for Voice Building
- Wave files
- Corresponding Transcription (in MARY or Festival Format)
- Create a new Voice Building Directory
- Put all Wave files in "wav" directory
- Run below commands through Shell script from Voice Building Directory.
export MARY_BASE="/path/to/mary" java -Xmx1024m -classpath $MARY_BASE/java:$MARY_BASE/java/mary-common.jar: \ $MARY_BASE/java/signalproc.jar:$MARY_BASE/java/freetts.jar:$MARY_BASE/java/jsresources.jar: \ $MARY_BASE/java/log4j-1.2.8.jar -Djava.endorsed.dirs=$MARYBASE/lib/endorsed \ de.dfki.lt.mary.unitselection.voiceimport.DatabaseImportMain
GUI is looking like below (Which supports voice building):
When you are running first time above shell script, It asks you some basic configuration settings. Global Configuration Settings window looks like below:
Global Configuration Settings:
Domain - general or limited
Gender - male or female
Locale - which specifies language of domain (de - Deutsch or en - English)
(Currently, MARY supporting 2 language only: 1. Deutsch 2. English)
Marybase - MARY Installation Directory (Global Path)
Rootdir - Voice Building Directory (Global Path)
Wavdir - Where we can store Wave files
Textdir - Where we can store corresponding Transcriptions
Each and Every Component also contains Configuration Settings. We recommended to give Absolute Paths for Configuration Settings. These config. settings are arguments to components to perform corresponding task.
Simplest way of Using Voice Import Components:
- Give Config. Settings for Each and Every Component.
- Tick mark all components
- Click RUN button
It can complete all tasks in sequential manner.
But No need to use all components for Building a New Voice.
For Example: For Automatic Labeling we can choose EHMM or Sphinx.
Explanation on Individual Voice Import Components
Feature Extraction from Acoustic Data
PraatPitchmarker
It computes pitch markers with help of Praat. You need to compile or install Praat in your machine.
It also do corrections for Pitch Marks to align near by Zero Crossing. Configuration Settings:
- command - Give Absolute path of Praat Executable
- pmDir - Output Dir Path for Praat Pitch marks
- corrPmDir - Output Dir Path for corrected pitch marks (Pitch marks tuned towards Zero Crossing)
- maxPitch, minPitch - For choosing Pitch Range (Ex: Male: 50-200 | Female: 150-300)
MCEPMaker
It calculate MFCCs from Speech Wave files, using Edinburgh Speech Tools.
Configuration Settings:
- estDir - Edinburgh Speech Tools Compiled Directory
- pmDir - Praat Pitch marks Directory
- corrPmDir - Corrected Pitch marks Directory
- mcepDir - Output Dir for MFCCs
Support for Transcription Conversion
Festvox2MaryTranscripts
This Component supports user to convert Festvox Transcription format (ex: txt.done.data) to MARY Supportable format. MARY contains individual text files for each wave file. All Voice Import Components use Transcription from MARY Format. So This component is very useful, if user have Transcription in Festvox format.
Configuration Settings:
- transcriptFile - Festvox format transcription file (Absolute path)
Mary2FestvoxTranscripts
It supports user to convert MARY Supportable format to Festvox format Transcription. It does reciprocal process to above component.
Configuration Settings:
- transcriptFile - Output Festvox format transcription file (Absolute path)
Feature Vector Extraction from Text Data
PhoneUnitFeatureComputer
PhoneUnitFeatureComputer computes Phone feature vectors for Unit Selection Voice building process.
- Note: This module requires a running Maryserver from MARY Installation.
You can connect to a different server by altering the settings. See the settings help for more information on this. What type of features computed is depends on configuration file called "targetfeatures.config". This configuration file is in Marybase/conf/ directory and directs Server to compute feature vectors.
Configuration Settings:
- featureDir - Output Directory to place computed Phone feature vectors (Absolute path)
- maryServerHost - Server Name
- maryServerPort - Socket Port number (Default 59125)
HalfPhoneUnitFeatureComputer
This component also same as above component. But It computes Half phone level feature vectors. Here "halfphone-targetfeatures.config" file, which is in Marybase/conf/ directory directs Server to compute Half-Phone level feature vectors.
Configuration Settings:
- featureDir - Output Directory to place computed Half-Phone feature vectors (Absolute path)
- maryServerHost - Server Name
- maryServerPort - Socket Port number (Default 59125)
Automatic Labeling
EHMMLabeler
EHMM Labeler is a labeling tool, which generates label files with help of Wave files and corresponding Transcriptions. EHMM basic tool is available with Festvox Recent Version. For running EHMM Labeler under MARY environment you need to compile EHMM tool in your machine. It may take long time depending on the size of the data and system configuration.
EHMMLabeler Supports:
- Database labeling with Force alignment by Training with Flat-Start Initialization
- Database Labeling with Force alignment by Training with initialized models (Re-Training)
- Database Labeling with Force alignment by already existed models (Decoding only)
Configuration Settings:
- ehmmDir - EHMM basic package compilation Directory.
- eDir - Directory name (Absolute path) to copy Transcription (in ehmm Supported format) and to store ehmm model.
- featureDir - Feature vectors Directory path, where phone features vectors were computed. (To get phone sequence)
- startEhmmModelDir - Already existing EHMM model Directory path to Initialize EHMM models (for Re-training or Decoding)
- reTrainFlag - (true | false) true - Do re-training by initializing with given models. false - Do just Decoding
- outputLabDir - Dir. Path to store generated Labels
Automatic Labeling using Sphinx Tools:
SphinxLabelingPreparator, SphinxTrainer and SphinxLabeler Components used to do Automatic Labeling with Sphinx tools. These 3 components need SphinxTrain, Sphinx Decoder and Edinburgh Speech Tools for training models and Force alignment.
SphinxLabelingPreparator
This Component prepares the required setup needed for SphinxTrain to train Models.
Configuration Settings:
- estDir - Edinburgh Speech Tools Compiled Directory
- maryServerHost - Server Name
- maryServerPort - Socket Port number (Default 59125)
- sphinxTrainDir - SphinxTrain installation Directory
- stDir - Directory name (Absolute path) to copy Dictionaries and Temp. files (in Sphinx Supported format).
- transcriptFile - Festvox format transcription file (Absolute path)
SphinxTrainer
It trains models required for labeling using Sphinxtrain. It may take long time depending on the size of the data and system configuration.
Configuration Settings:
- stDir - Absolute path of directory where all Dictionaries and Temp. files stored by SphinxLabelingPreparator.
SphinxLabeler
It produces labels with the help of the models built by the SphinxTrainer. It uses Sphinx-2 Decoder for force alignment.
Configuration Settings:
- sphinx2Dir - Sphinx-2 Installation directory absolute path.
- stDir - Absolute path of directory where all Dictionaries, Temp. files and models stored by SphinxLabelingPreparator and SphinxTrainer.
MRPALabelConverter
If you have labeled data in the Festvox format and using the MRPA-Phoneset, use this module to convert the phones into the phoneset used by Mary.
Configuration Settings:
- mrpaLabDir - MRPA Label file directory
LabelledFilesInspector
It allows user to browse through aligned labels and listen to the corresponding wave file. It is useful for perceptual manual verification on alignment.
Configuration Settings:
- corrPmDir - Directory Path for corrected pitch marks.
PhoneUnitLabelComputer and HalfPhoneUnitLabelComputer
These components converts the label files into the label files used by Mary. PhoneUnitLabelComputer produces phone labels, HalfPhoneUnitLabelComputer produces halfphone labels. User need both to build the voice.
Configuration Settings:
- labelDir - Output phone label dir. path for PhoneUnitLabelComputer.
Output Half phone label dir. path HalfPhoneUnitLabelComputer.
PhoneLabelFeatureAligner
It tries to align the labels and the feature vectors. If alignment fails, you can start the automatic pause correction.
This works as follows:
- pauses, that are in the label file but not in the feature file are deleted in the label file, and the durations of the previous and next labels are stretched.
- pauses that are in the feature file but not in the label file are inserted into the label file with length zero.
If there are still errors after the pause correction, you are prompted for each error. You can skip the error or remove the corresponding file from the basename list (the list of files that are used for your voice). "skip all" and "remove all" does this for all problematic files. "Edit unit labels" allows you to edit the label file. "Edit RAWMARYXML" lets you edit the maryxml that is the input for computing the features. You have to have a Maryserver running in order to recompute the features from the maryxml. You can alter the host and port settings for the server by altering the settings for the UnitFeatureComputer.
Configaration Settings:
- featureDir - Phone feature vectors directory
- labDir - Phone Labels directory
HalfPhoneLabelFeatureAligner
It also works same as PhoneLabelFeatureAligner, but it works for halfphone units case.
Configuration Settings:
- featureDir - Half Phone feature vectors directory
- labDir - Half Phone Labels directory
Basic Data Files
Following components will create basic binary files, which contain whole voice database. So that it is easier and faster to access Database. These files are needed for various voice building steps and for synthesis.
WaveTimelineMaker
The WaveTimelineMaker split the waveforms as datagrams to be stored in a timeline in Mary format. It produces a binary file, which contains all wave files.
Configuration Settings:
- corrPmDir - Directory Path for corrected pitch marks.
- WaveTimeline - file containing all wave files. Will be created by this module
BasenameTimelineMaker
The BasenameTimelineMaker takes a database root directory and a list of basenames, and associates the basenames with absolute times in a timeline in Mary format.
Configuration Settings:
- pmDir - Directory containing the pitchmarks
- timelineFile - file containing the list of files and their times, which will be created by this module.
MCepTimelineMaker
The MCepTimelineMaker takes a database root directory and a list of basenames, and converts the related wav files into a mcep timeline in Mary format.
Configuration Settings:
- mcepDir - directory containing the mcep files
- mcepTimeline - file containing all mcep files. Will be created by this module
( Under Construction - to continued)
Attachments (2)
- VIC2.jpg (74.6 KB) - added by masc01 15 years ago.
- VIC1.jpg (48.3 KB) - added by masc01 15 years ago.
Download all attachments as: .zip