Version 7 (modified by sach01, 17 years ago) (diff) |
---|
Voice Import Tools Tutorial : How to build a new Voice with Voice Import Tools
This Tutorial explains the procedure to build a new voice with Voice Import Tools (VIT) under MARY Environment.
Voice Import Tool is a Graphical User Interface(GUI), which contains a set of Voice Import Components and helps the user to build new voices under MARY(Modular Architecture for Research in speech sYnthesis) Environment. This GUI Tool designing is primarily aims to build new voices very easily by any user with out knowing much technical details of Speech Synthesis.
Currently, Voice Import Tool supports following categories mainly:
- Feature Extraction from Acoustic Data
- Feature Vector Extraction from Text Data
- Automatic Labeling
- Unit Selection
- Voice Installation to MARY
Requirements Needed:
- Operating System - Linux (Recommended)
- MARY TTS Recent Version - Download Link: http://mary.dfki.de/Download
- Openmary - SVN from http://mary.opendfki.de
(we also able to use Windows also, if we can able to compile properly the following dependent tools.)
Dependendent Tools:
- Praat Pitch Marker or Snack - For pitch marks
Download Link for praat : http://www.fon.hum.uva.nl/praat
- Edinburgh Speech Tools Library – For MFCCs and Wagon (CART)
Download Link for Speech Tools: http://www.cstr.ed.ac.uk/projects/speech_tools/
- EHMM or Sphinx – For Automatic Labeling
EHMM is available with festvox-2.1 (Recent Version) - http://festvox.org/download.html
Sphinx - http://cmusphinx.sourceforge.net/webpage/html/download.php
Voice Import Components:
Following Components are available with Voice Import Components:
- PraatPitchmarker
- SnackPitchmarker
- MCEPMaker
- Mary2FestvoxTranscripts
- Festvox2MaryTranscripts
- PhoneUnitFeatureComputer
- HalfPhoneUnitFeatureComputer
- EHMMLabeler
- SphinxLabelingPreparator
- SphinxTrainer
- SphinxLabeler
- MRPALabelConverter
- HalfPhoneUnitfileWriter
- HalfPhoneFeatureFileWriter
- JoinCostFileMaker
- AcousticFeatureFileWriter
- CARTBuilder
- CARTPruner
- VoiceInstaller
How to run?
- First you need to have following 2 basic requirements for Voice Building
- Wave files
- Corresponding Transcription (in MARY or Festival Format)
- Create a new Voice Building Directory
- Put all Wave files in "wav" directory
- Run below commands through Shell script from Voice Building Directory.
export MARY_BASE="/path/to/mary" java -Xmx1024m -classpath $MARY_BASE/java:$MARY_BASE/java/mary-common.jar: \ $MARY_BASE/java/signalproc.jar:$MARY_BASE/java/freetts.jar:$MARY_BASE/java/jsresources.jar: \ $MARY_BASE/java/log4j-1.2.8.jar -Djava.endorsed.dirs=$MARYBASE/lib/endorsed \ de.dfki.lt.mary.unitselection.voiceimport.DatabaseImportMain
GUI is looking like below (Which supports voice building):
When you are running first time above shell script, It asks you some basic configuration settings. Global Configuration Settings window looks like below:
Global Configuration Settings:
Domain - general or limited
Gender - male or female
Locale - which specifies language of domain (de - Deutsch or en - English)
(Currently, MARY supporting 2 language only: 1. Deutsch 2. English)
Marybase - MARY Installation Directory (Global Path)
Rootdir - Voice Building Directory (Global Path)
Wavdir - Where we can store Wave files
Textdir - Where we can store corresponding Transcriptions
Each and Every Component also contains Configuration Settings. We recommended to give Absolute Paths for Configuration Settings. These config. settings are arguments to components to perform corresponding task.
Simplest way of Using Voice Import Components:
- Give Config. Settings for Each and Every Component.
- Tick mark all components
- Click RUN button
It can complete all tasks in sequential manner.
But No need to use all components for Building a New Voice.
For Example: For Automatic Labeling we can choose EHMM or Sphinx.
Explanation on Individual Voice Import Components
PraatPitchmarker
It computes pitch markers with help of Praat. You need to compile or install Praat in your machine.
It also do corrections for Pitch Marks to align near by Zero Crossing.
Configuration Settings:
- command - Give Absolute path of Praat Executable
- pmDir - Output Dir Path for Praat Pitch marks
- corrPmDir - Output Dir Path for corrected pitch marks (Pitch marks tuned towards Zero Crossing)
- maxPitch, minPitch - For choosing Pitch Range (Ex: Male: 50-200 | Female: 150-300)
MCEPMaker
It calculate MFCCs from Speech Wave files, using Edinburgh Speech Tools.
Configuration Settings:
- estDir - Edinburgh Speech Tools Compiled Directory
- pmDir - Praat Pitch marks Directory
- corrPmDir - Corrected Pitch marks Directory
- mcepDir - Output Dir for MFCCs
( Under Construction - to continued)
Attachments (2)
- VIC2.jpg (74.6 KB) - added by masc01 15 years ago.
- VIC1.jpg (48.3 KB) - added by masc01 15 years ago.
Download all attachments as: .zip