= [[span(style=color: #FF0000, NOTE: As of 2012, MaryTTS is maintained at http://github.com/marytts/marytts. These pages are only kept online for historical interest.)]] = Voice Import Tools Tutorial : How to build a new Voice with Voice Import Tools = This Tutorial explains the procedure to build a new voice with Voice Import Tools (VIT) under the MARY Environment. [[Image(VIC1.jpg)]] Voice Import Tool is a Graphical User Interface (GUI), which contains a set of Voice Import Components and helps the user to build new voices under the MARY (Modular Architecture for Research in speech sYnthesis) Environment. This GUI tool aims to simplify the task of building new synthesis voices so that users who do not have detailed technical knowledge of speech synthesis can build their own voices. In a nutshell, the Voice Import Tools cover the following steps in voice building: 1. Feature Extraction from Acoustic Data 2. Feature Vector Extraction from Text Data 3. Automatic Labeling 4. Unit Selection voice building 4. HMM-based voice building 5. Voice Installation to MARY == Requirements == * Operating System - Linux (Recommended) * MARY TTS Recent Version - Download Link: http://mary.dfki.de/Download * Openmary - SVN from http://mary.opendfki.de (You may be able to use other operating systems such as Mac OS X or Windows if you can get the dependencies to work, which are described below) == Dependendent Tools: == * '''Praat''' ''or'' '''Snack''' - For pitch marks On Ubuntu-like systems, install the '''praat''' package. Otherwise, download Praat from http://www.fon.hum.uva.nl/praat On Ubuntu-like systems, install the '''libsnack2''' package. Otherwise, install Tcl and Snack manually. If using ActiveTcl (http://www.activestate.com/activetcl/), be aware that only version 8.4 comes with Snack; however, on 8.5 and newer, Snack can be installed using the command `teacup install snack`. Installation instructions for older systems are available at http://www.speech.kth.se/snack/ * '''Edinburgh Speech Tools''' – For MFCCs and Wagon (CART) On Ubuntu-like systems, install the '''speech-tools''' package. Otherwise, install Speech Tools manually: http://www.cstr.ed.ac.uk/projects/speech_tools/ * '''EHMM''' – For Automatic Labeling EHMM is available in `$MARYBASE/lib/external/ehmm.tar.gz`, or with Festvox version 2.1 or newer: http://festvox.org/download.html Once the EHMM binaries have been built, the path to the EHMM `bin` directory must be provided to MARY. Currently, the easiest way to do this is by running the `check_install_external_programs.sh` script in `$MARYBASE/lib/external` with the `-check` argument, which generates a file called `externalBinaries.config`. This file must provide the path to the EHMM `bin` directory. Alternatively, create this file and (replacing `/path/to/ehmm/bin` with the correct path to the EHMM `bin` directory) insert the line: {{{ external.ehmmPath /path/to/ehmm/bin }}} == Voice Import Components: == Following Components are available with Voice Import Components: * !PraatPitchmarker * !SnackPitchmarker * MCEPMaker * Festvox2MaryTranscripts * Mary2FestvoxTranscripts * !PhoneUnitFeatureComputer * !HalfPhoneUnitFeatureComputer * EHMMLabeler * !LabelledFilesInspector * !PhoneUnitLabelComputer * !PhoneLabelFeatureAligner * !HalfPhoneUnitLabelComputer * !HalfPhoneLabelFeatureAligner * !QualityControl * !HalfPhoneUnitfileWriter * !HalfPhoneFeatureFileWriter * !JoinCostFileMaker * !AcousticFeatureFileWriter * CARTBuilder * CARTPruner * !VoiceInstaller == Step-by-Step Procedure: == First you need to have following 2 basic requirements for Voice Building a. Wave files a. Corresponding Transcription (in MARY or Festvox Format) MARY Format : Each transcription represented by a single file. All these files placed in a single directory. By default, all these files placed in 'text' directory of voice-building directory. Festvox (Festival) Format : A single file contains all transcriptions. For examples see below example. {{{ ( arctic_a0001 "AUTHOR OF THE DANGER TRAIL, PHILIP STEELS, ETC" ) ( arctic_a0002 "Not at this particular case, Tom, apologized Whittemore." ) ( arctic_a0003 "For the twentieth time that evening the two men shook hands." ) ( arctic_a0004 "Lord, but I'm glad to see you again, Phil." ) ( arctic_a0005 "Will we ever forget it." ) }}} === 1. Create a new Voice Building Directory === Create a voice building directory somewhere on your file system, say `/home/me/myvoice`. * Put all Wave files into the `wav` sub-directory of the voice building directory, i.e. `/home/me/myvoice/wav`. You may want to use the Audio converter GUI to make sure that this data is mono, at the right sampling rate, doesn't include overly long initial and final pauses, etc. * Either put the individual text files into the `myvoice/text` subdirectory (if using text files in MARY format), or the single text file in Festvox format into `myvoice/txt.done.data`. If you want to test this but haven't recorded your own voice files yet, one way of getting data to test this is to use the ARCTIC data from CMU. Download and upack http://www.speech.cs.cmu.edu/cmu_arctic/packed/cmu_us_slt_arctic-0.95-release.tar.bz2, and then: * copy or move `cmu_us_slt_arctic/wav` to `myvoice/wav`, including all wav files; * copy or move `cmu_us_slt_arctic/etc/txt.done.data` to `myvoice/txt.done.data`. === 2. Run the voice building tools === Create a simple shell script, `myvoice/import.sh`, with the following content: {{{ export MARY_BASE="/path/to/mary" java -Xmx1024m -jar $MARY_BASE/java/voiceimport.jar }}} Adapt the `/path/to/mary` to match the location of your MARY TTS installation. Run the shell script from the command line (`sh import.sh`) to start the voice building process. When you are running the voice building tools for the first time, tt asks you for some basic configuration settings by presenting a GUI window where you have to enter a few basic settings. Almost all other settings are based on these first settings and set automatically. The global configuration settings window looks roughly like this: [[Image(VIC2.jpg)]] '''Global Configuration Settings''' Domain - general or limited[[BR]] Gender - male or female[[BR]] Locale - which specifies language of domain (de - Deutsch or en - English) [[BR]] (Currently, MARY supports 2 language only: 1. Deutsch 2. English)[[BR]] Marybase - MARY Installation Directory (Global Path)[[BR]] Rootdir - Voice Building Directory (Global Path)[[BR]] Wavdir - Where we can store Wave files [[BR]] Textdir - Where we can store corresponding Transcriptions [[BR]] After clicking the '''Save''' button, you will get to the main window of Voice Import Tools as shown in Screen shot. There you can see a list of modules. A component is executed by ticking the associated checkbox and clicking on '''Run'''. '''Component Configuration Settings''' You can verify and change the settings for each individual component by clicking on the '''wrench symbol''' next to the component. Clicking on "Settings" takes you to the window where you can change the basic settings. In a settings window, you can change the view to the settings of another module or the basic settings via the drop-down menu. Basically, all modules need to be run to import the voice into MARY. For more detailed information, check the general help file - just click on "Help" in the main window. Clicking on help in the settings window opens a help window with details about the displayed settings. It is recommended to give Absolute Paths for individual Configuration Settings. These config settings are arguments to components to perform the corresponding task. The import tool creates two files in the directory where you started it - database.config and importMain.config. database.config contains the values of the settings - you can change the settings also in this file, but be aware that this may cause problems. '''How to run the Voice Import Components''' In an ideal world the process of building a voice would look like this: * Give Config. Settings for Each and Every Component. * Tick mark all components * Click RUN button [[BR]] It will complete all tasks in sequential manner. [[BR]] In the real world, however, the user needs to take a few decisions here and there, so the real-world process is usually a bit more complex than that. For example, pitch marking can be done with either Praat or Snack. When using a pitch marker, you may want to verify that the frequency range is appropriate for your recordings, and adapt the component's config settings before running it again. * If your transcriptions are in Festvox format, it is necessary to choose "''Festvox2MaryTranscripts''" Component. This will convert the transcriptions in Festvox format (`txt.done.data`) to MARY format (`text/*.txt`). Voice Import Tools uses MARY format transcription for building a voice. If you have recorded your voice using Redstart, there is no need to run the "''Mary2FestvoxTranscripts''" component. * ''!PhoneUnitFeatureComputer'' and ''!HalfPhoneUnitFeatureComputer'' need a MARY Server which runs the NLP components for the target locale. This is very important point, since we need it to convert the text of an utterance into a phone sequence to align with the audio data. You need to make sure a Mary server is running while executing the above two Components. * ''!LabelledFilesInspector'' gives a GUI interface to check the results of automatic labeling. It will let you listen to phone segments according to the timestamps from automatic labeling. If you don't want to inspect labeling, there is no need to choose this component. While executing each component, a Progress bar shows the percentage of work completed for that component. A Component is converted to GREEN if that component is executed successfully. It turns RED, and it throws an exception, if that component encounters an error. If you get there, you will need to understand what went wrong, and how it must be fixed. There is no simple recipe for that case. We hope this tutorial helps to build a new '''unit selection voice''' using the Voice Import Tools under the MARY platform. The Individual Voice Import Components are explained [wiki:VoiceImportComponents here]. [[BR]] * [wiki:VoiceImportComponents Explanation on Individual Voice Import Components] An explanation about how to create a new '''HMM-based voice''' using the HMM Voice Import Tools under the MARY platform, can be found [wiki:HMMVoiceCreation here]: * [wiki:HMMVoiceCreation Explanation on how to create HMM-based voices for MARY] - Sathish Chandra Pammi (Sathish.Chandra@dfki.de) and Marc Schröder