= Transcription Tool = MARY Transcription Tool, a graphical user interface, supports a semi-automatic procedure for transcribing new language text corpus and automatic training of Letter-to-sound(LTS) rules for that language. It stores all functional words in that language to build a primitive POS tagger. == Requirements: == 1. Prepare phoneset for your language ''' Example for locale en-US : ''' [http://mary.opendfki.de/wiki/TranscriptionTool/allophones_en-US.xml] 2. Acceptable input formats (input from file) Example 1: List of words {{{ Live Item Top Eintracht Spieltags Hannover sechsundneunzig Borussia Arminia }}} Example 2: List of words and transcriptions for few words {{{ Live 'laIf Item Top 'tOp Eintracht '?aIn-tRaxt Spieltags 'Spi:l-ta:ks Hannover sechsundneunzig Borussia bo:-'RU_si:-a: Arminia }}} Example 3: Load from MySQL table (3 columns: ID, word, frequency (word frequency in text corpus)) {{{ mysql> desc en_US_wordList; +-----------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------+------------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | word | varchar(255) | NO | | | | | frequency | int(10) unsigned | NO | | | | +-----------+------------------+------+-----+---------+----------------+ }}} == How to run? == Run below commands through Shell script: {{{ export MARY_BASE="[PATH TO MARYBASE]" java -Xmx1024m -classpath $MARY_BASE/java:$MARY_BASE/java/mary-common.jar:\ $MARY_BASE/java/log4j-1.2.8.jar:$MARY_BASE/java/weka.jar:\ $MARY_BASE/java/mysql-connector-java-5.1.7-bin.jar\ -Djava.endorsed.dirs=$MARYBASE/lib/endorsed\ marytts.tools.transcription.TranscriptionGUI }}} {{{ #!html

Instructions for Transcription Tool

  1. Specify phoneset (allophones.xml) file using menu item 'specify phoneset'
  2. Load your data for transcription
    (two options: 1. Open (MenuItem) 2. Load from MySql database (list of words as a table) )
  3. Save your data to a specified location. (then 'TrainPredict' button will be enable)
  4. If you transcribe some data, click 'trainpredict' button to predict transcription for remaining words. you have to manually correct predicted transcription.
  5. Description of colors:
}}}