= Transcription Tool =

MARY Transcription Tool, a graphical user interface, supports a semi-automatic procedure  for transcribing new language text corpus and automatic training of Letter-to-sound(LTS) rules for that language.  It stores all functional words in that language to build a primitive POS tagger.


                                                                [[Image(Transcription-Tool.png)]]


== Requirements: ==
 

 1. Prepare phoneset for your language

 ''' Example for locale en-US : ''' [http://mary.opendfki.de/wiki/TranscriptionTool/allophones_en-US.xml]

 2. Acceptable input formats (input from file)
 
 Example 1: List of words


{{{
Live
Item
Top
Eintracht
Spieltags
Hannover
sechsundneunzig
Borussia
Arminia
}}}

 Example 2: List of words and transcriptions for few words

{{{
Live 'laIf
Item
Top 'tOp
Eintracht '?aIn-tRaxt
Spieltags 'Spi:l-ta:ks
Hannover 
sechsundneunzig 
Borussia bo:-'RU_si:-a:
Arminia
}}}

 3. Another acceptable input format: 
        To load 'WordList' from MySQL table (3 columns: ID, word, frequency (word frequency in text corpus))
{{{
mysql> desc en_US_wordList;
+-----------+------------------+------+-----+---------+----------------+
| Field     | Type             | Null | Key | Default | Extra          |
+-----------+------------------+------+-----+---------+----------------+
| id        | int(11)          | NO   | PRI | NULL    | auto_increment | 
| word      | varchar(255)     | NO   |     |         |                | 
| frequency | int(10) unsigned | NO   |     |         |                | 
+-----------+------------------+------+-----+---------+----------------+

mysql> SELECT * from en_US_wordList WHERE id <= 5; 
+----+-------------+-----------+
| id | word        | frequency |
+----+-------------+-----------+
|  1 | treason     |        15 | 
|  2 | indignation |         2 | 
|  3 | Oilinvest   |         1 | 
|  4 | helgu       |         1 | 
|  5 | perentie    |         1 | 
+----+-------------+-----------+
}}}


== How to run? ==

 Run below commands through Shell script:
{{{

export MARY_BASE="[PATH TO MARYBASE]"

java -Xmx1024m -classpath $MARY_BASE/java/:\
$MARY_BASE/java/swing-layout-1.0.jar:\
$MARY_BASE/java/mary-common.jar:\
$MARY_BASE/java/commons-lang-2.4.jar:\
$MARY_BASE/java/mwdumper-2008-04-13.jar:\
$MARY_BASE/java/log4j-1.2.15.jar:$MARY_BASE/java/weka.jar:\
$MARY_BASE/java/mysql-connector-java-5.1.7-bin.jar\
 -Djava.endorsed.dirs=$MARYBASE/lib/endorsed\
 marytts.tools.transcription.TranscriptionGUI

}}}

 or 
{{{

export MARY_BASE="[PATH TO MARYBASE]"

java -Xmx1024m -jar java/transcription-tool.jar

}}}

== How to load data? ==

Once you specify phoneset (allophones.xml) file using menu item 'specify phoneset', you will be able to load data. 

 1. To load data from file: (The file should be in above specified formats)
{{{
File -> Open
}}}
  
 2. To load data from MySQL table:  
{{{
File -> Load from MySQL Database
}}}

   It will provide you a panel (as shown in below figure) to know MySQL database and table details:
   [[Image(MySqlDetails.png)]]
     
   * You have to provide HostName, DataBase name, Table name as well as database privileges (MySQL username and password). 
   * The 'WordList' table should be in the above specified format. 


== How to predict pronunciation for new words? ==

If you transcribe some data, click 'trainpredict' button to predict transcription for remaining words. you have to manually correct predicted transcription.

'''Note:''' The prediction from 'TrainPredit' depends on the amount of manually corrected data. If you have more manually corrected data, the prediction from 'TrainPredit' will be good. 

== Description of colors ==

  * Black	-  Manually written or manually corrected
  * Red	  	-  Transcription syntax is not permitted (eg: user written transcription is incorrect)
  * LightGray	-  Predicted by 'TrainPredict' (predicted based on manual transcription)