= Transcription Tool = MARY Transcription Tool, a graphical user interface, supports a semi-automatic procedure for transcribing new language text corpus and automatic training of Letter-to-sound(LTS) rules for that language. It stores all functional words in that language to build a primitive POS tagger. [[Image(Transcription-Tool.png)]] == Requirements: == 1. Prepare phoneset for your language ''' Example for locale en-US : ''' [http://mary.opendfki.de/wiki/TranscriptionTool/allophones_en-US.xml] 2. Acceptable input formats (input from file) Example 1: List of words {{{ Live Item Top Eintracht Spieltags Hannover sechsundneunzig Borussia Arminia }}} Example 2: List of words and transcriptions for few words {{{ Live 'laIf Item Top 'tOp Eintracht '?aIn-tRaxt Spieltags 'Spi:l-ta:ks Hannover sechsundneunzig Borussia bo:-'RU_si:-a: Arminia }}} 3. Another acceptable input format: To load 'WordList' from MySQL table (3 columns: ID, word, frequency (word frequency in text corpus)) {{{ mysql> desc en_US_wordList; +-----------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------+------------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | word | varchar(255) | NO | | | | | frequency | int(10) unsigned | NO | | | | +-----------+------------------+------+-----+---------+----------------+ mysql> SELECT * from en_US_wordList WHERE id <= 5; +----+-------------+-----------+ | id | word | frequency | +----+-------------+-----------+ | 1 | treason | 15 | | 2 | indignation | 2 | | 3 | Oilinvest | 1 | | 4 | helgu | 1 | | 5 | perentie | 1 | +----+-------------+-----------+ }}} == How to run? == Run below commands through Shell script: {{{ export MARY_BASE="[PATH TO MARYBASE]" java -Xmx1024m -classpath $MARY_BASE/java/:\ $MARY_BASE/java/swing-layout-1.0.jar:\ $MARY_BASE/java/mary-common.jar:\ $MARY_BASE/java/commons-lang-2.4.jar:\ $MARY_BASE/java/mwdumper-2008-04-13.jar:\ $MARY_BASE/java/log4j-1.2.15.jar:$MARY_BASE/java/weka.jar:\ $MARY_BASE/java/mysql-connector-java-5.1.7-bin.jar\ -Djava.endorsed.dirs=$MARYBASE/lib/endorsed\ marytts.tools.transcription.TranscriptionGUI }}} or {{{ export MARY_BASE="[PATH TO MARYBASE]" java -Xmx1024m -jar java/transcription-tool.jar }}} == How to load data? == Once you specify phoneset (allophones.xml) file using menu item 'specify phoneset', you will be able to load data. 1. To load data from file: (The file should be in above specified formats) {{{ File -> Open }}} 2. To load data from MySQL table: {{{ File -> Load from MySQL Database }}} It will provide you a panel (as shown in below figure) to know MySQL database and table details: [[Image(MySqlDetails.png)]] * You have to provide HostName, DataBase name, Table name as well as database privileges (MySQL username and password). * The 'WordList' table should be in the above specified format. == How to predict pronunciation for new words? == If you transcribe some data, click 'trainpredict' button to predict transcription for remaining words. you have to manually correct predicted transcription. '''Note:''' The prediction from 'TrainPredit' depends on the amount of manually corrected data. If you have more manually corrected data, the prediction from 'TrainPredit' will be good. == Description of colors == * Black - Manually written or manually corrected * Red - Transcription syntax is not permitted (eg: user written transcription is incorrect) * LightGray - Predicted by 'TrainPredict' (predicted based on manual transcription)