Version 13 (modified by sach01, 15 years ago) (diff) |
---|
Transcription Tool
MARY Transcription Tool, a graphical user interface, supports a semi-automatic procedure for transcribing new language text corpus and automatic training of Letter-to-sound(LTS) rules for that language. It stores all functional words in that language to build a primitive POS tagger.
Requirements:
- Prepare phoneset for your language
Example for locale en-US : http://mary.opendfki.de/wiki/TranscriptionTool/allophones_en-US.xml
- Acceptable input formats (input from file)
Example 1: List of words
Live Item Top Eintracht Spieltags Hannover sechsundneunzig Borussia Arminia
Example 2: List of words and transcriptions for few words
Live 'laIf Item Top 'tOp Eintracht '?aIn-tRaxt Spieltags 'Spi:l-ta:ks Hannover sechsundneunzig Borussia bo:-'RU_si:-a: Arminia
- Another acceptable input format:
To load 'WordList' from MySQL table (3 columns: ID, word, frequency (word frequency in text corpus))
mysql> desc en_US_wordList; +-----------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------+------------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | word | varchar(255) | NO | | | | | frequency | int(10) unsigned | NO | | | | +-----------+------------------+------+-----+---------+----------------+ mysql> SELECT * from en_US_wordList WHERE id <= 5; +----+-------------+-----------+ | id | word | frequency | +----+-------------+-----------+ | 1 | treason | 15 | | 2 | indignation | 2 | | 3 | Oilinvest | 1 | | 4 | helgu | 1 | | 5 | perentie | 1 | +----+-------------+-----------+
How to run?
Run below commands through Shell script:
export MARY_BASE="[PATH TO MARYBASE]" java -Xmx1024m -classpath $MARY_BASE/java:$MARY_BASE/java/mary-common.jar:\ $MARY_BASE/java/log4j-1.2.8.jar:$MARY_BASE/java/weka.jar:\ $MARY_BASE/java/mysql-connector-java-5.1.7-bin.jar\ -Djava.endorsed.dirs=$MARYBASE/lib/endorsed\ marytts.tools.transcription.TranscriptionGUI
or
export MARY_BASE="[PATH TO MARYBASE]" java -Xmx1024m -jar java/transcription-tool.jar
How to load data?
Once you specify phoneset (allophones.xml) file using menu item 'specify phoneset', you will be able to load data.
- To load data from file: (The file should be in above specified formats)
File -> Open
- To load data from MySQL table:
File -> Load from MySQL Database
It will provide you a panel (as shown in below figure) to know MySQL database and table details:
- You have to provide HostName, DataBase name, Table name as well as database privileges (MySQL username and password).
- The 'WordList' table should be in the above specified format.
How to predict pronunciation for new words?
If you transcribe some data, click 'trainpredict' button to predict transcription for remaining words. you have to manually correct predicted transcription.
Note: The prediction from 'TrainPredit' depends on the amount of manually corrected data. If you have more manually corrected data, the prediction from 'TrainPredit' will be good.
Description of colors
- Black - Manually written or manually corrected
- Red - Transcription syntax is not permitted (eg: user written transcription is incorrect)
- LightGray - Predicted by 'TrainPredict' (predicted based on manual transcription)
Attachments (2)
- MySqlDetails.png (12.8 KB) - added by sach01 15 years ago.
- Transcription-Tool.png (73.7 KB) - added by sach01 15 years ago.
Download all attachments as: .zip