Version 9 (modified by sach01, 15 years ago) (diff) |
---|
Transcription Tool
MARY Transcription Tool, a graphical user interface, supports a semi-automatic procedure for transcribing new language text corpus and automatic training of Letter-to-sound(LTS) rules for that language. It stores all functional words in that language to build a primitive POS tagger.
Requirements:
- Prepare phoneset for your language
Example for locale en-US : http://mary.opendfki.de/wiki/TranscriptionTool/allophones_en-US.xml
- Acceptable input formats (input from file)
Example 1: List of words
Live Item Top Eintracht Spieltags Hannover sechsundneunzig Borussia Arminia
Example 2: List of words and transcriptions for few words
Live 'laIf Item Top 'tOp Eintracht '?aIn-tRaxt Spieltags 'Spi:l-ta:ks Hannover sechsundneunzig Borussia bo:-'RU_si:-a: Arminia
- Another acceptable input format:
To load 'WordList' from MySQL table (3 columns: ID, word, frequency (word frequency in text corpus))
mysql> desc en_US_wordList; +-----------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------+------------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | word | varchar(255) | NO | | | | | frequency | int(10) unsigned | NO | | | | +-----------+------------------+------+-----+---------+----------------+ mysql> SELECT * from en_US_wordList WHERE id <= 5; +----+-------------+-----------+ | id | word | frequency | +----+-------------+-----------+ | 1 | treason | 15 | | 2 | indignation | 2 | | 3 | Oilinvest | 1 | | 4 | helgu | 1 | | 5 | perentie | 1 | +----+-------------+-----------+
How to run?
Run below commands through Shell script:
export MARY_BASE="[PATH TO MARYBASE]" java -Xmx1024m -classpath $MARY_BASE/java:$MARY_BASE/java/mary-common.jar:\ $MARY_BASE/java/log4j-1.2.8.jar:$MARY_BASE/java/weka.jar:\ $MARY_BASE/java/mysql-connector-java-5.1.7-bin.jar\ -Djava.endorsed.dirs=$MARYBASE/lib/endorsed\ marytts.tools.transcription.TranscriptionGUI
or
export MARY_BASE="[PATH TO MARYBASE]" java -Xmx1024m -jar java/transcription-tool.jar
Instructions for Transcription Tool
- Specify phoneset (allophones.xml) file using menu item 'specify phoneset'
- Load your data for transcription
(two options: 1. Open (MenuItem) 2. Load from MySql database (list of words as a table) ) - Save your data to a specified location. (then 'TrainPredict' button will be enable)
- If you transcribe some data, click 'trainpredict' button to predict transcription for remaining words. you have to manually correct predicted transcription.
- Description of colors:
- Black - Manually written or manually corrected
- Red - Transcription syntax is not permitted (eg: user written transcription is incorrect)
- LightGray - Predicted by 'TrainPredict' (predicted based on manual transcription)
Attachments (2)
- MySqlDetails.png (12.8 KB) - added by sach01 15 years ago.
- Transcription-Tool.png (73.7 KB) - added by sach01 15 years ago.
Download all attachments as: .zip