wiki:TranscriptionTool

Version 9 (modified by sach01, 14 years ago) (diff)

--

Transcription Tool

MARY Transcription Tool, a graphical user interface, supports a semi-automatic procedure for transcribing new language text corpus and automatic training of Letter-to-sound(LTS) rules for that language. It stores all functional words in that language to build a primitive POS tagger.

Requirements:

  1. Prepare phoneset for your language

Example for locale en-US : http://mary.opendfki.de/wiki/TranscriptionTool/allophones_en-US.xml

  1. Acceptable input formats (input from file)

Example 1: List of words

Live
Item
Top
Eintracht
Spieltags
Hannover
sechsundneunzig
Borussia
Arminia

Example 2: List of words and transcriptions for few words

Live 'laIf
Item
Top 'tOp
Eintracht '?aIn-tRaxt
Spieltags 'Spi:l-ta:ks
Hannover 
sechsundneunzig 
Borussia bo:-'RU_si:-a:
Arminia
  1. Another acceptable input format:

To load 'WordList' from MySQL table (3 columns: ID, word, frequency (word frequency in text corpus))

mysql> desc en_US_wordList;
+-----------+------------------+------+-----+---------+----------------+
| Field     | Type             | Null | Key | Default | Extra          |
+-----------+------------------+------+-----+---------+----------------+
| id        | int(11)          | NO   | PRI | NULL    | auto_increment | 
| word      | varchar(255)     | NO   |     |         |                | 
| frequency | int(10) unsigned | NO   |     |         |                | 
+-----------+------------------+------+-----+---------+----------------+

mysql> SELECT * from en_US_wordList WHERE id <= 5; 
+----+-------------+-----------+
| id | word        | frequency |
+----+-------------+-----------+
|  1 | treason     |        15 | 
|  2 | indignation |         2 | 
|  3 | Oilinvest   |         1 | 
|  4 | helgu       |         1 | 
|  5 | perentie    |         1 | 
+----+-------------+-----------+

How to run?

Run below commands through Shell script:

export MARY_BASE="[PATH TO MARYBASE]"

java -Xmx1024m -classpath $MARY_BASE/java:$MARY_BASE/java/mary-common.jar:\
$MARY_BASE/java/log4j-1.2.8.jar:$MARY_BASE/java/weka.jar:\
$MARY_BASE/java/mysql-connector-java-5.1.7-bin.jar\
 -Djava.endorsed.dirs=$MARYBASE/lib/endorsed\
 marytts.tools.transcription.TranscriptionGUI

or

export MARY_BASE="[PATH TO MARYBASE]"

java -Xmx1024m -jar java/transcription-tool.jar

Instructions for Transcription Tool

  1. Specify phoneset (allophones.xml) file using menu item 'specify phoneset'
  2. Load your data for transcription
    (two options: 1. Open (MenuItem) 2. Load from MySql database (list of words as a table) )
  3. Save your data to a specified location. (then 'TrainPredict' button will be enable)
  4. If you transcribe some data, click 'trainpredict' button to predict transcription for remaining words. you have to manually correct predicted transcription.
  5. Description of colors:
    • Black - Manually written or manually corrected
    • Red - Transcription syntax is not permitted (eg: user written transcription is incorrect)
    • LightGray - Predicted by 'TrainPredict' (predicted based on manual transcription)

Attachments (2)

Download all attachments as: .zip