Changes between Version 11 and Version 12 of NewLanguageSupport


Ignore:
Timestamp:
07/02/09 19:21:43 (15 years ago)
Author:
masc01
Comment:

updated feature maker section slightly.

Legend:

Unmodified
Added
Removed
Modified
  • NewLanguageSupport

    v11 v12  
    247247== 5. Run feature maker with the minimal nlp components == 
    248248 
    249 The '''FeatureMakerServer''' program splits the clean text obtained in step 2 into sentences, classify them as reliable, or non-reliable (sentences with unknownWords or strangeSymbols) and extracts context features from the reliable sentences. All this extracted data will be  
     249The '''FeatureMaker''' program splits the clean text obtained in step 2 into sentences, classify them as reliable, or non-reliable (sentences with unknownWords or strangeSymbols) and extracts context features from the reliable sentences. All this extracted data will be  
    250250kept in the DB.[[BR]] 
    251251 
     
    260260# just the not processed records. 
    261261 
    262 #Usage: java FeatureMakerMaryServer -locale language -mysqlHost host -mysqlUser user 
     262#Usage: java FeatureMaker -locale language -mysqlHost host -mysqlUser user 
    263263#                 -mysqlPasswd passwd -mysqlDB wikiDB 
    264 #                 [-maryHost localhost -maryPort 59125 -strictCredibility strict] 
     264#                 [-reliability strict] 
    265265#                 [-featuresForSelection phoneme,next_phoneme,selection_prosody] 
    266266# 
    267267#  required: This program requires a MARY server running and an already created cleanText table in the DB.  
    268268#            The cleanText table can be created with the WikipediaProcess program.  
    269 #  default/optional: [-maryHost localhost -maryPort 59125] 
    270 #  default/optional: [-featuresForSelection phoneme,next_phoneme,selection_prosody] (features separated by ,)  
    271 #  optional: [-strictCredibility [strict|lax]] 
    272 # 
    273 #  -strictCredibility: setting that determines what kind of sentences  
    274 #  are regarded as credible. There are two settings: strict and lax. With  
    275 #  setting strict (default), only those sentences that contain words in the lexicon  
    276 #  or words that were transcribed by the preprocessor are regarded as credible;  
    277 #  the other sentences as unreliable. With setting lax, also those words that  
    278 #  are transcribed with the Denglish and the compound module are regarded as credible.  
     269#  default/optional: [-featuresForSelection phone,next_phone,selection_prosody] (features separated by ,)  
     270#  optional: [-reliability [strict|lax]] 
     271# 
     272#  -reliability: setting that determines what kind of sentences  
     273#  are regarded as reliable. There are two settings: strict and lax. With  
     274#  setting strict, only those sentences that contain words in the lexicon 
     275#  or words that were transcribed by the preprocessor can be selected for the synthesis script; 
     276#  the other sentences as unreliable. With setting lax (default), also those words that 
     277#  are transcribed with the letter to sound component can be selected. 
    279278 
    280279 
     
    283282 
    284283java -Xmx1000m -classpath $CLASSPATH -Djava.endorsed.dirs=$MARY_BASE/lib/endorsed \ 
    285 -Dmary.base=$MARY_BASE marytts.tools.dbselection.FeatureMakerMaryServer \ 
     284-Dmary.base=$MARY_BASE marytts.tools.dbselection.FeatureMaker \ 
    286285-locale "en_US" \ 
    287286-mysqlHost "localhost" \ 
     
    289288-mysqlPasswd "wiki123" \ 
    290289-mysqlDB "wiki" \ 
    291 -featuresForSelection "phoneme,next_phoneme,selection_prosody"  
    292  
    293 }}} 
    294  
     290-featuresForSelection "phone,next_phone,selection_prosody"  
     291 
     292}}} 
     293 
     294There is a variant of the program, '''FeatureMakerMaryServer''', which calls an external Mary server instead of starting the Mary components internally. It takes the additional command line arguments ''-maryHost localhost -maryPort 59125''. 
    295295 
    296296Output: