Synthesis of vocalizations using MARY TTS
Listener vocalizations play an important role in communicating listener intentions while the interlocutor is talking. They include non-linguistic vocalizations like uh-huh, mhm, (laughter), and (sigh) as well as verbal response tokens such as yes, right, really, and absolutely. To communicate different intentions, a synthesiser should be capable of generating a broad range of vocalisations with different kinds of acoustic properties. In multimodal human-computer interaction, the ability of systems to generate vocal listener behavior is an important requirement for generating affective interaction. This page aims to provide examples to synthesize vocalizations using MARY speech synthesis framework.
MARY framework supports to synthesize vocalizations using following MARYXML (WORDS input type) requests:
1. Synthesis using a 'variant'
Example:
<maryxml version="0.5" xmlns="http://mary.dfki.de/2002/MaryXML" xml:lang="en-GB"> <voice name="dfki-poppy"> <p> <vocalization variant="14"/> </p> </voice> </maryxml>
2. Synthesize a vocalization which fits better for given target
Example 2.1:
<maryxml version="0.5" xmlns="http://mary.dfki.de/2002/MaryXML" xml:lang="en-GB"> <voice name="dfki-poppy"> <p> <vocalization name="yeah" meaning="uncertain" intonation="falling" voicequality="modal"/> </p> </voice> </maryxml>
Example 2.2:
<maryxml version="0.5" xmlns="http://mary.dfki.de/2002/MaryXML" xml:lang="en-GB"> <voice name="dfki-poppy"> <p> <vocalization name="yeah" meaning="agreeing" intonation="mid" voicequality="modal"/> </p> </voice> </maryxml>
Possible values currently supported for each of the attributes of the <vocalization> element in MaryXML:
Attribute Possible values meaning anger, sadness, amusement, happiness, contempt, certain, uncertain, agreeing, disagreeing, interested, uninterested, low-anticipation, high-anticipation, low-solidarity, high-solidarity, low-antagonism, high-antagonism intonation rising, falling, high, mid, low voicequality modal, creaky, whispery, breathy, tense, lax name yeah, yes, mhmh, mhm, right, tsright, tsyeah, aha, (snort), (sigh), (laughter), definitely, really, gosh, ah_I_see, oh_god_(gasp), yeah_absolutely
name attribute values are voice specific, see interactive documentation.
See also the interactive documentation at http://mary.dfki.de:59125/documentation.html#vocalizations