VocalizationSynthesis – MARY

Context Navigation

Version 9 (modified by sach01, 15 years ago) (diff)
--

Synthesis of vocalizations using MARY TTS

Listener vocalizations play an important role in communicating listener intentions while the interlocutor is talking. They include non-linguistic vocalizations like uh-huh, mhm, (laughter), and (sigh) as well as verbal response tokens such as yes, right, really, and absolutely. To communicate different intentions, a synthesiser should be capable of generating a broad range of vocalisations with different kinds of acoustic properties. In multimodal human-computer interaction, the ability of systems to generate vocal listener behavior is an important requirement for generating affective interaction. This page aims to provide examples to synthesize vocalizations using MARY speech synthesis framework.

MARY framework supports to synthesize vocalizations using following MARYXML (WORDS input type) requests.

1. Synthesis using a 'variant'

Example:

<maryxml version="0.5" xmlns="http://mary.dfki.de/2002/MaryXML" xml:lang="en-GB">
<voice name="dfki-poppy">
<p>
<vocalization variant="14"/>
</p>
</voice>
</maryxml>

2. Synthesize a vocalization which fits better for given target

Example 2.1:

<maryxml version="0.5" xmlns="http://mary.dfki.de/2002/MaryXML" xml:lang="en-GB">
<voice name="dfki-poppy">
<p>
<vocalization name="yeah" meaning="uncertain" intonation="falling" voicequality="modal"/>
</p>
</voice>
</maryxml>

Example 2.2:

<maryxml version="0.5" xmlns="http://mary.dfki.de/2002/MaryXML" xml:lang="en-GB">
<voice name="dfki-poppy">
<p>
<vocalization name="yeah" meaning="agreeing" intonation="mid" voicequality="modal"/>
</p>
</voice>
</maryxml>

Possible values currently supported for each of the attributes of the <vocalization> element in MaryXML:

Attribute Possible values
meaning anger, sadness, amusement, happiness, contempt, certain, uncertain, agreeing, disagreeing, interested, uninterested, low-anticipation, high-anticipation, low-solidarity, high-solidarity, low-antagonism, high-antagonism
intonation rising, falling, high, mid, low
voicequality modal, creaky, whispery, breathy, tense, lax
name yeah, yes, mhmh, mhm, right, tsright, tsyeah, aha, (snort), (sigh), (laughter), definitely, really, gosh, ah_I_see, oh_god_(gasp), yeah_absolutely

name attribute values are voice specific, see interactive documentation.

See also the interactive documentation at http://mary.dfki.de:59125/documentation.html#vocalizations

Download in other formats:

Plain Text

Attribute	Possible values
meaning	anger, sadness, amusement, happiness, contempt, certain, uncertain, agreeing, disagreeing, interested, uninterested, low-anticipation, high-anticipation, low-solidarity, high-solidarity, low-antagonism, high-antagonism
intonation	rising, falling, high, mid, low
voicequality	modal, creaky, whispery, breathy, tense, lax
name	yeah, yes, mhmh, mhm, right, tsright, tsyeah, aha, (snort), (sigh), (laughter), definitely, really, gosh, ah_I_see, oh_god_(gasp), yeah_absolutely