wiki:VocalizationSynthesis

Version 7 (modified by sach01, 13 years ago) (diff)

--

Synthesis of vocalizations using MARY TTS

Listener vocalizations play an important role in communicating listener intentions while the interlocutor is talking. They include non-linguistic vocalizations like uh-huh, mhm, (laughter), and (sigh) as well as verbal response tokens such as yes, right, really, and absolutely. To communicate different intentions, a synthesiser should be capable of generating a broad range of vocalisations with different kinds of acoustic properties. In multimodal human-computer interaction, the ability of systems to generate vocal listener behavior is an important requirement for generating affective interaction. This page aims to provide examples to synthesize vocalizations using MARY speech synthesis framework.

MARY supports synthesis of vocalizations with following MARYXML (WORDS input type) requests.

1. Synthesis using a 'variant'

Example:

<maryxml version="0.5" xmlns="http://mary.dfki.de/2002/MaryXML" xml:lang="en-GB">
<voice name="dfki-poppy">
<p>
<vocalization variant="14"/>
</p>
</voice>
</maryxml>

2. Synthesize a vocalization which fits better for given target

Example 2.1:

<maryxml version="0.5" xmlns="http://mary.dfki.de/2002/MaryXML" xml:lang="en-GB">
<voice name="dfki-poppy">
<p>
<vocalization name="yeah" meaning="uncertain" intonation="falling" voicequality="modal"/>
</p>
</voice>
</maryxml>

Example 2.2:

<maryxml version="0.5" xmlns="http://mary.dfki.de/2002/MaryXML" xml:lang="en-GB">
<voice name="dfki-poppy">
<p>
<vocalization name="yeah" meaning="agreeing" intonation="mid" voicequality="modal"/>
</p>
</voice>
</maryxml>

See also the interactive documentation at http://mary.dfki.de:59125/documentation.html#vocalizations