SSML (Speech Synthesis Markup Language)
SSML tags are used to customize the way a text-to-speech engine creates audio. The tags are used to add a pause, change emphasis, and change the pronunciation. Pronouncing numbers in cardinal format (example: 123 is spoken as One Hundred and Twenty Three).
You may want additional control over the speech generated from the text in the response in some cases. For example, you want a string of digits read back as a standard telephone number and not in cardinal form. The Reverie TTS provides control with Speech Synthesis Markup Language (SSML) support.
<speak> and <voice>
The <speak> is the root element of an SSML document. When using SSML with the Reverie TTS, surround the text to be spoken with this tag.
Note: <speak> tag is mandatory.
<voice> Base tag used to encapsulate text and other tags, and specify which voice to use via the "name" attribute. This allows use of multiple voices in a single request.
<p> and <s>
<p> represents a Paragraph. Use the tag to add a pause between paragraphs in the text.
Note: By default, the API will separate the paragraphs by 800ms break.
<s> represents a Sentence. Use the tag to add a pause between lines or sentences in the text. Note: By default, the API will separate the sentences within a paragraph by a 400ms break.
Note: Recommend using the <p> and <s> tags to structure your requests, especially when using <break>.
Sample Code:
<break>
<break> represents a pause in the speech. Set the length of the pause with the time attribute. Add pause of a specified duration, either in seconds or milliseconds.
Note: <break> is currently not supported within a sentence. It can only exist before or between sentences or paragraphs.
Sample Code:
<sub>
Pronounce the specified word or phrase as a different word or phrase. Specify the pronunciation to substitute with the alias attribute.
Sample Code:
<say-as>
The <say-as> tag will allow you to provide instructions for how to speak a text file. The TTS engine will automatically detect many of these features, but the say-as command will enable you to mark them specifically.
Use the <say-as> tag with the interpret-as attribute to tell the TTS how to speak certain characters, words, and numbers.
Cardinal
Interprets the numerical text as a cardinal number, as in 1,234
Note: By default, the API will understand and pronounce the numbers in Cardinal format.
Example: The API will interpret 1234 as One thousand two hundred and thirty-four.
Sample Code:
Ordinal
This value will interpret and speak the ordinal value for the given digit within the enclosed tag.
Sample Code:
Character
This value will speak the characters in a given text within the enclosed tag.
Sample Code:
Digits
This value spells out the digits in a given number within the enclosed tag.
Sample Code:
Unit
Interpret a value as a measurement. The value should be a number followed by a unit or just a unit. The units supported by the TTS API are:
Sample Code:
Date
This value will speak the enclosed tag’s date, using the format given in the associated format attribute. The format attribute is mandatory for use with the date value of interpret-as.
Supported Date Formats:
Currency
Use the currency parameter to control the synthesis of monetary quantities. The currency formats supported and spoken by the API is:
URL
This value is used to control the synthesis of the website address.
Note: By default, the API will interpret the website address and speak in the right format.
Note: The API will pronounce symbols as present in a web address.
Example: reverieinc.com is pronounced as reverieinc dot com
Sample Code:
<prosody>
The prosody element can be used to adjust the below-listed attributes for the text-to-speech output.
Syntax
Last updated