SSML Tags for Text-to-Speech Customization

SSML (Speech Synthesis Markup Language) tags allow you to customize how a text-to-speech engine generates audio.

Why Use SSML?

SSML gives you precise control over speech characteristics like pauses, pitch, speed, and pronunciation, enabling natural and context-appropriate audio output.

Reverie TTS Support

Reverie’s TTS API supports SSML to structure text, switch voices, and customize pronunciation for various formats like dates, currencies, and units.

Core SSML Tags

<speak> and <voice>

The root element for any SSML document. Surround all text to be spoken with this tag.
Note: The <speak> tag is mandatory.

Encapsulates text and other tags, specifying the voice to use via the name attribute. Supports multiple voices in a single request.

<p> represents a Paragraph. Use the tag to add a pause between paragraphs in the text. Note: By default, the API will separate the paragraphs by 800ms break.

<s> represents a Sentence. Use the tag to add a pause between lines or sentences in the text. Note: By default, the API will separate the sentences within a paragraph by a 400ms break.

Recommend using the <p> and <s> tags to structure your requests, especially when using <break>.

For <speak> Tag:

<speak>
  <p><s>This is a sentence</s></p>
</speak>

For <voice> Tag:

<speak>
    <voice name="en_female"> This is an example. </voice>
    <voice name="hi_female"> यह एक उदाहरण है </voice>
</speak>

<break>

It represents a pause in the speech. Set the length of the pause with the time attribute. Add pause of a specified duration, either in seconds or milliseconds.

<break> is currently not supported within a sentence. It can only exist before or between sentences or paragraphs.

AttributeDescription
strength(Optional) Strength of the pause inserted. Values: none, x-weak, weak, medium, strong, x-strong.
time(Optional) Duration of the pause, e.g., "1s", "750ms". Unit must be specified. Default: "400ms".
<s>Fetching results</s>
<break time="3s"/>
<s>Found 3 items</s>

<sub>

Pronounce the specified word or phrase as a different word or phrase. Specify the pronunciation to substitute with the alias attribute.

AttributeDescription
aliasThe word or phrase to speak in place of the tagged text
<sub alias='World Wide Web'>www</sub>

<say-as>

The <say-as> tag will allow you to provide instructions for how to speak a text file. The TTS engine will automatically detect many of these features, but the say-as command will enable you to mark them specifically. Use the <say-as> tag with the interpret-as attribute to tell the TTS how to speak certain characters, words, and numbers.

AttributeDescription
interpret-asUses some available possibility values to speak the given text. In the below section, we provide you the values available interpret-as
<sub alias='World Wide Web'>www</sub>

Cardinal

Interprets the numerical text as a cardinal number, as in 1,2,3,4

By default, the API will understand and pronounce the numbers in Cardinal format. Example: The API will interpret 1234 as One thousand two hundred and thirty-four.

<say-as interpret-as='cardinal'>1234</say-as>

Ordinal

This value will interpret and speak the ordinal value for the given digit within the enclosed tag.

ValueInterpret Ordinal Value
44Forty-Fourth
2Second
<say-as interpret-as='ordinal'>2</say-as>

Character

This value will speak the characters in a given text within the enclosed tag.

ValueInterpret Ordinal Value
HELLOAPI will pronounce each character separately
50WSFive Zero W S
<say-as interpret-as='characters'>50WS</say-as>

Digits

This value spells out the digits in a given number within the enclosed tag.

ValueInterpret Digits
44Four Four
1234One Two Three Four
<say-as interpret-as='digits'>1234</say-as>

Units

Interpret a value as a measurement. The value should be a number followed by a unit or just a unit. The units supported by the TTS API are:

UnitsInterpreted as
kgs, kgkilograms
gms, gm, ggrams
mgs, mgmilligrams
kms, kmkilometers
cms, cmcentimeters
mtr, mmeters
mmmillimeters
ltrs, ltr, lt, l, ℓliters
ftfeet
ininches
lbs, lbpounds
hrs, hrhours
minminutes
sec, sseconds
<say-as interpret-as='unit'>5l</say-as> of oil is added to your cart.

Date

This value will speak the enclosed tag’s date, using the format given in the associated format attribute. The format attribute is mandatory for use with the date value of interpret-as.

Denoted asDigit Length
Dated1 or 2 digits
Monthm1 or 2 digits
Yeary4 digits

Supported Date Formats:

Date FormatInterpret Date
dmyday-month-year
Example: 10-02-1990 is read as Tenth February, Nineteen Ninety
Sample Code:
<say-as interpret-as='date' format='dmy'>10-2-1990</say-as>
mdymonth-day-year
Example: 02-10-1990 is read as Tenth February, Nineteen Ninety
Sample Code:
<say-as interpret-as='date' format='mdy'>2-10</say-as>
dmday-month
Example: 10-2 is read as Tenth February
Sample Code:
<say-as interpret-as='date' format='dm'>10-2</say-as>
mdmonth-day
Example: 02-10 is read as Tenth February
Sample Code:
<say-as interpret-as='date' format='md'>2-10</say-as>

Currency

Use the currency parameter to control the synthesis of monetary quantities. The currency formats supported and spoken by the API is:

CurrencySymbolAPI Interpretation
Rupee & Paisa₹, Rs., रExample: 10.50 is read as Ten rupees fifty paise.
Sample Code: <say-as interpret-as='currency'>₹10.50</say-as>
Dollars & Cents$Example: $10.50 is read as Ten dollars fifty cents.
Sample Code: <say-as interpret-as='currency'>$10.50</say-as>
Pounds & Pence£Example: £10.50 is read as Ten pounds fifty pence.
Sample Code: <say-as interpret-as='currency'>£10.50</say-as>
Euros & CentsExample: 10.50 is read as Ten euros fifty cents.
Sample Code: <say-as interpret-as='currency'>€10.50</say-as>

URL

This value is used to control the synthesis of the website address.

By default, the API will interpret the website address and speak in the right format. The API will pronounce symbols as present in a web address. Example: reverieinc.com is pronounced as reverieinc dot com

<say-as interpret-as='url'> reverieinc.com</say-as>

<prosody>

The prosody element can be used to adjust the below-listed attributes for the text-to-speech output.

AttributeDescription
pitch(optional) Changes the pitch (frequency) of the enclosed text overall.
Values:
Absolute or relative values in Hertz (Hz). e.g. 350Hz, +20Hz, -10Hz.
Relative values in semitones (st). e.g. +1st, -0.5st.
Absolute or relative values in percentage. e.g. 90%, +10%, -5%.
Labels: default, x-low, low, medium, high, x-high.
rate(optional) Changes the rate of speaking of the enclosed text overall.
Values:
Number as a factor. 1.0 being default. e.g. 0.75, 1.2.
Absolute or relative values in percentage. e.g. 90%, +20%, -10%.
Labels: default, x-slow, slow, medium, fast, x-fast.
duration(optional) Elapsed time of the enclosed text overall. overrides rate.
Values:
Number with unit (seconds or milliseconds). e.g. 2.5s, 3500ms.
volume(optional) Changes the loudness of the enclosed text overall.
Values:
Absolute or relative number, 100 being default. e.g. 80, 120, +5, -10.
Absolute or relative values in percentage. e.g. 90%, +20%, -10%.
Relative values in decibels (dB). e.g. +6dB, -6dB.
Labels: default, silent, x-soft, soft, medium, loud, x-loud.
contour(optional) Changes the pitch at various points as the utterance progresses.
Expressed as a space-separated array of parameter pairs. The first value denotes the location in text as a percentage. The second value denotes the desired pitch. See pitch attribute for possible value expressions.
e.g. <prosody contour="(0%,+20Hz)(10%,-2st)(40%,+10Hz)">
Note: This option is available for English and Hindi only. But the request will not fail if this option is used for other languages. It will just not have the intended effect.
range(optional) Changes the range of pitch variation of the enclosed text. Range refers to the difference between min and max pitch. Refer pitch attribute for possible value expressions.
Note: This option is available for English and Hindi only. But the request will not fail if this option is used for other languages. It will just not have the intended effect.
<prosody pitch="value" rate="value" duration="value" volume="value"></prosody>