SSML (Speech Synthesis Markup Language)

SSML tags are used to customize the way a text-to-speech engine creates audio. The tags are used to add a pause, change emphasis, and change the pronunciation. Pronouncing numbers in cardinal format (example: 123 is spoken as One Hundred and Twenty Three).

You may want additional control over the speech generated from the text in the response in some cases. For example, you want a string of digits read back as a standard telephone number and not in cardinal form. The Reverie TTS provides control with Speech Synthesis Markup Language (SSML) support.

<speak> and <voice>

The <speak> is the root element of an SSML document. When using SSML with the Reverie TTS, surround the text to be spoken with this tag.

Note: <speak> tag is mandatory.

<speak>
<p><s>This is a sentence</s></p>
</speak>

<voice> Base tag used to encapsulate text and other tags, and specify which voice to use via the "name" attribute. This allows use of multiple voices in a single request.

<speak>
<voice name="en_female"> This is an example. </voice>
<voice name="hi_female"> यह एक उदाहरण है </voice>
</speak>

<p> and <s>

<p> represents a Paragraph. Use the tag to add a pause between paragraphs in the text.

Note: By default, the API will separate the paragraphs by 800ms break.

<s> represents a Sentence. Use the tag to add a pause between lines or sentences in the text. Note: By default, the API will separate the sentences within a paragraph by a 400ms break.

Note: Recommend using the <p> and <s> tags to structure your requests, especially when using <break>.

Sample Code:

<speak>
<p><s>This is a sentence</s></p>
</speak>

<break>

<break> represents a pause in the speech. Set the length of the pause with the time attribute. Add pause of a specified duration, either in seconds or milliseconds.

Note: <break> is currently not supported within a sentence. It can only exist before or between sentences or paragraphs.

AttributeDescription

strength

(optional): strength of the pause inserted. values: none, x-weak, weak, medium, strong, x-strong.

time

(optional): duration of the pause. e.g. "1s" or "750ms". The unit must be specified. Default: "400ms"

Sample Code:

<s>Fetching results</s>
<break time="3s"/>
<s>Found 3 items</s>

<sub>

Pronounce the specified word or phrase as a different word or phrase. Specify the pronunciation to substitute with the alias attribute.

AttributeDescription

alias

The word or phrase to speak in place of the tagged text

Sample Code:

<sub alias='World Wide Web'>www</sub>

<say-as>

The <say-as> tag will allow you to provide instructions for how to speak a text file. The TTS engine will automatically detect many of these features, but the say-as command will enable you to mark them specifically.

Use the <say-as> tag with the interpret-as attribute to tell the TTS how to speak certain characters, words, and numbers.

AttributeDescription

interpret-as

Uses some available possibility values to speak the given text. In the below section, we provide you the values available interpret-as

Cardinal

Interprets the numerical text as a cardinal number, as in 1,234

Note: By default, the API will understand and pronounce the numbers in Cardinal format.

Example: The API will interpret 1234 as One thousand two hundred and thirty-four.

Sample Code:

<say-as interpret-as='cardinal'>1234</say-as>

Ordinal

This value will interpret and speak the ordinal value for the given digit within the enclosed tag.

ValueInterpret Ordinal Value

44

Forty-Fourth

2

Second

Sample Code:

<say-as interpret-as='ordinal'>2</say-as>

Character

This value will speak the characters in a given text within the enclosed tag.

ValueInterpret Character

HELLO

API will pronounce each character separately

50WS

Five Zero W S

Sample Code:

<say-as interpret-as='characters'>50WS</say-as>

Digits

This value spells out the digits in a given number within the enclosed tag.

ValueInterpret Digits

44

Four Four

1234

One Two Three Four

Sample Code:

<say-as interpret-as='digits'>1234</say-as>

Unit

Interpret a value as a measurement. The value should be a number followed by a unit or just a unit. The units supported by the TTS API are:

UnitsInterpreted as

kgs, kg

kilograms

gms, gm, g

grams

mgs, mg

milligrams

kms, km

kilometers

cms, cm

centimeters

mtr, m

meters

mm

millimeters

ltrs, ltr, lt, l, ℓ

liters

ft

feet

in

inches

lbs, lb

pounds

hrs, hr

hours

min

minutes

sec, s

seconds

Sample Code:

<say-as interpret-as='unit'>5l</say-as> of oil is added to your cart.

Date

This value will speak the enclosed tag’s date, using the format given in the associated format attribute. The format attribute is mandatory for use with the date value of interpret-as.

Denoted asDigit Length

Date

d

1 or 2 digits

Month

m

1 or 2 digits

Year

y

4 digits

Supported Date Formats:

Date Format Interpret Date

dmy

day-month-year

Example: 10-02-1990 is read as Tenth February, Nineteen Ninety

Sample Code:

<say-as interpret-as= 'date' format ='dmy'>10-2-1990</say-as>

mdy

month-day-year

Example: 02-10-1990 is read as Tenth February, Nineteen Ninety

Sample Code:

<say-as interpret-as= 'date' format ='mdy'>2-10</say-as>

dm

day-month

Example: 10-2 is read as Tenth February

Sample Code:

<say-as interpret-as= 'date' format ='dm'>10-2</say-as>

md

month-day

Example: 02-10 is read as Tenth February

Sample Code:

<say-as interpret-as= 'date' format ='md'>2-10</say-as>

Currency

Use the currency parameter to control the synthesis of monetary quantities. The currency formats supported and spoken by the API is:

CurrencySymbolAPI Interpretation

Rupee & Paisa

₹, Rs.,र

Example: ₹10.50 is read as Ten rupees fifty paise.

Sample Code:

<say-as interpret-as= 'currency'>₹10.50</say-as>

Dollars & Cents

$

Example: $10.50 is read as Ten dollars fifty cents

Sample Code: <say-as interpret-as= 'currency'>$10.50</say-as>

Pounds & Pence

£

Example: £10.50 is read as Ten pounds fifty pence.

Sample Code: <say-as interpret-as= 'currency'>£10.50</say-as>

Euros & Cents

Example: €10.50 is read as Ten euros fifty cents

Sample Code: <say-as interpret-as= 'currency'>€10.50</say-as>

URL

This value is used to control the synthesis of the website address.

Note: By default, the API will interpret the website address and speak in the right format.

Note: The API will pronounce symbols as present in a web address.

Example: reverieinc.com is pronounced as reverieinc dot com

Sample Code:

<say-as interpret-as='url'> reverieinc.com</say-as>

<prosody>

The prosody element can be used to adjust the below-listed attributes for the text-to-speech output.

AttributeDescription

pitch

(optional) Changes the pitch (frequency) of the enclosed text overall.

Values:

Absolute or relative values in Hertz (Hz). e.g. 350Hz, +20Hz, -10Hz.

Relative values in semitones (st). e.g. +1st, -0.5st.

Absolute or relative values in percentage. e.g. 90%, +10%, -5%.

Labels: default, x-low, low, medium, high, x-high.

rate

(optional) Changes the rate of speaking of the enclosed text overall.

Values:

Number as a factor. 1.0 being default. e.g. 0.75, 1.2.

Absolute or relative values in percentage. e.g. 90%, +20%, -10%.

Labels: default, x-slow, slow, medium, fast, x-fast.

duration

(optional): Elapsed time of the enclosed text overall. overrides rate.

Values:

Number with unit (seconds or milliseconds). e.g. 2.5s, 3500ms

volume

(optional): Changes the loudness of the enclosed text overall.

Values:

Absolute or relative number, 100 being default. e.g. 80, 120, +5, -10.

Absolute or relative values in percentage. e.g. 90%, +20%, -10%.

Relative values in decibels (dB). e.g. +6dB, -6dB.

Labels: default, silent, x-soft, soft, medium, loud, x-loud

contour

(optional): Changes the pitch at various points as the utterance progresses.

Expressed as a space-separated array of parameter pairs. The first value denotes the location in text as a percentage. The second value denotes the desired pitch. See pitch attribute for possible value expressions.

e.g. <prosody contour="(0%,+20Hz)(10%,-2st)(40%,+10Hz)">

Note: This option is available for English and Hindi only. But the request will not fail if this option is used for other languages. It will just not have the intended effect.

range

(optional): Changes the range of pitch variation of the enclosed text. Range refers to the difference between min and max pitch. Refer pitch attribute for possible value expressions.

Note: This option is available for English and Hindi only. But the request will not fail if this option is used for other languages. It will just not have the intended effect.

Syntax

<prosody pitch="value" rate="value" duration="value" volume="value"></prosody>

Last updated