SSML (Speech Synthesis Markup Language)
SSML tags are used to customize the way a text-to-speech engine creates audio. The tags are used to add a pause, change emphasis, and change the pronunciation. Pronouncing numbers in cardinal format (example: 123 is spoken as One Hundred and Twenty Three).
You may want additional control over the speech generated from the text in the response in some cases. For example, you want a string of digits read back as a standard telephone number and not in cardinal form. The Reverie TTS provides control with Speech Synthesis Markup Language (SSML) support.
<speak> and <voice>
The <speak> is the root element of an SSML document. When using SSML with the Reverie TTS, surround the text to be spoken with this tag.
Note: <speak> tag is mandatory.
<voice> Base tag used to encapsulate text and other tags, and specify which voice to use via the "name" attribute. This allows use of multiple voices in a single request.
<p> and <s>
<p> represents a Paragraph. Use the tag to add a pause between paragraphs in the text.
Note: By default, the API will separate the paragraphs by 800ms break.
<s> represents a Sentence. Use the tag to add a pause between lines or sentences in the text. Note: By default, the API will separate the sentences within a paragraph by a 400ms break.
Note: Recommend using the <p> and <s> tags to structure your requests, especially when using <break>.
Sample Code:
<break>
<break> represents a pause in the speech. Set the length of the pause with the time attribute. Add pause of a specified duration, either in seconds or milliseconds.
Note: <break> is currently not supported within a sentence. It can only exist before or between sentences or paragraphs.
strength
(optional): strength of the pause inserted. values: none, x-weak, weak, medium, strong, x-strong.
time
(optional): duration of the pause. e.g. "1s" or "750ms". The unit must be specified. Default: "400ms"
Sample Code:
<sub>
Pronounce the specified word or phrase as a different word or phrase. Specify the pronunciation to substitute with the alias attribute.
alias
The word or phrase to speak in place of the tagged text
Sample Code:
<say-as>
The <say-as> tag will allow you to provide instructions for how to speak a text file. The TTS engine will automatically detect many of these features, but the say-as command will enable you to mark them specifically.
Use the <say-as> tag with the interpret-as attribute to tell the TTS how to speak certain characters, words, and numbers.
interpret-as
Uses some available possibility values to speak the given text. In the below section, we provide you the values available interpret-as
Cardinal
Interprets the numerical text as a cardinal number, as in 1,234
Note: By default, the API will understand and pronounce the numbers in Cardinal format.
Example: The API will interpret 1234 as One thousand two hundred and thirty-four.
Sample Code:
Ordinal
This value will interpret and speak the ordinal value for the given digit within the enclosed tag.
44
Forty-Fourth
2
Second
Sample Code:
Character
This value will speak the characters in a given text within the enclosed tag.
HELLO
API will pronounce each character separately
50WS
Five Zero W S
Sample Code:
Digits
This value spells out the digits in a given number within the enclosed tag.
44
Four Four
1234
One Two Three Four
Sample Code:
Unit
Interpret a value as a measurement. The value should be a number followed by a unit or just a unit. The units supported by the TTS API are:
kgs, kg
kilograms
gms, gm, g
grams
mgs, mg
milligrams
kms, km
kilometers
cms, cm
centimeters
mtr, m
meters
mm
millimeters
ltrs, ltr, lt, l, ℓ
liters
ft
feet
in
inches
lbs, lb
pounds
hrs, hr
hours
min
minutes
sec, s
seconds
Sample Code:
Date
This value will speak the enclosed tag’s date, using the format given in the associated format attribute. The format attribute is mandatory for use with the date value of interpret-as.
Date
d
1 or 2 digits
Month
m
1 or 2 digits
Year
y
4 digits
Supported Date Formats:
dmy
day-month-year
Example: 10-02-1990 is read as Tenth February, Nineteen Ninety
Sample Code:
<say-as interpret-as= 'date' format ='dmy'>10-2-1990</say-as>
mdy
month-day-year
Example: 02-10-1990 is read as Tenth February, Nineteen Ninety
Sample Code:
<say-as interpret-as= 'date' format ='mdy'>2-10</say-as>
dm
day-month
Example: 10-2 is read as Tenth February
Sample Code:
<say-as interpret-as= 'date' format ='dm'>10-2</say-as>
md
month-day
Example: 02-10 is read as Tenth February
Sample Code:
<say-as interpret-as= 'date' format ='md'>2-10</say-as>
Currency
Use the currency parameter to control the synthesis of monetary quantities. The currency formats supported and spoken by the API is:
Rupee & Paisa
₹, Rs.,र
Example: ₹10.50 is read as Ten rupees fifty paise.
Sample Code:
<say-as interpret-as= 'currency'>₹10.50</say-as>
Dollars & Cents
$
Example: $10.50 is read as Ten dollars fifty cents
Sample Code: <say-as interpret-as= 'currency'>$10.50</say-as>
Pounds & Pence
£
Example: £10.50 is read as Ten pounds fifty pence.
Sample Code: <say-as interpret-as= 'currency'>£10.50</say-as>
Euros & Cents
€
Example: €10.50 is read as Ten euros fifty cents
Sample Code: <say-as interpret-as= 'currency'>€10.50</say-as>
URL
This value is used to control the synthesis of the website address.
Note: By default, the API will interpret the website address and speak in the right format.
Note: The API will pronounce symbols as present in a web address.
Example: reverieinc.com is pronounced as reverieinc dot com
Sample Code:
<prosody>
The prosody element can be used to adjust the below-listed attributes for the text-to-speech output.
pitch
(optional) Changes the pitch (frequency) of the enclosed text overall.
Values:
Absolute or relative values in Hertz (Hz). e.g. 350Hz, +20Hz, -10Hz.
Relative values in semitones (st). e.g. +1st, -0.5st.
Absolute or relative values in percentage. e.g. 90%, +10%, -5%.
Labels: default, x-low, low, medium, high, x-high.
rate
(optional) Changes the rate of speaking of the enclosed text overall.
Values:
Number as a factor. 1.0 being default. e.g. 0.75, 1.2.
Absolute or relative values in percentage. e.g. 90%, +20%, -10%.
Labels: default, x-slow, slow, medium, fast, x-fast.
duration
(optional): Elapsed time of the enclosed text overall. overrides rate.
Values:
Number with unit (seconds or milliseconds). e.g. 2.5s, 3500ms
volume
(optional): Changes the loudness of the enclosed text overall.
Values:
Absolute or relative number, 100 being default. e.g. 80, 120, +5, -10.
Absolute or relative values in percentage. e.g. 90%, +20%, -10%.
Relative values in decibels (dB). e.g. +6dB, -6dB.
Labels: default, silent, x-soft, soft, medium, loud, x-loud
contour
(optional): Changes the pitch at various points as the utterance progresses.
Expressed as a space-separated array of parameter pairs. The first value denotes the location in text as a percentage. The second value denotes the desired pitch. See pitch attribute for possible value expressions.
e.g. <prosody contour="(0%,+20Hz)(10%,-2st)(40%,+10Hz)">
Note: This option is available for English and Hindi only. But the request will not fail if this option is used for other languages. It will just not have the intended effect.
range
(optional): Changes the range of pitch variation of the enclosed text. Range refers to the difference between min and max pitch. Refer pitch attribute for possible value expressions.
Note: This option is available for English and Hindi only. But the request will not fail if this option is used for other languages. It will just not have the intended effect.
Syntax
Last updated