Sites


UpStage

Text2Speech Voices

Text2Speech Voices

UpStage's speech is generated by the Festival Speech Synthesiser, developed at the Centre for Speech Technology Research at Edinburgh University (http://www.cstr.ed.ac.uk/projects/festival/).


An avatar's voice is selected from a dropdown menu when uploading or editing an avatar. There are currently about 100 voices on the Open UpStage server; if you are setting up your own UpStage server, please see the technical documentation regarding installing voices.


Note that at the moment, it is only possible to use characters from the Roman alphabet (including letters with accents such as é, ñ, ø, ü) in the text chat, but not other alphabets such as Cyrillic, Greek, etc.  


Voices currently available on UpStage


The voices currently available with UpStage have a filenaming system that gives a clue as to what kind of voice each one is. Some of the voices speak English with a foreign accent, some speak English with different English accents, and some are designed to speak other languages more-or-less accurately. We have endeavoured to include a good variety of accents as well as male and female voices.


The format is: ["e" or "emb"] _ [native language] - [en] - [modifications]


For example:


e_de – speaks and reads German

e_en – speaks and reads English

e_en-fast-f1 – speaks English quickly, in a female voice

e_en-wm – speaks english in a west midland accent. 

Other accents in the e_en series are "n" for north, "sc" for Scots, "rp" for RP, "r" for rhotic (which means it pronounces the r in words like church).


emb_af1 – speaks and reads Afrikaans

emb_af1-en – speaks English in an Afrikaans accent

emb_de4-en-low-slow – speaks english, lowly and slowly, in a german accent

You can test the voices on the avatar upload and edit screens, by selecting different voices from the drop down menu and entering the text you want to test.  


We are in the process of compiling descriptions for all the voices; following is the information so far: 


Voice


Male


Female


Accent


Non-Eng


Description


awb_cmu


X


 


Scottish


 


Soft, slightly muffled


awb_nitech


X


 


Scottish


 


Clear, not very deep


bdl_cmu


X


 


English?


 


A little bit quavery


bdl_nitech


X


 


English?


 


Firmer than bdl_cmu, a bit higher, but clearer


bud


X


 


NZ?


 


Deep, calm


clb_nitech


 


X


NZ?


 


Robotic, soft


crunchy


 


 


 


 


Crunchy - good for witches & effects


default


 


X


NZ?


 


Smooth, young


e_en-fast-f1


X


 


NZ?


 


fast, boyish


e_en-r-f3


X


 


NZ?


 


fast, boyish


e_en-wm-slow


X


 


Australian??


 


nasal drawl


e_en-wm-slow-f3


X


 


Australian??


 


boyish nasal drawl, computerish & like a learner-reader reading


e_eo


X


 


 


foreign


 


emb_de4


X


 


German


German


neutral German male


emb_de4-en


X


 


German


 


mid-range, clean, English w/German accent


emb_de4-en-low-slow


X


 


German


 


pimp's voice: low & lecherous (English w/German accent)


emb_de5


X


 


German


German


slow low somewhat distorted voice


emb_de5-en


X


 


German


 


slow high somewhat distorted voice, English w/German accent


emb_de5-en-high-slow


X


 


German


 


mid-high slightly strangulated male, English w/German accent


emb_de7


X


 


German


German


middle somewhat slow and drawn out male, German


emb_en1-high


X


 


English


 


soft mid-range male voice


emb_fr1-en-low


X


 


European


 


low & lecherous


emb_fr4-en-high-slow


X


 


European


 


mid-high male voice, sounds like he has trouble speaking


emb_hu1-en-slow


 


X


European?


 


low soft female voice with slight European accent


emb_nl2


X


 


 


Dutch


mid-low male


emb_nl2-en


X


 


European?


 


mid-low male voice with slight European accent


emb_pl1


 


X


 


Polish?


mid-low calm female


emb_pl1-en


 


X


Polish?


 


mid-low calm female with European accent


emb_ro1-en


X


 


 


 


 


emb_sw1-en-fast


X


 


Swedish?


 


mid-low male speaking quickly


emb_sw2-en-high-slow


 


X


Swedish?


 


mid-high female with European accent


high


X


X


Computer


 


boyish computer monotone


rms-faster


X


 


American


 


 


rms-nitech


X


 


American


 


Deeper than roger, clear, little bit emphatic


roger


X


 


English


 


Thin, proper-sounding, not deep


slow


X


 


computer


 


gets slower & lower, very good for effects


slt-cmu


 


X


American


 


Flat, slightly muffly


slt-nitech


 


X


American


 


Flat, a bit clearer & stronger than slt-cmu


Adding more voices


You can install additional speech plug-ins on your own server to extend the range of voices available to the avatars. As long as you don't mind messing around with the source code a little bit it's not difficult – Patricia Jung explains how she did it (for Linux, using UpStage V1 - note that this is now several years old):


Just add another entry in the VOICES section in Upstage/upstage/voices.py like:

 

 #txt2pho/mbrola:

 'de1': ("| /usr/local/mbrola/pipefilt | /usr/local/mbrola/preproc /usr/local/mbrola/Hadifix.abk /usr/local/mbrola/Rules.lst | /usr/local/mbrola/txt2pho -p /usr/local/mbrola/data/ |/usr/local/mbrola/mbrola /usr/local/mbrola/de1/de1 - -",


                      _fest),

 

I know, it looks awful but this is only because the command is an awful chain consisting of four commands with a couple of options each and the relevant path:

           "| pipefilt ...| preproc ... | txt2pho ... | mbrola ..."

 

It does some preprocessing (like exchanging all appearances of "z.B." with "zum Beispiel"), then hands the resulting text over to txt2pho and to mbrola.

 

As long as your command or command chain takes text input from the standard input and outputs the result as sound in raw format on the standard output chain (Unix stuff, ask me if you haven't heard about it) you can put whatever you like in between the "| and the ".

 

The above mentioned awful command chain will work when one has installed the txt2pho frontend; it uses the de1 female mbrola voice, and you can choose it in the web interface using the name de1.

 

The only problem with this kind of reconfiguration is: as config.py isn't a nice configuration file but a python script one needs to know at least that python is very picky about vertical alignment: It's extremely important that your new voice entries have the same amount of whitespaces at the beginning of the line as the other voice entries.

 

The reason it took me so long was TTS: I failed completely and utterly in making the German festival extensions for use with mbrola voices:

 

http://www.ims.uni-stuttgart.de/phonetik/synthesis/festival_opensource.html

 

work. Then I tried txt2pho with mbrola:

 

http://www.ikp.uni-bonn.de/dt/forsch/phonetik/hadifix/HADIFIXforMBROLA.html

 

(http://bogmog.sourceforge.net/document_show.php3?doc_id=34 has a nice installation description), ignoring Festival, and this worked at once. 

Il y a une erreur de communication avec le serveur Booktype. Nous ne savons pas actuellement où est le problème.

Vous devriez rafraîchir la page.