Frequently Asked Questions

What is Acapela-Box?
Quick start: how does acapela-box work?
Thoughtful start
What can I do with my sound files?
How do I purchase my sound files?
Will I get a receipt valid for tax purposes
How much does it cost?
Can I generate multiple messages with one purchase?
I need to generate a message that is 20 seconds long, how do I choose my purchase option?
Is there any limitation on the text size?
What is the difference between credits and characters?
What are premium voices?
Why is the cost in credits higher that the number of characters
What is a genuine children voice?
I have lots of text files to be converted into speech, is acapela-box the proper solution?
I like your text-to-speech and I would like to integrate it in my application, is it possible to access your text-to-speech via an API?
Can I listen to my files before purchasing them?
Can I modify the pronunciation of a word?
Can I use phonetic transcriptions in the pronunciation editor?
Why is part of my pronunciation displayed in red?
Can I add a pause within the text?
Can I switch voice in the middle of the text?
Can I change the settings in the middle of the text?
Can I get an alternative rendering of the same text?
What is the difference between alternative rendering of a word and pronunciation editor?
How can I specify the part of speech for a word?
Are tags counted as part of the text?
Do you store my text files and my purchased sound files?
What is “Automatic File Name”?
Is it possible to generate automatically a list of sound files ?
Is my account closed when I have used my allocated number of credits?
Why should I go for a medium or a large pack?
I would like to purchase but I cannot use PayPal or credit card, what can I do?
What audio formats are available?
Do I need to provide VAT registration number?
What is the format for VAT number?
Why is it necessary to accept the terms of use to listen to the speech?

Here you’ll find answer to some common questions about acapela-box.

What is Acapela-Box?

Acapela-Box is an online service that allows you to convert and download your text messages into sound files using our high quality text to speech.

Quick start: how does acapela-box work?

1. Fill your account
2. Listen to your text
3. Tweak it by writing it differently and use the advanced settings
4. Download your sound file and off you go !

Thoughtful start

To get started we recommend you to fill your account, this will allow you to download and use the sound files you will create. Simply click to the “buy” page after logging in and follow the instructions. Once this is done, the way to get your files is very simple. Go to the “box” page and type/paste your text in the blue window.

Listen: hear your text
Pause: this buttons stops the reading immediately
Continue: resume where you had paused
Stop: stop to start something else
Download: this button will a) download your file b) debit your acapela box account of the number of credits.
Advanced Settings:

Speech Rate: will change the rate from low (left) to high (right)
Voice Shaping: voice shaping changes the tone of the voice
Pronunciation Editor: allows you to change the pronunciation of a word

My account in this page you can edit your details, see your purchase history and see your transaction history.

Re-download In your transaction history you can re-download your original sound file again 5 times by clicking on this “ Redownload audio file “. After five times you won’t be able to get the original file back.

Regeneration You always will be able to regenerate your sound file by clicking on this “” in your transaction history: the sound file regeneration functions takes your original text files + the settings that you’ve used and creates a new sound file. Be aware : If you regenerate a sound file after a while there might be differences compared to the original sound file and here is the reason why : we are always improving our text to speech algorithm and doing major updates once or twice a year. This can bring some changes and in most of the cases it’s an enhancement, but sometimes it is just different. Consequently we strongly recommend you to back up your sound files and not to rely on the regeneration feature for back ups.

What can I do with my sound files?

The sound files that you download are free of rights.
You can use them for all personal or commercial projects except for broadcasting the sound files downloaded for messages used in advertisements, movies, videos, or any other media support that will generate revenues.

For any avoidance of doubt ; broadcasting is defined as the distribution of audio and/or video content or other messages to a dispersed audience via any electronic mass communications medium.

How do I purchase my sound files?

For purchasing, log-in and then go to the “buy” page, choose the package that fits your needs and then follow the purchasing process. Once you have credits on your account you can start generating and downloading sound files.

Will I get a receipt valid for tax purposes

Yes, a receipt will be available for download as PDF under myAccount after each purchase. The receipt is valid for all bookkeeping and tax purposes.

How much does it cost?

For information about pricing, see this page.

Can I generate multiple messages with one purchase?

Yes you can generate as many messages as you want within the number of credits that you have purchased.

For information about pricing and credits, see this page.

I need to generate a message that is 20 seconds long, how do I choose my purchase option?

On average one second of speech generated sound file equates 15 characters. So, 20 seconds is about 300 characters. The b-5 option is the one to choose.

For information about pricing, see this page.

Is there any limitation on the text size?

Yes, the text cannot exceed 10 000 characters. If you have a longer text, you will need to split it in shorter texts.

What is the difference between credits and characters?

For normal voices the amount of credits and amount of characters are approximately the same. The only differences being:

TAGS are filtered from from the text, with the exception of the PRN tag (used to specify a phonetic pronunciation). For PRN tags we count the number of phonemes specified within the tag as if they were characters.
While counting credits we also take now into account the substitions made by the pronunciation editor.

For downloads including some of the premium voices the principle is the same, but the number of credits needed for creating a sound file with a premium voice is higher.

When you type a text you will see few numbers under the box, including one stating the amount of characters in the text, and one stating the amount of credits needed to convert the text into speech.

What are premium voices?

Some of the voices in acapela-box are considered premium voices and therefore require a larger amount of credits to convert the same amount of text.

Premium voices are marked in the voice list with “Premium”.

Among premium voices we currently have all children voices that have a double price compared to normal voices (2 credits per character), and voices like Sharon (US English) that are using a new generation of text-to-speech technology.

Always check the cost in credits of a download by looking at the figures below the text box.

Why is the cost in credits higher that the number of characters

The cost in credit may be higher then the number of characters if you are using the Pronunciation Editor to expand short words into longer ones (as in the case with acronyms), or if you are using Premium voices.

What is a genuine children voice?

Genuine children voices are children voices created from recordings made by children, as opposed to children voices created by manipulating recordings made by adults or impersonators. This is at the moment of writing a unique feature of Acapela Group.

I have lots of text files to be converted into speech, is acapela-box the proper solution?

acapela-box can be used to generate large amount of files or large texts, but it requires manual work for each file to be converted. If you are looking for a solution where you can generate multiple files very quickly with just one click, you might want to have a look at the Virtual Speaker solution. Virtual Speaker is a desktop application for converting text files into speech, suited for companies with larger or recurring projects.
Do not hesitate to contact us if you want more information about Virtual Speaker.

I like your text-to-speech and I would like to integrate it in my application, is it possible to access your text-to-speech via an API?

acapela-box is designed to provide sound files only, however Acapela Group has many other products allowing integration of text-to-speech in an application via an API. Please have a look at www.acapela-group.com for more information on the complete product range of SDKs and developer tools from Acapela Group.

For cloud service or web API, please visit www.acapela-vaas.com

Can I listen to my files before purchasing them?

Yes, you can listen to your files as many times as you want before generating the sound file and downloading it.

Can I modify the pronunciation of a word?

Yes, you can modify the pronunciation of a word in two ways:

you spell the word differently in your text until you are satisfied with the pronunciation
you use the pronunciation editor to store your pronunciation, thereafter all of the texts that you submit will take the modified pronunciation of this word into account.

In the pronunciation editor you can choose whether to store a different spelling of a word, to get the proper pronunciation, or to store a phonetic transcription for the word. More information on phonetic transcriptions here below.

Can I use phonetic transcriptions in the pronunciation editor?

Yes the Pronunciation Editor offers the possibility to use phonetic transcriptions.

Phonetic transcriptions requires that you learn using the phonetic alphabet for the specific language you are using.The phonetic alphabet is made of phonetic symbols, each corresponding to a particular sound of that language.

When you open the Pronunciation Editor you will see a button called “SHOW PHONETIC SYMBOLS”. Click on it and a table of phonetic symbols will appear, completed with an example for each sound. At the bottom of the table you will see some further instruction on how to use the phonetic symbols, as for instance information about how to specify lexical stress, glottal stops and so on.

To write the pronunciation of a word with phonetic symbols, you need to select the “Use phonetic symbols” option in the Pronunciation Editor, and then type the phonetic symbols in the “Pronunciation” field.

You can listen to the current pronunciation at any time by clicking on the play button (a triangle) next to the “Pronunciation” field.

Why is part of my pronunciation displayed in red?

When you use phonetic symbols, the application checks that what you are entring is correct from a syntactical point of view. If what you type is not correct (for instance you type an unknown symbol or you forgot the space between phonetic symbols) the program signals that by printing the wrong characters in red, and by disabling the “Add this word to the list” button.

Please use the “SHOW PHONETIC SYMBOLS” function of the Pronuciation Editor to check the list of valid phonetic symbols for the language being used.

Can I add a pause within the text?

Yes you can add a pause by adding a \pau\ text tag like in the following example:

“hello \pau=3000\ how are you?”.

This command will insert a pause of 3 seconds (3000 milliseconds) between “hello” and “how are you?”. You can choose any length of pause that you like, just change the number to suit your needs.

Can I switch voice in the middle of the text?

Yes you can switch voice by adding a \vce=speaker\ text tag like in the following example:

Good morning, ladies and gentlemen, \vce=speaker=Julie\ Bonjour mesdames et messieurs.

Just pick the name of the voice that you want to use and the text-to-speech will immediately switch to the new voice right after the tag.

For special voices, like the voice with emotions or variants of the voice, you need to type the name without any space, parenthesis or underscore. As for instance:

Will (LittleCreature): \vce=speaker=willlittlecreature\
Peter (Sad): \vce=speaker=petersad\
Antoine (UpClose): \vce=speaker=antoineupclose\

Can I change the settings in the middle of the text?

Yes you can change settings by using the \spd\ tag to change the speech rate setting, and \vct\ to change the voice shaping setting and \vol\ to change the volume (volume is linear from 0 to 65535). Take the following examples:

Hello, this is the normal voice, \spd=300\ oh my goodness, this is really fast, \spd=180\ now I am back to normal speed.

Hello again, \vct=70\ now the voice sounds very dark, \vct=100\ now I am back to normal.

I always speak with max volume \vol=10000\ but I can speak softer \vol=65535\ and back to max volume again.

Please note that 180 is the normal value for speech rate and 100 is the normal value for voice shaping.

Can I get an alternative rendering of the same text?

Sometime you may want to have a word in a text to be read differently, for instance to add or remove emphasis or to get rid of some acoustic issue (like wobbling). To do this you can ask the TTS about a different acoustic rendering of a word by typing \sel=alt1\ in front of that word. You can also ask for \sel=alt2\ (and so on, up to “alt9″) to get more variations.

Note that the difference may be quite subtle and in some case barely hearable. Asking a different rendering of a word will also affect the words near by and more generally give a different nuance to the whole sentence.

As an example, Select the voice Rod and type the following text:

Let’s convert text into speech.
Let’s convert \sel=alt1\ text into speech.
Let’s convert \sel=alt2\ text into speech.

Now click on LISTEN to hear the result, you will notice a subtle nuance in the way the word text (and nearby words) are rendered.

You can also tag several words in one sentence, for instance if you type the following sentences with the voice Rod:

Let’s convert text into speech.
Let’s convert \sel=alt1\ text into speech.
Let’s \sel=alt1\ convert \sel=alt1\ text into speech.

You will hear one more variation in the acoustic rendering of the sentence.

The effect of the \sel=alt1\ tag is difficult to predict, so you need to work empirically using a trial-and-error method. It is also voice specific, so it cannot be copied from one voice to the other, even if in the same language.

What is the difference between alternative rendering of a word and pronunciation editor?

Pronunciation Editor is used to provide a pronunciation that does not follow standard language rules (for instance for foreign words, geographical names, business names and other exceptions) or to expand abbreviations and acronyms. For instance to have “UN” pronounced as “United Nations”
The alternative rendering is used to get a different acoustic rendering of a word, typically to get a different nuance in the reading of a word, to add or remove prominence of the word in the sentence, or to get rid of acoustic phenomena like wobbling.

How can I specify the part of speech for a word?

In some case the same word might be pronounced differently depending on its function. For instance the word “read” might be a verb or a noun. The TTS does its best to guess the function of a word, but in some case it may fail. Particularly if the word is isolated and wothout enough context.
To specify the part of speech you can use the “prx” tag who has a peculiar syntax (please note the “%1” and “%” characters used as separators):

\prx=%1nature%word\

This tag allows us to fix the nature of a word in a sentence. This can be relevant to remove a
potential ambiguity between identical words pronounced differently.
Nature can be chosen among the following: NOUN, ADJ, VERB, ADV, PARTPASSE, PARTPRES, CHIF, INFINIT.

Example:
“The queen and Alice \Prx=%1VERB%read\ a book.”
Here the prx tag makes sure that “read” will not be pronounced as past participle form.

Here is another example showing how the word “suspect” is used in two different ways in the same sentence:
I \prx=%1VERB%suspect\ that you have a \prx=%1NOUN%suspect\.

Are tags counted as part of the text?

When counting the length of a text, to decide how many credits will it cost to convert it to sound file, tags as \vce\, \spd\, \vct\ and \pau\ are not counted in.

The only exception to this rule is the \prn\ tag that can be used to include phonetic pronunciation in a text and is thus counted in.

Do you store my text files and my purchased sound files?

Yes we do. The text files and audio files are stored for as long as your account remains open.

What is “Automatic File Name”?

The Automatic file name feature allows to automatically insert the first three words of the text in the filename of the download audio file. This is very helpful particularly when creating several audio files at once.

As an example, if my text is “Butterflies are a chiefly diurnal group of the order Lepidoptera (which also includes moths).”, when I save the file without the option “Automatic file name” activated the file name would be:

acapelabox_551667.mp3

If I activate “Automatic file name” the file name would instead be:

acapelabox_552181_Butterflies_are_a.mp3

Filenames can be edited after download.

Is it possible to generate automatically a list of sound files ?

If you have a set of sentences that you wish to convert into sound files with one sentence per sound file, Acapela Box offers you the new EXPORT LINE BY LINE feature. By selecting this option (in the box just aside the main edit box), your text will be synthesized line by line. For each line, you will get a sound file. All the sound files will be merged into a downloadable zip file. This allows you to speed up your production process when you have several prompts to generate. You can also add the first words of each sentences to the naming of the sound files by selecting the AUTOMATIC FILE NAME.

When selected, this option allows you to download a set of sound files with one sentence in each instead of a large sound file including all the sentences.

The number of lines is limited to 50.

Can I use a different voice to regenerate a sound file with an existing text file?

No, the regeneration feature uses the same voice that you originally selected. If you want to use a different voice you have to purchase a new sound file.

Is my account closed when I have used my allocated number of credits?

No, your account is valid as long as the service runs.

Why should I go for a medium or a large pack?

Depending of the volume of text you need to vocalize, you may go directly for a medium or large pack and benefit from attractive prices. Check out the number of characters/estimated audio time table to make your choice.

I would like to purchase but I cannot use PayPal or credit card, what can I do?

We can accept payment via bank transfer only for packages from 500€ upwards. Contact us via the “contact us” page if you want to know more about it.

For more information about payments and pricing, see this page or select ‘prices‘ in the top menu.

What audio formats are available?

acapela-box provides you the possibility to choose among four different types of audio file formats. Here you can see the details information for each file format:

MP3 48 kbps: MP3, mono, 16 bits, 48 kbps
WAV 22kHz: WAV – PCM 22050Hz, mono, 16 bits
WAV 8kHz: WAV – PCM 8000Hz, mono, 16 bits
WAV A-law: WAV – ALAW 8000Hz, mono, 8 bits
WAV mu-law: WAV – ULAW 8000Hz, mono, 8 bits

If you want to test whether the produced files are compatible with your working environment, you can click here to download a ZIP file containing samples of the acapela-box file formats.

Do I need to provide VAT registration number?

Users outside the European Union do not need to specify a VAT number.

Users within the EU need to specify a VAT number to apply for VAT exemption, according to EU regulations.

What is the format for VAT number?

On this page you can see the right format for VAT registration number in your Country:

http://ec.europa.eu/taxation_customs/vies/faq.html#item_11

Why is it necessary to accept the terms of use to listen to the speech?

We had to introduce this restriction to make sure that users are aware of the Terms of Use and use our voices in a responsible manner. When creating an account users agree to our Terms of Use and so they do not need to agree on terms of use for each listen.