Here you’ll find answer to some common questions about acapela-box.
Acapela-Box is an online service that allows you to convert and download your text messages into sound files using our high quality text to speech.
1. Fill your account
2. Listen to your text
3. Tweak it by writing it differently and use the advanced settings
4. Download your sound file and off you go !
To get started we recommend you to fill your account, this will allow you to download and use the sound files you will create. Simply click to the “buy” page after logging in and follow the instructions. Once this is done, the way to get your files is very simple. Go to the “box” page and type/paste your text in the blue window.
Listen: hear your text
Pause: this buttons stops the reading immediately
Continue: resume where you had paused
Stop: stop to start something else
Download: this button will a) download your file b) debit your acapela box account of the number of credits.
Advanced Settings:
My account in this page you can edit your details, see your purchase history and see your transaction history.
Re-download In your transaction history you can re-download your original sound file again 5 times by clicking on this ““. After five times you won’t be able to get the original file back.
Regeneration You always will be able to regenerate your sound file by clicking on this “” in your transaction history: the sound file regeneration functions takes your original text files + the settings that you’ve used and creates a new sound file. Be aware : If you regenerate a sound file after a while there might be differences compared to the original sound file and here is the reason why : we are always improving our text to speech algorithm and doing major updates once or twice a year. This can bring some changes and in most of the cases it’s an enhancement, but sometimes it is just different. Consequently we strongly recommend you to back up your sound files and not to rely on the regeneration feature for back ups.
The sound files that you download are free of rights.
You can use them for all personal or commercial projects except for broadcasting the sound files downloaded for messages used in advertisements, movies, videos, or any other media support that will generate revenues.
For any avoidance of doubt ; broadcasting is defined as the distribution of audio and/or video content or other messages to a dispersed audience via any electronic mass communications medium.
For purchasing, log-in and then go to the “buy” page, choose the package that fits your needs and then follow the purchasing process. Once you have credits on your account you can start generating and downloading sound files.
Yes, a receipt will be available for download as PDF under myAccount after each purchase. The receipt is valid for all bookkeeping and tax purposes.
For information about pricing, see this page.
Yes you can generate as many messages as you want within the number of credits that you have purchased.
For information about pricing and credits, see this page.
On average one second of speech generated sound file equates 15 characters. So, 20 seconds is about 300 characters. The b-5 option is the one to choose.
For information about pricing, see this page.
Yes, the text cannot exceed 10 000 characters. If you have a longer text, you will need to split it in shorter texts.
For normal voices the amount of credits and amount of characters are approximately the same. The only differences being:
For downloads including some of the premium voices the principle is the same, but the number of credits needed for creating a sound file with a premium voice is higher.
When you type a text you will see few numbers under the box, including one stating the amount of characters in the text, and one stating the amount of credits needed to convert the text into speech.
Some of the voices in acapela-box are considered premium voices and therefore require a larger amount of credits to convert the same amount of text.
Premium voices are marked in the voice list with “Premium”.
Among premium voices we currently have all children voices that have a double price compared to normal voices (2 credits per character), and voices like Sharon (US English) that are using a new generation of text-to-speech technology.
Always check the cost in credits of a download by looking at the figures below the text box.
The cost in credit may be higher then the number of characters if you are using the Pronunciation Editor to expand short words into longer ones (as in the case with acronyms), or if you are using Premium voices.
Genuine children voices are children voices created from recordings made by children, as opposed to children voices created by manipulating recordings made by adults or impersonators. This is at the moment of writing a unique feature of Acapela Group.
acapela-box can be used to generate large amount of files or large texts, but it requires manual work for each file to be converted. If you are looking for a solution where you can generate multiple files very quickly with just one click, you might want to have a look at the Virtual Speaker solution. Virtual Speaker is a desktop application for converting text files into speech, suited for companies with larger or recurring projects.
Do not hesitate to contact us if you want more information about Virtual Speaker.
acapela-box is designed to provide sound files only, however Acapela Group has many other products allowing integration of text-to-speech in an application via an API. Please have a look at www.acapela-group.com for more information on the complete product range of SDKs and developer tools from Acapela Group.
For cloud service or web API, please visit www.acapela-vaas.com
Yes, you can listen to your files as many times as you want before generating the sound file and downloading it.
Yes, you can modify the pronunciation of a word in two ways:
In the pronunciation editor you can choose whether to store a different spelling of a word, to get the proper pronunciation, or to store a phonetic transcription for the word. More information on phonetic transcriptions here below.
Yes the Pronunciation Editor offers the possibility to use phonetic transcriptions.
Phonetic transcriptions requires that you learn using the phonetic alphabet for the specific language you are using.The phonetic alphabet is made of phonetic symbols, each corresponding to a particular sound of that language.
When you open the Pronunciation Editor you will see a button called “SHOW PHONETIC SYMBOLS”. Click on it and a table of phonetic symbols will appear, completed with an example for each sound. At the bottom of the table you will see some further instruction on how to use the phonetic symbols, as for instance information about how to specify lexical stress, glottal stops and so on.
To write the pronunciation of a word with phonetic symbols, you need to select the “Use phonetic symbols” option in the Pronunciation Editor, and then type the phonetic symbols in the “Pronunciation” field.
You can listen to the current pronunciation at any time by clicking on the play button (a triangle) next to the “Pronunciation” field.
When you use phonetic symbols, the application checks that what you are entring is correct from a syntactical point of view. If what you type is not correct (for instance you type an unknown symbol or you forgot the space between phonetic symbols) the program signals that by printing the wrong characters in red, and by disabling the “Add this word to the list” button.
Please use the “SHOW PHONETIC SYMBOLS” function of the Pronuciation Editor to check the list of valid phonetic symbols for the language being used.
Yes you can add a pause by adding a \pau\ text tag like in the following example:
“hello \pau=3000\ how are you?”.
This command will insert a pause of 3 seconds (3000 milliseconds) between “hello” and “how are you?”. You can choose any length of pause that you like, just change the number to suit your needs.
Yes you can switch voice by adding a \vce=speaker\ text tag like in the following example:
Good morning, ladies and gentlemen, \vce=speaker=Julie\ Bonjour mesdames et messieurs.
Just pick the name of the voice that you want to use and the text-to-speech will immediately switch to the new voice right after the tag.
For special voices, like the voice with emotions or variants of the voice, you need to type the name without any space, parenthesis or underscore. As for instance:
Yes you can change settings by using the \spd\ tag to change the speech rate setting, and \vct\ to change the voice shaping setting and \vol\ to change the volume (volume is linear from 0 to 65535). Take the following examples:
Hello, this is the normal voice, \spd=300\ oh my goodness, this is really fast, \spd=180\ now I am back to normal speed.
Hello again, \vct=70\ now the voice sounds very dark, \vct=100\ now I am back to normal.
I always speak with max volume \vol=10000\ but I can speak softer \vol=65535\ and back to max volume again.
Please note that 180 is the normal value for speech rate and 100 is the normal value for voice shaping.
Sometime you may want to have a word in a text to be read differently, for instance to add or remove emphasis or to get rid of some acoustic issue (like wobbling). To do this you can ask the TTS about a different acoustic rendering of a word by typing \sel=alt1\ in front of that word. You can also ask for \sel=alt2\ (and so on, up to “alt9″) to get more variations.
Note that the difference may be quite subtle and in some case barely hearable. Asking a different rendering of a word will also affect the words near by and more generally give a different nuance to the whole sentence.
As an example, Select the voice Rod and type the following text:
Let’s convert text into speech.
Let’s convert \sel=alt1\ text into speech.
Let’s convert \sel=alt2\ text into speech.
Now click on LISTEN to hear the result, you will notice a subtle nuance in the way the word text (and nearby words) are rendered.
You can also tag several words in one sentence, for instance if you type the following sentences with the voice Rod:
Let’s convert text into speech.
Let’s convert \sel=alt1\ text into speech.
Let’s \sel=alt1\ convert \sel=alt1\ text into speech.
You will hear one more variation in the acoustic rendering of the sentence.
The effect of the \sel=alt1\ tag is difficult to predict, so you need to work empirically using a trial-and-error method. It is also voice specific, so it cannot be copied from one voice to the other, even if in the same language.
Pronunciation Editor is used to provide a pronunciation that does not follow standard language rules (for instance for foreign words, geographical names, business names and other exceptions) or to expand abbreviations and acronyms. For instance to have “UN” pronounced as “United Nations”
The alternative rendering is used to get a different acoustic rendering of a word, typically to get a different nuance in the reading of a word, to add or remove prominence of the word in the sentence, or to get rid of acoustic phenomena like wobbling.
In some case the same word might be pronounced differently depending on its function. For instance the word “read” might be a verb or a noun. The TTS does its best to guess the function of a word, but in some case it may fail. Particularly if the word is isolated and wothout enough context.
To specify the part of speech you can use the “prx” tag who has a peculiar syntax (please note the “%1” and “%” characters used as separators):
\prx=%1nature%word\
This tag allows us to fix the nature of a word in a sentence. This can be relevant to remove a
potential ambiguity between identical words pronounced differently.
Nature can be chosen among the following: NOUN, ADJ, VERB, ADV, PARTPASSE, PARTPRES, CHIF, INFINIT.
Example:
“The queen and Alice \Prx=%1VERB%read\ a book.”
Here the prx tag makes sure that “read” will not be pronounced as past participle form.
Here is another example showing how the word “suspect” is used in two different ways in the same sentence:
I \prx=%1VERB%suspect\ that you have a \prx=%1NOUN%suspect\.
When counting the length of a text, to decide how many credits will it cost to convert it to sound file, tags as \vce\, \spd\, \vct\ and \pau\ are not counted in.
The only exception to this rule is the \prn\ tag that can be used to include phonetic pronunciation in a text and is thus counted in.
Yes we do. The text files and audio files are stored for as long as your account remains open.
The Automatic file name feature allows to automatically insert the first three words of the text in the filename of the download audio file. This is very helpful particularly when creating several audio files at once.
As an example, if my text is “Butterflies are a chiefly diurnal group of the order Lepidoptera (which also includes moths).”, when I save the file without the option “Automatic file name” activated the file name would be:
If I activate “Automatic file name” the file name would instead be:
Filenames can be edited after download.
If you have a set of sentences that you wish to convert into sound files with one sentence per sound file, Acapela Box offers you the new EXPORT LINE BY LINE feature. By selecting this option (in the box just aside the main edit box), your text will be synthesized line by line. For each line, you will get a sound file. All the sound files will be merged into a downloadable zip file. This allows you to speed up your production process when you have several prompts to generate. You can also add the first words of each sentences to the naming of the sound files by selecting the AUTOMATIC FILE NAME.
When selected, this option allows you to download a set of sound files with one sentence in each instead of a large sound file including all the sentences.
The number of lines is limited to 50.
No, the regeneration feature uses the same voice that you originally selected. If you want to use a different voice you have to purchase a new sound file.
No, your account is valid as long as the service runs.
Depending of the volume of text you need to vocalize, you may go directly for a medium or large pack and benefit from attractive prices. Check out the number of characters/estimated audio time table to make your choice.
We can accept payment via bank transfer only for packages from 500€ upwards. Contact us via the “contact us” page if you want to know more about it.
For more information about payments and pricing, see this page or select ‘prices‘ in the top menu.
acapela-box provides you the possibility to choose among four different types of audio file formats. Here you can see the details information for each file format:
If you want to test whether the produced files are compatible with your working environment, you can click here to download a ZIP file containing samples of the acapela-box file formats.
Users outside the European Union do not need to specify a VAT number.
Users within the EU need to specify a VAT number to apply for VAT exemption, according to EU regulations.
On this page you can see the right format for VAT registration number in your Country:
http://ec.europa.eu/taxation_customs/vies/faq.html#item_11
We had to introduce this restriction to make sure that users are aware of the Terms of Use and use our voices in a responsible manner. When creating an account users agree to our Terms of Use and so they do not need to agree on terms of use for each listen.