Monday, April 11, 2011

Text-to-Speech Functionality in Captivate

By Dean Hawkinson

There are a lot of arguments about using audio in eLearning – some in favor, some not.

Audio narration can be very time consuming, and in many cases requires hiring talent for a professional sound. Many eLearning development tools allow you to easily record narration with your content, including Adobe Presenter and Adobe Captivate.

However, what if you don’t have the budget to hire the talent to make it sound professional? What if you, as the Designer, do not feel that you have the voice for the recordings? What if you simply do not have the time to devote to recording audio in the first place?

Adobe Captivate includes a text-to-speech function that allows seamless narration without having to do any recording.

About the Tool

Captivate’s text-to-speech tool is very simple to use and allows you to type the text that you want narrated on each slide. You can select a female voice (Kate) or a male voice (Paul) for each individual slide, which allows you the freedom to mix it up a bit and use both narrators in one course. It also adds the feeling of having a trainer guide you through the process.

Captivate takes what you type and converts it to a voice-over for each slide. You can then use the timeline to position all of your effects to match the narration of the slide. If you play the slide and find that you made an error, fixing it is a simple matter of just re-typing the text.

Text-to-Speech Challenges

As with any software application, there are a few challenges to using a text-to-speech tool:

  • The voices can sound a bit robotic – Although not as bad as a monotone computer, you can tell that there is a robotic edge to the narration. However, in my experience, it is not as bad as you might think (anyone ever have a “Speak and Spell” toy as a kid?). The voices actually do sound professional, but you can always tell that it is not a real human doing the narration.
  • There are some issues with pronunciation and voice inflections – I have noticed that the narration has some interesting pronunciations of certain words. For example, the word “detail” comes across as “dtail” (doesn’t pronounce the “e”) and “status” is pronounced “stay-tus.”
  • It Shouldn’t be used for long narratives – The primary use for Text-to-Speech should be to walk learners through a system simulation by guiding them through the clicks. This calls for short narrations. However, if you need to explain something in detail, it might be better to use text on the screen and have the narration refer them to that text. The robotic sound is probably not best for longer narratives.

Ways to “Trick” the System

When you type out the narration text, no one will ever see it. So, you might end up purposefully mis-spelling some words to get the system to pronounce them the correct way.

For example, I really don’t like the way the system pronounces the word status. So, to get around it, I typed “staatus” and the narration pronounced it with the short A sound. The same thing can be done with the word detail. I typed “deetail” and it pronounced it the way I wanted it to. You may find some other ways to “trick” the narration to correctly pronounce certain words.

Is Text-to-Speech for Everyone?

As always, before selecting media and technology for eLearning, you need to consider your audience. You need to match the technology to your instructional goals, not the other way around. Ask yourself:

  • Is audio necessary?
  • What are your budget and time constraints?
  • Will your audience look past the occasional robotic pronunciation?

It might be better to use professional narration if you have the resources. However, text-to-speech is a great alternative when you have a short time frame for your project and do not have the budget to hire voice talent.

What is your experience with using this type of technology? Do you have any additional suggestions for using text-to-speech in your eLearning courses?


  1. I personally do not like listening to electronic voices; they are distracting to me and I get less out of the presentation/training. This prejudices me when I produce a module that requires voice narration, so I do end up doing the narration myself. It is cheaper and maybe easier than hiring a pro, and some people say I have a soothing voice...

  2. Thanks for the insight. I use Captivate, but have never used this feature because I didn't think I would like it. Maybe I will try this out. I don't think I have much of an accent, but when a stranger says "What part of the South are you from?" I realize: Yep, I do.

    When I was at Coca-Cola Enterprises, we used co-workers' voices to record a story-line. The CFO introduced the course and the course owner did the narration. The co-workers' voices probably sounded a little hokey, but the topic was very boring and the dialogue did increase interest. I got positive responses from many participants. The course owner who was a narrator had a great voice, so that was not a problem. However, it was a major problem when he left the company and our visuals identified him as the speaker. Oh well, live and learn... We changed the visuals, removed any reference in the audio to who he was, and left the narration in tact.

  3. I use text-to-speech tools to get timings for courses and videos. Generate the audio files, use them in the alpha and beta versions, then replace them with professional narration at the end. When the clients hear the text read, they are much better at making good edits to make the audio and text cleaner.

  4. Software based transcription cannot be accurate for non-American accents. Use manual transcription for accuracy


Thank you for your comments.