ePub audio capabilities

8 posts / 0 new
Last post


I am studying the ePub format for accessibility applications. I am familiar with DAISY. DAISY allows the audio descriptions of texts to be bundled together in a digital book distribution, either from human recordings or text-to-speech.

Such audio is synchronized with the text, so it allows text highlighting during playback.

Is such feature available in ePub? I have read that there is, but I can not find much info. Also, is there any ePub publishing tool that allows such audio production or integration with the book?

Thank you,


The Media Overlays feature in EPUB 3 supports this (http://idpf.org/epub/30/spec/epub30-mediaoverlays.html ). Reading Systems are in a transition from EPUB 2 to EPUB 3, so at the moment support is only in Apple iBooks. But, wider support should be available soon.

Tool support for Media Overlays is not available yet, but DAISY has an audio production tool (tobi) that is anticipated to be upgraded to support authoring of EPUB 3 content with Media Overlays, and other commercial tools should become available as this feature is not just for accessibility.

About DAISY tools, note also that the DAISY Pipeline - http://www.daisy.org/pipeline2 - already allows conversion of full-text full-audio DAISY 2.02 books to EPUB 3 with Media Overlays.
DAISY 3 to EPUB 3 as well as DAISY XML to EPUB 3 with Media Overlays (using TTS) is in the works.

I would only add to Bill's post that the Azardi reader has alpha support for Overlays in their online reader (but they note a problem with the Moby DIck sample using a single audio file for multilpe chapters they need to fix, and I couldn't get it to play in the desktop version).

And to expand in a different direction, EPUB 3 is the successor format to DAISY 3. There's nothing in DAISY 3 that can't also be done in EPUB 3 that I'm aware of (and often in more feature-rich ways).

Again, it is early in terms of reading system support, but EPUB 3 also includes enhanced TTS functionality, such as the ability to include PLS lexicons, embed SSML markup and take advantage of the CSS3 Speech module, so hopefully there will be less need to pre-record synthetic speech moving forward (which I know is common in the DAISY community, especially for back matter like indexes).

Even if a reading system only supports basic text-to-speech playback, it will typically provide word-level text-audio synchronization and highlighting. Obviously that's not quite what you're asking for, but it is one possibility to provide quality synthetic playback without having to pre-record it that will hopefully develop in support.

The ASTRI-Bee player doesn't provide enahnced TTS support, but it does include TTS playback of EPUB 3 publications with automatic word-level highlighting if you want to get a feel for what's possible. It's a free download from the Android store, and there was also an announcement in the public EPUB working group list with links to download.

Thank you all, these were very helpful.

Please allow me 3 more questions:

1. Is there a way to convert ePUB 3 to DAISY 3?
2. Since ePUB 3 is the successor of DAISY 3, is it possible to open a DAISY 3 document from an ePUB reader?

Thank you,

I don't know that anyone has implemented or considered a down-conversion back to DAISY 3/2.02 yet. Romain could answer whether that's on his radar or not for the pipeline tool. If the EPUB 3 content publishers produce meets accessibility guidelines, there could be legal reasons why you shouldn't be doing this. We're typically working under exceptions that allow us to make content that isn't accessible accessible, and the lack of a reading system may not be justification (I'm no lawyer, though, and the laws and exemptions vary from region to region).

Also, if we're encouraging publishers to adopt EPUB 3 to get accessible content at the same time as everyone else gets access, and to increase their sales by moving away from the library model, from a simple fair-play perspective we shouldn't be repurposing the content to make one accessible electronic format out of another.

But I don't really know exactly what the new production and client models will ultimately look like, and this discussion could take on a life of its own. You may be able to create/distribute EPUB 3s for print books only so long as the publisher has not provided their own moving forward (this is often the case now for audio books). Clients may have to go through the regular public library system or accessible libraries may have to pay equivalent costs for loaning. Or maybe nothing changes. I'm waiting to see how these issues evolve as DAISY members begin to tackle them. My thoughts and musings here are just that.

For your second question, a dedicated EPUB 3 reading system is not going to open a DAISY 3 book, at least not without some fiddling and ugly results. EPUB 3 is a successor to two formats, and the more logical forward/backward compatibility path is with EPUB 2, and even that's not going to be perfect.

Although DAISY 3 includes a package file and ncx, for example, the dtbook format is no longer supported in EPUB 3. DTBs also don't use the OCF container format, so missing will be the mimetype file and META-INF directory + container.xml needed for discovery of the package file and publication. If you added these bits, you might be able to convince an EPUB 3 reading system to attempt to open the container, but the results would not be what you want. It's only a guess what would happen if dtbook content is rendered as EPUB 3, but an educated guess would be semi-styled text display (where the grammar matches HTML). The SMIL synchronization has also been replaced with media overlays, so any embedded audio wouldn't run. There'd be no table of contents, either, since NCX is optional and only for compatibility with EPUB 2 systems.

In other words, you're better off providing both formats if you expect an overlap period where readers will be requesting one and/or the other.

Hello Apostolos,
we are actively working on the upcoming release of Tobi (free, open-source), which will support importing EPUB 3 publications (text-only, or with existing synchronised text-audio narration) and export to the EPUB 3 Media Overlay format. Tobi will continue to support the DAISY talking books format, but as Romain said, the Pipeline software is the tool of choice to convert from one format to another in an automated manner. Tobi is an interactive authoring tool geared towards live-recording of human voice, and/or synchronisation between pre-recorded audio and text markup (post-production stage).

More information here:


Regards, Daniel

Secondary menu