Media Overlays: how to highlight multiple <text> for the same <audio> clip

10 posts / 0 new

Submitted by alberto.pettarin on March 25, 2013 - 9:00am

Hi all,

I am trying to create a SMIL file for an EPUB 3 ebook, and I want to do the following: highlight *two* text fragments when a certain clip is played.

Example: suppose the XHTML page has (using [ ] instead of < and >):
[body]
[p id="p1"]...[/p]
[p id="p2"]...[/p]
[p id="p3"]...[/p]
[p id="p4"]...[/p]
[/body]

and I want to highlight p1 and p4 between 0:00:00 and 0:00:05 of the associated audio file, while I want to highlight p2 between 0:00:05 and 00:00:10, and p3 between 0:00:10 and 00:00:20.

If I understand the specs (http://www.idpf.org/epub/30/spec/epub30-mediaoverlays.html) correctly, this seems impossible to achieve, because <par> must have exactly one <text> child (and at most one <audio>). Am I missing something?

Thanks in advance,

Alberto

Submitted by matt.garrish on March 25, 2013 - 9:45am

No, that sort of mixed highlighting can't be done. Media overlays only facilitate synchronization of one text fragment with its corresponding audio (whether a clip referenced in the audio tag or TTS playback when absent), which allows for a linear progression through a work.

The kind of model you're suggesting has an implicit assumption that both (or more) text fragments are visible in the viewport at the same time, which couldn't be relied on even in a fixed layout presentation. Media overlays are only a general synchronization mechanism, so it's not like the reading system will know when the audio is transitioning to the next text reference to try and facilitate a model like this for readers who can't keep up with the moving targets. That's why you have to include par elements to enable the transitioning.

You can highlight multiple consecutive paragraphs by wrapping them in a div and synchronizing with it, but not random text locations.

Not the answer you wanted, I expect, but that's the way things are right now.

Submitted by alberto.pettarin on March 25, 2013 - 10:08am

Hi Matt,

thank you for confirming my doubts.

I understand the possible problems with having multiple <text> elements as children of a <par>, including the fact that they might overflow the actual viewport, but this cannot be prevented even with just one <text> child of <par> anyway.

Unfortunately, for typographical and ergonomics reasons, making "p1" and "p4" adjacent siblings (and then wrapping them in a [div]) is not desirable, so I guess I will rethink the structure of this (was-so-awesome) project. :)

Submitted by matt.garrish on March 25, 2013 - 10:37am

> but this cannot be prevented even with just one <text> child of <par> anyway

Right, no disagreement here. This is a problem with media overlays generally in a paginated view, but at least the person reading along is aware when the text transitions out of view. With random jumps, the accessibility factor that overlays provides goes down.

Why can't you just separately synchronize the first and fourth paragraphs as the text they contain is narrated? That's the simplest solution if you do indeed need non-linear reading. It's not two-at-once, but ultimately achieves the same purpose, no?

Submitted by alberto.pettarin on March 25, 2013 - 2:15pm

Not sure about your last suggestion.

What I was trying to achieve is the following: synchronizing a given audio track (say, in English) with both the text narrated (hence, in English) and a translation (say, in Italian).
From a typographical point of view, I wanted to have all the English text before the Italian one, fact that would make the <div> solution impossible.

However, as also Daniel Weck suggested on Twitter, I am playing with <div>'s and <span>'s and CSS to see if I can achieve a sufficiently satisfying aspect of the text with a structure like this:
<body>
<p id="f001">
<span class="orig" lang="en">bla bla bla</span>
<span class="tran" lang="it">ble ble ble</span>
</p>
<p id="f002">
<span class="orig" lang="en">bla bla bla</span>
<span class="tran" lang="it">ble ble ble</span>
</p>
...
</body>

Thanks for the feedback.

Submitted by matt.garrish on March 25, 2013 - 2:47pm

Ah, okay, that makes more sense now. I was struggling to understand why there were paragraphs between the content you were highlighting, and running on the assumption that the audio track contained voicing for both paragraphs (e.g., perhaps to prevent flickering movement because of the short duration).

Best of luck with it.

Submitted by alberto.pettarin on April 1, 2013 - 11:50am

Yes, my bad, I should have stated the context of my question. I think I found a decent tradeoff by using <div>'s in display: table mode. Not achieving the wanted "first EN text, then IT text" style, but it makes MO work in iBooks, Readium, Kobo.

Submitted by KangMaiKe on April 13, 2013 - 11:20pm

I don't understand the solution explained above.
I work for a foreign-language learning publisher, and we want to convert all our books to ebooks with embedded sound files instead of paper+MP3 disk. Every example sentence in our books is in two columns such as:
Chinese - English (4 sound files for American/British)
2 American files for phonemic (dictionary pronunciation) and phonetic (fluent speech) rendering, and 2 for British (same)
or
English - Chinese (4 sound files for Beijing / Taiwan both for phonemic and phonetic)
or Chinese - Japanese (etc)
One page may contain as many 20 sample sentences (short files of 3-5 seconds), depending on the notes of presenting sentence types.
And we are proposing to mgmt to add sound files for the source language if it doesn't significantly increase production time thereby increasing many sound files to one page. Should we combine all files together and apply to one block of text? I don't think our users want to listen to both American and British examples at the same time, but we want to provide flexibility of use for our users: they can choose whatever they want to learn.
Our Italian friend above, Alberto Pettarin, it seems is trying to accomplish the similar layout. Please, is this solution feasible, or are the sound files limited to running paragraph text only? Why are solutions limited to only one kind of book publication, and when can we expect a solution?

Submitted by matt.garrish on April 14, 2013 - 8:20am

There is no limitation on what you can syncrhonize to. You could synchronize to every word, groups of words, sentence, paragraphs, list items, table cells, or whatever else makes sense so long as they're tagged and can be referenced (at least until epub cfi support materializes).

The restriction Alberto ran into was wanting to highlight two separate content elements at the same time with a single audio clip (i.e., highlighting both but only reading one).

Submitted by alberto.pettarin on April 17, 2013 - 5:37am

#10

@ KangMaiKe : Matt gave you a concise and to-the-point answer. I just want to remark that the limitation I ran into was due to the specific typographical structure I (my client) wanted. If you need further information about the projects my startup is working on, feel free to write me an email.

Media Overlays: how to highlight multiple <text> for the same <audio> clip

Search form

Secondary menu