Working Group Draft 15 February 2011
Copyright © 2010, 2011 International Digital Publishing Forum™
All rights reserved. This work is protected under Title 17 of the United States Code. Reproduction and dissemination of this work with changes is prohibited except with the written permission of the International Digital Publishing Forum (IDPF).
EPUB is a registered trademark of the International Digital Publishing Forum.
Table of Contents
This specification, EPUB Media Overlays 3.0, defines a usage of [SMIL] (Synchronized Multimedia Integration Language), the Package Document, the EPUB® Style Sheet, and the EPUB Content Document for representation of synchronized text and audio publications.
Although designed to be usable in a variety of contexts, this specification does declare certain normative requirements specific to EPUB Publications. These requirements are marked as EPUB-specific where they are defined and may be disregarded by applications that only conform to the content and EPUB Reading System requirements of this specification. Illustrative examples, however, make no such distinction of use and may not be applicable in contexts not defined herein.
This specification is one of a family of related specifications that compose EPUB 3, the third major revision of an interchange and delivery format for digital publications based on XML and Web Standards. It is meant to be read and understood in concert with the other specifications that make up EPUB 3:
The EPUB 3 Overview [EPUB3Overview], which should be read first, provides an informative overview of EPUB and a roadmap to the rest of the EPUB 3 documents.
EPUB Publications 3.0 [Publications30], which defines publication-level semantics and overarching conformance requirements for EPUB Publications.
EPUB Content Documents 3.0 [ContentDocs30], which defines profiles of XHTML, SVG and CSS for use in the context of EPUB Publications.
EPUB Open Container Format (OCF) 3.0 [OCF3], which defines a file format and processing model for encapsulating a set of related resources into a single-file (ZIP) EPUB Container.
This specification relies on a subset of [SMIL], from which the EPUB Media Overlays elements and attributes defined in Summary of Media Overlay Elements and Attributes are derived.
A logical document entity consisting of a set of interrelated resources and packaged in a EPUB Container, as defined by this specification and its sibling specifications .
A resource that contains content or instructions that contribute, directly or indirectly, to the logic and rendering of the EPUB Publication (e.g., the Package Document, EPUB Content Documents, EPUB Style Sheets, audio, video, images, embedded fonts, scripts). In the absence of this resource, the Publication cannot be rendered as intended by the Author.
Publication resources are listed in the manifest [Publications30] .
A Publication Resource that conforms to one of the EPUB Content Document definitions (XHTML or SVG).
An EPUB Content Document is a Core Media Type, and may therefore be included in the EPUB Publication without the provision of fallbacks [Publications30] .
An EPUB Content Document conforming to the profile of [HTML5] defined in XHTML Content Documents [ContentDocs30] .
XHTML Content Documents use the XHTML syntax of [HTML5].
An EPUB Content Document conforming to the constraints expressed in SVG Content Documents [ContentDocs30] .
A specialization of the XHTML Content Document, containing human- and machine readable global navigation information, conforming to the constraints expressed in EPUB Navigation Documents [ContentDocs30] .
A set of Publication Resource types for which no fallback is required. Refer to Core Media Types [Publications30] for more information.
A Publication Resource carrying bibliographical and structural metadata about the EPUB Publication, as defined in Package Documents [Publications30] .
A list of all Publication Resources that constitute the EPUB Publication.
Refer to manifest [Publications30] for more information.
An ordered list of Publication Resources, typically EPUB Content Documents, representing the default reading order of the publication.
Refer to spine [Publications30] for more information.
An XML document that associates the text content of an XHTML Content Document with pre-recorded audio narration in order to provide a synchronized playback experience, as defined in this specification.
A CSS Style Sheet conforming to the CSS profile defined in EPUB Style Sheets [ContentDocs30] .
The region of an EPUB Reading System in which the content of an EPUB Publication is rendered visually to a User.
A Viewport capable of displaying CSS-styled content.
A ZIP-based packaging and distribution format for an EPUB Publication, as defined in [OCF3].
The person(s) or organization responsible for the creation of an EPUB Publication, which may or may not be the creator of the content and resources it contains.
An individual that consumes an EPUB Publication using an EPUB Reading System.
A system that processes EPUB Publications for presentation to Users in a manner conformant with this specification and its sibling specifications .
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
All sections of this specification are normative except where identified by the informative status label "This section is informative". The application of informative status to sections and appendices applies to all child content and subsections they may contain.
All examples in this specification are informative.
This section is informative
Books featuring audio narration synchronized with the text are found in mainstream e-books, educational tools, and e-books formatted for persons with print disabilities. In EPUB 3, these types of books are created by using Media Overlay Documents to describe the timing for the pre-recorded audio narration and how it relates to the text markup. The file format for Media Overlays is defined as a subset of SMIL, a W3C standard for representing synchronized multimedia information in XML.
The Media Overlays feature is designed to be transparent to EPUB Reading Systems that do not support the feature, so the inclusion of Media Overlay Documents has no impact on the interoperability of EPUB Publications.
Although future versions of this specification may incorporate support for video media (e.g., synchronized text/sign-language books), this version supports only text and audio media.
A pre-recorded narration of a publication can be represented as a series of audio clips, each corresponding to part of the text. A single audio clip, for example, typically represents a single phrase or paragraph, but infers no order relative to the other clips or to the text of a document. Media Overlays solve this problem of synchronization by tying the structured audio narration to its corresponding text in the EPUB Content Document using SMIL markup. Media Overlays are, in fact, a simplified subset of SMIL 3.0 that allow the playback sequence of these clips to be defined.
The SMIL elements primarily used for structuring Media Overlays are
body
(used for the main sequence), seq
(sequence) and par
(parallel). (Refer to Summary of Media Overlay Elements and Attributes for more information on these and other SMIL
elements.)
The par
element is the basic building block of an overlay and
corresponds to a phrase in the EPUB Content Document. The element provides two key
pieces of information for synchronizing content: 1) the audio clip containing the
narration for the phrase; and 2) a pointer to the associated text. The
par
element uses two media element children to represent this
information: a text
element and an audio
element. Since par
elements render their children in parallel,
the audio and text media are played back at the same time, resulting in a
synchronized presentation.
The text
element src
attribute references
the associated phrase, sentence, or other segment of the EPUB Content Document by
its IRI. The audio
element src
attribute
similarly references the location of the corresponding audio clip, and adds the
optional clipBegin
and clipEnd
attributes to
indicate a specific offset within the clip.
The following example shows the markup for a single phrase or sentence.
<par> <text src="chapter1.xhtml#sentence1"/> <audio src="chapter1_audio.mp3" clipBegin="23s" clipEnd="45s"/> </par>
par
elements are placed together sequentially to form a series
of phrases or sentences. The ordering of the elements must match the default reading
order of the EPUB Content Document.
The following example shows a basic Media Overlay Document containing a
sequence of phrases. The body
element acts as a main sequence
for the whole document.
<smil xmlns="http://www.w3.org/ns/SMIL" version="3.0" profile="http://www.idpf.org/epub/30/profile/content/"> <body> <par id="par1"> <text src="chapter1.xhtml#sentence1"/> <audio src="chapter1_audio.mp3" clipBegin="0s" clipEnd="10s"/> </par> <par id="par2"> <text src="chapter1.xhtml#sentence2"/> <audio src="chapter1_audio.mp3" clipBegin="10s" clipEnd="20s"/> </par> <par id="par3"> <text src="chapter1.xhtml#sentence3"/> <audio src="chapter1_audio.mp3" clipBegin="20s" clipEnd="30s"/> </par> </body> </smil>
par
elements can also be added to seq
elements to define more complex structures such as parts and chapters (see Structure).
The seq
element (sequence) is used to represent nested text
containers such as sections, asides, headers, and footnotes in the Media Overlay
Document. Its children must be other seq
elements or
par
elements. Each seq
element must
contain an epub:textref
attribute which references the
corresponding EPUB Content Document element by IRI.
The following example shows a Media Overlay Document with nested
seq
elements, representing a chapter with both a
section header and a sidebar, which itself has a nested image group.
<smil xmlns="http://www.w3.org/ns/SMIL" xmlns:epub="http://www.idpf.org/2011/epub" version="3.0" profile="http://www.idpf.org/epub/30/profile/content/"> <body> <!-- a chapter --> <seq id="id1" epub:textref="chapter1.xhtml#sectionstart" epub:type="chapter"> <!-- the section title --> <par id="id2"> <text src="chapter1.xhtml#section1_title"/> <audio src="chapter1_audio.mp3" clipBegin="0:23:23.84" clipEnd="0:23:34.221"/> </par> <!-- some sentences in the chapter --> <par id="id3"> <text src="chapter1.xhtml#text1"/> <audio src="chapter1_audio.mp3" clipBegin="0:23:34.221" clipEnd="0:23:59.003"/> </par> <par id="id4"> <text src="chapter1.xhtml#text2"/> <audio src="chapter1_audio.mp3" clipBegin="0:23:59.003" clipEnd="0:24:15.000"/> </par> <!-- an informational sidebar --> <seq id="id5" epub:textref="chapter1.xhtml#sidebar" epub:type="sidebar"> <par id="id6"> <text src="chapter1.xhtml#sidebartitle"/> <audio src="chapter1_audio.mp3" clipBegin="0:24:15.000" clipEnd="0:24:18.123"/> </par> <!-- an image group within the sidebar --> <seq id="id7" epub:textref="chapter1.xhtml#imagegroup"> <par id="id8"> <text src="chapter1.xhtml#photo"/> <audio src="chapter1_audio.mp3" clipBegin="0:24:18.123" clipEnd="0:24:28.764"/> </par> <par id="id9"> <text src="chapter1.xhtml#photo_caption"/> <audio src="chapter1_audio.mp3" clipBegin="0:24:28.764" clipEnd="0:24:50.010"/> </par> </seq> <!-- some sentences in the sidebar --> <par id="id10"> <text src="chapter1.xhtml#sidebartext3"/> <audio src="chapter1_audio.mp3" clipBegin="0:24:50.010" clipEnd="0:25:28.530"/> </par> <par id="id11"> <text src="chapter1.xhtml#sidebartext4"/> <audio src="chapter1_audio.mp3" clipBegin="0:25:28.530" clipEnd="0:25:45.515"/> </par> </seq> <!-- more sentences in the chapter (outside the sidebar) --> <par id="id12"> <text src="chapter1.xhtml#text3"/> <audio src="chapter1_audio.mp3" clipBegin="0:25:45.515" clipEnd="0:26:30.203"/> </par> <par id="id13"> <text src="chapter1.xhtml#text4"/> <audio src="chapter1_audio.mp3" clipBegin="0:26:30.203" clipEnd="0:27:15.000"/> </par> </seq> </body> </smil>
The reason for grouping structures like sidebars, section headers, image
groups, tables, and footnotes in a seq
element is so that
their start and end positions can be identified during playback. Reading Systems
can then offer playback options tailored to the layout of the publication, such
as jumping past a long sidebar, turning off rendering of page break
announcements (see Skippability and Escapability), or customizing the
reading mode to suit structures such as tables.
The following example shows the EPUB Content Document that corresponds to the previous Media Overlay example.
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2011/epub" profile="http://www.idpf.org/epub/30/profile/content/" xml:lang="en" lang="en"> <head> <title>Media Overlays Example of EPUB Content Document</title> </head> <body id="sec1"> <section id="sectionstart" epub:type="chapter"> <h1 id="section1_title">The Section Title</h1> <p id="text1">The first phrase of the main text body.</p> <p id="text2">The second phrase of the main text body.</p> <aside id="sidebar" epub:type="sidebar"> <h1 id="sidebartitle">The Sidebar Title</h1> <figure> <img id="photo" src="photo.png" alt="a photo for which there is a caption" /> <figcaption id="photocaption">The photo caption</figcaption> </figure> <p id="sidebartext3">A phrase in the sidebar.</p> <p id="sidebartext4">Another phrase in the sidebar</p> </aside> <p id="text3">The third phrase of the main text body.</p> <p id="text4">The fourth phrase of the main text body.</p> </section> </body> </html>
Media Overlay text
elements' src
attributes refer to EPUB Content Document elements by their IDs. The granularity
level of the Media Overlay therefore depends on how the EPUB Content Document is
marked up. If the finest level of markup is at the paragraph level, then that is
the finest possible level at which Media Overlay synchronization can be
authored. Likewise, if sub-paragraph markup is available, such as
span
elements representing phrases or sentences, then
finer granularity is possible in the Media Overlay. Finer granularity gives
users more precise results for synchronized playback when navigating by word or
phrase and when searching the text, but increases the file size of the Media
Overlay Documents.
Any EPUB Content Document associated with a Media Overlay may contain embedded
media objects such as video and audio. The Media Overlay text
element may be used in such instances to reference the embedded media by its ID
value.
When a text
element references embedded audio media or
video media that includes audio, no audio
sibling element is
required, though one is allowed.
In order to express semantic inflection, the epub:type
attribute defined in EPUB Content Documents
3.0 may be attached to Media Overlay par
, seq
,
and body
elements.
Values for the Media Overlay epub:type
attribute are constrained
identically to the epub:type
attribute in EPUB Content Documents. Refer
to
Semantic Inflection
[ContentDocs30]
for details.
The epub:type
attribute facilitates Reading System behavior
appropriate for the semantic type(s) indicated. Examples of these behaviors are
skippability, escapability (see Skippability and Escapability), and table
reading mode, which provides the user fine-grained control over table
cell/row/column navigation.
The following example shows the semantic markup for a Media Overlay containing a sidebar.
<smil xmlns="http://www.w3.org/ns/SMIL" xmlns:epub="http://www.idpf.org/2011/epub" version="3.0" profile="http://www.idpf.org/epub/30/profile/content/"> <body> <seq id="id1" epub:textref="chapter1.xhtml#sidebar" epub:type="sidebar"> <par id="id2"> <text src="chapter1.xhtml#sidebartitle"/> <audio src="chapter1_audio.mp3" clipBegin="0:24:15.000" clipEnd="0:24:18.123"/> </par> <par id="id3"> <text src="chapter1.xhtml#sidebartext3"/> <audio src="chapter1_audio.mp3" clipBegin="0:24:50.010" clipEnd="0:25:28.530"/> </par> <par id="id4"> <text src="chapter1.xhtml#sidebartext4"/> <audio src="chapter1_audio.mp3" clipBegin="0:25:28.530" clipEnd="0:25:45.515"/> </par> </seq> </body> </smil>
A Media Overlay Document must meet all of the following criteria:
› It must meet the conformance constraints for XML documents defined in XML Document Content Conformance [Publications30] .
› It must be valid to the Media Overlays schema as defined in Appendix A, Media Overlays Schema and conform to all content conformance constraints expressed in Summary of Media Overlay Elements and Attributes.
› The Media Overlay Document filename
should use the file extension .smil
.
All elements defined in this section are in the
http://www.w3.org/ns/SMIL
namespace unless otherwise
specified.
smil
ElementThe smil
element must be the root element of all Media
Overlay Documents. Unlike the SMIL specification
element from which it is derived, the version used in Media Overlays does not
require the inclusion of a child head
element.
smil
The smil
element is the root element of the
Media Overlay Document.
version
[required]
Specifies the version number of the [SMIL] specification to which the Media Overlay adheres.
This attribute must have the value
3.0
id
[optional]
The ID [XML] of this element, which must be unique within the document scope.
profile
[conditionally required]
Specifies the URI of the metadata profile [RDFa11 Core] for the Media Overlay Document.
This attribute is optional except when the
epub:type
attribute and/or meta element is used
in the Overlay Document, in which case its use is required.
This attribute must have the value
http://www.idpf.org/epub/30/profile/content/
prefix
[optional]
Declares additional metadata vocabulary prefixes not defined in the metadata profile [RDFa11 Core].
For more information on the usage of the profile
and
prefix
attributes, refer to Semantic Inflection.
head
ElementThe head
element is the container for metadata in the Media
Overlay Document, and consists of zero or more child meta
elements. As this specification defines no metadata properties occurring in the
Media Overlay Document, the head
element is optional.
meta
ElementThe meta
element represents metadata for the Media Overlay
Document. The attributes and content model for meta
in EPUB
Media Overlays differ from the [SMIL] specification. SMIL
meta
elements have name
and
content
attributes, and an empty content model, whereas
Media Overlay meta
elements have property
and about
attributes, and, optionally, text content. This
specification defines no metadata properties occurring in the Media Overlay
Document.
meta
As a child of the head element.
property
[required]
A CURIE [RDFa11 Core] that resolves
to a term in one of the vocabularies defined in the
metadata profile or smil
element
prefix
attribute.
about
[context dependent]
Identifies the subject of the property being expressed. The value of the attribute must be a relative IRI [RFC3987] pointing to the resource or element it describes.
The about
attribute is optional
depending on the type of metadata being expressed. When
omitted, the property relates to the Overlay as a
whole.
xml:lang
[required]
Specifies the language used in the contents and attribute values of the carrying element and its descendants, as defined in section 2.12 Language Identification of [XML].
dir
[optional]
Specifies the base text direction of the content and attribute values of the carrying element and its descendants.
Inherent directionality specified using [Unicode] takes precedence over this attribute.
Allowed values are ltr
(left-to-right) or rtl
(right-to-left).
id
[optional]
The ID [XML] of this element, which must be unique within the document scope.
Text.
body
ElementThe body
element is the starting point for the presentation
contained in the Media Overlay Document. It contains the main sequence of
par
and seq
elements.
body
The body
element is the required second child
of the smil element.
epub:type
[optional]
An expression of the structural semantics of the corresponding element in the EPUB Content Document.
The value is a whitespace separated list of CURIEs [RDFa11 Core]. Refer to Semantic Inflection for more information.
id
[optional]
The ID [XML] of this element, which must be unique within the document scope.
epub:textref
[optional]
The IRI [RFC3987] of the corresponding Content Document, including a fragment identifier that references the specific element.
One or more, in any order, of:
seq
[optional]
or
par
[optional]
, where at least one is required.
seq
ElementThe seq
element contains media objects which are to be
rendered sequentially.
seq
One or more seq
elements may occur as children
of the body element and of the seq element.
epub:type
[optional]
An expression of the structural semantics of the corresponding element in the EPUB Content Document.
The value is a whitespace separated list of CURIEs [RDFa11 Core]. Refer to Semantic Inflection for more information.
id
[optional]
The ID [XML] of this element, which must be unique within the document scope.
epub:textref
[required]
The IRI [RFC3987] of the corresponding Content Document, including a fragment identifier that references the specific element.
One or more, in any order, of:
seq
[optional]
or
par
[optional]
, where at least one is required.
par
ElementThe par
element contains media objects which are to be
rendered in parallel.
par
One or more par
elements may occur as children
of the body and seq elements.
epub:type
[optional]
An expression of the structural semantics of the corresponding element in the EPUB Content Document.
The value is a whitespace separated list of CURIEs [RDFa11 Core]. Refer to Semantic Inflection for more information.
id
[optional]
The ID [XML] of this element, which must be unique within the document scope.
In any order:
text
[required]
and
audio
[optional]
The audio element is optional only if its sibling text element refers to audio or video media (see Embedded Audio and Video).
text
ElementThe text
element references the element in the EPUB Content Document that represents the text media of a
synchronized component.
text
elements typically refer to textual elements, but can
also refer to audio or video elements (see Embedded Audio and Video).
text
As a required child of the par element.
Empty.
audio
ElementThe audio
element represents a clip of audio media.
audio
A required child of the par element unless its sibling text element refers to audio or video media (see Embedded Audio and Video), in which case it is optional.
id
[optional]
The ID [XML] of this element, which must be unique within the document scope.
src
[required]
The IRI [RFC3987] of an audio file. The audio file must be one of the audio formats listed in the Core Media Types [Publications30] table.
clipBegin
[optional]
A clock value that specifies the offset into the physical media corresponding to the start point of an audio clip.
Clock values
are a subset of SMIL
Wallclock-sync values, defined in [SMIL]. The Media Overlays
schema (see Appendix A, Media Overlays Schema
) defines the syntax as
hh:mm:ss.s
, or as a single unit of hours (h
), minutes
(min
), seconds (s
), or milliseconds
(ms
). See Appendix B, Examples of Wallclock Values
.
clipEnd
[optional]
A clock value that specifies the offset into the physical media corresponding to the end point of an audio clip.
Clock values
are a subset of SMIL
Wallclock-sync values, defined in [SMIL]. The Media Overlays
schema (see Appendix A, Media Overlays Schema
) defines the syntax as
hh:mm:ss.s
, or as a single unit of hours (h
), minutes
(min
), seconds (s
), or milliseconds
(ms
). See Appendix B, Examples of Wallclock Values
.
The chronological offset of the terminating position
must be after the starting offset specified in the
clipBegin
attribute.
Empty.
Visual rendering information for the currently-playing EPUB Content Document element may be expressed in the EPUB Style Sheet using the pseudo class
media-overlay-active
. This pseudo class can be used to add
highlighting, outlining and other indications that the Content Document element is
active.
The following example shows an EPUB Style Sheet that uses
media-overlay-active
.
:media-overlay-active { background-color: rgb(255, 255, 0); overflow: hidden; }
The item
elements of the manifest in
the Package Document may specify a Media Overlay via the
media-overlay
attribute. Media overlays are themselves
manifest items and must be referred to by their IDs.
The following example shows how to include Media Overlays in the manifest of a Package Document.
<manifest> <item id="ch1" href="chapter1.xhtml" media-type="application/xhtml+xml" media-overlay="ch1_audio"/> <item id="ch1_audio" href="chapter1_audio.smil" media-type="application/smil+xml"/> </manifest>
Manifest items which refer to Media Overlays must have the media-type
application/smil+xml
as specified in the Core Media Types
section of EPUB [Publications30].
The media-overlay
attribute must only be attached to manifest
items
that reference EPUB Content Documents. The attribute must not be attached to
item
s that reference Foreign Content Documents as defined in
[Publications30].
While not every manifest item is required to have a Media Overlay associated with it, there must be a one-to-one relationship between Media Overlay files and manifest items. Multiple EPUB Content Documents cannot share a single Media Overlay file.
This is a forwards-compatible addition: 2.0 Reading Systems may safely ignore the
media-overlay
attribute and process documents in their normal
fashion.
The Package Document must include metadata about Media Overlay Documents. The following tables detail the available properties.
TODO: verify fields are correct/necessary
› duration
|
|
Description: | The duration of the entire presentation or of a specific Media Overlay. |
Allowed value(s): | A subset of SMIL
Wallclock-sync values (defined in [SMIL]), expressed as hh:mm:ss.s or as a single unit
of h (hours), min (minutes),
s (seconds), or ms
(milliseconds). See Appendix B, Examples of Wallclock Values
. |
Cardinality: | Exactly one for the Publication and for each Overlay. |
Example: |
<meta property="media:duration">1:36:20</meta>
|
› narrator
|
|
Description: | Name of the narrator. |
Allowed value(s): |
xsd:string
|
Cardinality: |
zero or more
|
Example: |
<meta property="media:narrator">Joe Speaker</meta>
|
The Package Document must include the duration
of each Media
Overlay as well as of the entire publication. The Package Document may include
narrator
information, as well, in particular when each Media
Overlay has its own narrator or there is one narrator specified for the entire
publication. When meta
elements that are specific to a single
Media Overlay Document, the about
attribute is used to reference
which one. Meta
elements with no about
attribute are considered to be about the entire publication.
The following example shows a Package Document with metadata about Media Overlays.
<package> <metadata> … <meta property="media:duration" about="#ch1_audio">0:32:29</meta> <meta property="media:duration" about="#ch2_audio">0:34:02</meta> <meta property="media:duration" about="#ch3_audio">0:29:49</meta> <meta property="media:duration">1:36:20</meta> <meta property="media:narrator">Joe Speaker</meta> … </metadata> … </package>
If a Reading System supports Media Overlays, it must adhere to the conformance
requirements in this section. Reading Systems that do not support Media Overlays
must ignore both the media-overlay
attribute on manifest
item
elements and the manifest item
elements where the
media-type
attribute value equals
application/smil+xml
.
When the Reading System loads a Package Document, it
must refer to the manifest item
elements'
media-overlay
attributes to discover the corresponding
Media Overlays for EPUB Content Documents. When the
Reading System loads an EPUB Content Document, it also loads the corresponding
Media Overlay Document. Playback must start either at the beginning or at a
specific location within the Media Overlay Document (for example, if loading a
bookmarked position). When the Media Overlay Document has finished playing, the
Reading System must load the next EPUB Content Document (as specified in the
Package Document spine) and also load its corresponding
Media Overlay Document.
The Media Overlay elements associated with synchronization behavior are
called seq
(sequence) and par
(parallel). A Media Overlay is, in its simplest form, defined as a sequence
of parallel (i.e., rendered together) text and audio media objects. Reading
Systems must render immediate children of the body
element in a sequence. Each child element must be a seq
or a par
element. A seq
element's
children must be rendered in sequence, and playback completes when the last
child has finished playing. A par
element's children must
be rendered in parallel (with each starting at the same time), and playback
completes when all the children have finished playing. When the
body
element's last child has finished playing,
playback of the file is done.
When presented with a Media Overlay audio
element,
Reading Systems must play the audio resource referenced by the
src
attribute, starting at the time given by the
clipBegin
attribute and ending at the time given by
the clipEnd
attribute. The following rules must be
observed:
If clipBegin
is not specified, its value is
assumed to be 0
If clipEnd
is not specified, its value is
assumed to be the end of the physical media
If clipEnd
exceeds the duration of the physical
media, then its value is assumed to be the end of the physical
media
User-controllable audio playback options should include timescale modification, where the playback rate is altered without distorting the pitch. The suggested range is half-speed to double speed.
When presented with a Media Overlay text
element,
Reading Systems must ensure the EPUB Content Document element referenced by
the src
attribute is visible in the Viewport. Reading Systems with a CSS Viewport must also apply the styling rules
indicated by the EPUB Style Sheet pseudo class
media-overlay-active
to this EPUB Content Document
element.
An EPUB Content Document with which a Media Overlay is associated may itself contain embedded video and audio media objects. Unlike text and images, video and audio clips have an intrinsic duration. Consequently, when a EPUB Reading System renders the text/audio synchronization described by a Media Overlay, the default playback behavior of audio and video clips embedded within the associated text document must be overridden.
Note that the rules below apply only to text
elements
(within the Media Overlay) pointing to (i.e., via the src
attribute) video
or audio
elements
(within the associated EPUB Content Document).
All audio and video media objects embedded within an EPUB Content Document must have their public playback interface deactivated (typically: play/pause control, time slider, volume level, etc.). This behavior is required to avoid interference between the scheduled playback sequence defined by the Media Overlay, and the arbitrary playback behavior due to user interaction or script execution. As a result, when the Reading System is in playback mode, it must:
Hide the individual video/audio UI controls from the
page, which overrides the default behavior defined by
the controls
HTML5 attribute.
Prevent scripts embedded within the EPUB Content Document from invoking the JavaScript audio/video playback API (i.e., authored as part of the default publication behavior). As this requirement may be hard to implement in practice, it is recommended that content producers should avoid publishing embedded scripts dedicated to controlling the playback of inline audio/video media objects. The published Media Overlay can then retain full control of the synchronized text/audio presentation without any risk of interference from script-enabled custom behaviors.
All audio and video media objects embedded within an EPUB Content
Document must be initialized to their "stopped" state, and be ready
to be played from the zero-position within their content stream
(possibly displaying the poster
image specified
using the HTML5 markup). This requirement overrides the default
behavior defined by the autoplay
HTML5 attribute.
When a text
element becomes active within the
Media Overlay, the EPUB Style Sheet visual
highlighting rules apply regardless of the content type referred to
by the src
attribute (e.g., visible video and
audio player controls within the host EPUB Content Document must be
decorated as per the media-overlay-active
CSS
rules).
In addition to the default behavior of Media Overlay activation for textual fragments, audio and video playback must be started and stopped according to the duration implied by the authored Media Overlay synchronization (as per the standard [SMIL] timing model). There are two possible scenarios:
When a Media Overlay text
element
has no audio
sibling within its
par
parent container, the
referenced audio or video media object must play until
it ends, at which point the text
element's lifespan terminates. In this case, the
implicit duration of the text
element
(and by inference, of the parent par
container) is that of the referenced audio or video
clip.
When a Media Overlay text
element
has an audio
sibling within its
par
parent container, the
playback duration of the audio or video media object
referenced by the text
element must
be constrained by the duration of the
audio
sibling in the Media
Overlay. In this case, the actual duration of the parent
par
container is that of the
child audio clip, regardless of the duration of the
video or audio media pointed to by the
text
element. This behavior may
result in an embedded video or audio media object ending
playback prematurely (before reaching its full
duration), or ending before the playback of the parallel
Media Overlay audio
is finished (in
which case the last-played video frame should remain
visible until the parent par
container finally ends). This behavior is equivalent of
the Media Overlay audio
element
implicitly carrying the behavior of the
endsync
attribute as
defined in [SMIL].
Furthermore, Reading Systems should expose user
controls for the volume levels of each independent audio
track (i.e., from the audio
element
of the Media Overlay, and from the embedded audio or
video media objects within the Content Document), so
that audio output can be adjusted to match listeners'
requirements. Note that overlapping audio tracks is
typically an authoring-time concern: content producers
usually add a layer of audio information over a video
track for description purposes. It is recommended that
overlapping audio situations are carefully examined and
dealt with at production stage, as Reading Systems are
not required to handle simultaneous volume levels in any
particular way.
When a text
element becomes inactive in the
Media Overlay, and when it points to an video or audio media object,
the referenced media object must be reset to its initial "stopped"
state, and ready to be played from the zero-position within their
content stream (possibly displaying the poster image specified using
the HTML5 markup)
While reading, users may want to turn on or off certain features of the
publication, such as sidebars, footnotes, page numbers, or other types of
secondary content. This feature is called skippability. Reading Systems should
use the semantic information provided by Media Overlay elements'
epub:type
attribute to determine when to offer users
the option of skippable features. In the following example, a Reading System
should offer the user the option of turning on and off the page break/page
number announcements, which are often cumbersome to listen to.
The following example shows a Media Overlay Document with a pagebreak.
<smil xmlns="http://www.w3.org/ns/SMIL" xmlns:epub="http://www.idpf.org/2011/epub" version="3.0" profile="http://www.idpf.org/epub/30/profile/content/"> <body> <!-- a paragraph --> <par id="id1"> <text src="chapter1.xhtml#para1"/> <audio src="chapter1_audio.mp3" clipBegin="0:23:22.000" clipEnd="0:24:15.000"/> </par> <!-- a page number --> <par id="id2" epub:type="pagebreak"> <text src="chapter1.xhtml#g1"/> <audio src="chapter1_audio.mp3" clipBegin="0:24:15.000" clipEnd="0:24:18.123"/> </par> <!-- another paragraph --> <par id="id3"> <text src="chapter1.xhtml#g2"/> <audio src="chapter1_audio.mp3" clipBegin="0:24:18.123" clipEnd="0:25:28.530"/> </par> </body> </smil>
The following selection of terms from the [StructureVocab] for which User Agents should offer users the option of skippability is provided as an informative reference:
sidebar
practice
marginalia
annotation
help
note
footnote
rearnote
pagebreak
Media Overlays may use additional vocabularies to those specified in the
profile by defining them in the prefix
attribute
on the root smil
element. Reading System support for
skippability based on epub:type
values should not be
assumed.
Escapable items are nested structures such as tables, lists, and sidebars
that listeners may wish to skip over, continuing to read from the point
immediately after the nested structure. Escapable items differ from
skippable features in that they do not enable or disable entire types of
items, but provide an exit from them (e.g., a user can listen to some of the
content before choosing to escape). Reading Systems should allow escaping of
nested structure items. Reading Systems shall determine the start of nested
structures by the value of epub:type
attribute (e.g.,
glossary
) and should offer users the option to skip
playback of that structure and resume with whatever content comes after it.
The following example shows the Media Overlay Document for an EPUB Content Document containing a paragraph, a glossary, and another paragraph. A User Agent that supported skippability would give the user the option to interrupt playback of the glossary and continue playing the document paragraphs.
<smil xmlns="http://www.w3.org/ns/SMIL" version="3.0" xmlns:epub="http://www.idpf.org/2011/epub" profile="http://www.idpf.org/epub/30/profile/content/"> <body> <!-- a paragraph, part of the regular document text --> <par id="id1"> <text src="chapter1.xhtml#para1"/> <audio src="chapter1_audio.mp3" clipBegin="0:23:22.000" clipEnd="0:24:15.000"/> </par> <!-- a glossary, which is a nested structure --> <seq id="id2" epub:textref="chapter1.xhtml#g0" epub:type="glossary"> <par id="id3" epub:type="glossterm"> <text src="chapter1.xhtml#g1"/> <audio src="chapter1_audio.mp3" clipBegin="0:24:15.000" clipEnd="0:24:18.123"/> </par> <par id="id4" epub:type="glossdef"> <text src="chapter1.xhtml#g2"/> <audio src="chapter1_audio.mp3" clipBegin="0:24:18.123" clipEnd="0:25:28.530"/> </par> <par id="id5" epub:type="glossterm"> <text src="chapter1.xhtml#g3"/> <audio src="chapter1_audio.mp3" clipBegin="0:25:28.530" clipEnd="0:25:45.515"/> </par> <par id="id6" epub:type="glossdef"> <text src="chapter1.xhtml#g4"/> <audio src="chapter1_audio.mp3" clipBegin="0:25:45.515" clipEnd="0:27:04.123"/> </par> </seq> <!-- another paragraph, part of the document text that comes after the glossary --> <par id="id7"> <text src="chapter1.xhtml#para1"/> <audio src="chapter1_audio.mp3" clipBegin="0:27:04.123" clipEnd="0:27:59.000"/> </par> </body> </smil>
The schema for Media Overlays is available at ../schema/media-overlay-30.nvdl.
This schema is normative. In case of conflicts between the specification prose and this schema, the schema shall be considered definitive.
This section is informative
Validation of Media Overlays using this schema will require a processor that supports [NVDL], [RelaxNG] and [ISOSchematron].
Note however that the NVDL schema layer can be substituted by a two-pass validation using the embedded RELAX NG and ISO Schematron schemas alone.
This appendix is informative
The following examples show different values for the clipBegin and clipEnd attributes, found in the Media Overlay Document.
<audio clipBegin="5:34:31.396" clipEnd="5:35:21.875"/> <audio clipBegin="124:59:36" clipEnd="125:01:22"/> <audio clipBegin="0:05:01.2" clipEnd="0:05:02.4"/> <audio clipBegin="76.2s" clipEnd="98.6s"/> <audio clipBegin="3.2h" clipEnd="3.5h"/> <audio clipBegin="13min" clipEnd="15min"/> <audio clipBegin="2345ms" clipEnd="5678ms"/>
The following examples show the duration metadata property, found in the Package Document.
<meta property="media:duration">5:34:31.396</meta> <meta property="media:duration">124:59:36</meta> <meta property="media:duration">0:05:01.2</meta> <meta property="media:duration">76.2s</meta> <meta property="media:duration">3.2h</meta> <meta property="media:duration">13min</meta> <meta property="media:duration">2345ms</meta>
This appendix is informative
EPUB has been developed by the International Digital Publishing Forum in a cooperative effort, bringing together publishers, vendors, software developers, and experts in the relevant standards.
The EPUB 3 specifications were prepared by the International Digital Publishing Forum’s EPUB Maintenance Working Group, operating under a charter approved by the membership in May, 2010 under the leadership of:
Markus Gylling DAISY Consortium Chair |
Garth Conboy Google Inc. Vice-chair |
Brady Duga Google Inc. Vice-chair |
Bill McCoy International Digital Publishing Forum (IDPF) Secretary |
Active members of the working group included:
Alexis Wiles, Alicia Wise, … TODO : COMPLETE LIST OF CURRENT WG MEMBERS
For more detailed acknowledgements and information about contributors to each version of EPUB, refer to Contributors [EPUB3Overview] .
[ContentDocs30] EPUB Content Documents 3.0 .
[ISOSchematron] ISO/IEC 19757-3: Rule-based validation — Schematron .
[MediaOverlays30] EPUB Media Overlays 3.0 .
[OCF3] Open Container Format 3.0 .
[Publications30] EPUB Publications 3.0 .
[RDFa11 Core] RDFa Core 1.1 . Syntax and processing rules for embedding RDF through attributes. 26 October 2010.
[RFC2119] Key words for use in RFCs to Indicate Requirement Levels (RFC 2119) . March 1997.
[RFC3987] Internationalized Resource Identifiers (IRIs) (RFC 3987) . January 2005.
[RelaxNG] ISO/IEC 19757-2: Regular-grammar-based validation — RELAX NG. Second Edition . 2008-12-15.
[SMIL] SMIL Version 3.0 . 01 December 2008.
[StructureVocab] EPUB 3 Structural Semantics Vocabulary TODO Issue 74 .
[Unicode] The Unicode Consortium. The Unicode Standard, Version 5.0.0, defined by: The Unicode Standard, Version 5.0 (Boston, MA, Addison-Wesley, 2007. ISBN 0-321-48091-0).
[XML] Extensible Markup Language (XML) 1.0 (Fifth Edition) . 26 November 2008.
[EPUB3Overview] EPUB 3 Overview .