Recommended Specification 11 October 2011
A diff of changes from the previous draft is available at this link.
Copyright © 2010, 2011 International Digital Publishing Forum™
All rights reserved. This work is protected under Title 17 of the United States Code. Reproduction and dissemination of this work with changes is prohibited except with the written permission of the International Digital Publishing Forum (IDPF).
EPUB is a registered trademark of the International Digital Publishing Forum.
Table of Contents
The EPUB® specification is a distribution and interchange format standard for digital publications and documents. EPUB defines a means of representing, packaging and encoding structured and semantically enhanced Web content — including HTML5, CSS, SVG, images, and other resources — for distribution in a single-file format.
EPUB 3, the third major release of the standard, consists of a set of four specifications, each defining an important component of an overall EPUB Publication:
EPUB Publications 3.0 [Publications30], which defines publication-level semantics and overarching conformance requirements for EPUB Publications.
EPUB Content Documents 3.0 [ContentDocs30], which defines profiles of XHTML, SVG and CSS for use in the context of EPUB Publications.
EPUB Open Container Format (OCF) 3.0 [OCF3], which defines a file format and processing model for encapsulating a set of related resources into a single-file (ZIP) EPUB Container.
EPUB Media Overlays 3.0 [MediaOverlays30], which defines a format and a processing model for synchronization of text and audio.
EPUB has been widely adopted as the format for digital books (eBooks), and these new specifications significantly increase the format's capabilities in order to better support a wider range of publication requirements, including complex layouts, rich media and interactivity, and global typography features. The expectation is that EPUB 3 will be utilized for a broad range of content, including books, magazines and educational, professional and scientific publications.
This document provides a starting point for content authors and software developers wishing to understand these specifications. It consists of non-normative overview material, including a roadmap to the four building-block specification documents that compose EPUB 3.
Another non-normative document, EPUB 3 Changes from EPUB 2.0.1 [EPUB3Changes], describes changes in EPUB 3 from the previous version, but is intended primarily for Authors and EPUB Reading System vendors migrating from EPUB 2.0.1 to EPUB 3 and for those who anticipate supporting both versions.
This section provides an overview of the EPUB 3 specifications by explaining in brief the components of a Publication. Links to additional information within this document and to the specifications are included.
An EPUB Publication, at its most basic level, is a bundled collection of resources that can be reliably and predictably ingested by an EPUB Reading System in order to render its contents to a User. Some of these resources facilitate the discovery and processing of the EPUB Publication, while others make up the content of the source publication. The latter, EPUB Content Documents, are described in Content Documents and are fully defined in [ContentDocs30].
A Publication's resources are typically bundled for distribution as a ZIP-based archive with the file extension .epub
. As conformant ZIP archives, Publications can be unzipped by many software programs, simplifying both their production and consumption. The container format is introduced in Container and defined in [OCF3].
The container format not only provides a means of determining that the zipped content represents an EPUB Publication (the mimetype
file), but also provides a universally-named directory of informative resources (/META-INF
). Key among these is the container.xml
file, which directs Reading Systems to the root file of the Publication, the Package Document.
The Package Document is itself a kind of information warehouse for the Publication, storing metadata about the specific work contained in the Publication, providing an exhaustive list of resources and defining a default reading order. The Package Document is introduced in Package Document and defined in [Publications30].
The preceding components of an EPUB Publication are not new to EPUB 3, and will be familiar to anyone who has worked with Publications before, although they have been changed and enhanced in this version. A new core addition to EPUB 3, however, is the Media Overlay Document, which defines a means of synchronizing text and audio playback. The Overlay Document is introduced in Multimedia and defined in [MediaOverlays30].
The following example shows the resources a minimal "Hello World" Publication might contain:
mimetype META-INF/container.xml Content/HelloWorld.opf Content/HelloWorld.xhtml
While conceptually simple, an EPUB Publication is more than just a collection of HTML pages and dependent assets in a ZIP package as represented in this example. The following sections of this document delve into more detail about the primary features and functionality that Publications provide to enhance the reading experience.
This section covers the major features of EPUB, including important components and topics that apply to the process of authoring EPUB Publications as a whole.
Every EPUB Publication includes a single Package Document, which specifies all the Publication's constituent content documents and their required resources, defines a reading order for linear consumption, and associates Publication-level metadata and navigation information.
The Package Document represents a significant improvement on a typical Web site. A Web site, for example, embeds references to its resources within its content, which, while a simple and flexible means of identifying resources, makes it difficult to enumerate all the resources required to render it. In addition, there is no standard way for a Web site to define that a sequence of pages make up a larger publication, which is precisely what EPUB's spine
element does (i.e., it provides an external declarative means to explicitly specify navigation through a collection of documents). Finally, the Package Document defines a standard way to represent metadata globally applicable to a collection of pages.
The Package Document and other Publication-level constructs are specified in [Publications30].
The new EPUB Canonical Fragment Identifier (epubcfi) Specification [EPUBCFI] defines a standardized method for linking into a Publication.
Required support for this scheme in Reading Systems means that EPUB now has an interoperable linking mechanism, one that can, for example, facilitate the sharing of bookmarks and reading locations across devices.
EPUB Publications provide a rich array of options for adding Publication metadata. The Package Document includes a dedicated metadata section [Publications30] for general information about the Publication, allowing titles, authors, identifiers and other information about the Publication to be easily accessed. It also provides the means to attach complete bibliographic records to a Publication using the link element [Publications30].
The Package Document also allows a Unique Identifier to be established for a Publication using the unique-identifier attribute [Publications30]. The required last-modified date in the Package metadata section can be joined with this identifier to define a Package Identifier, which provides a means of distinguishing EPUB Publications that represent different versions of the same Manifestation (see Publication Identifiers [Publications30]). The Package Identifier addresses the issue of how to release a Publication without changing its Unique Identifier while still identifying it as a new version.
XHTML Content Documents also include the means of annotating document markup with rich metadata, making them more semantically meaningful and useful both for processing and accessibility purposes (Semantic Inflection [ContentDocs30]).
Every EPUB Publication contains one or more EPUB Content Documents, as defined in [ContentDocs30]. These are XHTML or SVG documents that describe the readable content of a Publication and reference associated media resources (e.g., images, audio and video clips).
XHTML Content Documents are defined by a profile of HTML5 that requires the use of XML serialization [HTML5] in order to ensure that content can be reliably manipulated and rendered. This profile also adds two additional EPUB-specific language constructs: the epub:type attribute [ContentDocs30] for element-level metadata and the epub:trigger element [ContentDocs30] for declaratively associating controls with multimedia elements.
These additions do not affect the ability of an HTML5 User Agent [HTML5] to render EPUB XHTML Content Documents, but Publications might not render identically in all User Agents depending on their support.
A key concept of EPUB is that content presentation should adapt to the User rather than the User having to adapt to a particular presentation of content. HTML was originally designed to support dynamic rendering of structured content, but over time HTML as supported in Web browsers has become focused on the needs of Web applications, and most popular Web sites now have fixed-format layouts.
EPUB Publications, however, are designed to maximize accessibility for the visually impaired, and Reading Systems typically perform text line layout and pagination on the fly, adapting to the size of the display area, the User's preferred font size, and other environmental factors. This behavior is not guaranteed in EPUB; images, vector graphics, video and other non-reflowable content may be included, and some Reading Systems might not paginate on the fly, or at all. Nevertheless, supporting dynamic adaptive layout and accessibility has been a primary design consideration throughout the evolution of the EPUB standard.
EPUB Content Documents may optionally reference EPUB Style Sheets, allowing Authors to define the desired rendering properties. EPUB 3 defines a profile of CSS based on CSS 2.1 [CSS2.1] for this purpose, together with capabilities defined by various CSS3 Modules and several additional properties specific to EPUB.
CSS3 properties were selected based on their current level of support in Web browsers, but support for them in Reading Systems and User Agents is not guaranteed (EPUB-defined properties may similarly be ignored).
EPUB 3 also supports CSS styles that enable both horizontal and vertical layout and both left-to-right and right-to-left writing, but Reading Systems might not support all of these capabilities. Reading Systems may also support different rendering options than the Author intended. Refer to CSS in the Global Language Support section for more information.
EPUB 3 also supports the ability to include multiple style sheets that allow users, for example, to select between day/night reading modes or to change the rendering direction of the text. Refer to Alternate Style Tags [ContentDocs30] for more information.
EPUB 3 supports audio and video embedded in [content documents] via the new [HTML5] audio
and video
elements, inheriting all the functionality and features these elements provide. (For information on supported audio formats, please refer to Core Media Types [Publications30]. For recommendations on embedding video, refer to Reading System Conformance [Publications30].)
Another key new multimedia feature in EPUB 3 is the inclusion of Media Overlay Documents [MediaOverlays30]. When pre-recorded narration is available for a Publication, Media Overlays provide the ability to synchronize that audio with the text of a Content Document (see also Aural Renditions and Media Overlays).
EPUB 3 supports two closely-related font formats — OpenType [OpenType] and WOFF [WOFF] — to accommodate both traditional publishing workflows and emerging Web-based workflows. Word processing programs used to create Publications are likely to have access only to a collection of installed OpenType fonts, for example, whereas Web-archival EPUB generators will likely only have access to WOFF resources (which cannot be converted to OpenType without undesirable, and potentially unlicensed, stripping of WOFF metadata).
EPUB 3 also supports both obfuscated and regular font resources for both OpenType and WOFF font formats. Support for obfuscated font resources is required to accommodate font licensing restrictions for many commercially-available fonts.
EPUB strives to treat content declaratively — as data that can be manipulated, not programs that must be executed — but does support scripting as defined in HTML5 and SVG (refer to Scripted Content Documents [ContentDocs30] for more information).
It is important to note, however, that scripting support is optional for Reading Systems and may be disabled for security reasons.
Authors should also note that scripting in an EPUB Publication can create security considerations that are different from scripting within a Web browser. For example, typical same-origin policies are not applicable to content that has been downloaded to a User's local system. Therefore, it is strongly encouraged that scripting be limited to container constrained contexts, as further described in Scripted Content Documents — Content Conformance [ContentDocs30].
Scripting consequently should be used only when essential to the User experience, since it greatly increases the likelihood that content will not be portable across all Reading Systems and creates barriers to accessibility and content reusability.
EPUB 3 provides the following text-to-speech (TTS) facilities for controlling aspects of speech synthesis, such as pronunciation, prosody and voice characteristics:
Pronunciation Lexicons
The inclusion of generic pronunciation lexicons using the W3C PLS format [PLS] enables Authors to provide pronunciation rules that apply to the entire EPUB Publication. Refer to PLS Documents [ContentDocs30] for more information.
Inline SSML Phonemes
The incorporation of SSML phonemes functionality [SSML] directly into a EPUB Content Document [ContentDocs30] enables fine-grained pronunciation control, taking precedence over default pronunciation rules and/or referenced pronunciation lexicons (as provided by the PLS format mentioned above). Refer to SSML Attributes [ContentDocs30] for more information.
CSS Speech Features
The inclusion of a select set of features from the CSS 3 Speech Module [CSS3Speech] (previously known as CSS 2.1 Aural Stylesheets [CSS2.1]) enables Authors to control further speech synthesis characteristics. Refer to CSS 3.0 Speech [ContentDocs30] for more information.
An EPUB Publication is transported and interchanged as a single file (a "portable document") that contains the Package Document, all Content Documents and all other required resources for processing the Publication. The single-file container format for EPUB is based on the widely adopted ZIP format. An XML manifest that specifies the location in the ZIP archive of the Package Document must be found at a well-defined location within the archive.
This approach provides a clear contract between any creator of an EPUB Publication and any system which consumes such Publications, as well as a reliable representation that is independent of network transport or file system specifics.
An EPUB Publication's representation as a container file is specified in [OCF3].
EPUB 3 supports alternate representations of all text metadata items in the package metadata section to improve global distribution of Publications. The alternate-script
property can be combined with the xml:lang
attribute to include and identify alternate script renditions of language-specific metadata.
Using this property, a Japanese Publication could, for example, include an alternate Roman-script representation of the author's name and/or one or more representations of the title in Romance languages. Refer to the alternate-script property [Publications30] for more information.
The page-progression-direction
attribute also allows the content flow direction to be globally specified for all Content Documents to facilitate rendering (see the page-progression-direction [Publications30]).
XHTML Content Documents leverage the new HTML5 directionality features to improve support for bidirectional content rendering: the bdi
element allows an instance of directional text to be isolated from the surrounding content, the bdo
element allows directionality to be overridden for its child content and the dir
attribute allows the directionality of any element to be explicitly set.
XHTML Content Documents also support ruby annotations for pronunciation support (which makes them supported in Navigation Document links, as well).
SVG Content Documents support the rendering of bidirectional text, but do not include support for ruby.
EPUB 3's support for new CSS3 modules enables typography for many different languages and cultures. Some specific enhancements include:
support for vertical writing, which also provides Reading Systems the ability to allow users to toggle direction;
better handling of emphasis, such as the inclusion of bōten;
better control over line breaking, so that breaks can occur at the character level for languages that do not use spaces to delimit new words; and
better control over hyphenation, to further facilitate line breaking.
EPUB 3 does not require that Reading Systems come with any particular set of built-in system fonts. As occurs in Web contexts, Users in a particular locale may have installed fonts that omit characters required for other locales, and Reading Systems may utilize intrinsic fonts or font engines that do not utilize operating system installed fonts. As a result, the text content of a Publication might not natively render as intended on all Reading Systems.
To address this problem, EPUB 3 supports the embedding of fonts to facilitate the rendering of text content, and this practice is recommended in order to ensure content is rendered as intended.
Support for embedded fonts also ensures that Publication-specific characters and glyphs can be embedded for proper display.
EPUB 3's support for PLS documents and SSML attributes increases the pronunciation control that Authors have over the rendering of any natural language in text-to-speech-enabled Reading Systems. Refer to Text-to-speech in the Features section for more information on these capabilities.
The combination of CSS Speech and inline SSML phonemes also allows fine control over ruby.
The OCF container format supports UTF-8, allowing for internationalized file and directory naming of content resources.
A major goal of EPUB is to facilitate content accessibility, and a variety of features in EPUB 3 support this requirement. This section reviews these features, detailing some established best practices for ensuring that EPUB Publications are accessible where applicable.
It is important to note that while accessibility is important in its own right, accessible content is also more valuable content: an accessible Publication will be adaptable to more devices and be easier to reuse, in whole or in part, via human and automated workflows. The EPUB Working Group strongly recommends that Authors use EPUB tools that generate accessible content.
HTML5 supports a number of new elements intended to make markup more semantically meaningful (e.g., section
, nav
, aside
) and introduces more clearly defined semantics for some HTML4 elements. These elements, in conjunction with best practices for authoring well-structured Web content, should be utilized when creating EPUB XHTML Content Documents. These additions allow content to be better grouped and defined, both for representing the structure of documents and to facilitate their logical navigation. XHTML Content Documents also natively support the inclusion of ARIA role and state attributes and events, enhancing the ability of Assistive Technologies to interact with the content.
EPUB 3 further introduces the epub:type [ContentDocs30] attribute, which is meant to be functionally equivalent to the W3C Role Attribute [Role]. This attribute allows any element in an XHTML Content Document to include additional information about its purpose and meaning within the work, using controlled vocabularies and terms. Refer to Semantic Inflection [ContentDocs30] for more information.
The design center of EPUB is dynamic layout: content is typically intended to be formatted on the fly rather than being typeset in a paginated manner in advance (i.e., expecting a particular sized "page"). This core capability is useful, for example, for optimizing rendering onto different sized device screens or window sizes, and it facilitates and simplifies content accessibility.
While it is possible to incorporate more highly formatted content in EPUB — for example via bitmap images or SVG graphics, or even use of CSS explicit positioning and/or table elements to achieve particular visual layouts — Authors are strongly discouraged from utilizing such techniques. They are not reliable in EPUB since many Reading Systems render content in a paginated manner rather than creating a single scrolling Viewport and since each Reading System may define its own pagination algorithm. If these techniques are required to convey the content of the publication (for example, for graphic novels), fallbacks [Publications30] should always be included.
In general, it is preferable to achieve visual richness by using EPUB Style Sheets without absolute sizing or positioning.
Aural renditions of content are important for accessibility and are a desirable feature for many other Users. A baseline to facilitate aural rendering is to utilize semantic HTML designed for dynamic layout. Refer to Text-to-speech for more information on how to use the native facilities that EPUB XHTML Documents include.
Media Overlays provide the ability to synchronize the text and audio content of a Publication, a feature already familiar to readers of DAISY Digital Talking Books. Overlays transcend the accessibility domain in their usefulness: the synchronization of text and audio as a tool for learning to read, for example, being of benefit in many circumstances.
Not all formats are accessible in their native format, and not all Users prefer to read in the default format provided. EPUB defines a variety of means for providing fallbacks so that alternate renditions of a Publication can be made available in these cases.
Publication and content-level fallbacks are defined in Restrictions and Fallbacks [Publications30]. These allow for the alternate rendition of specific resources within a Publication, such as SVG images or video clips.
In addition, multiple instances of a complete work can be delivered in a single Publication by defining multiple rootfile
elements in the OCF container file (as described in Container – META-INF/container.xml [OCF3]). This kind of fallback may be used, for example, so that a formatted graphic novel defined via a sequence of SVG pages can be accompanied by an accessible text version defined via XHTML.
EPUB 3 adopts a progressive enhancement approach for scripted content, whereby scripting must not interfere with the integrity of the document (i.e., must not result in information loss when scripting is not available). Consequently, although documents that do employ scripting may provide fallbacks [ContentDocs30] to further facilitate access to their contents, the documents must be accessible without them.
Several mechanisms in EPUB can further minimize and constrain scripting within Publications to improve accessibility:
The declarative trigger element [ContentDocs30] added to the EPUB HTML5 profile enables image or textual elements to act as controls for audio and video playback (for example, to start, stop and pause playback). This element eliminates the common use of scripting to include similar functionality.
The mediaType element [Publications30] provides a means of encapsulating script-based support for rendering custom XML vocabularies or other custom content types, as well as future-proofs Publications in case such content types are natively supported in future Reading Systems.
The semantic inflection capability provided by the type attribute [ContentDocs30] enables Authors to provide hints to Reading Systems about content properties. One use case is to define elements such as images and video as having a zoomable property value, in which case a Reading System may provide a means for Users to access an expanded view that is out-of-line with the normal layout. Such rollover effects are typically implemented via scripting in Web contexts, but scripting cannot be readily implemented given the wide variety of layouts that a Reading System may generate.
The switch element [ContentDocs30] provides a declaractive means for Authors to tailor the content displayed to Users without having to resort to scripted solutions.
Best practices for accessible scripting in Web documents, such as provided in [WAI-ARIA], should always be consulted, and use of scripting should be reserved for situations in which interactivity is critical to the User experience.
This appendix is informative
A logical document entity consisting of a set of interrelated resources and packaged in an EPUB Container, as defined by the EPUB 3 specifications.
A resource that contains content or instructions that contribute to the logic and rendering of the EPUB Publication. In the absence of this resource, the Publication might not render as intended by the Author. Examples of Publication Resources include the Package Document, EPUB Content Documents, EPUB Style Sheets, audio, video, images, embedded fonts and scripts.
With the exception of the Package Document itself, Publication Resources must be listed in the manifest [Publications30] and must be bundled in the EPUB container file unless specified otherwise in Publication Resource Locations [Publications30].
Examples of resources that are not Publication Resources include those identified by the Package Document link [Publications30] element and those identified in outbound hyperlinks that resolve outside the EPUB Container (e.g., referenced from an [HTML5] a
element href
attribute).
A Publication Resource that conforms to one of the EPUB Content Document definitions (XHTML or SVG).
An EPUB Content Document is a Core Media Type, and may therefore be included in the EPUB Publication without the provision of fallbacks [Publications30].
An EPUB Content Document conforming to the profile of [HTML5] defined in XHTML Content Documents [ContentDocs30].
XHTML Content Documents use the XHTML syntax of [HTML5].
An EPUB Content Document conforming to the constraints expressed in SVG Content Documents [ContentDocs30].
A specialization of the XHTML Content Document, containing human- and machine-readable global navigation information, conforming to the constraints expressed in EPUB Navigation Documents [ContentDocs30].
A set of Publication Resource types for which no fallback is required. Refer to Publication Resources [Publications30] for more information.
A Publication Resource carrying bibliographical and structural metadata about the EPUB Publication, as defined in Package Documents [Publications30].
The digital (or physical) embodiment of a work of intellectual content. Changes to the content such as significant revision, abridgement, translation, or the realization of the content in a different digital or physical form result in a new manifestation. There may be many individual but identical copies of a manifestation, termed 'instances' or 'items'. The ISBN is an example of a manifestation identifier, and is shared by all instances of that manifestation.
All instances of a manifestation need not be bit-for-bit identical, as minor corrections or revisions are not judged to create a new manifestation or work.
The Unique Identifier is the primary identifier for an EPUB Publication, as identified by the unique-identifier
attribute. The Unique Identifier may be shared by one or many Manifestations of the same work that conform to the EPUB standard and embody the same content, where the differences between the Manifestations are limited to those changes that take account of differences between EPUB Reading Systems (and which themselves may require changes in the ISBN).
The Unique Identifier is less granular than the ISBN. However, significant revision, abridgement, etc. of the content requires a new Unique Identifier.
The Package Identifier allows any instance of an EPUB Publication to be compared against another to determine if they are identical, different versions of the same Manifestation, or unrelated.
Refer to Package Identifier [Publications30] for more information.
An XML document that associates the XHTML Content Document with pre-recorded audio narration in order to provide a synchronized playback experience, as defined in [MediaOverlays30].
A CSS Style Sheet conforming to the CSS profile defined in EPUB Style Sheets [ContentDocs30].
The region of an EPUB Reading System in which the content of an EPUB Publication is rendered visually to a User.
The ZIP-based packaging and distribution format for EPUB Publications defined in [OCF3].
The person(s) or organization responsible for the creation of an EPUB Publication, which is not necessarily the creator of the content and resources it contains.
An individual that consumes an EPUB Publication using an EPUB Reading System.
A system that processes EPUB Publications for presentation to a User in a manner conformant with the EPUB 3 specifications.
This appendix is informative
EPUB has been developed by the International Digital Publishing Forum in a cooperative effort, bringing together publishers, vendors, software developers, and experts in the relevant standards.
The EPUB 3 specifications were prepared by the International Digital Publishing Forum’s EPUB Maintenance Working Group, operating under a charter approved by the membership in May, 2010 under the leadership of:
Active members of the working group at the time of publication of revision 3.0 were:
IDPF Members
Invited Experts/Observers
Version 2.0.1 of this specification was prepared by the International Digital Publishing Forum’s EPUB Maintenance Working Group under the leadership of:
Active members of the working group at the time of publication of revision 2.0.1 were:
Version 1.0 of this specification was prepared by the International Digital Publishing Forum’s Unified OEBPS Container Format Working Group under the leadership of:
Active members of the working group at the time of publication of revision 1.0 were:
[CSS2.1] Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification . 7 June 2011.
[CSS3Speech] CSS3 Speech Module .
[ContentDocs30] EPUB Content Documents 3.0 .
[MediaOverlays30] EPUB Media Overlays 3.0 .
[OCF3] Open Container Format 3.0 .
[OPS2] Open Publication Structure 2.0.1 .
[OpenType] ISO/IEC 14496-22:2009 - Information technology -- Coding of audio-visual objects -- Part 22: Open Font Format .
[PLS] Pronunciation Lexicon Specification 1.0 (PLS) . 14 October 2008.
[Publications30] EPUB Publications 3.0 .
[SSML] Speech Synthesis Markup Language (SSML) Version 1.1 . 7 September 2010.
[SVG] Scalable Vector Graphics (SVG) 1.1 (Second Edition) . 09 June 2011.
[WAI-ARIA] Accessible Rich Internet Applications (WAI-ARIA) 1.0 .
[WOFF] WOFF File Format 1.0 .
[EPUB3Changes] EPUB 3 Differences from EPUB 2.0.1 .
[Role] Role Attribute . An attribute to support the role classification of elements. 05 August 2010.