Working Group Draft 15 February 2011
Copyright © 2010, 2011 International Digital Publishing Forum™
All rights reserved. This work is protected under Title 17 of the United States Code. Reproduction and dissemination of this work with changes is prohibited except with the written permission of the International Digital Publishing Forum (IDPF).
EPUB is a registered trademark of the International Digital Publishing Forum.
Table of Contents
application/epub+zip
media type
This specification, EPUB Open Container Format (OCF) 3.0, defines a file format and processing model for encapsulating the set of related resources that comprise an EPUB® Publication — including any alternate renditions — into a single-file container. It also defines a standard method for obfuscating embedded fonts for those EPUB publications that require this functionality.
This specification is one of a family of related specifications that compose EPUB 3, the third major revision of an interchange and delivery format for digital publications based on XML and Web Standards. It is meant to be read and understood in concert with the other specifications that make up EPUB 3:
The EPUB 3 Overview [EPUB3Overview], which should be read first, provides an informative overview of EPUB and a roadmap to the rest of the EPUB 3 documents.
EPUB Publications 3.0 [Publications30], which defines publication-level semantics and overarching conformance requirements for EPUB Publications.
EPUB Content Documents 3.0 [ContentDocs30], which defines profiles of XHTML, SVG and CSS for use in the context of EPUB Publications.
EPUB Media Overlays 3.0 [MediaOverlays30], which defines a format and a processing model for synchronization of text and audio.
This specification supersedes Open Container Format (OCF) 2.0.1 [OCF2]. Refer to [EPUB3Changes] for information on differences between this specification and its predecessor.
A logical document entity consisting of a set of interrelated resources and packaged in a EPUB Container, as defined by this specification and its sibling specifications .
A resource that contains content or instructions that contribute, directly or indirectly, to the logic and rendering of the EPUB Publication (e.g., the Package Document, EPUB Content Documents, EPUB Style Sheets, audio, video, images, embedded fonts, scripts). In the absence of this resource, the Publication cannot be rendered as intended by the Author.
Publication resources are listed in the manifest [Publications30] .
A Publication Resource that conforms to one of the EPUB Content Document definitions (XHTML or SVG).
An EPUB Content Document is a Core Media Type, and may therefore be included in the EPUB Publication without the provision of fallbacks [Publications30] .
An EPUB Content Document conforming to the profile of [HTML5] defined in XHTML Content Documents [ContentDocs30] .
XHTML Content Documents use the XHTML syntax of [HTML5].
An EPUB Content Document conforming to the constraints expressed in SVG Content Documents [ContentDocs30] .
A set of Publication Resource types for which no fallback is required. Refer to Core Media Types [Publications30] for more information.
A Publication Resource carrying bibliographical and structural metadata about the EPUB Publication, as defined in Package Documents [Publications30] .
A CSS Style Sheet conforming to the CSS profile defined in EPUB Style Sheets [ContentDocs30] .
A ZIP-based packaging and distribution format for an EPUB Publication, as defined in OCF Physical Container: The ZIP Container .
The person(s) or organization responsible for the creation of an EPUB Publication, which may or may not be the creator of the content and resources it contains.
A system that processes EPUB Publications for presentation to Users in a manner conformant with this specification and its sibling specifications .
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
All sections of this specification are normative except where identified by the informative status label "This section is informative". The application of informative status to sections and appendices applies to all child content and subsections they may contain.
All examples in this specification are informative.
The term “Conforming OCF Abstract Container” indicates an OCF Abstract Container that conforms to all of the relevant conformance criteria defined in this specification.
The term “Conforming OCF ZIP Container” indicates a ZIP archive that conforms to the relevant ZIP container conformance criteria (see OCF Physical Container: The ZIP Container ) and whose contents is a Conforming OCF Abstract Container.
The term “Conforming EPUB Reading System” indicates an EPUB Reading System that supports all of the mandatory features defined by this specification.
OCF collects a related set of publication resources into a predictable, machine-readable structure, encapsulated in a single file. The standardized container structure enables easy transport of, management of, and random access to, the collection.
OCF is the required container technology for EPUB publications. OCF may play a role in the following workflows:
During the preparation steps in producing an electronic publication, OCF may be used as the container format when exchanging in-progress publications between different individuals and/or different organizations.
When providing an electronic publication from publisher or conversion house to the distribution or sales channel, OCF is the recommended container format to be used as the transport format.
When delivering the final publication to an EPUB Reading System or User, OCF is the required format for the container that holds all of the assets that make up the publication.
OCF defines the rules for structuring the file collection in the abstract: the “abstract container”. It also defines the rules for the representation of this abstract container within a ZIP archive: the "physical container". The rules for ZIP physical containers build upon, and are backward compatible with, the ZIP technologies used by [ODF]. OCF also defines a standard method for obfuscating embedded fonts for those EPUB publications that require this functionality.
An OCF Abstract Container defines a file system model for the contents of the
container. The file system model uses a single common root directory for all of the
contents of the container. All (non-remote) resources for embedded publications are
located within the directory tree headed by the container’s root directory, although
no specific file system structure is mandated for this. The file system model also
includes a mandatory directory named META-INF
that is a direct
child of the root directory and which is used to store the following special
files:
container.xml
[required]
Identifies the file that is the point of entry for each embedded publication.
signatures.xml
[optional]
Contains digital signatures for various assets.
encryption.xml
[optional]
Contains information about the encryption of publication resources. (This file is required if font obfuscation is used.)
metadata.xml
[optional]
Used to store metadata about the container.
rights.xml
[optional]
Used to store information about digital rights.
manifest.xml
[optional]
A manifest of container contents, compatible with Open Document Format (ODF).
Complete conformance requirements for the various files in
META-INF
are found in META-INF.
The virtual file system for the OCF Abstract Container must have a single common root directory for all of the contents of the container.
The OCF Abstract Container must include a directory named
META-INF
at the root level of the virtual file system.
Requirements for the contents of this directory are described in META-INF.
The file name mimetype
in the root directory is reserved for
use by OCF ZIP Containers, as explained in
OCF Physical Container: The ZIP Container
.
All other files used by the publication rendition(s) within the OCF Abstract
Container may be in any location descendant from the root directory except for
mimetype
at the root level or within the
META-INF
directory.
It is recommended that the contents of individual publications be stored within dedicated directories under the root to minimize potential file name collisions in the event that multiple renditions are used.
Files within the OCF Abstract Container must reference each other via Relative IRI
References ([RFC3987] and [RFC3986]). For
example, if a file named chapter1.html
references an image file
named image1.jpg
that is located in the same directory, then
chapter1.html
might contain the following as part of its
content:
<img src="image1.jpg" alt="…" />
For Relative IRI References, the Base IRI [RFC3986] is determined by the relevant language specifications for the given file formats. For example, the CSS specification defines how relative IRI references work in the context of CSS style sheets and property declarations. Note that some language specifications reference RFCs that preceded RFC3987, in which case the earlier RFC applies for content in that particular language.
Unlike most language specifications, the Base IRIs for all files within the
META-INF/
directory use the root folder for the Abstract
Container as the default Base IRI. For example, if
META-INF/container.xml
has the following content:
<?xml version="1.0"?> <container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container"> <rootfiles> <rootfile full-path="OEBPS/Great Expectations.opf" media-type="application/oebps-package+xml" /> </rootfiles> </container>
then the path OEBPS/Great Expectations.opf
is relative to the
root directory for the OCF Abstract Container and not relative to the
META-INF/
directory.
The term File Name represents the name of any type of file, either a directory or an ordinary file within a directory within an OCF Abstract Container. For a given directory within the OCF Abstract Container, the Path Name is a string holding all directory names in the full path concatenated together with a “/” (ASCII 0x2F) character separating the directory names. For a given file within the Abstract Container, the Path Name is the string holding all directory names concatenated together with a “/” character separating the directory names, followed by a “/” character and then the name of the file. The File Name restrictions described below are designed to allow directory names and file names to be used without modification on most commonly used operating systems. This specification does not specify how an OCF Processor that is unable to represent OCF conforming File Names would compensate for this incompatibility.
The following statements apply to Conforming OCF Abstract Containers:
File Names must be UTF-8 [Unicode] encoded.
File Names must not exceed 255 bytes.
The Path Name for any directory or file within the Abstract Container must not exceed 65535 bytes.
File Names must not use the following characters, as these characters may not be supported always across commonly-used operating systems:
Slash: /
(ASCII 0x2F)
Double quote: "
(ASCII 0x22)
Asterisk: *
(ASCII 0x2A)
Period as the last character: .
(ASCII
0x3A)
Colon: :
(ASCII 0x3A)
Less than: <
(ASCII 0x3C)
Greater than: >
(ASCII 0x3E)
Question mark: ?
(ASCII 0x3F)
Back slash : \
(ASCII 0x5C)
File Names are case sensitive.
Two File Names within the same directory must not map to the same string following case normalization as described in section 3.13 of [Unicode]. Two File Names that differ only in case are disallowed within the same directory.
Two File Names within the same directory may map to the same string following accent normalization.
Some commercial ZIP tools do not support the full Unicode range and may only support the ASCII range for File Names. Content creators who want to use ZIP tools that have these restrictions may find it is best to restrict their File Names to the ASCII range. If the names of files cannot be preserved during the unzipping process, it will be necessary to compensate for any name translation which took place when the files are referenced by URI from within the content.
All valid OCF Abstract Containers must include a directory called
META-INF
at the root level of the container file system.
This directory contains the files specified below that describe the contents,
metadata, signatures, encryption, rights and other information about the contained
publication.
All files specified below must meet the conformance constraints for XML documents defined in XML Document Content Conformance [Publications30] .
Files other than the ones defined below may be included in the
META-INF
directory; OCF Processors must not fail
when encountering such files.
All valid OCF Containers must include a file called
container.xml
within the META-INF
directory at the root level of the container file system. The
container.xml
file must identify the MIME type of, and
path to, the rootfile for the EPUB rendition of the publication and any optional
alternate renditions included within the container.
The container.xml
file must not be encrypted.
The schema for container.xml
files is available in Schema for container.xml
. Conforming
container.xml
files must be valid according to this
schema after removing all elements and
attributes from other namespaces (including all child nodes of such elements).
The rootfiles
element must contain one or more
rootfile
elements, each of which must uniquely reference a
single rendition of the contained publication.
For EPUB Publications, there must be at least one
rootfile
element with a media-type
of
application/oebps-package+xml
. The target of such a
reference (a Package Document) must not be
encrypted.
A Reading System should consider the first rootfile
element
within the rootfiles
element to be the default rendition for the
contained publication.
The following example shows a sample container.xml
for an EPUB Publication with the root file OEBPS/My Crazy
Life.opf
(the OPF package file):
<?xml version="1.0"?> <container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container"> <rootfiles> <rootfile full-path="OEBPS/My Crazy Life.opf" media-type="application/oebps-package+xml" /> </rootfiles> </container>
The following example adds an alternate PDF version of the publication:
<?xml version="1.0"?> <container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container"> <rootfiles> <rootfile full-path="OEBPS/My Crazy Life.opf" media-type="application/oebps-package+xml" /> <rootfile full-path="PDF/My Crazy Life.pdf" media-type="application/pdf" /> </rootfiles> </container>
The manifest
element contained within the Package Document specifies the one and only manifest used
for EPUB processing. Ancillary manifest information contained in the ZIP archive
or in the optional manifest.xml
file must not be used for
EPUB processing purposes. Any extra files in the ZIP archive must not be used
in the processing of the EPUB publication (i.e., files within the ZIP archive that are
not listed within the package files’ manifest
element, such as
META-INF
files or alternate derived
renditions of the publication).
The value of the full-path
attribute must contain a path
component
(as defined by RFC3986) which must only take the form of a
path-rootless
(also defined by RFC 3986). The path
components are relative to the root of the container in which they are
used.
Conforming OCF Processors must ignore unrecognized elements (and their
contents) and unrecognized attributes within a
container.xml
file, including unrecognized elements and
unrecognized attributes from other namespaces.
An optional encryption.xml
file within the
META-INF
directory at the root level of the container
file system holds all encryption information on the contents of the container.
This file is an XML document whose root element is encryption
. The
encryption
element contains child elements of type
EncryptedKey
and EncryptedData
as defined by [XML ENC Core]. Each EncryptedData
element describes
how one or more container files are encrypted. Consequently, if any resource
within the container is encrypted, encryption.xml
must be
present to indicate that the resource is encrypted and provide information on
how it is encrypted.
An EncryptedKey
element describes each encryption key used in the
container, while an EncryptedData
element describes each encrypted
file. Each EncryptedData
element refers to an
EncryptedKey
element, as described in XML Encryption.
The schema for encryption.xml
files is available in Schema for encryption.xml
. Conforming
signatures.xml
files must be valid according to this
schema after removing all elements and
attributes from other namespaces (including all child nodes of such elements).
When the encryption.xml
file is not present, the OCF
Abstract Container provides no information indicating any part of the container
is encrypted.
OCF encrypts individual files independently, trading off some security for improved performance, allowing the container contents to be incrementally decrypted. Encryption in this way exposes the directory structure and file naming of the whole package.
OCF uses XML Encryption [XML ENC Core] to provide a framework
for encryption, allowing a variety of algorithms to be used. XML Encryption
specifies a process for encrypting arbitrary data and representing the result in
XML. Even though an OCF Abstract Container may contain non-XML data, XML
Encryption can be used to encrypt all data in an OCF Abstract Container. OCF
encryption supports only encryption of whole files. The
encryption.xml
file, if present, must not be
encrypted.
Encrypted data replaces unencrypted data in an OCF Abstract Container. For
example, if an image named photo.jpeg
is encrypted, the
contents of the photo.jpeg
resource should be replaced by
its encrypted contents. When stored in a ZIP container, streams of data must be
compressed before they are encrypted and Deflate compression must be used. Within
the ZIP directory, encrypted files should be stored rather than
Deflate-compressed.
Some situations require obfuscating the storage of embedded fonts
referenced by an EPUB Publication to tie them to the “parent” publication and make them more
difficult to extract for unrestricted use. In these cases,
encryption.xml
should be used to provide requisite font
decoding information according to
Font Obfuscation
.
The following files must never be encrypted, regardless of whether default or specific encryption is requested:
mimetype
|
META-INF/container.xml
|
META-INF/encryption.xml
|
META-INF/manifest.xml
|
META-INF/metadata.xml
|
META-INF/rights.xml
|
META-INF/signatures.xml
|
EPUB rootfile (the Package Document)
|
Signed resources may subsequently be encrypted using the Decryption Transform for XML Signature [XML SIG Decrypt]. This feature enables an application such as an OCF agent to distinguish data that was encrypted before signing from data that was encrypted after signing. Only data that was encrypted after signing must be decrypted before computing the digest used to validate the signature.
In the following example, adapted from Section 2.2.1 of [XML ENC Core] the resource image.jpeg is encrypted using a symmetric key algorithm (AES) and the symmetric key is further encrypted using an asymmetric key algorithm (RSA) with a key of John Smith.
<encryption xmlns ="urn:oasis:names:tc:opendocument:xmlns:container" xmlns:enc="http://www.w3.org/2001/04/xmlenc#" xmlns:ds="http://www.w3.org/2000/09/xmldsig#"> <enc:EncryptedKey Id="EK"> <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#rsa-1_5"/> <ds:KeyInfo> <ds:KeyName>John Smith</ds:KeyName> </ds:KeyInfo> <enc:CipherData> <enc:CipherValue>xyzabc</enc:CipherValue> </enc:CipherData> </enc:EncryptedKey> <enc:EncryptedData Id="ED1"> <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#kw-aes128"/> <ds:KeyInfo> <ds:RetrievalMethod URI="#EK" Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey"/> </ds:KeyInfo> <enc:CipherData> <enc:CipherReference URI="image.jpeg"/> </enc:CipherData> </enc:EncryptedData> </encryption>
An optional file with the reserved name manifest.xml
may be included within the META-INF
directory at the root level of the container file system. If present, the content
of this file must be as defined in the ODF
1.0 manifest schema.
The manifest.xml
file, if present, must not be
encrypted.
An optional file with the reserved name metadata.xml
may be included within the
META-INF
directory at the root level of the container
file system. This file, if present,
must be used for container-level metadata. This version of the OCF specification does
not specify any container-level metadata.
If the META-INF/metadata.xml
file is present, its
contents should be only namespace-qualified elements [XMLNS] to avoid collision with future versions of OCF
that may specify a particular format for this file.
The metadata.xml
file, if present, must not be
encrypted.
An optional file with the reserved name rights.xml
may
be included within the META-INF
directory at the root level of the container file system. This file
is reserved for digital rights management (DRM) information for trusted exchange
of Publications among rights holders, intermediaries, and users. This version of the
OCF specification does not specify a required format for DRM information, but a future version
may specify a particular format for DRM
information.
If the META-INF/rights.xml
file is present, its contents
should be only namespace-qualified elements [XMLNS]
to avoid collision with future versions of OCF that may specify a particular
format for this file.
The rights.xml
file must not be encrypted.
When the rights.xml
file is not present, the OCF
container provides no information indicating any part of the container is rights
governed.
An optional signatures.xml
within the
META-INF
directory at the root level of the container
file system holds digital signatures of the container and its contents. This
file is an XML document whose root element is signatures
. The
signatures
element contains child elements of type
Signature
as defined by [XML DSIG Core].
Signatures can be applied to the publication and any alternate renditions as a
whole or to parts of the publication and renditions. XML Signature can specify
the signing of any kind of data, not just XML.
The signatures.xml
file must not be encrypted.
When the signatures.xml
file is not present, the OCF
container provides no information indicating any part of the container is
digitally signed at the container level. It is possible that digital signing
exists within any optional alternate contained renditions, however.
The schema for signatures.xml
files is available in Schema for signatures.xml
. Conforming
signatures.xml
files must be valid according to this
schema after removing all elements and
attributes from other namespaces (including all child nodes of such elements).
When an OCF agent creates a signature of data in a container, it should add
the new signature as the last child Signature
element of the
signatures
element in the signatures.xml
file.
Each Signature
in the signatures.xml
file identifies by IRI the data to which the signature applies, using the
XML Signature Manifest
element and its Reference
sub-elements. Individual contained files may be signed separately or
together. Separately signing each file creates a digest value for the
resource that can be validated independently. This approach may make a
Signature element larger. If files are signed together, the set of signed
files can be listed in a single XML Signature Manifest
element
and referenced by one or more Signature
elements.
Any or all files in the container can be signed in their entirety with the
exception of the signatures.xml
file since that file will
contain the computed signature information. Whether and how the
signatures.xml
file should be signed depends on the
objective of the signer.
If the signer wants to allow signatures to be added or removed from the
container without invalidating the signer’s signature, the
signatures.xml
file should not be signed.
If the signer wants any addition or removal of a signature to invalidate the
signer’s signature, the Enveloped Signature transform (defined in Section 6.6.4 of [XML DSIG Core]) can be used to
sign the entire preexisting signature file excluding the Signature
being created. This transform would sign all previous signatures, and it would
become invalid if a subsequent signature was added to the package.
If the signer wants the removal of an existing signature to invalidate the signer’s signature but also wants to allow the addition of signatures, an XPath transform can be used to sign just the existing signatures. (This is only a suggestion. The particular XPath transform is not a part of the OCF specification.)
XML-Signature does not associate any semantics with a signature; an agent may,
however, include semantic information, for example, by adding information to the
Signature element that describes the signature. XML Signature describes how
additional information can be added to a signature (for example, by using the
SignatureProperties
element).
The following XML expression shows the content of an example
signatures.xml
file, and is based on the examples
found in Section 2 of [XML DSIG Core]. It contains one
signature, and the signature applies to two resources,
OEBFPS/book.html
and
OEBFPS/images/cover.jpeg
, in the container.
<signatures xmlns="urn:oasis:names:tc:opendocument:xmlns:container"> <Signature Id="sig" xmlns="http://www.w3.org/2000/09/xmldsig#"> <SignedInfo> <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/> <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/> <Reference URI="#Manifest1"> <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/> <DigestValue>j6lwx3rvEPO0vKtMup4NbeVu8nk=</DigestValue> </Reference> </SignedInfo> <SignatureValue>…</SignatureValue> <KeyInfo> <KeyValue> <DSAKeyValue> <P>…</P><Q>…</Q><G>…</G><Y>…</Y> </DSAKeyValue> </KeyValue> </KeyInfo> <Object> <Manifest Id="Manifest1"> <Reference URI="OEBFPS/book.xml"> <Transforms> <Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/> </Transforms> <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/> <DigestValue></DigestValue> </Reference> <Reference URI="OEBFPS/images/cover.jpeg"> <Transforms> <Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/> </Transforms> <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/> <DigestValue></DigestValue> </Reference> </Manifest> </Object> </Signature> </signatures>
This section is informative
An OCF ZIP Container is a physical single-file manifestation of an abstract container.
An OCF ZIP Container uses the ZIP format as specified by [ZIP APPNOTE], but with the following constraints and clarifications:
› Conforming OCF ZIP Containers must not use the features in the ZIP application note that allow ZIP files to be split across multiple storage media. Conforming EPUB Reading System must treat any OCF files that specify that the ZIP file is split across multiple storage media as being in error.
› Conforming OCF ZIP Containers must only include uncompressed files or Deflate-compressed files within the ZIP archive. Conforming EPUB Reading Systems must treat any OCF Containers that use compression techniques other than Deflate as being in error.
› Conforming OCF ZIP Containers may use the ZIP64 extensions defined as "Version 1" in section V, subsection G of the application note at [ZIP APPNOTE] and should only use those extensions when the content requires them. Conforming OCF Processors must support the ZIP64 extensions defined as "Version 1".
› Conforming OCF ZIP Containers must not use the encryption features defined by the ZIP format; instead, encryption must be done using the features described in Encryption – META-INF/encryption.xml. Conforming EPUB Reading Systems must treat any other OCF ZIP Containers that use ZIP encryption features as being in error.
› It is not a requirement that Conforming OCF
Processors preserve information from an OCF ZIP Container through load and
save operations that do not map to corresponding representation within the
OCF Abstract Container; in particular, a Conforming OCF Processor does not
have to preserve CRC values, comment fields or fields that hold file system
information corresponding to a particular operating system (e.g.,
External file attributes
and Extra
field
).
› Conforming OCF ZIP Containers must encode File System Names using UTF-8 [Unicode].
The following constraints apply to particular fields in the OCF ZIP Container archive:
› In the local file header table,
Conforming OCF ZIP Containers must set the version needed to
extract
fields to the values 10
, 20
or
45
in order to match the maximum version level needed by
the given file (e.g., 20
if Deflate is needed, 45
if ZIP64 is needed). Conforming OCF Processors must treat any other values
as being in error.
› In the local file header table, Conforming
OCF ZIP Containers must set the compression
method field to the
values 0
or 8
. Conforming OCF Processors must
treat any other values as being in error.
› Conforming OCF Processors must treat OCF ZIP
Containers with an Archive decryption header
or an
Archive extra data record
as being in error.
It is frequently necessary for applications to determine the media type of a file,
which is usually accomplished by inspecting the file extension. OCF ZIP Containers
should facilitate this form of rapid determination of their format by processing
applications by using the .epub
extension.
In order to translate a file extension into a media type, a processing agent
typically will register the relationship between file extension and media type with
the operating system. Applications that are interested in OCF ZIP Container files
should register the media type application/epub+zip
as
corresponding to the file extension .epub
.
File extensions do not provide a reliable means of identifying file formats,
however. As a result, a more robust means of identifying files independent of their
file names or extensions is also necessary. The method for such identification in
ZIP archives is the inclusion of an uncompressed, unencrypted ASCII file named
mimetype
, where the contents of this file identify the
media type of the Container. OCF ZIP Containers must consequently include a
mimetype
file as the first file in the Container, and the
contents of this file must be the MIME type string
application/epub+zip
.
The contents of the mimetype
file must not contain any
leading padding or whitespace and the case of the MIME type string must be exactly
as presented above. The mimetype
file additionally must be
neither compressed nor encrypted, and there must not be an extra field in its ZIP
header.
When constructed with a conformant mimetype
file, the ZIP
Container offers convenient magic number support as described in RFC 2048 and the following will hold true:
the bytes PK
will be at the beginning of the
file;
the bytes mimetype
will be at position 30
;
and
the actual MIME type will begin at position 38
(i.e., the
ASCII string application/epub+zip
).
Since an OCF Zip Container is fundamentally a ZIP file, commonly available ZIP tools can be used to extract any unencrypted content stream from the package. On some systems, the contents of the ZIP file may appear like any other native container (e.g., a folder). While the ability to do this is quite useful, it can pose a problem for an Author who wishes to include a third-party font. Many commercial fonts allow embedding, but embedding a font implies making it an integral part of the publication, not providing the original font file along with the content. Since integrated ZIP support is so ubiquitous in modern operating systems, simply placing the font in the ZIP archive is insufficient to signify that the font is not intended to be reused in other contexts. This uncertainty can undermine the otherwise very useful font embedding capability that OPF/OPS provides.
In order to discourage reuse of the font, some font vendors may allow use of their fonts in EPUB publications if those fonts are bound in some way to the publication. That is, if the font file cannot be installed directly for use on an operating system with the built-in tools of that computing device, and it cannot be directly used by other EPUB publications. It is beyond the scope of this document to provide a digital rights management or enforcement system for font files. It will instead propose a method of obfuscation that will require additional work on the part of the final OCF recipient to gain general access to any included fonts. It is the hope of the IDPF that this will meet the requirements of most font vendors. No claim is made in this document or by the IDPF, however, that this constitutes encryption, nor does it guarantee that the font file will be secure from copyright infringement. The proposed mechanism will simply provide a stumbling block for those who are unaware of the license details of the supplied font. It will not prevent a determined user from gaining full access to the font. Given the original OCF Container, it is possible to apply the algorithms described in this document to extract the raw font file. Whether this satisfies the requirements of individual font licenses remains a question for the licensor and licensee.
The algorithm employed to obfuscate the font file consists of modifying the first 1040 bytes (~1KB) of the font file. In the unlikely event that the file is less than 1040 bytes, then the entire file will be modified. The key for the algorithm must be a 20 byte (160 bit) [SHA-1] digest of the publication's unique identifier. Details on generating this key are given in the section Generating the Obfuscation Key. To obfuscate the original data, the result of performing a logical exclusive or (XOR) on the first byte of the raw file and the first byte of the key is stored as the first byte of the embedded font file. This process is repeated with the next byte of source and key, until all bytes in the key have been used. At this point, the process continues starting with the first byte of the key and 21st byte of the source. Once 1040 bytes have been encoded in this way (or the end of the source is reached), any remaining data in the source is directly copied to the destination. In pseudo-code, this is the algorithm:
set source to font file set destination to obfuscated file set keyData to key for font set outer to 0 while outer < 52 and not (source at EOF) set inner to 0 while inner < 20 and not (source at EOF) read 1 byte from source //Assumes read advances file position set sourceByte to result of read set keyByte to byte inner of keyData set obfuscatedByte to (sourceByte XOR keyByte) write obfuscatedByte to destination increment inner end while increment outer end while if not (source at EOF) then read source to EOF write result of read to destination end if
To get the original font data back, the process is simply reversed. That is, the source file becomes the obfuscated data and the destination file will contain the raw font data.
To tie a font to a particular EPUB publication, it is necessary to bind to a
unique property of that publication. Such a value is required by the EPUB
Publications 3.0 specification, as detailed in
Unique Identifier
[Publications30]
. Every compliant
OPF file has a dc:identifier
element which uniquely identifies the
publication. The OPF 3.0 specification details finding this element by examining the
unique-identifier
attribute of the package file's
package
element. This element provides the required characteristic
of being unique to a publication; it is not suitable for use directly as the
obfuscation key, however (for instance, its length is not defined).
In order to create a suitable key that is tied to the publication, an SHA-1 digest
of the unique identifier should be generated as specified by the Secure Hash
Standard [SHA-1]. Before generating the digest, all whitespace
characters as defined by the XML 1.0 specification [XML], section
2.3 are removed. Specifically the Unicode code points 0x20
,
0x09
, 0x0D
and 0x0A
must be stripped from
the string before the digest is computed. This digest is then directly used as the
key for the algorithm described in Obfuscation Algorithm.
All encrypted data in an OCF Abstract Container must have an entry in the
encryption.xml
file accompanying the publication (see Encryption – META-INF/encryption.xml), which includes fonts
obfuscated using the method described here. For such obfuscated fonts, in the
encryption.xml
file, the EncryptionMethod
element child of the EncryptedData
must have an Algorithm
attribute with the value http://www.idpf.org/2008/embedding
. The
presence of this attribute signals the use of the algorithm described in this
specification. All resources that have been obfuscated using this approach must be
listed in the CipherData
element.
An example encryption.xml
file might look like
this:
<encryption xmlns="urn:oasis:names:tc:opendocument:xmlns:container" xmlns:enc="http://www.w3.org/2001/04/xmlenc#"> <enc:EncryptedData> <enc:EncryptionMethod Algorithm="http://www.idpf.org/2008/embedding"/> <enc:CipherData> <enc:CipherReference URI="OEBPS/Fonts/BKANT.TTF"/> </enc:CipherData> </enc:EncryptedData> </encryption>
To prevent trivial copying of the embedded font to other publications, the
explicit key must not be provided in the encryption.xml
file.
Reading systems that implement this specification must derive the key from the
package's unique identifier.
The schemas in this Appendix are normative. In case of conflicts between the specification prose and the given schema, the schema shall be considered definitive.
container.xml
The schema for container.xml
files is available at http://www.idpf.org/epub/30/schema/ocf-container-30.rnc.
encryption.xml
The schema for encryption.xml
files is available at http://www.idpf.org/epub/30/schema/ocf-encryption-30.rnc.
signatures.xml
The schema for signatures.xml
files is available at http://www.idpf.org/epub/30/schema/ocf-signatures-30.rnc.
The following example demonstrates the use of this OCF format to contain a signed and encrypted EPUB publication with an alternate PDF rendition within a ZIP Container.
Example B.1. Ordered list of files in the ZIP Container
mimetype META-INF/container.xml META-INF/signatures.xml META-INF/encryption.xml OEBPS/As You Like It.opf OEBPS/book.html OEBPS/images/cover.png PDF/As You Like It.pdf
Example B.3. The contents of the META-INF/container.xml
file
<?xml version="1.0"?> <container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container"> <rootfiles> <rootfile full-path="OEBPS/As You Like It.opf" media-type="application/oebps-package+xml" /> <rootfile full-path="OEBPS/As You Like It.pdf" media-type="application/pdf" /> </rootfiles> </container>
Example B.4. The contents of the META-INF/signatures.xml
file
<signatures xmlns="urn:oasis:names:tc:opendocument:xmlns:container"> <Signature Id="AsYouLikeItSignature" xmlns="http://www.w3.org/2000/09/xmldsig#"> <!-- SignedInfo is the information that is actually signed. In this case --> <!-- the SHA1 algorithm is used to sign the canonical form of the XML --> <!-- documents enumerated in the Object element below --> <SignedInfo> <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/> <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/> <Reference URI="#AsYouLikeIt"> <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/> <DigestValue>…</DigestValue> </Reference> </SignedInfo> <!-- The signed value of the digest above using the DSA algorithm --> <SignatureValue>…</SignatureValue> <!-- The key to use to validate the signature --> <KeyInfo> <KeyValue> <DSAKeyValue> <P>…</P> <Q>…</Q> <G>…</G> <Y>…</Y> </DSAKeyValue> </KeyValue> </KeyInfo> <!-- The list documents to sign. Note that the canonical form of XML --> <!-- documents is signed while the binary form of the other documents --> <!-- is used --> <Object> <Manifest Id="AsYouLikeIt"> <Reference URI="OEBPS/As You Like It.opf"> <Transforms> <Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/> </Transforms> <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/> <DigestValue></DigestValue> </Reference> <Reference URI="OEBPS/book.html"> <Transforms> <Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/> </Transforms> <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/> <DigestValue></DigestValue> </Reference> <Reference URI="OEBPS/images/cover.png"> <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/> <DigestValue></DigestValue> </Reference> <Reference URI="PDF/As You Like It.pdf"> <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/> <DigestValue></DigestValue> </Reference> </Manifest> </Object> </Signature> </signatures>
Example B.5. The contents of the META-INF/encryption.xml
file
<?xml version="1.0"?> <encryption xmlns="urn:oasis:names:tc:opendocument:xmlns:container" xmlns:enc="http://www.w3.org/2001/04/xmlenc#" xmlns:ds="http://www.w3.org/2000/09/xmldsig#"> <!-- The RSA encrypted AES-128 symmetric key used to encrypt the data --> <enc:EncryptedKey Id="EK"> <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#rsa-1_5"/> <ds:KeyInfo> <ds:KeyName>John Smith</ds:KeyName> </ds:KeyInfo> <enc:CipherData> <enc:CipherValue>xyzabc…</enc:CipherValue> </enc:CipherData> </enc:EncryptedKey> <!-- Each EncryptedData block identifies a single document that has been --> <!-- encrypted using the AES-128 algorithm. The data remains stored in it’s --> <!-- encrypted form in the original file within the container. --> <enc:EncryptedData Id="ED1"> <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#kw-aes128"/> <ds:KeyInfo> <ds:RetrievalMethod URI="#EK" Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey"/> </ds:KeyInfo> <enc:CipherData> <enc:CipherReference URI="OEBPS/book.html"/> </enc:CipherData> </enc:EncryptedData> <enc:EncryptedData Id="ED2"> <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#kw-aes128"/> <ds:KeyInfo> <ds:RetrievalMethod URI="#EK" Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey"/> </ds:KeyInfo> <enc:CipherData> <enc:CipherReference URI="OEBPS/images/cover.png"/> </enc:CipherData> </enc:EncryptedData> <enc:EncryptedData Id="ED3"> <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#kw-aes128"/> <enc:KeyInfo> <enc:RetrievalMethod URI="#EK" Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey"/> </enc:KeyInfo> <enc:CipherData> <enc:CipherReference URI="PDF/As You Like It.pdf"/> </enc:CipherData> </enc:EncryptedData> </encryption>
Example B.6. The contents of the OEBPS/As You Like It.opf
file
<?xml version="1.0"?> <package version="2.0" xmlns="http://www.idpf.org/2007/opf" unique-identifier="Pub-ID"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf"> <dc:identifier id="Pub-ID">urn:uuid:B9B412F2-CAAD-4A44-B91F-A375068478A0</dc:identifier> <dc:title>As You Like It</dc:title> <dc:creator opf:role="aut">William Shakespeare</dc:creator> <dc:identifier>0-7410-1455-6</dc:identifier> <dc:subject/> <dc:type/> <dc:date opf:event="publication">3/24/2000</dc:date> <dc:date opf:event="copyright">1/1/9999</dc:date> <dc:identifier opf:scheme="ISBN">urn:isbn:9780741014559</dc:identifier> <dc:publisher>Project Gutenberg</dc:publisher> <dc:language>en</dc:language> </metadata> <manifest> <item id="4915" href="book.html" media-type="application/xhtml+xml"/> <item id="7184" href="images/cover.png" media-type="image/png"/> <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/> </manifest> <spine toc="ncx"> <itemref idref="4915"/> </spine> </package>
application/epub+zip
media typeThis appendix registers the media type application/epub+zip
for the
EPUB Open Container Format (OCF).
An OCF file is a container technology based on the ZIP archive format. It is used to encapsulate EPUB publications and optional alternate renditions thereof. OCF and its related standards are maintained and defined by the International Digital Publishing Forum (IDPF).
application
epub+zip
None.
None.
OCF files are binary files in ZIP (http://www.iana.org/assignments/media-types/application/zip) format.
All processors that read OCF files should rigorously check the size and validity of data retrieved.
In addition, because of the various content types that can be embedded in OCF files, it is possible that “application/epub+zip” may describe content that has security implications beyond those described here. However, only in the case where the processor recognizes and processes the additional content, or where further processing of that content is dispatched to other processors, would security issues potentially arise. And in that case, they would fall outside the domain of this registration document.
Security considerations that apply to application/zip also apply to OCF files.
None.
This media type registration is for the EPUB Open Container Format (OCF) as described by this specification which is located at http://www.idpf.org/epub/30/spec/epub30-ocf.html.
This specification supersedes Open Container Format 2.0.1 which is located
at http://www.idpf.org/doc_library/epub/OCF_2.0.1_draft.doc, and which
also uses the application/epub+zip
media type.
This media type is in wide use for the distribution of ebooks in the EPUB format. The following list of applications is not exhaustive.
Adobe Digital Editions
Aldiko
Azardi
Apple iBooks
Barnes & Noble Nook
Calibre
Google Books
Ibis Reader
MobiPocket reader
Sony Reader
Stanza
0: PK
, 30: mimetype
,
38: application/epub+zip
OCF files are most often identified with the extension
.epub
.
ZIP
William McCoy, [email protected]
COMMON
International Digital Publishing Forum (http://www.idpf.org)
This appendix is informative
EPUB has been developed by the International Digital Publishing Forum in a cooperative effort, bringing together publishers, vendors, software developers, and experts in the relevant standards.
The EPUB 3 specifications were prepared by the International Digital Publishing Forum’s EPUB Maintenance Working Group, operating under a charter approved by the membership in May, 2010 under the leadership of:
Markus Gylling DAISY Consortium Chair |
Garth Conboy Google Inc. Vice-chair |
Brady Duga Google Inc. Vice-chair |
Bill McCoy International Digital Publishing Forum (IDPF) Secretary |
Active members of the working group included:
Alexis Wiles, Alicia Wise, … TODO : COMPLETE LIST OF CURRENT WG MEMBERS
For more detailed acknowledgements and information about contributors to each version of EPUB, refer to Contributors [EPUB3Overview] .
[ContentDocs30] EPUB Content Documents 3.0 .
[MediaOverlays30] EPUB Media Overlays 3.0 .
[OCF2] Open Container Format 2.0.1 .
[OCF3] Open Container Format 3.0 .
[ODF] JP: Current version of spec is 1.1; update? ODF Open Document Format .
[ODF10 Manifest Schema] ODF 1.0 Manifest Schema .
[Publications30] EPUB Publications 3.0 .
[RFC2048] Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures (RFC 2048). November 1996.
[RFC2119] Key words for use in RFCs to Indicate Requirement Levels (RFC 2119) . March 1997.
[RFC3986] Uniform Resource Identifier (URI): Generic Syntax (RFC 3986) . January 2005.
[RFC3987] Internationalized Resource Identifiers (IRIs) (RFC 3987) . January 2005.
[SHA-1] Federal Information Processing Standards Publication 180-3: Secure Hash Standard (SHS) . October 2008.
[Unicode] The Unicode Consortium. The Unicode Standard, Version 5.0.0, defined by: The Unicode Standard, Version 5.0 (Boston, MA, Addison-Wesley, 2007. ISBN 0-321-48091-0).
[XML] Extensible Markup Language (XML) 1.0 (Fifth Edition) . 26 November 2008.
[XML DSIG Core] JP: 1.1 going to CR in Q1 2011. Update reference as necessary XML-Signature Syntax and Processing (Second Edition) . 10 June 2008.
[XML ENC Core] JP: 1.1 due to go to CR in Q1 2011; update reference as needed XML Encryption Syntax and Processing . 10 December 2002.
[XML SIG Decrypt] Decryption Transform for XML Signature . 10 December 2002.
[XMLNS] Namespaces in XML (Third Edition) . 8 December 2009.
[EPUB3Changes] EPUB 3 Differences from EPUB 2.0.1 . .
[EPUB3Overview] EPUB 3 Overview .
[ZIP APPNOTE] ZIP File Format Specification . September 28, 2007. PKWARE, Inc..