EPUB Open Container Format (OCF) 3.0

Working Group Draft 15 February 2011

This version
http://www.idpf.org/epub/30/spec/epub30-ocf-20110215.html
Latest version
http://www.idpf.org/epub/30/spec/epub30-ocf.html
Previous version
N/A
Diffs to previous version
N/A

Editors (this version)

James Pritchett, Recording for the Blind & Dyslexic

Markus Gylling, DAISY Consortium

Editors (previous versions)

Garth Conboy, eBook Technologies

Table of Contents

1. Overview
1.1. Purpose and Scope
1.2. Terminology
1.3. Conformance
1.3.1. Conformance Statements
1.3.2. Conforming Containers
1.3.3. Conforming Processors
2. Introduction
3. OCF Abstract Container
3.1. Overview
3.2. File and Directory Structure
3.3. Relative IRIs for Referencing Other Components
3.4. File Names
3.5. META-INF
3.5.1. Container – META-INF/container.xml
3.5.2. Encryption – META-INF/encryption.xml
3.5.3. Manifest – META-INF/manifest.xml (Optional)
3.5.4. Metadata – META-INF/metadata.xml
3.5.5. Rights Management – META-INF/rights.xml
3.5.6. Digital Signatures – META-INF/signatures.xml
4. OCF Physical Container: The ZIP Container
4.1. Overview
4.2. ZIP File Requirements
4.3. OCF ZIP Container Media Type Identification
5. Font Obfuscation
5.1. Introduction
5.2. Obfuscation Algorithm
5.3. Generating the Obfuscation Key
5.4. Specifying Obfuscated Resources
A. Schemas
A.1. Schema for container.xml
A.2. Schema for encryption.xml
A.3. Schema for signatures.xml
B. Example
C. The application/epub+zip media type
D. Contributors
D.1. Acknowledgements and Contributors
References

 1 Overview

 1.1 Purpose and Scope

This specification, EPUB Open Container Format (OCF) 3.0, defines a file format and processing model for encapsulating the set of related resources that comprise an EPUB® Publication — including any alternate renditions — into a single-file container. It also defines a standard method for obfuscating embedded fonts for those EPUB publications that require this functionality.

This specification is one of a family of related specifications that compose EPUB 3, the third major revision of an interchange and delivery format for digital publications based on XML and Web Standards. It is meant to be read and understood in concert with the other specifications that make up EPUB 3:

  • The EPUB 3 Overview [EPUB3Overview], which should be read first, provides an informative overview of EPUB and a roadmap to the rest of the EPUB 3 documents.

  • EPUB Publications 3.0 [Publications30], which defines publication-level semantics and overarching conformance requirements for EPUB Publications.

  • EPUB Content Documents 3.0 [ContentDocs30], which defines profiles of XHTML, SVG and CSS for use in the context of EPUB Publications.

  • EPUB Media Overlays 3.0 [MediaOverlays30], which defines a format and a processing model for synchronization of text and audio.

This specification supersedes Open Container Format (OCF) 2.0.1 [OCF2]. Refer to [EPUB3Changes] for information on differences between this specification and its predecessor.

 1.2 Terminology

EPUB Publication (or Publication)

A logical document entity consisting of a set of interrelated resources and packaged in a EPUB Container, as defined by this specification and its sibling specifications .

Publication Resource

A resource that contains content or instructions that contribute, directly or indirectly, to the logic and rendering of the EPUB Publication (e.g., the Package Document, EPUB Content Documents, EPUB Style Sheets, audio, video, images, embedded fonts, scripts). In the absence of this resource, the Publication cannot be rendered as intended by the Author.

Publication resources are listed in the manifest [Publications30] .

EPUB Content Document

A Publication Resource that conforms to one of the EPUB Content Document definitions (XHTML or SVG).

An EPUB Content Document is a Core Media Type, and may therefore be included in the EPUB Publication without the provision of fallbacks [Publications30] .

XHTML Content Document

An EPUB Content Document conforming to the profile of [HTML5] defined in XHTML Content Documents [ContentDocs30] .

XHTML Content Documents use the XHTML syntax of [HTML5].

SVG Content Document

An EPUB Content Document conforming to the constraints expressed in SVG Content Documents [ContentDocs30] .

Core Media Type

A set of Publication Resource types for which no fallback is required. Refer to Core Media Types [Publications30] for more information.

Package Document

A Publication Resource carrying bibliographical and structural metadata about the EPUB Publication, as defined in Package Documents [Publications30] .

EPUB Style Sheet (or Style Sheet)

A CSS Style Sheet conforming to the CSS profile defined in EPUB Style Sheets [ContentDocs30] .

EPUB Container (or Container)

A ZIP-based packaging and distribution format for an EPUB Publication, as defined in OCF Physical Container: The ZIP Container .

Author

The person(s) or organization responsible for the creation of an EPUB Publication, which may or may not be the creator of the content and resources it contains.

EPUB Reading System (or Reading System)

A system that processes EPUB Publications for presentation to Users in a manner conformant with this specification and its sibling specifications .

 1.3 Conformance

 1.3.1 Conformance Statements

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

All sections of this specification are normative except where identified by the informative status label "This section is informative". The application of informative status to sections and appendices applies to all child content and subsections they may contain.

All examples in this specification are informative.

 1.3.2 Conforming Containers

The term “Conforming OCF Abstract Container” indicates an OCF Abstract Container that conforms to all of the relevant conformance criteria defined in this specification.

The term “Conforming OCF ZIP Container” indicates a ZIP archive that conforms to the relevant ZIP container conformance criteria (see OCF Physical Container: The ZIP Container ) and whose contents is a Conforming OCF Abstract Container.

 1.3.3 Conforming Processors

The term “Conforming EPUB Reading System” indicates an EPUB Reading System that supports all of the mandatory features defined by this specification.

 2 Introduction

OCF collects a related set of publication resources into a predictable, machine-readable structure, encapsulated in a single file. The standardized container structure enables easy transport of, management of, and random access to, the collection.

OCF is the required container technology for EPUB publications. OCF may play a role in the following workflows:

  • During the preparation steps in producing an electronic publication, OCF may be used as the container format when exchanging in-progress publications between different individuals and/or different organizations.

  • When providing an electronic publication from publisher or conversion house to the distribution or sales channel, OCF is the recommended container format to be used as the transport format.

  • When delivering the final publication to an EPUB Reading System or User, OCF is the required format for the container that holds all of the assets that make up the publication.

OCF defines the rules for structuring the file collection in the abstract: the “abstract container”. It also defines the rules for the representation of this abstract container within a ZIP archive: the "physical container". The rules for ZIP physical containers build upon, and are backward compatible with, the ZIP technologies used by [ODF]. OCF also defines a standard method for obfuscating embedded fonts for those EPUB publications that require this functionality.

 3 OCF Abstract Container

 3.1 Overview

An OCF Abstract Container defines a file system model for the contents of the container. The file system model uses a single common root directory for all of the contents of the container. All (non-remote) resources for embedded publications are located within the directory tree headed by the container’s root directory, although no specific file system structure is mandated for this. The file system model also includes a mandatory directory named META-INF that is a direct child of the root directory and which is used to store the following special files:

container.xml [required]

Identifies the file that is the point of entry for each embedded publication.

signatures.xml [optional]

Contains digital signatures for various assets.

encryption.xml [optional]

Contains information about the encryption of publication resources. (This file is required if font obfuscation is used.)

metadata.xml [optional]

Used to store metadata about the container.

rights.xml [optional]

Used to store information about digital rights.

manifest.xml [optional]

A manifest of container contents, compatible with Open Document Format (ODF).

Complete conformance requirements for the various files in META-INF are found in META-INF.

 3.2 File and Directory Structure

The virtual file system for the OCF Abstract Container must have a single common root directory for all of the contents of the container.

The OCF Abstract Container must include a directory named META-INF at the root level of the virtual file system. Requirements for the contents of this directory are described in META-INF.

The file name mimetype in the root directory is reserved for use by OCF ZIP Containers, as explained in OCF Physical Container: The ZIP Container .

All other files used by the publication rendition(s) within the OCF Abstract Container may be in any location descendant from the root directory except for mimetype at the root level or within the META-INF directory.

It is recommended that the contents of individual publications be stored within dedicated directories under the root to minimize potential file name collisions in the event that multiple renditions are used.

 3.3 Relative IRIs for Referencing Other Components

Files within the OCF Abstract Container must reference each other via Relative IRI References ([RFC3987] and [RFC3986]). For example, if a file named chapter1.html references an image file named image1.jpg that is located in the same directory, then chapter1.html might contain the following as part of its content:

<img src="image1.jpg" alt="…" />
                

For Relative IRI References, the Base IRI [RFC3986] is determined by the relevant language specifications for the given file formats. For example, the CSS specification defines how relative IRI references work in the context of CSS style sheets and property declarations. Note that some language specifications reference RFCs that preceded RFC3987, in which case the earlier RFC applies for content in that particular language.

Unlike most language specifications, the Base IRIs for all files within the META-INF/ directory use the root folder for the Abstract Container as the default Base IRI. For example, if META-INF/container.xml has the following content:

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <rootfiles>
        <rootfile full-path="OEBPS/Great Expectations.opf"
            media-type="application/oebps-package+xml" />	
    </rootfiles>
</container>
            

then the path OEBPS/Great Expectations.opf is relative to the root directory for the OCF Abstract Container and not relative to the META-INF/ directory.

 3.4 File Names

The term File Name represents the name of any type of file, either a directory or an ordinary file within a directory within an OCF Abstract Container. For a given directory within the OCF Abstract Container, the Path Name is a string holding all directory names in the full path concatenated together with a “/” (ASCII 0x2F) character separating the directory names. For a given file within the Abstract Container, the Path Name is the string holding all directory names concatenated together with a “/” character separating the directory names, followed by a “/” character and then the name of the file. The File Name restrictions described below are designed to allow directory names and file names to be used without modification on most commonly used operating systems. This specification does not specify how an OCF Processor that is unable to represent OCF conforming File Names would compensate for this incompatibility.

The following statements apply to Conforming OCF Abstract Containers:

  • File Names must be UTF-8 [Unicode] encoded.

  • File Names must not exceed 255 bytes.

  • The Path Name for any directory or file within the Abstract Container must not exceed 65535 bytes.

  • File Names must not use the following characters, as these characters may not be supported always across commonly-used operating systems:

    • Slash: / (ASCII 0x2F)

    • Double quote: " (ASCII 0x22)

    • Asterisk: * (ASCII 0x2A)

    • Period as the last character: . (ASCII 0x3A)

    • Colon: : (ASCII 0x3A)

    • Less than: < (ASCII 0x3C)

    • Greater than: > (ASCII 0x3E)

    • Question mark: ? (ASCII 0x3F)

    • Back slash : \ (ASCII 0x5C)

  • File Names are case sensitive.

  • Two File Names within the same directory must not map to the same string following case normalization as described in section 3.13 of [Unicode]. Two File Names that differ only in case are disallowed within the same directory.

  • Two File Names within the same directory may map to the same string following accent normalization.

note

Some commercial ZIP tools do not support the full Unicode range and may only support the ASCII range for File Names. Content creators who want to use ZIP tools that have these restrictions may find it is best to restrict their File Names to the ASCII range. If the names of files cannot be preserved during the unzipping process, it will be necessary to compensate for any name translation which took place when the files are referenced by URI from within the content.

 3.5 META-INF

All valid OCF Abstract Containers must include a directory called META-INF at the root level of the container file system. This directory contains the files specified below that describe the contents, metadata, signatures, encryption, rights and other information about the contained publication.

All files specified below must meet the conformance constraints for XML documents defined in XML Document Content Conformance [Publications30] .

Files other than the ones defined below may be included in the META-INF directory; OCF Processors must not fail when encountering such files.

 3.5.1 Container – META-INF/container.xml

All valid OCF Containers must include a file called container.xml within the META-INF directory at the root level of the container file system. The container.xml file must identify the MIME type of, and path to, the rootfile for the EPUB rendition of the publication and any optional alternate renditions included within the container.

The container.xml file must not be encrypted.

The schema for container.xml files is available in Schema for container.xml . Conforming container.xml files must be valid according to this schema after removing all elements and attributes from other namespaces (including all child nodes of such elements).

The rootfiles element must contain one or more rootfile elements, each of which must uniquely reference a single rendition of the contained publication.

TODO: consider supporting human-readable label and/or machine-readable type metadata to facilitate choosing from multiple alternatives

For EPUB Publications, there must be at least one rootfile element with a media-type of application/oebps-package+xml. The target of such a reference (a Package Document) must not be encrypted.

A Reading System should consider the first rootfile element within the rootfiles element to be the default rendition for the contained publication.

The following example shows a sample container.xml for an EPUB Publication with the root file OEBPS/My Crazy Life.opf (the OPF package file):

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <rootfiles>
        <rootfile full-path="OEBPS/My Crazy Life.opf"
            media-type="application/oebps-package+xml" />
    </rootfiles>
</container>
                    

The following example adds an alternate PDF version of the publication:

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <rootfiles>
        <rootfile full-path="OEBPS/My Crazy Life.opf"
            media-type="application/oebps-package+xml" />
        <rootfile full-path="PDF/My Crazy Life.pdf"
            media-type="application/pdf" />
    </rootfiles>
</container>
                    

The manifest element contained within the Package Document specifies the one and only manifest used for EPUB processing. Ancillary manifest information contained in the ZIP archive or in the optional manifest.xml file must not be used for EPUB processing purposes. Any extra files in the ZIP archive must not be used in the processing of the EPUB publication (i.e., files within the ZIP archive that are not listed within the package files’ manifest element, such as META-INF files or alternate derived renditions of the publication).

The value of the full-path attribute must contain a path component (as defined by RFC3986) which must only take the form of a path-rootless (also defined by RFC 3986). The path components are relative to the root of the container in which they are used.

Conforming OCF Processors must ignore unrecognized elements (and their contents) and unrecognized attributes within a container.xml file, including unrecognized elements and unrecognized attributes from other namespaces.

 3.5.2 Encryption – META-INF/encryption.xml

An optional encryption.xml file within the META-INF directory at the root level of the container file system holds all encryption information on the contents of the container. This file is an XML document whose root element is encryption. The encryption element contains child elements of type EncryptedKey and EncryptedData as defined by [XML ENC Core]. Each EncryptedData element describes how one or more container files are encrypted. Consequently, if any resource within the container is encrypted, encryption.xml must be present to indicate that the resource is encrypted and provide information on how it is encrypted.

An EncryptedKey element describes each encryption key used in the container, while an EncryptedData element describes each encrypted file. Each EncryptedData element refers to an EncryptedKey element, as described in XML Encryption.

The schema for encryption.xml files is available in Schema for encryption.xml . Conforming signatures.xml files must be valid according to this schema after removing all elements and attributes from other namespaces (including all child nodes of such elements).

When the encryption.xml file is not present, the OCF Abstract Container provides no information indicating any part of the container is encrypted.

OCF encrypts individual files independently, trading off some security for improved performance, allowing the container contents to be incrementally decrypted. Encryption in this way exposes the directory structure and file naming of the whole package.

OCF uses XML Encryption [XML ENC Core] to provide a framework for encryption, allowing a variety of algo­rithms to be used. XML Encryption specifies a process for encrypting arbitrary data and representing the result in XML. Even though an OCF Abstract Container may contain non-XML data, XML Encryption can be used to encrypt all data in an OCF Abstract Container. OCF encryption sup­ports only encryption of whole files. The encryption.xml file, if present, must not be encrypted.

Encrypted data replaces unencrypted data in an OCF Abstract Container. For example, if an image named photo.jpeg is encrypted, the contents of the photo.jpeg resource should be replaced by its encrypted contents. When stored in a ZIP container, streams of data must be compressed before they are encrypted and Deflate compression must be used. Within the ZIP directory, encrypted files should be stored rather than Deflate-compressed.

Some situations require obfuscating the storage of embedded fonts referenced by an EPUB Publication to tie them to the “parent” publication and make them more difficult to extract for unrestricted use. In these cases, encryption.xml should be used to provide requisite font decoding information according to Font Obfuscation .

The following files must never be encrypted, regardless of whether default or specific encryption is requested:

mimetype
META-INF/container.xml
META-INF/encryption.xml
META-INF/manifest.xml
META-INF/metadata.xml
META-INF/rights.xml
META-INF/signatures.xml
EPUB rootfile (the Package Document)

Signed resources may subsequently be encrypted using the Decryption Transform for XML Signature [XML SIG Decrypt]. This feature enables an application such as an OCF agent to distinguish data that was encrypted before signing from data that was encrypted after signing. Only data that was encrypted after signing must be decrypted before computing the digest used to validate the signature.

In the following example, adapted from Section 2.2.1 of [XML ENC Core] the resource image.jpeg is encrypted using a symmetric key algorithm (AES) and the symmetric key is further encrypted using an asymmetric key algorithm (RSA) with a key of John Smith.

<encryption
    xmlns ="urn:oasis:names:tc:opendocument:xmlns:container"
    xmlns:enc="http://www.w3.org/2001/04/xmlenc#"
    xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
    <enc:EncryptedKey Id="EK">
        <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#rsa-1_5"/>
        <ds:KeyInfo>
            <ds:KeyName>John Smith</ds:KeyName>
        </ds:KeyInfo>
        <enc:CipherData>
            <enc:CipherValue>xyzabc</enc:CipherValue>
        </enc:CipherData>
    </enc:EncryptedKey>
    <enc:EncryptedData Id="ED1">
        <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#kw-aes128"/>
        <ds:KeyInfo>
            <ds:RetrievalMethod URI="#EK"
                Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey"/>
        </ds:KeyInfo>
        <enc:CipherData>
            <enc:CipherReference URI="image.jpeg"/>
        </enc:CipherData>
    </enc:EncryptedData>
</encryption>
                    

 3.5.3 Manifest – META-INF/manifest.xml (Optional)

An optional file with the reserved name manifest.xml may be included within the META-INF directory at the root level of the container file system. If present, the content of this file must be as defined in the ODF 1.0 manifest schema.

The manifest.xml file, if present, must not be encrypted.

 3.5.4 Metadata – META-INF/metadata.xml

An optional file with the reserved name metadata.xml may be included within the META-INF directory at the root level of the container file system. This file, if present, must be used for container-level metadata. This version of the OCF specification does not specify any container-level metadata.

If the META-INF/metadata.xml file is present, its contents should be only namespace-qualified elements [XMLNS] to avoid collision with future versions of OCF that may specify a particular format for this file.

The metadata.xml file, if present, must not be encrypted.

 3.5.5 Rights Management – META-INF/rights.xml

An optional file with the reserved name rights.xml may be included within the META-INF directory at the root level of the container file system. This file is reserved for digital rights management (DRM) information for trusted exchange of Publications among rights holders, intermediaries, and users. This version of the OCF specification does not specify a required format for DRM information, but a future version may specify a particular format for DRM information.

If the META-INF/rights.xml file is present, its contents should be only namespace-qualified elements [XMLNS] to avoid collision with future versions of OCF that may specify a particular format for this file.

The rights.xml file must not be encrypted.

When the rights.xml file is not present, the OCF container provides no information indicating any part of the container is rights governed.

 3.5.6 Digital Signatures – META-INF/signatures.xml

An optional signatures.xml within the META-INF directory at the root level of the container file system holds digital signatures of the container and its contents. This file is an XML document whose root element is signatures. The signatures element contains child elements of type Signature as defined by [XML DSIG Core]. Signatures can be applied to the publication and any alternate renditions as a whole or to parts of the publication and renditions. XML Signature can specify the signing of any kind of data, not just XML.

The signatures.xml file must not be encrypted.

When the signatures.xml file is not present, the OCF container provides no information indicating any part of the container is digitally signed at the container level. It is possible that digital signing exists within any optional alternate contained renditions, however.

The schema for signatures.xml files is available in Schema for signatures.xml . Conforming signatures.xml files must be valid according to this schema after removing all elements and attributes from other namespaces (including all child nodes of such elements).

When an OCF agent creates a signature of data in a container, it should add the new signature as the last child Signature element of the signatures element in the signatures.xml file.

note

Each Signature in the signatures.xml file identifies by IRI the data to which the signature applies, using the XML Signature Manifest element and its Reference sub-elements. Individual contained files may be signed separately or together. Separately signing each file creates a digest value for the resource that can be validated independently. This approach may make a Signature element larger. If files are signed together, the set of signed files can be listed in a single XML Signature Manifest element and referenced by one or more Signature elements.

Any or all files in the container can be signed in their entirety with the exception of the signatures.xml file since that file will contain the computed signature information. Whether and how the signatures.xml file should be signed depends on the objective of the signer.

If the signer wants to allow signatures to be added or removed from the container without invalidating the signer’s signature, the signatures.xml file should not be signed.

If the signer wants any addition or removal of a signature to invalidate the signer’s signature, the Enveloped Signature transform (defined in Section 6.6.4 of [XML DSIG Core]) can be used to sign the entire preexisting signature file excluding the Signature being created. This transform would sign all previous signatures, and it would become invalid if a subsequent signature was added to the package.

If the signer wants the removal of an existing signature to invalidate the signer’s signature but also wants to allow the addition of signatures, an XPath transform can be used to sign just the existing signatures. (This is only a suggestion. The particular XPath transform is not a part of the OCF specification.)

XML-Signature does not associate any semantics with a signature; an agent may, however, include semantic information, for example, by adding information to the Signature element that describes the signature. XML Signature describes how additional information can be added to a signature (for example, by using the SignatureProperties element).

The following XML expression shows the content of an example signatures.xml file, and is based on the examples found in Section 2 of [XML DSIG Core]. It contains one signature, and the signature applies to two resources, OEBFPS/book.html and OEBFPS/images/cover.jpeg, in the container.

<signatures xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <Signature Id="sig" xmlns="http://www.w3.org/2000/09/xmldsig#">
        <SignedInfo>
            <CanonicalizationMethod 
                Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
            <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/>
            <Reference URI="#Manifest1">
                <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
                <DigestValue>j6lwx3rvEPO0vKtMup4NbeVu8nk=</DigestValue>
            </Reference>
        </SignedInfo>
        <SignatureValue>…</SignatureValue>
        <KeyInfo>
            <KeyValue>
                <DSAKeyValue>
                    <P>…</P><Q>…</Q><G>…</G><Y>…</Y> 
                </DSAKeyValue>
            </KeyValue>
        </KeyInfo>
        <Object>
            <Manifest Id="Manifest1">
                <Reference URI="OEBFPS/book.xml">                    
                    <Transforms>                                                
                        <Transform
                            Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>                        
                    </Transforms>
                    <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
                    <DigestValue></DigestValue>
                </Reference>
                <Reference URI="OEBFPS/images/cover.jpeg">
                    <Transforms>                                                
                        <Transform
                            Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>                        
                    </Transforms>
                    <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
                    <DigestValue></DigestValue>
                </Reference>
            </Manifest>
        </Object>
    </Signature> 
</signatures>
                    

 4 OCF Physical Container: The ZIP Container

 4.1 Overview

This section is informative

An OCF ZIP Container is a physical single-file manifestation of an abstract container.

 4.2 ZIP File Requirements

An OCF ZIP Container uses the ZIP format as specified by [ZIP APPNOTE], but with the following constraints and clarifications:

  •  Conforming OCF ZIP Containers must not use the features in the ZIP application note that allow ZIP files to be split across multiple storage media. Conforming EPUB Reading System must treat any OCF files that specify that the ZIP file is split across multiple storage media as being in error.

  •  Conforming OCF ZIP Containers must only include uncompressed files or Deflate-compressed files within the ZIP archive. Conforming EPUB Reading Systems must treat any OCF Containers that use compression techniques other than Deflate as being in error.

  •   Conforming OCF ZIP Containers may use the ZIP64 extensions defined as "Version 1" in section V, subsection G of the application note at [ZIP APPNOTE] and should only use those extensions when the content requires them. Conforming OCF Processors must support the ZIP64 extensions defined as "Version 1".

  •  Conforming OCF ZIP Containers must not use the encryption features defined by the ZIP format; instead, encryption must be done using the features described in Encryption – META-INF/encryption.xml. Conforming EPUB Reading Systems must treat any other OCF ZIP Containers that use ZIP encryption features as being in error.

  •  It is not a requirement that Conforming OCF Processors preserve information from an OCF ZIP Container through load and save operations that do not map to corresponding representation within the OCF Abstract Container; in particular, a Conforming OCF Processor does not have to preserve CRC values, comment fields or fields that hold file system information corresponding to a particular operating system (e.g., External file attributes and Extra field).

  •  Conforming OCF ZIP Containers must encode File System Names using UTF-8 [Unicode].

The following constraints apply to particular fields in the OCF ZIP Container archive:

  •  In the local file header table, Conforming OCF ZIP Containers must set the version needed to extract fields to the values 10, 20 or 45 in order to match the maximum version level needed by the given file (e.g., 20 if Deflate is needed, 45 if ZIP64 is needed). Conforming OCF Processors must treat any other values as being in error.

  •  In the local file header table, Conforming OCF ZIP Containers must set the compression method field to the values 0 or 8. Conforming OCF Processors must treat any other values as being in error.

  •  Conforming OCF Processors must treat OCF ZIP Containers with an Archive decryption header or an Archive extra data record as being in error.

 4.3 OCF ZIP Container Media Type Identification

It is frequently necessary for applications to determine the media type of a file, which is usually accomplished by inspecting the file extension. OCF ZIP Containers should facilitate this form of rapid determination of their format by processing applications by using the .epub extension.

In order to translate a file extension into a media type, a processing agent typically will register the relationship between file extension and media type with the operating system. Applications that are interested in OCF ZIP Container files should register the media type application/epub+zip as corresponding to the file extension .epub.

File extensions do not provide a reliable means of identifying file formats, however. As a result, a more robust means of identifying files independent of their file names or extensions is also necessary. The method for such identification in ZIP archives is the inclusion of an uncompressed, unencrypted ASCII file named mimetype, where the contents of this file identify the media type of the Container. OCF ZIP Containers must consequently include a mimetype file as the first file in the Container, and the contents of this file must be the MIME type string application/epub+zip.

The contents of the mimetype file must not contain any leading padding or whitespace and the case of the MIME type string must be exactly as presented above. The mimetype file additionally must be neither compressed nor encrypted, and there must not be an extra field in its ZIP header.

When constructed with a conformant mimetype file, the ZIP Container offers convenient magic number support as described in RFC 2048 and the following will hold true:

  • the bytes PK will be at the beginning of the file;

  • the bytes mimetype will be at position 30; and

  • the actual MIME type will begin at position 38 (i.e., the ASCII string application/epub+zip).

 5 Font Obfuscation

 5.1 Introduction

Since an OCF Zip Container is fundamentally a ZIP file, commonly available ZIP tools can be used to extract any unencrypted content stream from the package. On some systems, the contents of the ZIP file may appear like any other native container (e.g., a folder). While the ability to do this is quite useful, it can pose a problem for an Author who wishes to include a third-party font. Many commercial fonts allow embedding, but embedding a font implies making it an integral part of the publication, not providing the original font file along with the content. Since integrated ZIP support is so ubiquitous in modern operating systems, simply placing the font in the ZIP archive is insufficient to signify that the font is not intended to be reused in other contexts. This uncertainty can undermine the otherwise very useful font embedding capability that OPF/OPS provides.

In order to discourage reuse of the font, some font vendors may allow use of their fonts in EPUB publications if those fonts are bound in some way to the publication. That is, if the font file cannot be installed directly for use on an operating system with the built-in tools of that computing device, and it cannot be directly used by other EPUB publications. It is beyond the scope of this document to provide a digital rights management or enforcement system for font files. It will instead propose a method of obfuscation that will require additional work on the part of the final OCF recipient to gain general access to any included fonts. It is the hope of the IDPF that this will meet the requirements of most font vendors. No claim is made in this document or by the IDPF, however, that this constitutes encryption, nor does it guarantee that the font file will be secure from copyright infringement. The proposed mechanism will simply provide a stumbling block for those who are unaware of the license details of the supplied font. It will not prevent a determined user from gaining full access to the font. Given the original OCF Container, it is possible to apply the algorithms described in this document to extract the raw font file. Whether this satisfies the requirements of individual font licenses remains a question for the licensor and licensee.

 5.2 Obfuscation Algorithm

The algorithm employed to obfuscate the font file consists of modifying the first 1040 bytes (~1KB) of the font file. In the unlikely event that the file is less than 1040 bytes, then the entire file will be modified. The key for the algorithm must be a 20 byte (160 bit) [SHA-1] digest of the publication's unique identifier. Details on generating this key are given in the section Generating the Obfuscation Key. To obfuscate the original data, the result of performing a logical exclusive or (XOR) on the first byte of the raw file and the first byte of the key is stored as the first byte of the embedded font file. This process is repeated with the next byte of source and key, until all bytes in the key have been used. At this point, the process continues starting with the first byte of the key and 21st byte of the source. Once 1040 bytes have been encoded in this way (or the end of the source is reached), any remaining data in the source is directly copied to the destination. In pseudo-code, this is the algorithm:

set source to font file
set destination to obfuscated file
set keyData to key for font
set outer to 0
while outer < 52 and not (source at EOF)
    set inner to 0
    while inner < 20 and not (source at EOF)
        read 1 byte from source     //Assumes read advances file position
        set sourceByte to result of read
        set keyByte to byte inner of keyData
        set obfuscatedByte to (sourceByte XOR keyByte)
        write obfuscatedByte to destination
        increment inner
    end while
    increment outer
end while
if not (source at EOF) then
    read source to EOF
    write result of read to destination
end if 
            

To get the original font data back, the process is simply reversed. That is, the source file becomes the obfuscated data and the destination file will contain the raw font data.

 5.3 Generating the Obfuscation Key

To tie a font to a particular EPUB publication, it is necessary to bind to a unique property of that publication. Such a value is required by the EPUB Publications 3.0 specification, as detailed in Unique Identifier [Publications30] . Every compliant OPF file has a dc:identifier element which uniquely identifies the publication. The OPF 3.0 specification details finding this element by examining the unique-identifier attribute of the package file's package element. This element provides the required characteristic of being unique to a publication; it is not suitable for use directly as the obfuscation key, however (for instance, its length is not defined).

In order to create a suitable key that is tied to the publication, an SHA-1 digest of the unique identifier should be generated as specified by the Secure Hash Standard [SHA-1]. Before generating the digest, all whitespace characters as defined by the XML 1.0 specification [XML], section 2.3 are removed. Specifically the Unicode code points 0x20, 0x09, 0x0D and 0x0A must be stripped from the string before the digest is computed. This digest is then directly used as the key for the algorithm described in Obfuscation Algorithm.

 5.4 Specifying Obfuscated Resources

All encrypted data in an OCF Abstract Container must have an entry in the encryption.xml file accompanying the publication (see Encryption – META-INF/encryption.xml), which includes fonts obfuscated using the method described here. For such obfuscated fonts, in the encryption.xml file, the EncryptionMethod element child of the EncryptedData must have an Algorithm attribute with the value http://www.idpf.org/2008/embedding. The presence of this attribute signals the use of the algorithm described in this specification. All resources that have been obfuscated using this approach must be listed in the CipherData element.

An example encryption.xml file might look like this:

<encryption 
    xmlns="urn:oasis:names:tc:opendocument:xmlns:container" 
    xmlns:enc="http://www.w3.org/2001/04/xmlenc#">
    <enc:EncryptedData> 
        <enc:EncryptionMethod Algorithm="http://www.idpf.org/2008/embedding"/> 
        <enc:CipherData> 
            <enc:CipherReference URI="OEBPS/Fonts/BKANT.TTF"/>  
        </enc:CipherData> 
    </enc:EncryptedData>  
</encryption> 

                

To prevent trivial copying of the embedded font to other publications, the explicit key must not be provided in the encryption.xml file. Reading systems that implement this specification must derive the key from the package's unique identifier.

 Appendix A. Schemas

The schemas in this Appendix are normative. In case of conflicts between the specification prose and the given schema, the schema shall be considered definitive.

 A.1 Schema for container.xml

The schema for container.xml files is available at http://www.idpf.org/epub/30/schema/ocf-container-30.rnc.

 A.2 Schema for encryption.xml

The schema for encryption.xml files is available at http://www.idpf.org/epub/30/schema/ocf-encryption-30.rnc.

 A.3 Schema for signatures.xml

The schema for signatures.xml files is available at http://www.idpf.org/epub/30/schema/ocf-signatures-30.rnc.

 Appendix B. Example

The following example demonstrates the use of this OCF format to contain a signed and encrypted EPUB publication with an alternate PDF rendition within a ZIP Container.

 

Example B.1. Ordered list of files in the ZIP Container

mimetype
META-INF/container.xml
META-INF/signatures.xml
META-INF/encryption.xml
OEBPS/As You Like It.opf
OEBPS/book.html
OEBPS/images/cover.png
PDF/As You Like It.pdf
            

 

Example B.2. The contents of the mimetype file

application/epub+zip
            

 

Example B.3. The contents of the META-INF/container.xml file

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <rootfiles>
        <rootfile full-path="OEBPS/As You Like It.opf"
            media-type="application/oebps-package+xml" />
        <rootfile full-path="OEBPS/As You Like It.pdf"
            media-type="application/pdf" />
    </rootfiles>
</container>
            

 

Example B.4. The contents of the META-INF/signatures.xml file

<signatures xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <Signature Id="AsYouLikeItSignature" xmlns="http://www.w3.org/2000/09/xmldsig#">
        
        <!-- SignedInfo is the information that is actually signed. In this case -->
        <!-- the SHA1 algorithm is used to sign the canonical form of the XML    -->
        <!-- documents enumerated in the Object element below                    -->
        <SignedInfo>
            <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
            <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/>
            <Reference URI="#AsYouLikeIt">
                <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
                <DigestValue>…</DigestValue>
            </Reference>
        </SignedInfo>
        
        <!-- The signed value of the digest above using the DSA algorithm -->
        <SignatureValue>…</SignatureValue>
        
        <!-- The key to use to validate the signature -->
        <KeyInfo>
            <KeyValue>
                <DSAKeyValue>
                    <P>…</P>
                    <Q>…</Q>
                    <G>…</G>
                    <Y>…</Y>
                </DSAKeyValue>
            </KeyValue>
        </KeyInfo>
        
        <!-- The list documents to sign. Note that the canonical form of XML   -->
        <!-- documents is signed while the binary form of the other documents -->
        <!-- is used -->
        <Object>
            <Manifest Id="AsYouLikeIt">
                <Reference URI="OEBPS/As You Like It.opf">
                    <Transforms>
                        <Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
                    </Transforms>
                    <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
                    <DigestValue></DigestValue>
                </Reference>
                <Reference URI="OEBPS/book.html">
                    <Transforms>
                        <Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
                    </Transforms>
                    <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
                    <DigestValue></DigestValue>
                </Reference>
                <Reference URI="OEBPS/images/cover.png">
                    <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
                    <DigestValue></DigestValue>
                </Reference>
                <Reference URI="PDF/As You Like It.pdf">
                    <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
                    <DigestValue></DigestValue>
                </Reference>
            </Manifest>
        </Object>        
    </Signature>
</signatures>
            

 

Example B.5. The contents of the META-INF/encryption.xml file

<?xml version="1.0"?>
<encryption xmlns="urn:oasis:names:tc:opendocument:xmlns:container"
    xmlns:enc="http://www.w3.org/2001/04/xmlenc#" xmlns:ds="http://www.w3.org/2000/09/xmldsig#">

    <!-- The RSA encrypted AES-128 symmetric key used to encrypt the data -->
    <enc:EncryptedKey Id="EK">
        <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#rsa-1_5"/>
        <ds:KeyInfo>
            <ds:KeyName>John Smith</ds:KeyName>
        </ds:KeyInfo>
        <enc:CipherData>
            <enc:CipherValue>xyzabc…</enc:CipherValue>
        </enc:CipherData>
    </enc:EncryptedKey>

    <!-- Each EncryptedData block identifies a single document that has been    -->
    <!-- encrypted using the AES-128 algorithm. The data remains stored in it’s -->
    <!-- encrypted form in the original file within the container.              -->
    <enc:EncryptedData Id="ED1">
        <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#kw-aes128"/>
        <ds:KeyInfo>
            <ds:RetrievalMethod URI="#EK" Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey"/>
        </ds:KeyInfo>
        <enc:CipherData>
            <enc:CipherReference URI="OEBPS/book.html"/>
        </enc:CipherData>
    </enc:EncryptedData>

    <enc:EncryptedData Id="ED2">
        <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#kw-aes128"/>
        <ds:KeyInfo>
            <ds:RetrievalMethod URI="#EK" Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey"/>
        </ds:KeyInfo>
        <enc:CipherData>
            <enc:CipherReference URI="OEBPS/images/cover.png"/>
        </enc:CipherData>
    </enc:EncryptedData>

    <enc:EncryptedData Id="ED3">
        <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#kw-aes128"/>
        <enc:KeyInfo>
            <enc:RetrievalMethod URI="#EK" Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey"/>
        </enc:KeyInfo>
        <enc:CipherData>
            <enc:CipherReference URI="PDF/As You Like It.pdf"/>
        </enc:CipherData>
    </enc:EncryptedData>

</encryption>

            

 

Example B.6. The contents of the OEBPS/As You Like It.opf file

<?xml version="1.0"?>
<package version="2.0" xmlns="http://www.idpf.org/2007/opf" unique-identifier="Pub-ID">
    <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
        <dc:identifier id="Pub-ID">urn:uuid:B9B412F2-CAAD-4A44-B91F-A375068478A0</dc:identifier>
        <dc:title>As You Like It</dc:title>
        <dc:creator opf:role="aut">William Shakespeare</dc:creator>
        <dc:identifier>0-7410-1455-6</dc:identifier>
        <dc:subject/>
        <dc:type/>
        <dc:date opf:event="publication">3/24/2000</dc:date>
        <dc:date opf:event="copyright">1/1/9999</dc:date>
        <dc:identifier opf:scheme="ISBN">urn:isbn:9780741014559</dc:identifier>
        <dc:publisher>Project Gutenberg</dc:publisher>
        <dc:language>en</dc:language>

    </metadata>
    <manifest>
        <item id="4915" href="book.html" media-type="application/xhtml+xml"/>
        <item id="7184" href="images/cover.png" media-type="image/png"/>
        <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>
    </manifest>
    <spine toc="ncx">
        <itemref idref="4915"/>
    </spine>
</package>

            

 Appendix C. The application/epub+zip media type

This appendix registers the media type application/epub+zip for the EPUB Open Container Format (OCF).

An OCF file is a container technology based on the ZIP archive format. It is used to encapsulate EPUB publications and optional alternate renditions thereof. OCF and its related standards are maintained and defined by the International Digital Publishing Forum (IDPF).

MIME media type name:

application

MIME subtype name:

epub+zip

Required parameters:

None.

Optional parameters:

None.

Encoding considerations:

OCF files are binary files in ZIP (http://www.iana.org/assignments/media-types/application/zip) format.

Security considerations:

All processors that read OCF files should rigorously check the size and validity of data retrieved.

In addition, because of the various content types that can be embedded in OCF files, it is possible that “application/epub+zip” may describe content that has security implications beyond those described here. However, only in the case where the processor recognizes and processes the additional content, or where further processing of that content is dispatched to other processors, would security issues potentially arise. And in that case, they would fall outside the domain of this registration document.

Security considerations that apply to application/zip also apply to OCF files.

Interoperability considerations:

None.

Published specification:

This media type registration is for the EPUB Open Container Format (OCF) as described by this specification which is located at http://www.idpf.org/epub/30/spec/epub30-ocf.html.

This specification supersedes Open Container Format 2.0.1 which is located at http://www.idpf.org/doc_library/epub/OCF_2.0.1_draft.doc, and which also uses the application/epub+zip media type.

Applications which use this media type:

This media type is in wide use for the distribution of ebooks in the EPUB format. The following list of applications is not exhaustive.

  • Adobe Digital Editions

  • Aldiko

  • Azardi

  • Apple iBooks

  • Barnes & Noble Nook

  • Calibre

  • Google Books

  • Ibis Reader

  • MobiPocket reader

  • Sony Reader

  • Stanza

Additional information:
Magic number(s):

0: PK, 30: mimetype, 38: application/epub+zip

File extension(s):

OCF files are most often identified with the extension .epub.

Macintosh File Type Code(s):

ZIP

Person & email address to contact for further information:

William McCoy, [email protected]

Intended usage:

COMMON

Author/Change controller:

International Digital Publishing Forum (http://www.idpf.org)

 Appendix D. Contributors

This appendix is informative

 D.1 Acknowledgements and Contributors

EPUB has been developed by the International Digital Publishing Forum in a cooperative effort, bringing together publishers, vendors, software developers, and experts in the relevant standards.

The EPUB 3 specifications were prepared by the International Digital Publishing Forum’s EPUB Maintenance Working Group, operating under a charter approved by the membership in May, 2010 under the leadership of:

Markus Gylling DAISY Consortium Chair
Garth Conboy Google Inc. Vice-chair
Brady Duga Google Inc. Vice-chair
Bill McCoy International Digital Publishing Forum (IDPF) Secretary

Active members of the working group included:

Alexis Wiles, Alicia Wise, … TODO : COMPLETE LIST OF CURRENT WG MEMBERS

For more detailed acknowledgements and information about contributors to each version of EPUB, refer to Contributors [EPUB3Overview] .

 References

Normative References

[ContentDocs30] EPUB Content Documents 3.0 .

[MediaOverlays30] EPUB Media Overlays 3.0 .

[ODF] JP: Current version of spec is 1.1; update? ODF Open Document Format .

[ODF10 Manifest Schema] ODF 1.0 Manifest Schema .

[Publications30] EPUB Publications 3.0 .

[RFC3986] Uniform Resource Identifier (URI): Generic Syntax (RFC 3986) . Berners-Lee, et al. January 2005.

[RFC3987] Internationalized Resource Identifiers (IRIs) (RFC 3987) . M Duerst, et al. January 2005.

[Unicode] The Unicode Consortium. The Unicode Standard, Version 5.0.0, defined by: The Unicode Standard, Version 5.0 (Boston, MA, Addison-Wesley, 2007. ISBN 0-321-48091-0).

[XML] Extensible Markup Language (XML) 1.0 (Fifth Edition) . T. Bray, et al. 26 November 2008.

[XML DSIG Core] JP: 1.1 going to CR in Q1 2011. Update reference as necessary XML-Signature Syntax and Processing (Second Edition) . M. Bartel, et al. 10 June 2008.

[XML ENC Core] JP: 1.1 due to go to CR in Q1 2011; update reference as needed XML Encryption Syntax and Processing . D. Eastlake, et al. 10 December 2002.

[XML SIG Decrypt] Decryption Transform for XML Signature . M. Hughes, et al. 10 December 2002.

[XMLNS] Namespaces in XML (Third Edition) . T. Bray, D. Hollander, A. Layman, R. Tobin. W3C. 8 December 2009.

Informative References

[EPUB3Changes] EPUB 3 Differences from EPUB 2.0.1 . TODO. .

[EPUB3Overview] EPUB 3 Overview . TODO.

[ZIP APPNOTE] ZIP File Format Specification . September 28, 2007. PKWARE, Inc..