EPUB Open Container Format (OCF) 3.1

Proposed Specification 14 October 2016

This version
http://www.idpf.org/epub/31/spec/epub-ocf-20161014.html
Latest version
http://www.idpf.org/epub3/latest/ocf
Previous version
http://www.idpf.org/epub/301/spec/epub-ocf-20160801.html
Previous recommendation
http://www.idpf.org/epub/31/spec/epub-ocf-20160906.html
Document history

Editors

James Pritchett, Learning Ally (formerly Recording for the Blind & Dyslexic)

Markus Gylling, International Digital Publishing Forum (IDPF)

Copyright © 2010-2016 International Digital Publishing Forum™

All rights reserved. This work is protected under Title 17 of the United States Code. Reproduction and dissemination of this work with changes is prohibited except with the written permission of the International Digital Publishing Forum (IDPF).

EPUB is a registered trademark of the International Digital Publishing Forum.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents might supersede this document.

This document is a draft produced by the EPUB Working Group under the EPUB Working Group Charter approved on 8 July 2015.

This document is not considered stable and might be updated, replaced or obsoleted at any time. Its publication as a draft does not imply endorsement by IDPF membership or the IDPF Board. When citing this document, clearly refer to it as a work in progress.

Feedback on this document can be provided to the EPUB Working Group's mailing list or issue tracker.

This document is governed by the IDPF Policies and Procedures.

  1 Overview

  1.1 Purpose and Scope

This section is informative

This specification, EPUB Open Container Format (OCF) 3.1, defines a file format and processing model for encapsulating the set of related resources that comprise an EPUB® Publication into a single-file container, the EPUB Container.

This specification defines the rules for structuring the file collection in the abstract (the "abstract container") and the rules for the representation of this abstract container within a ZIP archive (the "ZIP container"). The rules for ZIP containers build upon the ZIP technologies used by [ODF]. OCF also defines a standard method for obfuscating embedded resources, such as fonts, for those EPUB Publications that require this functionality.

This specification is one of a family of specifications that compose [EPUB 3.1], an interchange and delivery format for digital publications based on XML and Web Standards. It is meant to be read and understood in concert with the other specifications that make up EPUB 3.1.

Refer to [EPUB3 Changes] for more information on the differences between this specification and its predecessor.

  1.2 Terminology

Terms with meanings specific to EPUB 3.1 are capitalized in this document (e.g., "Author", "Reading System"). A complete list of these terms and definitions is provided in [EPUB 3.1].

Only the first instance of a term in a section is linked to its definition.

In addition, the following terminology is defined for use in this specification:

Codec

Codec refers to content that has intrinsic binary format qualities, such as video and audio media types which are already designed for optimum compression, or which provide optimized streaming capabilities.

Default Rendition

The Rendition listed in the first rootfile element in the container.xml file.

File Name

The name of any type of file within an OCF Abstract Container, whether a directory or a file within a directory.

Non-Codec

Non-Codec refers to content types that benefit from compression due to the nature of their internal data structure, such as file formats based on character strings (for example, HTML, CSS, etc.).

OCF Abstract Container

The OCF Abstract Container defines a file system model for the contents of the OCF ZIP Container, as defined in OCF Abstract Container.

OCF Processor

A software application that processes OCF ZIP Containers according to the requirements of this specification.

OCF ZIP Container

The ZIP-based packaging and distribution format for EPUB Publications, as defined in OCF ZIP Container.

OCF ZIP Container and EPUB Container are synonymous.

Path Name

For a given directory within the OCF Abstract Container, the string holding all directory File Name in the full path concatenated together with a / (U+002F) character separating the directory File Names.

For a given file within the OCF Abstract Container, the Path Name is the string holding all directory File Names concatenated together with a / character separating the directory File Names, followed by a / character and then the File Name of the file.

Root Directory

The root directory represents the base of the OCF Abstract Container file system. This directory is virtual in nature: a EPUB Reading System might or might not generate a physical root directory for the contents of the OCF Abstract Container if the contents are unzipped.

  1.3 Typographic Conventions

The following typographic conventions are used in this specification:

markup

All markup (elements, attributes, properties), code (JavaScript, pseudo-code), machine-readable values (string, characters, media types) and file names are in red monospace font.

markup link

Links to markup and code definitions are in underlined red monospace font.

http://www.idpf.org/

URIs are in navy blue monospace font.

hyperlink

Hyperlinks are underlined and blue.

[reference]

Normative and informative references are enclosed in square brackets.

Term

Terms defined in the Terminology are in capital case.

Term Link

Links to term definitions have a dotted blue underline.

Normative element, attribute and property definitions are in blue boxes.

Informative markup examples are in light gray boxes.

note

Informative notes are in green boxes with a "Note" header.

caution

Informative cautionary notes are in red boxes with a "Caution" header.

  1.4 Conformance Statements

The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in [RFC2119].

All sections and appendixes of this specification are normative except where identified by the informative status label "This section is informative". The application of informative status to sections and appendixes applies to all child content and subsections they contain.

All examples in this specification are informative.

  2 OCF Conformance

  2.1 Container Conformance

  2.2 Reading System Conformance

An EPUB Reading System must meet all of the following criteria:

  3 OCF Abstract Container

  3.1 Introduction

This section is informative

The OCF Abstract Container file system model uses a single common Root Directory for all of the contents. All Local Resources for the EPUB Publication are located within the directory tree headed by the Root Directory, but no specific file system structure for them is mandated by this specification.

The file system model also includes a mandatory directory named META-INF that is a direct child of the Root Directory and is used to store the following special files:

container.xml [required]

Identifies the Package Documents that define each Rendition of the EPUB Publication.

signatures.xml [optional]

Contains digital signatures for various assets.

encryption.xml [optional]

Contains information about the encryption of Publication Resources. This file is mandatory when obfuscation is used.

metadata.xml [optional]

Used to store metadata about the OCF ZIP Container.

rights.xml [optional]

Used to store information about digital rights.

manifest.xml [optional]

A manifest of container contents as allowed by Open Document Format [ODF].

Conformance requirements for the various files in the META-INF directory are defined in META-INF Directory.

  3.2 File and Directory Structure

The virtual file system for the OCF Abstract Container must have a single common Root Directory for all of the contents of the container.

The OCF Abstract Container must include a directory named META-INF that is a direct child of the container's Root Directory. Requirements for the contents of this directory are described in META-INF Directory.

The file name mimetype in the Root Directory is reserved for use by OCF ZIP Containers, as explained in OCF ZIP Container.

All other files within the OCF Abstract Container may be in any location descendant from the Root Directory, provided they are not within the META-INF directory.

It is recommended that the contents of the EPUB Publication be stored within its own dedicated directory under the Root Directory.

  3.3 Relative IRIs for Referencing Other Components

Files within the OCF Abstract Container must reference each other via relative IRI references ([RFC3987] and [RFC3986]).

The following example shows how a file named image1.jpg in the same directory as an XHTML Content Document can be referenced from an [HTML] img element.

<img src="image1.jpg" alt="…" />
                

For relative IRI references, the Base IRI [RFC3986] is determined by the relevant language specifications for the given file formats. For example, CSS defines how relative IRI references work in the context of CSS style sheets and property declarations [CSS Snapshot].

note

Some language specifications reference Requests For Comments [RFC] that preceded [RFC3987], in which case the earlier RFC applies for content in that particular language.

Unlike most language specifications, the Base IRIs for all files within the META-INF directory use the Root Directory of the OCF Abstract Container as the default Base IRI.

For example, if META-INF/container.xml has the following content:

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <rootfiles>
        <rootfile full-path="EPUB/Great_Expectations.opf"
            media-type="application/oebps-package+xml" />	
    </rootfiles>
</container>
            

then the path EPUB/Great Expectations.opf is relative to the root directory for the OCF Abstract Container and not relative to the META-INF directory.

All relative IRI references must resolve to resources within the OCF Abstract Container (i.e., at or below the Root Directory).

  3.4 File Names

The File Name restrictions described in this section are designed to allow Path Names and File Names to be used without modification on most commonly used operating systems. This specification does not specify how an OCF Processor that is unable to represent OCF File and Path Names would compensate for this incompatibility.

In the context of an OCF Abstract Container, File and Path Names are case sensitive and must meet all of the following criteria:

  •   File Names must be UTF-8 [Unicode] encoded.

  •   File Names must not exceed 255 bytes.

  •   The Path Name for any directory or file within the OCF Abstract Container must not exceed 65535 bytes.

  •   File Names must not use the following [Unicode] characters, as these characters might not be supported consistently across commonly-used operating systems:

    • SOLIDUS: / (U+002F)

    • QUOTATION MARK: " (U+0022)

    • ASTERISK: * (U+002A)

    • FULL STOP as the last character: . (U+002E)

    • COLON: : (U+003A)

    • LESS-THAN SIGN: < (U+003C)

    • GREATER-THAN SIGN: > (U+003E)

    • QUESTION MARK: ? (U+003F)

    • REVERSE SOLIDUS: \ (U+005C)

    • DEL (U+007F)

    • C0 range (U+0000 … U+001F)

    • C1 range (U+0080 … U+009F)

    • Private Use Area (U+E000 … U+F8FF)

    • Non characters in Arabic Presentation Forms-A (U+FDDO … U+FDEF)

    • Specials (U+FFF0 … U+FFFF)

    • Tags and Variation Selectors Supplement (U+E0000 … U+E0FFF)

    • Supplementary Private Use Area-A (U+F0000 … U+FFFFF)

    • Supplementary Private Use Area-B (U+100000 … U+10FFFF)

  •   All File Names within the same directory must be unique following case normalization as described in section 3.13 of [Unicode].

  •   All File Names within the same directory should be unique following NFC or NFD normalization [TR15].

note

Some commercial ZIP tools do not support the full Unicode range and might support only the [US-ASCII] range for File Names. Authors who want to use ZIP tools that have these restrictions might find it is best to restrict their File Names to the [US-ASCII] range. If the names of files cannot be preserved during the unzipping process, it will be necessary to compensate for any name translation which took place when the files are referenced by URI from within the content.

  3.5 META-INF Directory

  3.5.1 Inclusion

All OCF Abstract Containers must include a directory called META-INF in their Root Directory.

This directory contains the files specified in META-INF Reserved Files. Files other than the ones listed in that section may be included in the META-INF directory; OCF Processors must not fail when encountering such files.

  3.5.2 Reserved Files

  3.5.2.1 Container File (container.xml)

The required container.xml file in the META-INF directory identifies the EPUB Packages in the OCF Abstract Container.

The contents of this file must be valid to the schema in Schema for container.xml after removing all elements and attributes from other namespaces (including all attributes and contents of such elements).

Each rootfile element must identify the location of a Package Document representing one Rendition of the EPUB Publication.

An OCF Processor must consider the first rootfile element within the rootfiles element to represent the Default Rendition for the contained EPUB Publication. Reading Systems are required to present the Default Rendition, but may present other Renditions in the container.

The following example shows a sample container.xml for an EPUB Publication with the root file EPUB/My Crazy Life.opf (the Package Document):

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <rootfiles>
        <rootfile full-path="EPUB/My_Crazy_Life.opf"
            media-type="application/oebps-package+xml" />
    </rootfiles>
</container>
                    

The following example shows SVG and XHTML Renditions bundled in the same container:

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <rootfiles>
        <rootfile full-path="SVG/Sandman.opf"
            media-type="application/oebps-package+xml" />
        <rootfile full-path="XHTML/Sandman.opf"
            media-type="application/oebps-package+xml" />
    </rootfiles>
</container>
                    

The optional links element identifies resources necessary for the processing of the OCF ZIP Container. Each of its child link elements must include an href attribute whose value identifies the location of a resource. Each link element also must include a rel attribute whose value identifies the relationship of the resource, and may include a media-type attribute whose value must be a media type [RFC2046] that specifies the type and format of the resource referenced by the link.

The value of the rootfile element full-path attribute and the link element href attribute must contain a path component [RFC3986] which must take the form of a path-rootless [RFC3986] only. The path components are relative to the Root Directory.

OCF Processors must ignore foreign elements and attributes within a container.xml file.

  3.5.2.2 Encryption File (encryption.xml)

The optional encryption.xml file in the META-INF directory holds all encryption information on the contents of the container. If any resource within the container is encrypted, encryption.xml must be present to indicate that the resource is encrypted and provide information on how it is encrypted.

This file is an XML document whose root element is encryption. The encryption element contains child elements of type EncryptedKey and EncryptedData as defined by [XML ENC Core]. An EncryptedKey element describes each encryption key used in the container, while an EncryptedData element describes each encrypted file. Each EncryptedData element refers to an EncryptedKey element, as described in XML Encryption.

The contents of the encryption.xml file must be valid to the schema in Schema for encryption.xml.

OCF encrypts individual files independently, trading off some security for improved performance, allowing the container contents to be incrementally decrypted. Encryption in this way exposes the directory structure and file naming of the whole package.

OCF uses XML Encryption [XML ENC Core] to provide a framework for encryption, allowing a variety of algorithms to be used. XML Encryption specifies a process for encrypting arbitrary data and representing the result in XML. Even though an OCF Abstract Container might contain non-XML data, XML Encryption can be used to encrypt all data in an OCF Abstract Container. OCF encryption supports only the encryption of entire files within the container, not parts of files. The encryption.xml file, if present, must not be encrypted.

Encrypted data replaces unencrypted data in an OCF Abstract Container. For example, if an image named photo.jpeg is encrypted, the contents of the photo.jpeg resource should be replaced by its encrypted contents. Within the ZIP directory, encrypted files should be stored rather than Deflate-compressed.

Note that some situations require obfuscating the storage of embedded resources referenced by a Rendition to tie them to the "parent" EPUB Publication and make them more difficult to extract for unrestricted use (e.g., fonts). Although obfuscation is not encryption, the encryption.xml file is used in conjunction with the IDPF resource obfuscation algorithm to identify resources that need to be de-obfuscated before they can be used.

The following files must not be encrypted, regardless of whether default or specific encryption is requested:

mimetype
META-INF/container.xml
META-INF/encryption.xml
META-INF/manifest.xml
META-INF/metadata.xml
META-INF/rights.xml
META-INF/signatures.xml
Package Document

Signed resources may subsequently be encrypted using the Decryption Transform for XML Signature [XML SIG Decrypt]. This feature enables an application such as an OCF agent to distinguish data that was encrypted before signing from data that was encrypted after signing. Only data that was encrypted after signing must be decrypted before computing the digest used to validate the signature.

In the following example, adapted from Section 2.2.1 of [XML ENC Core] the resource image.jpeg is encrypted using a symmetric key algorithm (AES) and the symmetric key is further encrypted using an asymmetric key algorithm (RSA) with a key of John Smith.

<encryption
    xmlns ="urn:oasis:names:tc:opendocument:xmlns:container"
    xmlns:enc="http://www.w3.org/2001/04/xmlenc#"
    xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
    <enc:EncryptedKey Id="EK">
        <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#rsa-1_5"/>
        <ds:KeyInfo>
            <ds:KeyName>John Smith</ds:KeyName>
        </ds:KeyInfo>
        <enc:CipherData>
            <enc:CipherValue>xyzabc</enc:CipherValue>
        </enc:CipherData>
    </enc:EncryptedKey>
    <enc:EncryptedData Id="ED1">
        <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#kw-aes128"/>
        <ds:KeyInfo>
            <ds:RetrievalMethod URI="#EK"
                Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey"/>
        </ds:KeyInfo>
        <enc:CipherData>
            <enc:CipherReference URI="image.jpeg"/>
        </enc:CipherData>
    </enc:EncryptedData>
</encryption>
                    

Order of Compression and Encryption

When stored in a ZIP container, streams of data with Non-Codec content types should be compressed before they are encrypted, and Deflate compression must be used. This practice ensures that file entries stored in the ZIP container have a smaller size.

Streams of data with Codec content types should not be compressed before they are encrypted. In such cases, additional compression would introduce unnecessary processing overhead at production time (especially with large resource files), and would impact audio/video playback performance at consumption time. In some cases, the combination of compression with some encryption schemes might even compromise the ability of Reading Systems to handle partial content requests (e.g. HTTP byte ranges), due to the technical impossibility to determine the length of the full resource ahead of media playback (e.g. HTTP Content-Length header).

Streams of data that are compressed before they are encrypted should provide additional EncryptionProperties metadata to specify the size of the initial resource (i.e., before compression and encryption), as per the Compression XML element defined below. Streams of data that are not compressed before they are encrypted may provide the additional EncryptionProperties metadata to specify the size of the initial resource (i.e., before encryption).

Element Name

Compression

Namespace

http://www.idpf.org/2016/encryption#compression

Usage

Optional child of EncryptionProperty.

Attributes
Method [required]

Identifies the compression method used.

Value is either "0" (no compression) or "8" (Deflate algorithm).

OriginalLength [required]

Represents the size of the initial resource (number of bytes).

Value is a positive integer.

Content Model

Empty

The following example shows an MP4 file that that has been Deflate compressed and whose original size was 3500000 bytes.

<encryption xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <enc:EncryptedData xmlns:enc="http://www.w3.org/2001/04/xmlenc#">
        ...
        <enc:CipherData>
            <enc:CipherReference URI="OEPBS/video.mp4"/>
        </enc:CipherData>
        <enc:EncryptionProperties>
            <enc:EncryptionProperty xmlns:ns="http://www.idpf.org/2016/encryption#compression">
                <ns:Compression Method="8" OriginalLength="3500000"/>
            </enc:EncryptionProperty>
        </enc:EncryptionProperties>
        ...
    </enc:EncryptedData>
</encryption>

  3.5.2.3 Manifest File (manifest.xml)

The optional manifest.xml file in the META-INF directory provides a manifest of files in the Container.

The OCF specification does not mandate a format for the manifest.

Note that the manifest element contained within a Package Document specifies the one and only manifest used for processing a given Rendition. Ancillary manifest information contained in the ZIP archive or in the optional manifest.xml file must not be used for processing the Rendition.

  3.5.2.4 Metadata File (metadata.xml)

The optional META-INF/metadata.xml file in the META-INF directory, if present, must be used for container-level metadata.

If the metadata.xml file is present, its contents should be only namespace-qualified elements [XMLNS]. The file should contain the root element metadata in the namespace http://www.idpf.org/2013/metadata, but other root elements are allowed for backwards compatibility. Reading Systems should ignore metadata.xml files with unrecognized root elements.

This version of the OCF specification does not define metadata for use in the metadata.xml file. Container-level metadata may be defined in future versions of this specification and in IDPF-defined EPUB extension specifications.

  3.5.2.5 Rights Management File (rights.xml)

The optional rights.xml file in the META-INF directory is reserved for digital rights management (DRM) information for trusted exchange of EPUB Publications among rights holders, intermediaries, and users.

This version of the OCF specification does not require a specific format for DRM information, but a future version might. The contents of the rights.xml should be only namespace-qualified elements [XMLNS] to avoid collision with a future format.

When the rights.xml file is not present, no part of the container is rights governed at the container level. Rights expressions might exist within the contained Renditions.

If the rights.xml file is not present, no part of the OCF Abstract Container is rights governed.

  3.5.2.6 Digital Signatures File (signatures.xml)

The optional signatures.xml file in the META-INF directory holds digital signatures for the container and its contents. The contents of this file must be valid to the schema in Schema for signatures.xml.

The root element of the signatures.xml file is the signatures element. This element contains child elements of type Signature, as defined by [XML DSIG Core]. Signatures can be applied to an EPUB Publication as a whole or to its parts, and can specify the signing of any kind of data (i.e., not just XML).

When the signatures.xml file is not present, no part of the container is digitally signed at the container level. Digital signing might exist within the EPUB Publication.

When a data signature is created for the container, the signature should be added as the last child Signature element of the signatures element.

note

Each Signature in the signatures.xml file identifies by IRI the data to which the signature applies, using the [XML DSIG Core] Manifest element and its Reference sub-elements. Individual contained files might be signed separately or together. Separately signing each file creates a digest value for the resource that can be validated independently. This approach might make a Signature element larger. If files are signed together, the set of signed files can be listed in a single XML Signature Manifest element and referenced by one or more Signature elements.

Any or all files in the container can be signed in their entirety with the exception of the signatures.xml file since that file will contain the computed signature information. Whether and how the signatures.xml file is signed depends on the objective of the signer.

If the signer wants to allow signatures to be added or removed from the container without invalidating the signer’s signature, the signatures.xml file should not be signed.

If the signer wants any addition or removal of a signature to invalidate the signer’s signature, the Enveloped Signature transform defined in Section 6.6.4 of [XML DSIG Core] can be used to sign the entire preexisting signature file excluding the Signature being created. This transform would sign all previous signatures, and it would become invalid if a subsequent signature was added to the package.

note

If the signer wants the removal of an existing signature to invalidate the signer’s signature, but also wants to allow the addition of signatures, an XPath transform could be used to sign just the existing signatures. The details of such a transform are outside the scope of this specification, however.

The [XML DSIG Core] specification does not associate any semantics with a signature; an agent might include semantic information, for example, by adding information to the Signature element that describes the signature. The [XML DSIG Core] specification describes how additional information can be added to a signature, such as by use the SignatureProperties element.

The following XML expression shows the content of an example signatures.xml file, and is based on the examples found in Section 2 of [XML DSIG Core]. It contains one signature, and the signature applies to two resources, EPUB/book.xhtml and EPUB/images/cover.jpeg, in the container.

<signatures xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <Signature Id="sig" xmlns="http://www.w3.org/2000/09/xmldsig#">
        <SignedInfo>
            <CanonicalizationMethod 
                Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
            <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/>
            <Reference URI="#Manifest1">
                <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
                <DigestValue>j6lwx3rvEPO0vKtMup4NbeVu8nk=</DigestValue>
            </Reference>
        </SignedInfo>
        <SignatureValue>…</SignatureValue>
        <KeyInfo>
            <KeyValue>
                <DSAKeyValue>
                    <P>…</P><Q>…</Q><G>…</G><Y>…</Y> 
                </DSAKeyValue>
            </KeyValue>
        </KeyInfo>
        <Object>
            <Manifest Id="Manifest1">
                <Reference URI="EPUB/book.xhtml">                    
                    <Transforms>                                                
                        <Transform
                            Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>                        
                    </Transforms>
                    <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
                    <DigestValue></DigestValue>
                </Reference>
                <Reference URI="EPUB/images/cover.jpeg">
                    <Transforms>                                                
                        <Transform
                            Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>                        
                    </Transforms>
                    <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
                    <DigestValue></DigestValue>
                </Reference>
            </Manifest>
        </Object>
    </Signature> 
</signatures>
                    

  4 OCF ZIP Container

  4.1 Introduction

This section is informative

An OCF ZIP Container is a physical single-file manifestation of an OCF Abstract Container. The Container is used:

  • to exchange in-progress EPUB Publication between different individuals and/or different organizations;

  • to provide EPUB Publications from a publisher or conversion house to the distribution or sales channel; and

  • to deliver EPUB Publications to EPUB Reading Systems or users.

  4.2 ZIP File Requirements

An OCF ZIP Container uses the ZIP format as specified by [ZIP APPNOTE 6.3.3], but with the following constraints and clarifications:

  •   The contents of the OCF ZIP Container must be a conforming OCF Abstract Container.

  •   OCF ZIP Containers must not use the features in the ZIP application note [ZIP APPNOTE 6.3.3] that allow ZIP files to be split across multiple storage media. OCF Processors must treat any OCF files that specify that the ZIP file is split across multiple storage media as being in error.

  •    OCF ZIP Containers must include only stored (uncompressed) and Deflate-compressed ZIP entries within the ZIP archive. OCF Processors must treat any OCF Containers that use compression techniques other than Deflate as being in error.

  •   OCF ZIP Containers may use the ZIP64 extensions defined as "Version 1" in section V, subsection G of the application note [ZIP APPNOTE 6.3.3] and should use only those extensions when the content requires them. OCF Processors must support the ZIP64 extensions defined as "Version 1".

  •   OCF ZIP Containers must not use the encryption features defined by the ZIP format; instead, encryption must be done using the features described in Encryption File (encryption.xml). OCF Processors must treat any other OCF ZIP Containers that use ZIP encryption features as being in error.

  •   It is not a requirement that OCF Processors preserve information from an OCF ZIP Container through load and save operations that is not defined within the OCF Abstract Container; in particular, an OCF Processor does not have to preserve CRC values, comment fields or fields that hold file system information corresponding to a particular operating system (e.g., External file attributes and Extra field).

  •   OCF ZIP Containers must encode File System Names using UTF-8 [Unicode].

The following constraints apply to particular fields in the OCF ZIP Container archive:

  •   In the local file header table, OCF ZIP Containers must set the version needed to extract fields to the values 10, 20 or 45 in order to match the maximum version level needed by the given file (e.g., 20 if Deflate is needed, 45 if ZIP64 is needed). OCF Processors must treat any other values as being in error.

  •   In the local file header table, OCF ZIP Containers must set the compression method field to the values 0 or 8. OCF Processors must treat any other values as being in error.

  •   OCF Processors must treat OCF ZIP Containers with an Archive decryption header or an Archive extra data record as being in error.

  4.3 OCF ZIP Container Media Type Identification

The first file in the OCF ZIP Container must be the mimetype file. The contents of this file must be the MIME media type [RFC2046] string application/epub+zip encoded in US-ASCII [US-ASCII].

The contents of the mimetype file must not contain any leading padding or white space, must not begin with the Unicode signature (or Byte Order Mark), and the case of the media type string must be exactly as presented above. The mimetype file additionally must not be compressed or encrypted, and there must not be an extra field in its ZIP header.

note

Refer to Appendix C, The application/epub+zip Media Type for further information about the application/epub+zip media type.

  5 Resource Obfuscation

  5.1 Introduction

This section is informative

Since an OCF ZIP Container is fundamentally a ZIP file, commonly available ZIP tools can be used to extract any unencrypted content stream from the package. Moreover, the nature of ZIP files means that their contents might appear like any other native container on some systems (e.g., a folder).

While this simplicity of ZIP files is quite useful, it also poses a problem when ease of extraction of resources is not a desired side-effect of not encrypting them. An Author who wishes to include a third-party font, for example, typically does not want that font extracted and re-used by others. More critically, many commercial fonts allow embedding, but embedding a font implies making it an integral part of the EPUB Publication, not just providing the original font file along with the content.

Since integrated ZIP support is so ubiquitous in modern operating systems, simply placing a font in the ZIP archive is insufficient to signify that it is not intended to be reused in other contexts. This uncertainty can undermine the otherwise very useful font embedding capability of EPUB Publications.

In order to discourage reuse of the font, some font vendors might only allow use of their fonts in EPUB Publications if those fonts are bound in some way to the EPUB Publication. That is, if the font file cannot be installed directly for use on an operating system with the built-in tools of that computing device, and it cannot be directly used by other EPUB Publications.

It is beyond the scope of this specification to provide a digital rights management or enforcement system for such resources. This section instead defines a method of obfuscation that will require additional work on the part of the final OCF recipient to gain general access to any obfuscated resources.

Note that no claim is made in this specification, or by the IDPF, that this constitutes encryption, nor does it guarantee that the resource will be secure from copyright infringement. It is the hope of the IDPF, however, that this algorithm will meet the requirements of most vendors who require some assurance that their resources cannot simply be extracted by unzipping the Container.

In the case of fonts, the primary use case for obfuscation, the defined mechanism will simply provide a stumbling block for those who are unaware of the license details. It will not prevent a determined user from gaining full access to the font. Given an OCF Container, it is possible to apply the algorithms defined to extract the raw font file. Whether this method of obfuscation satisfies the requirements of individual font licenses remains a question for the licensor and licensee.

  5.2 Obfuscation Key

The key used in the obfuscation algorithm is derived from the Unique Identifier of the Default Rendition.

All white space characters, as defined in section 2.3 of the XML 1.0 specification [XML], must be removed from this identifier — specifically, the Unicode code points U+0020, U+0009, U+000D and U+000A.

A SHA-1 digest of the UTF-8 representation of the resulting string should be generated as specified by the Secure Hash Standard [SHA-1]. This digest is then directly used as the key for the algorithm.

  5.3 Obfuscation Algorithm

The algorithm employed to obfuscate resource consists of modifying the first 1040 bytes (~1KB) of the file. In the unlikely event that the file is less than 1040 bytes, then the entire file will be modified.

To obfuscate the original data, the result of performing a logical exclusive or (XOR) on the first byte of the raw file and the first byte of the obfuscation key is stored as the first byte of the embedded resource.

This process is repeated with the next byte of source and key, and continues until all bytes in the key have been used. At this point, the process continues starting with the first byte of the key and 21st byte of the source. Once 1040 bytes have been encoded in this way (or the end of the source is reached), any remaining data in the source is directly copied to the destination.

Obfuscation of resources must occur before they are compressed and added to the OCF Container. Note that as obfuscation is not encryption, this requirement is not a violation of the one in Encryption File (encryption.xml) to compress resources before encrypting them.

The following pseudo-code exemplifies the obfuscation algorithm.

set ocf to OCF container file
set source to file
set destination to obfuscated file
set keyData to key for file
set outer to 0
while outer < 52 and not (source at EOF)
    set inner to 0
    while inner < 20 and not (source at EOF)
        read 1 byte from source     //Assumes read advances file position
        set sourceByte to result of read
        set keyByte to byte inner of keyData
        set obfuscatedByte to (sourceByte XOR keyByte)
        write obfuscatedByte to destination
        increment inner
    end while
    increment outer
end while
if not (source at EOF) then
    read source to EOF
    write result of read to destination
end if
Deflate destination
store destination as source in ocf
            

To get the original font data back, the process is simply reversed: the source file becomes the obfuscated data and the destination file will contain the raw data.

note

The obfuscation of fonts was allowed prior to EPUB 3.0.1, but the order of obfuscation and compression was not specified. As a result, invalid fonts might be encountered after decompression and de-obfuscation. In such instances, de-obfuscating the data before inflating it might return a valid font. This specification does not require support for this method of retrieval, as it is not compliant with this version of this specification, but it needs to be considered when supporting EPUB 3 content generally.

  5.4 Specifying Obfuscated Resources

Although not technically encrypted data, all obfuscated resources must have an entry in the encryption.xml file accompanying the EPUB Publication (see Encryption File (encryption.xml)).

An EncryptionMethod element must be included for each obfuscated resource. Each must include a child EncryptedData element whose Algorithm attribute is set to the value http://www.idpf.org/2008/embedding. The presence of this attribute signals the use of the algorithm described in this specification. The path to the obfuscated resource must be listed in the CipherReference child of the CipherData element.

The following example shows an entry for an obfuscated font in the encryption.xml file.

<encryption 
    xmlns="urn:oasis:names:tc:opendocument:xmlns:container" 
    xmlns:enc="http://www.w3.org/2001/04/xmlenc#">
    <enc:EncryptedData> 
        <enc:EncryptionMethod Algorithm="http://www.idpf.org/2008/embedding"/> 
        <enc:CipherData> 
            <enc:CipherReference URI="EPUB/Fonts/BKANT.TTF"/>  
        </enc:CipherData> 
    </enc:EncryptedData>  
</encryption> 

                

To prevent trivial copying of the embedded resource to other EPUB Publications, the obfuscation key must not be provided in the encryption.xml file.

  Appendix A. Schemas

  A.1 Schema for container.xml

The schema for container.xml files is available at http://www.idpf.org/epub/31/schema/ocf-container-31.rnc.

Validation using this schema requires a processor that supports [RelaxNG] and [XSD-DATATYPES].

  A.2 Schema for encryption.xml

The schema for encryption.xml files is included in [XML Sec RNG Schemas].

  A.3 Schema for signatures.xml

The schema for signatures.xml files is included in [XML Sec RNG Schemas].

  Appendix B. Example

The following example demonstrates the use of this OCF format to contain a signed and encrypted EPUB Publication within an OCF ZIP Container.

  
Ordered list of files in the OCF ZIP Container
mimetype
META-INF/container.xml
META-INF/signatures.xml
META-INF/encryption.xml
EPUB/As You Like It.opf
EPUB/book.html
EPUB/nav.html
EPUB/images/cover.png
  
The contents of the mimetype file
application/epub+zip
  
The contents of the META-INF/container.xml file
<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <rootfiles>
        <rootfile full-path="EPUB/As_You_Like_It.opf"
            media-type="application/oebps-package+xml" />
    </rootfiles>
</container>
  
The contents of the META-INF/signatures.xml file
<signatures xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <Signature Id="AsYouLikeItSignature" xmlns="http://www.w3.org/2000/09/xmldsig#">
        
        <!-- 
             SignedInfo is the information that is actually signed.
             In this case the SHA1 algorithm is used to sign the 
             canonical form of the XML documents enumerated in the
             Object element below
        -->
        <SignedInfo>
            <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
            <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/>
            <Reference URI="#AsYouLikeIt">
                <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
                <DigestValue>…</DigestValue>
            </Reference>
        </SignedInfo>
        
        <!-- 
             The signed value of the digest above using the DSA 
             algorithm
        -->
        <SignatureValue>…</SignatureValue>
        
        <!-- The key to use to validate the signature -->
        <KeyInfo>
            <KeyValue>
                <DSAKeyValue>
                    <P>…</P>
                    <Q>…</Q>
                    <G>…</G>
                    <Y>…</Y>
                </DSAKeyValue>
            </KeyValue>
        </KeyInfo>
        
        <!-- 
             The list documents to sign. Note that the canonical 
             form of XML documents is signed while the binary form
             of the other documents is used
        -->
        <Object>
            <Manifest Id="AsYouLikeIt">
                <Reference URI="EPUB/As You Like It.opf">
                    <Transforms>
                        <Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
                    </Transforms>
                    <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
                    <DigestValue></DigestValue>
                </Reference>
                <Reference URI="EPUB/book.html">
                    <Transforms>
                        <Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
                    </Transforms>
                    <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
                    <DigestValue></DigestValue>
                </Reference>
                <Reference URI="EPUB/images/cover.png">
                    <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
                    <DigestValue></DigestValue>
                </Reference>
            </Manifest>
        </Object>        
    </Signature>
</signatures>
  
The contents of the META-INF/encryption.xml file
<?xml version="1.0"?>
<encryption xmlns="urn:oasis:names:tc:opendocument:xmlns:container"
    xmlns:enc="http://www.w3.org/2001/04/xmlenc#" xmlns:ds="http://www.w3.org/2000/09/xmldsig#">

    <!--
         The RSA encrypted AES-128 symmetric key used to encrypt
         the data
    -->
    <enc:EncryptedKey Id="EK">
        <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#rsa-1_5"/>
        <ds:KeyInfo>
            <ds:KeyName>John Smith</ds:KeyName>
        </ds:KeyInfo>
        <enc:CipherData>
            <enc:CipherValue>xyzabc…</enc:CipherValue>
        </enc:CipherData>
    </enc:EncryptedKey>

    <!-- 
         Each EncryptedData block identifies a single document
         that has been encrypted using the AES-128 algorithm.
         The data remains stored in its encrypted form in the
         original file within the container.
    -->
    <enc:EncryptedData Id="ED1">
        <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#kw-aes128"/>
        <ds:KeyInfo>
            <ds:RetrievalMethod URI="#EK" Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey"/>
        </ds:KeyInfo>
        <enc:CipherData>
            <enc:CipherReference URI="EPUB/book.html"/>
        </enc:CipherData>
    </enc:EncryptedData>

    <enc:EncryptedData Id="ED2">
        <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#kw-aes128"/>
        <ds:KeyInfo>
            <ds:RetrievalMethod URI="#EK" Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey"/>
        </ds:KeyInfo>
        <enc:CipherData>
            <enc:CipherReference URI="EPUB/images/cover.png"/>
        </enc:CipherData>
    </enc:EncryptedData>

</encryption>
  
The contents of the EPUB/As You Like It.opf file
<?xml version="1.0"?>
<package version="3.1" 
         xml:lang="en"
         xmlns="http://www.idpf.org/2007/opf" 
         unique-identifier="pub-id">
    
    <metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
        <dc:identifier 
              id="pub-id">urn:uuid:B9B412F2-CAAD-4A44-B91F-A375068478A0</dc:identifier>
        
        <dc:language>en</dc:language>
        
        <dc:title>As You Like It</dc:title>
        
        <dc:creator id="creator">William Shakespeare</dc:creator>
        
        <meta property="dcterms:modified">2000-03-24T00:00:00Z</meta>
        
        <dc:publisher>Project Gutenberg</dc:publisher>
        
        <dc:date>2000-03-24</dc:date>
        
        <meta property="dcterms:dateCopyrighted">9999-01-01</meta>
        
        <dc:identifier 
              id="isbn13">urn:isbn:9780741014559</dc:identifier>
        
        <dc:identifier id="isbn10">0-7410-1455-6</dc:identifier>
        
        <link rel="xml-signature" 
              href="../META-INF/signatures.xml#AsYouLikeItSignature"/>
    </metadata>
    
    <manifest>
        <item id="r4915" 
              href="book.html" 
              media-type="application/xhtml+xml"/>
        <item id="r7184" 
              href="images/cover.png" 
              media-type="image/png"/>
        <item id="nav" 
              href="nav.html" 
              media-type="application/xhtml+xml" 
              properties="nav"/>
    </manifest>
    
    <spine>
        <itemref idref="r4915"/>
    </spine>
</package>

  Appendix C. The application/epub+zip Media Type

This appendix registers the media type application/epub+zip for the EPUB Open Container Format (OCF).

An OCF file is a container technology based on the ZIP archive format. It is used to encapsulate the Renditions of EPUB Publications. OCF and its related standards are maintained and defined by the International Digital Publishing Forum (IDPF).

MIME media type name:

application

MIME subtype name:

epub+zip

Required parameters:

None.

Optional parameters:

None.

Encoding considerations:

OCF files are binary files in ZIP (http://www.iana.org/assignments/media-types/application/zip) format.

Security considerations:

All processors that read OCF files should rigorously check the size and validity of data retrieved.

In addition, because of the various content types that can be embedded in OCF files, it is possible that application/epub+zip may describe content that has security implications beyond those described here. However, only in the case where the processor recognizes and processes the additional content, or where further processing of that content is dispatched to other processors, would security issues potentially arise. And in that case, they would fall outside the domain of this registration document.

Security considerations that apply to application/zip also apply to OCF files.

Interoperability considerations:

None.

Published specification:

This media type registration is for the EPUB Open Container Format (OCF), as described by the EPUB Open Container Format (OCF) 3.0 specification located at http://www.idpf.org/epub/30/spec/epub30-ocf.html.

The EPUB OCF 3.0 specification supersedes the Open Container Format 2.0.1 specification, which is located at http://www.idpf.org/doc_library/epub/OCF_2.0.1_draft.doc and which also uses the application/epub+zip media type.

Applications which use this media type:

This media type is in wide use for the distribution of ebooks in the EPUB format. The following list of applications is not exhaustive.

  • Adobe Digital Editions

  • Aldiko

  • Azardi

  • Apple iBooks

  • Barnes & Noble Nook

  • Calibre

  • Google Books

  • Ibis Reader

  • MobiPocket reader

  • Sony Reader

  • Stanza

Additional information:
Magic number(s):

0: PK 0x03 0x04, 30: mimetype, 38: application/epub+zip

File extension(s):

OCF files are most often identified with the extension .epub.

Macintosh File Type Code(s):

ZIP

Fragment Identifiers:

The IDPF maintains a registry of linking schemes at http://www.idpf.org/epub/linking/. Some of these schemes define custom fragment identifiers that resolve to application/epub+zip and application/oebps-package+xml documents.

Person & email address to contact for further information:

William McCoy, [email protected]

Intended usage:

COMMON

Author/Change controller:

International Digital Publishing Forum (http://www.idpf.org)

  Acknowledgements and Contributors

This section is informative

EPUB has been developed by the International Digital Publishing Forum in a cooperative effort, bringing together publishers, vendors, software developers, and experts in the relevant standards.

The EPUB 3.1 specifications were prepared by the International Digital Publishing Forum’s EPUB Maintenance Working Group, operating under a charter approved by the membership in July 2015, under the leadership of:

Active members of the working group included:

IDPF Members

Invited Experts/Observers

For more detailed acknowledgements and information about contributors to each version of EPUB, refer to Acknowledgements and Contributors [EPUB3 Overview].

  References

Normative References

[CSS Snapshot] CSS Snapshot .

[EPUB 3.1] EPUB 3.1 .

[HTML] HTML .

[RFC2046] Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types (RFC 2046) . N. Freed, N. Borenstein. November 1996.

[RFC3986] Uniform Resource Identifier (URI): Generic Syntax (RFC 3986) . Berners-Lee, et al. January 2005.

[RFC3987] Internationalized Resource Identifiers (IRIs) (RFC 3987) . M Duerst, et al. January 2005.

[US-ASCII] "Coded Character Set - 7-bit American Standard Code for Information Interchange", ANSI X3.4, 1986..

[Unicode] The Unicode Consortium. The Unicode Standard..

[XML] Extensible Markup Language (XML) 1.0 (Fifth Edition) . T. Bray, et al. 26 November 2008.

[XML DSIG Core] XML-Signature Syntax and Processing Version 1.1 . M. Bartel, et al. 11 April 2013.

[XML ENC Core] XML Encryption Syntax and Processing Version 1.1 . D. Eastlake, et al. 11 April 2013.

[XML SIG Decrypt] Decryption Transform for XML Signature . M. Hughes, et al. 10 December 2002.

[XML Sec RNG Schemas] XML Security RELAX NG Schemas (W3C Working Group Note) . Makoto Murata, et al. 11 April 2013.

[XMLNS] Namespaces in XML (Third Edition) . T. Bray, et al. 8 December 2009.

[XSD-DATATYPES] XML Schema Part 2: Datatypes Second Edition . Paul V. Biron et al. 28 October 2004.

[ZIP APPNOTE 6.3.3] ZIP File Format Specification . September 1, 2012. PKWARE, Inc..

Informative References

[EPUB3 Overview] EPUB 3.1 Overview .