EPUB Lightweight Content Protection: Use Cases & Requirements

Prepared for IDPF by Bill Rosenblatt, GiantSteps Media Technology Strategies (edited by IDPF mgmt.)
Publication Date: May 18, 2012

Introduction

This document is an initial statement of requirements for a potential “lightweight content protection” technology for EPUB®3 (“EPUB LCP”) for review and comment by IDPF members and other interested parties. For additional background and information on the review process, see: idpf.org/lcp_draft_reqs_announce.

Why Consider “Lightweight Content Protection” for EPUB?

Digital Rights Management (DRM) has been used on some eBook files since the beginning of the commercial eBook market during the first Internet bubble of the late 1990s. Although definitions of DRM vary somewhat, GiantSteps defines it (in a nutshell) as technology that encrypts content before distribution and requires users to employ special-purpose hardware or software to decrypt and view it. Thus, we distinguish DRM from other technologies for addressing unauthorized use of content (“piracy”), such as requiring users to log in to websites with usernames and passwords or “watermarking” files by inserting invisible data into them that denotes the user’s identity.

The EPUB standard has become the broadly adopted standard interchange and distribution format for eBooks and other digital publications. EPUB 2 defined the building blocks of file encryption, but not in a fully-specified way that guarantees interoperability among encryption solutions.

Most major publishers continue to require DRM in their eBook distribution agreements, and eBook retailers have used DRM to promote “lock-in” to their platforms. The lack of a standard DRM has led to fragmentation in the market, wherein different retailers use non-interoperable DRM schemes that are tied in with eBook reader devices or apps.

There is question about whether DRM does prevent piracy. DRM has various effects on the eBook market and the digital reading experience, but it is extremely difficult to quantify them; in fact, a 2010 U.S. Government Accountability Office study suggested that such quantification may never be possible.[1] However, in GiantSteps’ experience, two factors are axiomatic:

DRM does discourage “oversharing” (e.g. putting files on a website for anyone to download). Although DRMs will be and are cracked, breaking DRM acts as a “speed bump” that less determined users may judge not to be worth the trouble.
Cracking DRM is illegal in many countries, as discussed in more detail below, which puts a damper on the ease of use and availability of cracks to the general public.

In addition to these general practical facts about DRM, it is a goal for EPUB to be a reflowable (i.e., more cross-media-friendly), accessible, and Web-standards-based alternative to PDF for certain types of documents, such as STM publications; some commercial and ad hoc document distribution workflows require encryption for EPUB to be an attractive alternative to PDF, as PDF specifies a simple password-based encryption (which could be considered a form of lightweight DRM).

Meanwhile, attitudes towards DRM have shifted recently as the rapid rise of e-reading popularity has drawn more attention to it. There is a growing recognition among publishers that DRM has aspects that work against their interests, including its lack of user-friendliness and eBook distributors’ use of the technology to “lock in” consumers. (We describe these and related factors below.) Certain segments of the publishing industry are starting to conclude that they might be better off without DRM, while others insist on keeping or even strengthening it.

Yet a number of crucial points weigh in favor of adopting a standard “lightweight” DRM for EPUB now:

A standard method of protecting eBook content that becomes broadly adopted would materially increase interoperability, ameliorate some of the ease-of-use limitations in current DRMs, and may promote broader adoption of digital reading.
As explained below, it is possible to define a DRM that eases other usage restrictions, such as a reasonable level of sharing.
DRM is more widely agreed to be useful in non-retail business models, such as library e-lending; it would be useful to have a standard DRM to use in these scenarios for reasons stated in #1 above.
A lightweight encryption similar to PDF’s (which is not necessarily considered “DRM”) would promote content authors ability to protect EPUB content against unauthorized uses, easing the transition from PDF to EPUB in some ad hoc / enterprise document distribution scenarios.

What Is the Recommended Process For Defining “Lightweight Content Protection” for EPUB?

Because DRM technology is not well suited to development in typical working groups, GiantSteps recommends that IDPF solicit contributions of existing technology that could become the basis of a market-relevant solution for LCP within the next 12 calendar months or less. This draft requirements document is being distributed for member comment; in particular, to elicit additional use cases and requirements, and explicit feedback from the membership on the merits and priority of the potential activity.

After the review period, the IDPF could publish a modified version of this document as an RFP (Request for Proposal), which would solicit proposed contributions of technology (on a gratis or licensed basis). Submissions will be assessed against the requirements, and the most suitable submission selected to become the basis of LCP for EPUB. Should no suitable submission be forthcoming, IDPF could consider other alternatives, including developing a LCP for EPUB that is not based on any pre-existing technology.

The resulting EPUB LCP specifications, implementation, and related information would likely be published under licensing regimes. That is, content distributors and reading system suppliers would need to execute separate agreements with IDPF to obtain permission to use EPUB LCP and access to the specifications and reference implementation(s). Use of the technology would be expected to be charged on a cost recovery basis.

What Is “Lightweight Content Protection”?

After about fifteen years of experience with commercial DRM technologies, GiantSteps has come to understand the elements of DRM that make it “heavy” or “light.” There are three interrelated criteria to consider:

Implementation: lightweight DRMs are lower in implementation cost than heavyweight solutions, which particularly means per-client costs because they are the ones that affect makers of client hardware and software (EPUB readers in our case). Factors that contribute to client costs include:
1. Use of memory and processor power.
2. Requirements for secure hardware (e.g. secure memory for encryption keys) or software robustness (e.g. certain key obfuscation techniques that require lots of memory or processor power, or slow execution)
3. Complexity of client-server (or client-server-client) interactions.
User Experience:[2] lightweight DRMs are simpler to use and cause less end-user confusion. The FairPlay DRM for Apple iTunes is often cited as a good example of a DRM with a decent user experience.[3] Elements of good user experience include:
1. Does not get in the way of users’ expected interactions with the content.
2. Works seamlessly in the absence of a network connection – including permanent absence if the distributor ceases operations
Intrusion: Lightweight DRMs should be minimally intrusive to users. Intrusiveness generally falls into one or more of these categories:
1. Imposes excessive restrictions on user behavior, such as prohibiting uses that could well be permissible under copyright law.
2. Impinges on users’ privacy, e.g. through excessive reporting of users’ actions to a server without an opt-out provision or even proper notification. An example of this was SunnComm’s copy protection for audio CDs, which “phoned home” even when the user opted out.
3. Jeopardizes the security or integrity of the user’s device. An example of this was First4Internet’s CDMax copy protection technology for audio CDs, which was known to make changes in users’ PCs that rendered them susceptible to viruses.

A key objective in providing “some level of protection” is to take advantage of anticircumvention law, which is enacted in many countries including signatories to the Anti-Counterfeiting Trade Agreement (ACTA). The anticircumvention law in the United States, for example, is 17 USC § 1201(a), known colloquially as “DMCA” or “DMCA 1201” owing to its enactment as part of the Digital Millennium Copyright Act of 1998.[4]

Anticircumvention law makes it a criminal offense to circumvent an “effective technical protection measure.”[5] The law does a poor job of defining this term. While courts have refused to set a bar for “effectiveness” such that any technology below the bar is not entitled to protection under the law,[6] there is some evidence to suggest that a technology that is particularly ineffective could face such a challenge.[7] (Think of it this way: if there were a law criminalizing breaking locks on doors, and someone marketed a “lock” consisting of a piece of tape with “door lock” written on it, would the law apply to that product?)

Therefore, a lightweight DRM should be a technology that clearly enjoys protection under anticircumvention law. Technologies such as “watermarking” (inserting data about the user, retailer, and/or content into the file) do not qualify. If a technology is protected under anticircumvention law, then it’s illegal to distribute or use cracking tools for that technology. To be very clear on this point: we expect that a lightweight DRM (in reality, any DRM) will be cracked, and we are relying on anticircumvention law for some level of crack protection.

The worst kind of crack (from a security perspective) is one that we call a “one-click crack”: a crack that nontechnical users can use easily and that works quickly and permanently on any of one’s content files.[8] The DeCSS crack for DVDs is the classic example of a one-click crack. Put another way, a one-click crack is equivalent in user effort to a format conversion program, such as is available in apps such as Calibre (for publications) and iTunes (for music files). Yet even one-click crack are protected under anticircumvention law, while format conversion programs are not. Therefore, a lightweight DRM that is susceptible to one-click crack has more protection than, say, a watermark removal tool.

What Is a “Heavyweight” DRM?

In contrast to the above criteria for Lightweight DRM, it may be useful to provide some specific information on features and technologies found in “heavyweight” DRMs. Such attributes can be broken down into two categories:

Security enhancements: technologies that strengthen the DRM’s security, crack resistance, and crack recoverability:
1. Robustness Rules: these are sets of technical requirements attached to licensing agreements for DRM technologies. They specify in great detail the steps that implementers must take to make the DRM as robust (crack-proof) as possible. They are particularly relevant for all-software implementations (i.e. those that do not rely on hardware security). Robustness Rules are generally designed to help ensure that a DRM can only be broken with “professional” tools and techniques, as opposed to the “one-click” methods described above. For software implementations, Robustness Rules tend to dictate the use of sophisticated key-protection techniques that are complex and expensive to implement.[9] Most modern audiovisual-oriented DRMs, including Microsoft PlayReady, Intertrust Marlin, and Open Mobile Alliance (OMA) DRM 2.x have robustness rules as part of their licensing agreements.
2. Execution monitoring: the ability to monitor behavior on the client device or app for suspicious activity, such as the presence of cracker-installed code that steals keys or the use of debugging tools to reverse-engineer client functionality.
3. Recoverability: the ability to recover from or limit the impact of cracks once they are detected. This includes capabilities such as key revocation and field-upgradable protection. With key revocation, the server will publish a list of revoked keys to all clients on a periodic basis. Clients must check keys against these lists before letting them be used. Field-upgradable protection is the ability to change the DRM client code on the fly from the server, perhaps by changing variables in a code diversity scheme so that it works differently and a crack no longer applies.
Business model enablers: technologies that enable content business models which are not possible with lightweight DRM (by itself). Examples include:
1. Domain authentication: Allowing a single user account to apply to multiple devices. Technologies that allow this include Adobe Content Server and UltraViolet. The problem is that this requires a server to communicate with devices continuously, which makes it impossible to use devices offline seamlessly.
2. License chaining: enabling licenses to be grouped together for purposes such as subscription models.
3. Master-slave schemes: downloading protected content onto one device (the “master”) and transferring it to other devices (“slaves”). This is one way to achieve interoperability, particularly with devices that do not have network connections.
4. Forward-and-delete: enabling “First Sale” a/k/a “Exhaustion” rights such as the right to lend, sell, or give to another user, after/during which the first user does not have access to the content. A limited version of this feature is present in Barnes & Noble’s publication DRM, which allows a 14-day lending period for some titles. The IEEE P1817 standard (Consumer Ownable Digital Personal Property) specifies a key management system that implements forward-and-delete. Lightweight DRMs allow such transfers but without the content being inaccessible by the first user after the transfer.

We should note, however, that defining EPUB LCP does not preclude the above capabilities and technologies; EPUB LCP is designed so that these capabilities can be added in specific implementations where appropriate. For example, there may well be an enhancement of EPUB LCP with stronger security for use in Higher Ed publishing, or another enhancement of EPUB LCP that supports license chaining for subscription download services in professional or STM publishing.[10]

Why Not Propose a “Heavyweight” DRM Standard for EPUB?

The DRM industry is subject to a single over-arching limitation: the entities that want DRM (i.e., publishers and copyright owners) do not typically pay for it. Instead, the cost of DRM is usually passed on to content distributors and retailers. Apart from its use for “lock-in,” these downstream entities have no incentive to protect content other than as a contractual obligation to content licensors. Thus it is understandable that distributors and retailers have been highly reluctant to pay for DRM-related features that do not directly benefit them.

The “heavyweight” DRM features described above are complex and expensive to implement. Some features, like robustness rule compliance, can hamper performance in addition to sometimes being inordinately expensive. Some lead to problems with usability and technical support.[11] And some heavyweight features cause problems with privacy and intrusion: for example, domain authentication and recoverability require frequent “phoning home.”

Furthermore, DRM interoperability is inherently in conflict with content distributors’ desires to “lock in” users.

Additionally, patents have been a longtime issue in DRM. A number of organizations received patents on DRM technology in the mid to late 1990s (and since then) but failed to commercialize their technologies, leading to a field that is unusually encumbered by patent licensing requirements. While it is impossible to say a priori what technologies may read on which patents, we can say generally that the more complex the DRM is, the more likely it is to read on one or more of them.

Finally, heavyweight DRM has generated significant resistance from consumers and consumer advocates, particularly in paid content business models, and this resistance has increased over time. Consumers object to intrusion (as described above), the technical and user experience glitches that are more likely to appear with more complex technology, and restrictions on content usage that correspond to usages of physical products to which they are accustomed (or which should be allowed by law). It may well be impossible to overcome these objections, at least without radical changes in industry economics. At the same time, simple encryption technology in PDF is widely adopted, with, for example, built-in PDF “Security Options” in print settings, without generating significant controversy. Encryption that can make watermarks more tamper-resistant and keep content from being modified or extracted in source form is significantly less controversial than DRM technology that involves network authentication and device activation and limits authorized users. As well, lightweight encryption enables legitimate content sharing when necessary or appropriate, which implicitly gives it more acceptability to consumers.[12]

All of these are reasons why downstream vendors (distributors, retailers, software vendors, reading device makers) have resisted complex, interoperable DRM technologies, especially standards-based ones that they do not control. In fact, by far the most successful open DRM standard thus far has been Version 1 of OMA DRM, which is a simple technology designed for low-functionality mobile devices and simple content such as graphics and ringtones. Moreover, Version 2 of OMA DRM, which is much more complex, has run into some of the limitations mentioned above and has so far failed to take off in the market. PDF file encryption is also widely used, but the vast majority of uses in commercial publishing (as opposed to corporate/enterprise applications) are to turn off functions like edit and print rather than to restrict reading.

To be clear, in exploring standardization of a lightweight content protection mechanism for EPUB, there is no recommendation to remove or deprecate the extensibility in the EPUB format that enables a multiplicity of proprietary heavyweight DRM mechanisms to be provided by vendors. That “ship has sailed” and there are applications where heavyweight DRM may be required.

EPUB LCP Use Cases

Here are the most basic use cases for EPUB LCP, which should make some of its design goals clear.

Purchase
1. Initial download and read on Device D1 (“Fulfillment”)
  1. Download File F1 onto Device D1
  2. Set password P1 for File F1 (user sets it or commerce software pre-assigns it)[13]
  3. Read file on Device D1 using Reading System
2. Subsequent read on Device D1
  1. Just open File F1 in Reading System
  2. This can be configured to require re-entry of the password or not.
3. Transfer the file to Device D2 (using any method – email, USB drive, cyberlocker, etc.)
  1. Open file on Device D2 with Password P1
  2. Read file on Device D2 using Reading System
4. Subsequent read on Device D2
  1. Open the file in Reading System
  2. This can be configured to require re-entry of the password or not.
5. Share with a Friend
  1. Send file to User U2 (using any method)
  2. User U2 loads file onto Device D3
  3. Send Password P1 to User U2 (using any method – IM, email, text, etc.)
  4. User U2 opens File F1 with P1 on D3 using Reading System
Library Lend
1. Download and read on Device D1
  1. Download the file onto Device D1
  2. Password P1 for File F1 is pre-assigned by library (e.g. library card number)
  3. Read file on Device D1 using Reading System
2. Subsequent read on Device D1
  1. Open the file in Reading System
3. Transfer the file to Device D2 (using any method: email, USB drive, cyberlocker, etc.)
  1. Authenticate User U1 on Device D2 with Password P1
  2. Read file on Device D2 using Reading System
4. Expiry
  1. Lending period elapses
  2. File no longer readable
5. Share with a Friend – same as 1)e) above.
Unauthorized Uses and Hacks
1. Changes to content
  1. Content is encrypted; no changes are possible unless encryption is hacked.
2. Extraction of content beyond configured limitations (see Requirement 2)b) below)
  1. Content is encrypted; no ad-hoc extraction is possible unless encryption is hacked.
3. Key discovery
  1. Hacker invents technique for discovering content encryption key
  2. Hacker creates Reading System (or plug-in to existing app, or standalone tool) that enables opening files without user authentication (i.e. discovers key, opens files)
  3. Both hacker and user of H1 are liable under anticircumvention law[14]
4. Password discovery
  1. Hacker creates software that intercepts file passwords before they are entered.
  2. Hacker surreptitiously (e.g., via a virus) installs password-stealing software on user’s device.
  3. Hacker receives stolen passwords.
  4. Passwords are limited to only that user’s files, and not even necessarily all of them, so impact of hack is limited.

EPUB LCP Requirements

Finally, here are requirements for EPUB LCP that technology vendors should be able to use to create proposed solutions:

Authentication:
1. Authentication is based on files, as opposed to user or device authentication, although other modes of authentication can be added (see below).
2. Each file has a password associated with it. The password is set at the time the user acquires the publication. It is normally necessary for the user to enter the password the first time he opens a file on a device, but not thereafter.
3. To read the publication on another device, the user must simply enter the password once on the other device (though see below for exceptions). To share the publication with another user (on another device), the password must be communicated to the other user; otherwise sharing with another user is exactly the same as the same user reading the publication on another device; see the Use Cases above.
4. Names of password field(s) must be included in each publication as metadata (e.g. “User ID,” “Password,” “Kennwort,” “암호,” “Library card number,” etc.)
5. Password semantics are not defined EPUB LCP but may be configured by the distributor as appropriate, for example:
  1. User settable on a per-file basis
  2. User settable, single password for all files
  3. User settable with minimum password strength requirements (e.g. at least N characters, one or more non-letters or non-alphanumerics, etc.)
  4. Re-entry required for every file open (or not)
  5. Distributor settable (e.g. to user’s email address, user ID, or credit card number)
6. Passwords must be stored on the client software or device in obfuscated and unrecoverable form, such as via a one-way hash function. In other words, any password-stealing hacks must be forced to grab the password between when the user types it in and when it is obfuscated.
Configurable limitations on usage:
1. Printing, expressed in amount of text that can be printed at a time and/or total content per title.
2. Copy to clipboard, expressed in amount of text that can be copied at a time and/or total content per title.
3. Editing (on/off).
4. Configurable usage start and end dates (e.g. for library lending), with time granularity down to the minute or smaller (to accommodate research libraries with lending periods measured in hours).
5. Optional separate passwords for some of the above operations, such as edit password or copy/print password.
Content protection:
1. At least one level of encryption is required: symmetric keys for encrypting content. Additional means may need to be specified to support all use cases, such as protocols (e.g. SSL) or asymmetric keys for protecting content keys in transit and on clients.
2. The algorithm used for content encryption must be the AES, at least 128-bit strength, operational mode up to the implementer. AES, the U.S. government standard invented in Belgium in 2001, is used for this purpose in virtually every modern DRM. Code libraries are in wide existence.
3. The algorithm used for content key protection must be a standard public/private key algorithm that is efficient and inexpensive to implement, such as RSA-1024 (inexpensive) or ECC-NIST-256 (efficient).
4. Key hiding methods are intentionally left up to the implementer.[15]
5. Key revocation is not supported as it requires phoning home (see above).
Format support:
1. The expectation is that EPUB LCP will build on the EPUB Open Container Format (OCF) definition of encryption.xml, rights.xml, etc. it is not anticipated that EPUB LCP will be directly applicable to formats other than EPUB.
Code library:
1. Client
  1. Maximally portable
  2. Well-documented client APIs for integration into e-reader apps and devices
  3. Callable through a wide variety of programming languages
  4. Reference implementation on a popular platform of the implementer’s choice (e.g. Android, iOS, Windows, Mac OS).
2. Server
  1. RESTful web service interface
  2. Reference implementation on Windows or Linux

[1] United States Government Accountability Office, Observations on Efforts to Quantify the Economic Effects of Counterfeit and Pirated Goods, April 2010. Available at http ://www .gao .gov /new .items /d 10423.pdf. The report covers physical (i.e. counterfeiting) as well as digital piracy and does not address DRM per se.

[2] The User Experience and Intrusion criteria are influenced by the work of Reclaim Your Game (http ://www .reclaimyourgame .com /), which rates DRMs for PC games according to some of these and other criteria.

[3] Ironically, FairPlay’s lack of interoperability beyond Apple devices and iTunes software, and Apple’s refusal to license the technology to other vendors, contribute to the quality of its user experience. Steve Jobs has claimed that one reason why Apple doesn’t license FairPlay to third parties is that doing so would introduce too much complexity to keep the user experience smooth.

[4] The US has the richest litigation history of any country (by far) around anticircumvention law, so our interpretation of the law stems from an understanding of US case precedents. Stated conversely, the standards used here are based on US case law, but other countries’ case laws around anticircumvention are not developed to the point where they could contravene the US precedents, so it is a “best guess for the time being” that US-derived standards will apply elsewhere.

[5] This is the United States definition; see previous footnote.

[6] See for example Universal v. Reimerdes, U.S. Court of Appeals for the 7th Circuit, 2000. The court was asked to rule that the CSS scheme for DVDs was too ineffective to qualify for protection under 17 USC § 1201(a); the court refused to rule one way or another.

[7] For example, SunnComm’s audio CD protection technology for PCs could be circumvented by holding down the Shift key on a PC while inserting a CD into the PC’s CD drive. The company sued a researcher who was going to publish a paper to that effect, but it backed down and withdrew the lawsuit. We believe that the suit was withdrawn under intense pressure from media industry interests that feared that the definition of “effective” would come under unwelcome scrutiny in court.

[8] In contrast, DRMs may have features that make cracks less effective, such as key revocability. The AACS scheme for Blu-rays is an example of this. Such features are independent of the strength of the encryption itself. It should be noted that AACS was itself cracked, although the crack was not a “one-click crack” as described herein.

[9] Examples include execution monitoring and code obfuscation. The latter includes techniques such as code diversity, virtual machines, and whitebox encryption – which are listed here in increasing order of complexity and required resources.

[10] The inspiration for layering functionality on top of a core DRM design comes from the Digital Media Project’s IDP (Interoperable DRM Platform, http ://www .dmpf .org /specs /index .php).

[11] An oft-cited example of this is Microsoft’s now-obsolete “PlaysForSure” (a/k/a Windows Media DRM 10) technology, which enabled master-slave content portability across many different devices. This technology suffered from user experience glitches and various technical problems.

[12] For example, some of the cracks that the U.S. Copyright Office has identified as temporary exceptions to the U.S. anticircumvention law, such as circumvention for use by persons with disabilities, can be easily accommodated.

[13] “Password” is used here in a generic fashion to mean “shared secret” in the cryptographic sense. Implementations could present shared secrets as, for example, combinations of user IDs and passwords, and use a user’s ID in an e-commerce or library lending system as part of the shared secret.

[14] there is a DMCA exception that allows the cracking of digital books for use by persons with disabilities. The LCP must not lock out assistive technologies that present the publications to persons with disabilities in a variety of ways.

[15] We intentionally avoid Robustness Rules of the type found in most “heavyweight” DRMs because of the cost and complexity overheads they create.