IDPF logo

EDUPUB Output Profile Baseline Spec

IDPF Member Submission - 15 November 2013

Table of Contents

1. Introduction

1.2 Purpose

2. EDUPUB Guiding Principles

3. EDUPUB Basics: Elements, Classes and ePub:type

3.1 Starting with XHTML5

Sample 1: paragraph

Sample 2: blockquote

Sample 3: paragraph and an unordered list

3.2 Adding Meaning Through the Class Attribute and ePub:type

Primary Classes

Sample 1: differentiating sections

Markup

Explanation

Sample 2: differentiating asides

Markup

Explanation

3.3 Design Classes

Sample 1: Design Class

3.4 Local_Semantic Class

Sample 1: Local_Semantic Class

3.5 Literal_Style Class

Sample 1: Literal_Style

Explanation

4. Content Hierarchy

5. Numbering

Sample 1: simple lists

Markup

Explanation

Sample 2: list with literal text marker

Markup

Explanation

Sample 4: numbered figure

Markup

6. Linking and ID's

Local IDs

Sample 1: reference to asset

Markup

Explanation

7. Page Markers

8. Embedded Content

8.3 Widgets & Gadgets

Gadget Content Model

Gadget Template

Standard Object Attributes (http://www.w3.org/TR/html-markup/object.html#object)

Custom Attributes

Fallback Mechanism

9. File Naming and Folder Layout

9.1 File Naming Conventions

9.2 package.opf File Structure

9.3 Chunking Level

10. Image Conversion Specifications

Appendix A: Concepts and Definitions

CML

Learning Objects

MathML

QTI (IMS Question & Test Interoperability)

SVG

Appendix B: Accessibility Features

Separation of style with ability to adjust size/color

Semantic inflection

Logical reading order

Images with textual descriptions

Scalable Vector Graphics (SVG)

Easy to navigate tables

MathML

Resizable text without loss of functionally

Escapable and skippable elements

Rich video

Synchronized audio

Text-To-Speech (TTS)

Highlighted words during narration

Footnotes outside of text stream


1. Introduction

As part of its business transformation strategy to standards based, digital-first content creation workflows, Pearson is developing EPUB3 output profiles for educational content. To encourage open standards across our industry Pearson is submitting to the IDPF one of these output profiles - EDUPUB - as the basis for an educational profile for EPUB3. In doing so Pearson hopes to provide a standard that any publisher, vendor, and content distributor can embrace and contribute to. 

The EDUPUB output profile will provide the following for educational content:

Benefits include:

By reducing the number of variable formats for similar content - publishers, vendors, and content distributors can devote more of their resources to improving content, services, and end user experience and less on creating redundant output formats that provide no competitive advantage - truly a win-win for all involved.

1.2 Purpose

The purpose of this document is to define EDUPUB concepts,  terms and requirements; and to provide a general understanding of the markup and how it semantically describes a publishers content. In conjunction with the EDUPUB Content Model this is the specification for EDUPUB.

2. EDUPUB Guiding Principles

  1. Is fully conformant XHTML5, not only in element names and attributes but in the recommendations for how those elements are used.
  2. Uses microformats (class attribute) to add semantic distinctions to XHTML5. Class values should be intelligible to authors, publishing staff, and vendors and avoid conflicting with existing industry-standard microformat names  –e.g., vcard, fn, n (See schema.org)

3. EDUPUB Basics: Elements, Classes and ePub:type

3.1 Starting with XHTML5[d]

An EDUPUB document will be immediately familiar to anyone who understands XHTML5[e]/EPUB 3. All the content markup is XHTML5. The following three code snippets are all fully conformant to XHTML5.

Sample 1: paragraph

<p>One of the most important factors in oil, gas, and coal exploration and extraction is technology. The ability to find and extract fossil fuels has changed dramatically with the development of new techniques...</p>

Sample 2: blockquote

<blockquote><p>The law isn’t justice. It’s a very imperfect mechanism. If you press exactly the right buttons and are also lucky, justice may show up in the answer. A mechanism is all the law was ever intended to be.</p></blockquote>

Sample 3: paragraph and an unordered list

<p>The advantages of a total patient care system include:</p>

<ul><li><p>Continuous, holistic, expert nursing care</p></li>

<li><p>Total accountability for the nursing care of the assigned patient(s) for that shift</p></li>

<li><p>Continuity of communication with the patient, family, physician(s), and staff from other departments</p></li></ul>

3.2 Adding Meaning Through the Class Attribute and ePub:type

The XHTML5 class attribute enables users to describe content in their own terms. EDUPUB uses the class attribute to provide (additional) semantic meaning. For example, the section element is defined in the W3C specification as "a generic section of a document or application". EDUPUB defines specific classes for "part, chapter, section..." which provides additional semantic information.

Primary Classes

Primary class names are a fixed list of editorial terms for important publishing, navigation and accessibility structures (e.g., "summary", "sidebar", "objective", "nav", "longdesc") elements.

The class names represent a superset of the semantics defined in the EPUB 3 Structural Semantics Vocabulary. Where possible, class names were derived from the corresponding ePub:type or DAISY accessibility standard, a few exceptions are listed below.[f] 

Class Names

ePub:Type

index-xxxxx

index:xxxx

name-index, subject-index

index:body

EDUPUB requires that both the class and ePub:type attributes are provided as defined in the Content Model.

Sample 1: differentiating sections

Markup

<section  class= "part" id="..."  epub:type= "part" id="..." >...

<section  class= "chapter" id="..."  epub:type= "chapter" id="..." >..

        <section id="..." >

        </section>

</section>

</section>

Explanation

Sample 2: differentiating asides

Markup

<aside  class= "sidebar" id="..."  epub:type= "sidebar" id="..." >...</aside>

<aside  class= "pullquote" id= "..." >...</aside>

<aside  class= "footnote"  epub:type= "footnote" id="..." id= "..." >...</aside>

Explanation

In addition to Primary Classes, there are three more class types: Design, Local_Semantic  and Literal_Style Classes which are discussed below.

3.3 Design Classes

Sometimes there may be multiple "types" of a given Primary Class (e.g., a single title might have 3 different "types" of sidebars that appear throughout the title: one referred to as "Case Study", another called "Concepts", another called "Active Research"). EDUPUB allows these distinctions to be made by tagging both the Content and  Design classes in the class attribute (class="sidebar sidebar_1"). If adding a  Design class to an element that does not have a Primary class, then use the element name, an underscore and a number (for example <span class="span_1">.)

Design class names follow the form xxxxxxxx_n, where xxxxxxx is the element or base class (figure, sidebar,...) and n is an integer value.

Design classes allow formatting distinctions to be made in a way that is often helpful to the reader, but is not essential to the understanding of the text. These distinctions will be lost when the content is aggregated with other content.

Design Classes  must be added to the css folder in design.css. Any file containing a use of that class should include a link  to design.css.

Sample 1: Design Class

<link  rel= "stylesheet"  type= "text/css"  href= "../css/design.css"  />

...

<aside  class= "sidebar sidebar_1" id="..."  epub:type="sidebar" >...</aside>

In design.css

.sidebar_1 header h1 {

/* Green headings for sidebars about the web  */

  color: #005a30;

}

3.4 Local_Semantic Class

Local_Semantic classes are used for adding semantics to content where no Primary class exists. For example a product discussing  grammar may need a semantic class for proper nouns or a product  on programming may need a semantic class for methods.

Local_Semantic class names follow the form xxxxxxxx_lc_n, where xxxxxxx is the element or base class (figure, sidebar,...) and n is an integer value.

Local_Semantic  classes allow formatting distinctions to be made where it is essential to the understanding of the text. The specific formatting used may be changed when this content is reused (colors might be used in one product, but some other treatment would be required in a black and white product), but the distinctions should be preserved.

If the product has text that describes a particular rendering, the paragraph(s) containing that rendering description are wrapped in <div class="rendering-notes">.

Local_Semantic Classes  must be added to the css folder in local.css. Any file containing a use of that class should include a link to local.css.

Sample 1: Local_Semantic Class

<link rel="stylesheet" type="text/css" title="local" [g]href="../css/local.css" />

...

<div class="rendering-notes"><p>In the following section common nouns are underlined...</p></div>

<p>give the balance sheet to  <span class="span_lc_1">Melissa</span><p>

In local.css

.span_lc_1 {

text-decoration:underline;

}

3.5 Literal_Style Class

In EDUPUB the markup indicates what objects are as opposed to how they look. However, in some cases the formatting of content is intrinsic to its meaning. For example, in content about how to write an effective letter - a sample letter may require a particular layout or style to convey the author's meaning. In these cases, classes will be created to describe the characteristics that are intrinsic to the author's intent. These classes will take precedence over any theme provided CSS.

Literal_Style class names follow the form xxxxxxxx_ls_n, where xxxxxxx is the element or base class (figure, sidebar,...) and n is an integer value.

Sample 1: Literal_Style

<link rel="stylesheet" type="text/css" title="literal" href="../css/literal.css" />

...

<div class="div_ls_1">

7 Fairlane Road

...

</div>         

In the Theme css

.div_ls_1 {

/* sample letter address appears flush right  */

text-align:right;

}

Explanation

4. Content Hierarchy

EDUPUB content is organized into 6 levels: product, volume, part, chapter, module and card. These levels are identified in the markup and will have consistent semantics across all products that are supported by the tools and documentation. Some of the levels are optional and some levels can be further subdivided as dictated by the content.

The levels and their definitions are:

  1. Product - The highest level is always required and is tagged in product.xhtml. The subclass is required and is used to distinguish a book: [h]
    <section class="product product_book">
    from a course:
    <section
     class="product product_course">[i]

  2. Volume[j] - An optional level used primarily for multi-volume books. Represents content that may be sold separately or as a complete work (i.e. Volume 1, Volume 2 and Combined.)

<section class="volume"  id="id">

  1. Part - An optional level for grouping Chapters. May have introductory and summary content, always has at least one Chapter.
    <section class="part"><header>...</header>
  2. Chapter - A required level. Chapters contains a collection of Modules. May have introductory and summary content, always has at least one Module.
    <section class="chapter" id="id"><header><h1>...</h1></header>
  3. Module - An optional level. A Module has one or more topics, assignments and assessment items.
    <section class="module" id="id"><header><h1>...</h1></header>
  4. Card - An optional level used in digital themes and print products that identifies content visible to the student on a screen/page at some point in time during a lesson. This level may also carry layout specific classes for presentation.
    <section class="card" id="id"><header><h1>...</h1></header>

5. Numbering

Numbering in EDUPUB is handled using three methods depending on the object and numbering needs:

  1. Simple non-hierarchical lists (e.g., 1,2,3, a,b,c, i,ii, iii) that can be easily autonumbered.
  1. Complex lists (e.g., multi-level lists such as 2.1, 2.2, or lists that jump sequences, such as 1, 3, 4, 7) that cannot be auto-numbered without significant effort, and lists with markers that are provided as text.

The lists are staticlists because the numbers are already statically part of the content. The staticlist class tells the CSS not to autonumber ol's with this class value.

  1. Identifiers for "numbered" content (Chapter, Table, Figure, Steps) and references to those identifiers (See Chapter 4.5) are literal text within <span class="number">

Sample 1: Simple non-hierarchical auto-numbered lists

Markup

<ol>

             <li><p>lakes and rivers</p></li>

             <li><p>roads and bridges</p></li>

</ol>

<ol start="3">

             <li><p>buildings and structures</p></li>

</ol>

Explanation

Sample 2: Complex static lists that contain the number as part of the content

Markup

<ol  class= "staticlist" >

              <li><p> <span  class= "number" > 1.1 </span> lakes and rivers</p> </li>

              <li><p> <span  class= "number" > 1.2 </span> roads and bridges</p> </li>

</ol>

Sample 3: Numbered figure

Markup

<figure><figcaption><header>

  <h1><span class="label">Figure</span> <span class="number">9.2</span>

        House Bill on Ethics Reform

  </h1>

</header></figcaption>

</figure>

6. Linking and ID's

This section describes references to assets, elements within the current file and to elements within the current ePub but external to the current file.

EDUPUB references to content in the current ePub have three forms:

  1. References to asset files (src="localfilename.ext")
  1. The filename is a relative reference
  1. References to elements in the current file (href="#id")
  1. The id is unique within a the ePub.
  1. References to elements in a different file (href="localfilename.xhtml#id")
  1. The filename is a relative reference

Local IDs

The id attribute within EDUPUB must be a valid XML ID (one or more characters followed by characters and digits and having no spaces), and must be unique within the ePub.

Sample 1: reference to asset

Markup

<img src="../images/M03_SULL4546_i123.jpg" alt="..." />

Explanation

7. Page Markers

For a digital version of a print product it is helpful but not required to provide markers for the start of each page. This allows navigation into the digital content using page numbers and support for classrooms where students and the professor could have either the print or the digital version.

When including print pagination references, the package document metadata must also include a dc:source element identifying the print source.

If an EDUPUB document contains page markers:

  1. All pages in the print version of the content must have exactly one[k] corresponding page marker in EDUPUB
  2. All markers in EDUPUB must be in the same order as the print pages
  3. All floating elements and sections must be placed on the page that they start.  (This does not imply that all text runs can be placed correctly on pages.)

Page Markers are tagged as empty span elements and allowed only where PCDATA is supported (i.e. not between two list items or two chapters). They should be placed before the first visible content of the page they are defining.

8. Embedded Content

Elements that are from namespaces other than the HTML namespace and that convey content but not metadata, are embedded content for the purposes of the content models defined in this specification. (For example, MathML, or SVG.)

Some embedded content elements can have fallback content: content that is to be used when the external resource cannot be used (e.g. because it is of an unsupported format). The element definitions state what the fallback is, if any.

Embedded content is content that imports another resource (e.g., xml, swf, ...) into the document, or content from another vocabulary (e.g., CML) that is inserted into the document. The embed element is is used for this purpose.

8.1 Video Recommendations

 Video  content is specified using the <video> element.

Best practice is to provide two formats of audio: mp4 and webm.

<div class="fallback"> is required and should include an error message for platforms that are unable to play video.

Best practice is to provide the track element for captions.

<video controls="controls" poster="../images/fraser.jpg"> <source src="../video/fraser_amrev_720480.mp4" type="video/mp4"/> <source src="../video/fraser_amrev_720480.webm" type="video/webm"/> <track src="../video/fraser_amrev_720480.vtt" kind="captions" srclang="en" label="English"/> <div class="fallback"> <p> Sorry, it appears your system  either does not support video  playback or cannot play the  MP4 format or WebM format provided. </p> </div> </video>

8.2 Audio Content

Audio content is specified using the <audio> element.

Best practice is to provide two formats of audio: ogg and mp3.

<div class="fallback"> is required and should include an error message for platforms that are unable to play audio.

<audio controls="controls">
 
<source src="audio/04_01.ogg" type="audio/ogg" /> 
 
<source src="audio/04_01.mp3" type="audio/mpeg" /> 
<div class="fallback">
 
<p>
   Sorry, it appears your system either does not support audio playback or
   cannot play the MP3 format or OGG format provided.
 
</p>
</div>
</audio>

8.3 Widgets & Gadgets

The object element is used to reference external resources in EDUPUB such as “widgets/gadgets". This enables authoring tools and browsers to natively display the content if needed (vs. using a div which would require special processing to display the gadget). On output the object can be transformed (if needed) to other elements such as a div for embedded display or a hyperlink to  launch in a new window.

A fallback can be specified by using flow content. Fallbacks are commonly an image with some text.

In order to capture all the necessary information about a gadget, a mix of standard object attributes, custom “data" attributes, and parms as we will be used. If parameters are needed to initialize a gadget, the object can contain one or more param elements.

The object can contain fallback content

Gadget Content Model

Gadget Template

<object class="gadget gadget_dcat" data="#URI#" type="#Text#" height=""
width=
"#Text#" lang="#Text#"  title="#Text#"
data-responsivedesigned=
"#yes/no#"  data-minwidth="#Text#"
data-minheight=
"#Text#" data-lmsrequired="#yes/no#"
data-offlinesupport=
"#yes/no#"
data-displaytarget=
"#embed/new_window#" data-icon="#URI#"
data-iconwidth=
"#Text#" data-iconheight="#Text#"> <!-- gadget params
required to initialize the gadget -->
 <param name="#CDATA#"
value=
"#CDATA#"/>

 <!-- fallback could be an image and/or flow content --> 

<span class="fallback"><img src="#URI#" alt="#Text#" /></span> </object>

Standard Object Attributes (http://www.w3.org/TR/html-markup/object.html#object)

Custom Attributes

Fallback Mechanism

The object can contain fallback content. It  should follow any param elements (if they exist). Fallback content is typically an image and/or some text, but can be any elements classified as “flow content" by the HTML5 spec.

9. File Naming and Folder Layout

9.1 File Naming Conventions

9.2 package.opf File Structure

The EDUPUB specification does not add any new requirements beyond what is documented in the ePub3 specification.

9.3 Chunking Level

The EDUPUB specification does not have requirements on how parts/chapters/sections should be "chunked" into files within the ePub, but chunking at the first (A-Head) section within a chapter is considered a Best Practice.

10. Image Conversion Specifications

The EDUPUB specification does not add any new requirements beyond what is documented in the HTML5 specification. The following guidelines are recommended as best-practices.

For digital media the colorspace is sRGB.

Appendix A: Concepts and Definitions

Concepts and definitions were developed during the creation of this document. Not all concepts and definitions are discussed in this document and are defined for future reference in related documents.

CML

Chemical Markup Language (CML) is still under evaluation for inclusion in the EDUPUB Spec.

CML has been developed by Peter Murray-Rust and Henry Rzepa since 1995. It is the de facto XML for chemistry, accepted by publishers and with more than 1 million lines of Open Source code supporting it. CML can be validated and built into authoring tools (for example the Chemistry Add-in for Microsoft Word).[2]

Learning Objects

A learning object is "a collection of content items, practice items, and assessment items that are combined based on a single learning objective" [Cisco Systems, Reusable Information Object Strategy].

EDUPUB embodies the notion of Learning Objects (LO). Content is created and presented as self-contained textual, audio-visual, interactive and assessment components that combine to satisfy learning objectives.

MathML

Math equations will be authored and stored in Presentation MathML or Content MathML.

MathML is the industry standard XML markup for displaying and processing mathematical equations and notations. One of the benefits of MathML is its ability to facilitate accessibility options. Tools will be provided to facilitate the authoring and transformation of content to MathML.

QTI (IMS Question & Test Interoperability)

Assessment content will use the IMS Global Learning Consortium's Question and Test Interoperability version 2.1 (QTI v2.1) Specification as an exchange format. EDUPUB can reference assessment content using the embed element.

In Q12013 we will add support for authoring "Low stakes" assessment  in XHTML5 markup and included in the EDUPUB source.

SVG

Scalable Vector Graphics (SVG) is a family of specifications of an XML-based file format for two-dimensional vector graphics, both static and dynamic (i.e.,interactive or animated). The SVG specification is an open standard that has been under development by the World Wide Web Consortium (W3C) since 1999.

SVG images and their behaviors are defined in XML text files. This means that they can be searched, indexed, scripted, and, if need be, compressed.[3] 

Appendix B: Accessibility Features

Separation of style with ability to adjust size/color

Description: Visual appearance of content is not the only way to convey meaning to readers.

Benefit: The meaning behind the text formatting won’t be lost when displayed or transformed. Also allows for adjustable font face, font size, and background/foreground color without loss of meaning or functionality.

Requirement: Avoid text formatting (e.g. italic, underline, bold, font size) as the only way to provide information that goes beyond emphasis. An example of this issue would be if a quiz question asked: “What is the significance of the italic text in the following paragraph?" There must be a second way to locate the text in this type of situation.

 content will be marked-up semantically using XHTML5 element and class names as well as leveraging EPUB3 conventions (e.g., epub:type) and QTI APIP for assessment. Authoring guidelines & validation will be used to avoid specific references to formatting.

Semantic inflection

Description: Roles will be identified (e.g. heading, numbered list, bulleted list, data table, paragraph, emphasized text)

Benefit: When spoken text features are used, it is possible to skip entire lists and continue reading the main text. Also provides the ability to meaningfully skim the content and return to sections of the page without sight.

Requirement: Ensure that semantic tagging practices identify chunks of textual content with the appropriate role. Content will be marked-up semantically using element and class names as well as leveraging EPUB3 conventions (e.g., epub:type) and QTI/APIP for assessment.

Logical reading order

Description/Benefit: Content follows a logical and sequential order. Allows the content to be automatically transformed to other visual formats and to audio formats without loss of meaning.

Requirement: Establish standards and best practices to define proper reading order.

 Content will be authored in the logical reading order and preserved throughout. The aside element introduces content that can be read out of order (for example a sidebar) and is authored where the reader might consider reading it.

Images with textual descriptions

Description/Benefit: Accessible descriptions will be available for images (photographs and rendered art) except where the image is purely decorative.

Requirement: Content expert will create a text alternative that effectively conveys the instructional intent of the image and provides the same information that the image provides.

The alt text attribute will be empty when the image is purely decorative, or will contain the text alternative. This will be enforced via validation rules.

Scalable Vector Graphics (SVG)

Description: SVG assets look great at any size. Scaled object will not appear pixelated. These assets can also have titles and descriptions.

Benefit: Ability to scale images without the need for specialized zoom software. Textual data (title & description) can be accessed via accessible technology.

Requirement: Production to create and/or convert line art images to SVG. Content expert will create a text alternative for the SVG image.

 Referencing of SVG images is supported and will always appear in a switch/case architecture allowing for HTML, alternative text or image alternative if SVG can not be natively supported.

Easy to navigate tables

Description/Benefit: Allows reader to quickly determine what they are reading at any given point in the table.

Requirement: Apply industry standard markup to indicate proper order and identify column/row headers.

 This markup is supported but can be very labor intensive based on the complexity and size of the table.

MathML

Description/Benefit: MathML eliminates the need for content experts to create alternative descriptive text of math structures because MathML is machine readable and can be understood by screen readers and other assistive technologies. Also eliminates the production and quality assurance cost of creating images of math structures.

Requirement: Production to ensure the composition of the book supports MathML.

 MathML is supported and will always appear with an alternative text or image alternative if MathML cannot be natively supported

Resizable text without loss of functionally

Description: Text will be zoomable up to at least 200%.

Benefit: No left or right swiping needed for basic text reading. Minimal up/down swiping needed when zoomed at or above 200%. This capability is part of the spec for the NexText platform.

 All textual content is in XHTML5. Zooming is a feature of the device/platform.

Escapable and skippable elements

Description/Benefit: Allows user to jump to the end of chunk of content in a list or table.

Requirement: Production will use the “seq" element in lists and tables to define smaller chunks of related data.

 

Rich video

Description: Video with: captions, transcripts, audio descriptions, and graceful fallbacks.

Benefit: Using text to describe a video, or communicate the audio portion of a video, allows people with hearing disabilities to have a similar experience compared to people who can hear. This also helps in noisy environments.

Requirement: Create captions, transcript, audio description for the video. Include a “poster image" of the video as well as several renditions at different resolutions and/or codecs.

 All images and media are required to have alt text unless purely decorative. This will be enforced via validation rules. Native XHTML5 track element will be used for subtitle and caption information within the video element.

Synchronized audio

Description: Timed Tracks allow for properly synchronized text and audio.

Benefit: Captions and subtitles will be in sync with video.

 All images and media are required to have alt text unless purely decorative. This will be enforced via validation rules. Native XHTML5 track element will be used for subtitle and caption information within the audio element.

Text-To-Speech (TTS)

Description: Speech synthesized with proper pronunciation for the chosen language.

Benefit: An alternative to human narration.

Requirement: Production to provide code to explicitly declare the language. Spoken text is part of the spec for the NexText platform.

 SMIL and Media Overlay capabilities of EPUB3 will be supported.

Highlighted words during narration

Description: Media Overlays provide reading/listening options and the ability to easily switch between them.

Benefits: Simultaneous audio-only, text-only, and eBook production. Compliance with the SMIL standard.

Requirement: Production to link structured audio narration to its corresponding text or timestamp within a video.

 SMIL and media overlay capabilities of EPUB3 will be supported.

Footnotes outside of text stream

Description: Properly tagged footnotes do not comingle with text stream.

Requirement: Production to compose the book with properly tagged footnotes.

 Use of aside element with epub:type="footnote" will uniquely identify footnotes.

Draft                                Page                                 11/15/2013


[1] Additional classing/subclassing of gadgets will occur once a complete matrix of widgets and gadgets is assembled and ready analysis

[2] "Chemical Markup Language | CML." 12 Jul. 2012 <http://www.xml-cml.org/>

[3] "Scalable Vector Graphics - Wikipedia, the free encyclopedia." 2003. 18 Jul. 2012 <http://en.wikipedia.org/wiki/Scalable_Vector_Graphics>