EPUB 3 Indexes Charter

Table of Contents

Status of this proposal

This charter proposal was approved by the IDPF membership in January 2012.

Note that the Definitions section contains definitions for terms as used in this document. The definitions are intended to apply narrowly, within the scope of this document, and should not be construed as applying to the field of indexing in general or to EPUBs as a whole.

Need for this proposal

Indexes are specialized navigational and supplemental information tools that offer readers an interaction with content that is enhanced, more powerful, and more specific than simple search. Users will expect to have indexes available in the EPUB3 ecosystem and accessible as easily as search. Publishers of EPUB3 content wish to make this data available to users, to allow them to explore book contents beyond what search results reveal.

Readers use indexes in a variety of ways: to quickly locate discussions in content, to discover relevant content that is discussed with differing synonyms, to discover new terminology for concepts, and to see details of topics covered in an eBook. Indexes convey a sense of the depth of topic coverage in an eBook, break down large concepts into important subcategories, and allow exploration of content through granular and user-friendly access points. Indexes provide the added value of human analysis, enabling an interactive conversation between the reader and the book. Indexers are not constrained to use as entries the terms used by the author, or even in some cases only the terms that appear in the entire document: indexers are focused on meanings, not just words. Indexes are also a pre-coordinate search system, as opposed to search's propensity to being post-coordinate.

Index information and metadata can be used by devices to provide navigation and supplemental search details to the reader. Search can be supplemented and fine-tuned by reading index metadata to provide better results. Index metadata can provide new views into the semantic underpinnings of an eBook.

This proposal describes the scope, required functionality, and timeline to deliver a standard for producing ePUB3 publications that meet the use cases included in this proposal

As a navigation feature, support for indexes relates directly to Item 6 in the EPUB Revision Working Group Charter, regarding enhanced navigation support (see here).

Main wiki page for the Working Group is here.

Scope

In-scope (Deliverables)

The scope of this project is to define a declarative mechanism for the representation of indexes in EPUB Publications. As further detailed in Use Cases, Needed Publication Properties, and Reading System Behaviors below, the delivered mechanism shall have the following top-level functional properties:

Out of Scope

Indexers write indexes using a variety of tools, ranging from built-in modules in page layout and XML content management software to dedicated index preparation software. Details of how to implement indexing in those tools are out of scope.

Ordering of main headings and subheadings in the index are part of the creation process and thus out of scope.

Index display format in chapter form (e.g., indenting, spacing, etc.) can vary greatly, depending on the writer and publisher. Suggested presentation formats are out of scope.

Low-level, system-oriented functionality for fast lookup, reverse lookup, and retrieval, typically described in terms of a database-like file, are out of scope.

Integration Constraints

The defined mechanism shall integrate with EPUB 3 as follows:

Timeline and Participation

Project participation is open to IDPF members and invited experts. (Note that invited expert status needs to be renewed for each IDPF project.)

The project charter spans one year in total. Once formed, the working group will decide on feature prioritization and possibly also versioning strategies, after which the milestones below can be dated.

Draft Charter Proposal to WG for review December 2, 2011
Submission to Membership for Approval January 6, 2012
WG creation, formal project start January 23, 2012
WG Face-to-face Feb timeframe TBD
First WG Draft TBD
Second WG Draft TBD
Proposed Specification TBD
Recommended Specification TBD
Maintenance/Tutorials Through Jan 2013

This project is intended to be run concurrently with the project on dictionaries and glossaries, and so shares the charter span with that project.

Working Group Leads

Suggested Leads of this working group are:

Use Cases

  1. Chapter-like index:

Needed Publication Properties

Package metadata

Index links

Index presentation

Reading System Behaviors

Note: the intent of this project is not to mandate reading system behaviors. The list below only serves the purpose of illustrating Reading System/Index interactions.

Implied/assumed (existing functionality in EPUB readers that indexes will use)

Chapter-like index

Pop-up index

Reverse index

Standalone index

References

Definitions

This section contains definitions for terms as used in this document. The definitions are intended to apply narrowly, within the scope of this document, and should not be construed as applying to the field of indexing in general or to EPUBs as a whole.

Auto-fill
Auto-fill functionality pre-scrolls a pop-up index to main headings in the index matching the letters as they are typed in by the user.
Browsing
Reading/skimming index content.
Chapter-like index
An index presented in a book's content as a chapter, accessed from the table of contents and from special menus or icons. It can be paged through and browsed as normal content, with hyperlinks back into the book's content, and cross-reference hyperlinks to other areas of the index.
Cross reference
Entry in an index that directs the reader from one term to another term. An entry should be hyperlinked to the targeted term. There are three types: See references, See also references, and Generic cross references (defined below).
Decoration
A prefix, suffix, symbol or special formatting added to locators to indicate special content, such as tables, figures, or primary discussions.
Editor's note, inline
Editorial note that is part of an index entry, found inline after the main heading or subheadings.
Entry
A unit of an index, consisting of a main heading, zero or more subheadings, and at least one locator or cross reference.
Generic cross reference
Cross reference to a category of entries rather than a specific entry. For example, in a software manual: "Commands. See names of specific commands", or in a book on pets: "Dogs. See names of specific breeds, e.g. golden retriever".
Group break navigation data
A string of hyperlinked letters and/or digits (e.g., A-Z, 0-9) used to easily navigate to another section of the index: for example, clicking P would take the user to the section of the index beginning with P. Other alphabets and character systems would display the appropriate glyphs for any navigation data.
Headnote
Explanatory paragraph(s) at the head of the index that describe unique features of the index (e.g., special typography, scope of the index, omitted items, etc.) that the reader needs to know in order to effectively use the index.
Index
An intuitively sorted (usually alphabetical) list of entry terms providing a variety of different access points to all significant discussions of subjects, which might be concepts, entities, processes, individuals and organizations within a document, with associated locators indicating where these discussions are to be found.
Legend
A section of content that explains locator decorations, special symbols, or other typography for the user.
Level
Nested depth of subheadings beneath each main heading. A main heading is level 1; a subheading is level 2; a sub-subheading is level 3; and so on. There can be as many levels as the indexer and publisher feel necessary.
Locator
Pointer from an entry in the index to a significant treatment of the topic in the text, which may be a page number, section number, etc. In an EPUB the locator should appear as a hyperlink.
Main heading
Words, symbols, or phrases based on or selected from the book's content, expressing a concept, idea, or proper name. A main heading may or may not have subheadings, but must have one or more locators or a cross reference.
Master index
An index that covers more than one publication. A master index can be part of an EPUB with other content or part of a standalone index.
Package metadata
Data about the EPUB as a whole. Please see descriptions at package document and package metadata.
Pop-up index
Index view activated by user while in the text and displayed in a separate window.
Post-coordinate system
System in which the user enters one or more terms which are matched character-by-character in the target text. Search engines are an example of post-coordinate systems.
Pre-coordinate system
System in which co-relations (e.g., broader/narrower relations, semantic connections) between topics have been determined by human analysis, adding an enhanced level of sophistication and specificity. An index is an example of a pre-coordinate system.
Range
A locator that indicates a span of text, i.e., where coverage of a subject begins and ends.
Reverse index
Index view activated when the reader highlights a range of text, which displays in a separate window the index entries associated with the range.
See also reference
Cross reference that directs the reader to related, broader, or narrower subjects covered at other main headings.
See reference
Cross reference that directs the user from an term not used in the index to the preferred term in the index.
Standalone index
A publication that consists only of one or more indexes to other EPUBs or external targets.
Stemming
Stemming engines supply root forms of words and incorporate multiple versions (grow, growing, grows, growth) into search, extending the search's results.
Subheading
Second-level, third-level, fourth-level, etc. headings subordinate to a main heading.
Target
Unique id code located in book's content, available for links to use in navigation.