Table of Contents
This charter proposal was approved by the IDPF membership in January 2012.
Note that the Definitions section contains definitions for terms as used in this document. The definitions are intended to apply narrowly, within the scope of this document, and should not be construed as applying to the field of indexing in general or to EPUBs as a whole.
Indexes are specialized navigational and supplemental information tools that offer readers an interaction with content that is enhanced, more powerful, and more specific than simple search. Users will expect to have indexes available in the EPUB3 ecosystem and accessible as easily as search. Publishers of EPUB3 content wish to make this data available to users, to allow them to explore book contents beyond what search results reveal.
Readers use indexes in a variety of ways: to quickly locate discussions in content, to discover relevant content that is discussed with differing synonyms, to discover new terminology for concepts, and to see details of topics covered in an eBook. Indexes convey a sense of the depth of topic coverage in an eBook, break down large concepts into important subcategories, and allow exploration of content through granular and user-friendly access points. Indexes provide the added value of human analysis, enabling an interactive conversation between the reader and the book. Indexers are not constrained to use as entries the terms used by the author, or even in some cases only the terms that appear in the entire document: indexers are focused on meanings, not just words. Indexes are also a pre-coordinate search system, as opposed to search's propensity to being post-coordinate.
Index information and metadata can be used by devices to provide navigation and supplemental search details to the reader. Search can be supplemented and fine-tuned by reading index metadata to provide better results. Index metadata can provide new views into the semantic underpinnings of an eBook.
This proposal describes the scope, required functionality, and timeline to deliver a standard for producing ePUB3 publications that meet the use cases included in this proposal
As a navigation feature, support for indexes relates directly to Item 6 in the EPUB Revision Working Group Charter, regarding enhanced navigation support (see here).
Main wiki page for the Working Group is here.
The scope of this project is to define a declarative mechanism for the representation of indexes in EPUB Publications. As further detailed in Use Cases, Needed Publication Properties, and Reading System Behaviors below, the delivered mechanism shall have the following top-level functional properties:
Indexers write indexes using a variety of tools, ranging from built-in modules in page layout and XML content management software to dedicated index preparation software. Details of how to implement indexing in those tools are out of scope.
Ordering of main headings and subheadings in the index are part of the creation process and thus out of scope.
Index display format in chapter form (e.g., indenting, spacing, etc.) can vary greatly, depending on the writer and publisher. Suggested presentation formats are out of scope.
Low-level, system-oriented functionality for fast lookup, reverse lookup, and retrieval, typically described in terms of a database-like file, are out of scope.
The defined mechanism shall integrate with EPUB 3 as follows:
Project participation is open to IDPF members and invited experts. (Note that invited expert status needs to be renewed for each IDPF project.)
The project charter spans one year in total. Once formed, the working group will decide on feature prioritization and possibly also versioning strategies, after which the milestones below can be dated.
Draft Charter Proposal to WG for review | December 2, 2011 |
Submission to Membership for Approval | January 6, 2012 |
WG creation, formal project start | January 23, 2012 |
WG Face-to-face | Feb timeframe TBD |
First WG Draft | TBD |
Second WG Draft | TBD |
Proposed Specification | TBD |
Recommended Specification | TBD |
Maintenance/Tutorials | Through Jan 2013 |
This project is intended to be run concurrently with the project on dictionaries and glossaries, and so shares the charter span with that project.
Suggested Leads of this working group are:
Package metadata
Index links
Index presentation
Note: the intent of this project is not to mandate reading system behaviors. The list below only serves the purpose of illustrating Reading System/Index interactions.
Implied/assumed (existing functionality in EPUB readers that indexes will use)
Standalone index
This section contains definitions for terms as used in this document. The definitions are intended to apply narrowly, within the scope of this document, and should not be construed as applying to the field of indexing in general or to EPUBs as a whole.
Auto-fill functionality pre-scrolls a pop-up index to main headings in the index matching the letters as they are typed in by the user.
Reading/skimming index content.
An index presented in a book's content as a chapter, accessed from the table of contents and from special menus or icons. It can be paged through and browsed as normal content, with hyperlinks back into the book's content, and cross-reference hyperlinks to other areas of the index.
Entry in an index that directs the reader from one term to another term. An entry should be hyperlinked to the targeted term. There are three types: See references, See also references, and Generic cross references (defined below).
A prefix, suffix, symbol or special formatting added to locators to indicate special content, such as tables, figures, or primary discussions.
Editorial note that is part of an index entry, found inline after the main heading or subheadings.
A unit of an index, consisting of a main heading, zero or more subheadings, and at least one locator or cross reference.
Cross reference to a category of entries rather than a specific entry. For example, in a software manual: "Commands. See names of specific commands", or in a book on pets: "Dogs. See names of specific breeds, e.g. golden retriever".
A string of hyperlinked letters and/or digits (e.g., A-Z, 0-9) used to easily navigate to another section of the index: for example, clicking P would take the user to the section of the index beginning with P. Other alphabets and character systems would display the appropriate glyphs for any navigation data.
Explanatory paragraph(s) at the head of the index that describe unique features of the index (e.g., special typography, scope of the index, omitted items, etc.) that the reader needs to know in order to effectively use the index.
An intuitively sorted (usually alphabetical) list of entry terms providing a variety of different access points to all significant discussions of subjects, which might be concepts, entities, processes, individuals and organizations within a document, with associated locators indicating where these discussions are to be found.
A section of content that explains locator decorations, special symbols, or other typography for the user.
Nested depth of subheadings beneath each main heading. A main heading is level 1; a subheading is level 2; a sub-subheading is level 3; and so on. There can be as many levels as the indexer and publisher feel necessary.
Pointer from an entry in the index to a significant treatment of the topic in the text, which may be a page number, section number, etc. In an EPUB the locator should appear as a hyperlink.
Words, symbols, or phrases based on or selected from the book's content, expressing a concept, idea, or proper name. A main heading may or may not have subheadings, but must have one or more locators or a cross reference.
An index that covers more than one publication. A master index can be part of an EPUB with other content or part of a standalone index.
Data about the EPUB as a whole. Please see descriptions at package document and package metadata.
Index view activated by user while in the text and displayed in a separate window.
System in which the user enters one or more terms which are matched character-by-character in the target text. Search engines are an example of post-coordinate systems.
System in which co-relations (e.g., broader/narrower relations, semantic connections) between topics have been determined by human analysis, adding an enhanced level of sophistication and specificity. An index is an example of a pre-coordinate system.
A locator that indicates a span of text, i.e., where coverage of a subject begins and ends.
Index view activated when the reader highlights a range of text, which displays in a separate window the index entries associated with the range.
Cross reference that directs the reader to related, broader, or narrower subjects covered at other main headings.
Cross reference that directs the user from an term not used in the index to the preferred term in the index.
A publication that consists only of one or more indexes to other EPUBs or external targets.
Stemming engines supply root forms of words and incorporate multiple versions (grow, growing, grows, growth) into search, extending the search's results.
Second-level, third-level, fourth-level, etc. headings subordinate to a main heading.
Unique id code located in book's content, available for links to use in navigation.