EPUB 3 Dictionaries and Glossaries Charter

Table of Contents

Status of this proposal

This charter proposal was approved by the IDPF membership in January 2012.

Need for this proposal

Dictionaries, glossaries, thesauri, and similar works are ubiquitous published resources that users expect to have available in the EPUB3 ecosystem. The primary use of a dictionary or glossary from a user point of view is the ability to search for a term and quickly retrieve its definition or translation. Currently, EPUB has no mechanism for an author to mark up the needed semantic information to enable such reading system search features, making it impossible to publish a dictionary in EPUB that serves its primary purpose. While EPUB-based reading systems often bundle dictionaries with devices and offer a word lookup feature, this is achieved by storing the dictionary in a proprietary format and essentially treating it as part of the reading system software, rather than an independent publication.

The current situation does not allow users to choose the dictionary content that best suits their needs, and instead limits them to using a single bundled dictionary. Publishers of EPUB3 content wish to make a broad range of reference resources available to users and to serve needs that cannot be met by a general monolingual dictionary typically bundled with a reading system: children need dictionaries designed for their reading level, language learners need dictionaries that translate from a foreign language to their native tongue, and users reading material in fields such as medicine and law need dictionaries covering a broad specialized vocabulary. Publishers also wish to offer users the ability to look up words in a publication's glossary while reading, thereby enhancing the user's experience of educational and other types of content.

Reading system developers wish to utilize and innovate around these types of publications.

This proposal describes the scope, required functionality, and timeline to deliver a standard for producing EPUB3 Publications that meet the use cases that are also included in this proposal.

Scope

In-scope (Deliverables)

The scope of this project is to define a declarative mechanism for the representation of dictionaries and glossaries in EPUB Publications sufficient to enable development of reading system features specific to these publication types. As further detailed in Use Cases and Needed Publication Properties below, the delivered mechanism shall have the following top-level functional properties:

Out of Scope

Integration Constraints

The defined mechanism shall integrate with EPUB 3 as follows:

Timeline and Participation

Project participation is open to IDPF members and invited experts. (Note that invited expert status needs to be renewed for each IDPF project.)

The project charter spans one year in total. Once formed, the working group will decide on feature prioritization and possibly also versioning strategies, after which the milestones below can be dated.

Draft Charter Proposal to WG for review December 2, 2011
Submission to Membership for Approval January 6, 2012
WG creation, formal project start January 23, 2012
WG Face-to-face Feb timeframe TBD
First WG Draft TBD
Second WG Draft TBD
Proposed Specification TBD
Recommended Specification TBD
Maintenance/Tutorials Through Jan 2013

This project is intended to be run concurrently with the project on indexes, and so shares the the charter span with that project.

Working Group Leads

Suggested Leads of this working group are:

Use Cases

Actors: publishers, users

System: reading system, content

Needed Publication Properties

Package Metadata

Entry Structure

Headwords and Inflections

Other Semantic Markup

Structure and Semantics

N.B.: The following terms are representative of the range of lexical and semantic qualities that will be needed to support stated use cases and also allow for innovation. For the purposes of this charter proposal to initiate a working group, these terms are not intended to be interpreted as a strict requirement for inclusion into a specification.

Glossaries

Lexicographical

Morphological Semantic

Dictionaries

Lexicographical

Phonetic Morphological Historical Syntax/Grammar Semantic

Bilingual / Multilingual Dictionaries

Lexicographical

Phonetic Morphological Syntax/Grammar Semantic

Thesauri

Lexicographical

Morphological Phonetic Syntax/Grammar Semantic

Definitions

affix
A prefix, infix, or suffix that is attached to another form to make a word with a distinct meaning, eg, laugh + ed. (1)
alternate headword
A form related to a primary headword but generally carrying a somewhat different meaning. For example, an entry with the primary headword aestivate might have aestivation as an alternate headword. An alternate headword should be indexed for search purposes along with the primary headword.
antonym
Terms with opposite sense or meaning.
audio pronunciation
An audio file containing a recording of the pronunciation of a particular headword. This feature of many electronic dictionaries can be offered in addition to or in place of the traditional written pronunciation.
case
An inflection of a noun, adjective, or pronoun according to its function in a sentence. German, Russian, and Latin are examples of languages in which words have many different written forms according to case.
cultural note
A note providing detailed cultural context on a headword.
date
The date of the first recorded use in a language of a particular headword.
definition
An explanation of the meaning of a particular sense of a headword.
dictionary resource
A collection of entries that have headwords in a particular source language and that a reading system can access to look up terms a user selects while reading a publication.
displayed inflection
An inflection of a headword that is part of the viewable content of an entry. Irregular inflections are often explicitly printed in entries to provide guidance to the user, eg, the displayed inflection "mice" in " mouse noun, plural mice"
entry
The fundamental organizational unit of a glossary or dictionary, consisting of at least one headword and a definition, translation, or equivalence cross-reference.
equivalence
A statement that a headword or particular sense of a headword is equivalent in meaning to another dictionary headword, typically supplied in lieu of a definition and acting as a cross-reference to the equivalent entry cited. An example would be a short entry for color in a British English dictionary that informs the user this is a US equivalent of colour: ' color noun (US) = colour'.
etymology
An explanation of the historical origin of a headword, eg, a statement that it is derived from a particular Latin word.
example
A sentence or phrase illustrating the usage of a headword in a particular sense.
gender
A label indicating the gender of a noun, generally subsumed in part-of-speech at the beginning of an entry; in bilingual dictionaries, often a stand-alone label associated with a particular translation.
glossary
A glossary section of a publication that a reading system can access to look up a term a user selects while reading that particular publication.
headword
The word occurring at the start of an entry whose meanings the entry covers; in a broader sense, a word whose meanings are discussed at any point in the entry (see alternate headword, variant headword, run-in headword, and run-on headword). In a monolingual dictionary or glossary, the headword is defined, while in a bilingual dictionary the headword is translated, and in a thesaurus synonyms are provided. In most languages, entries are arranged alphabetically according to the spelling of the headword.
holonym
A relation between a whole and a part, eg, a wiki is a holonym of constituent wiki pages; 'has-parts'.
hypernym
A relation between a class and sub-class; 'has-types'.
hyponym
A relation between a sub-class and a class; 'is-type-of'.
idiom
An idiomatic expression that is defined or translated in an entry. For example, an entry for cold might contain the idiom ' to get cold feet'.
inflection
An affixed form of a headword that conveys a specific grammatical meaning; for example, the past tense of a verb (eg, 'ran' is an inflection of 'run') or plural form of a noun (eg, 'mice' is an inflection of 'mouse'). Related to the concept of stemming in indexes.
lookup
A search for a user-selected term in dictionary or glossary headwords (including alternate, variant, run-on, and run-in headwords) and inflections. When a user initiates a glossary lookup, the reading system should search the local publication's embedded glossary, while when a user initiates a dictionary lookup, the reading system should search the user's preferred resources. Matching glossary or dictionary entries are then displayed to the user, typically in a pop-up window.
meronym
A relation between a part and a whole, eg, a wiki page is a meronym of a wiki; 'is-a-part-of'.
quotation
A quotation from a cited source illustrating the usage of a headword in a particular sense.
part-of-speech
A label indicating the grammatical function of the headword ( noun, verb, adjective, interjection, transitive verb, reflexive verb, etc.)
phrasal headword
A headword of two or more words typically formed from another headword and listed within that headword's entry. For example, the items ' get out' and ' get up' listed in the entry for ' get' would be phrasal headwords.
preferred resource
An available dictionary resource which a reading system uses during lookup based on a user's indicated preferences.
pronunciation
One or more written phonetic pronunciations given for a headword.
register label
A label indicating usage register of a headword or sense, eg, formal, slang, offensive.
regional label
A label indicating geographic range of a headword or sense, eg, Latin America, Western US, Australia.
run-in headword
A headword occurring in the middle of an entry, generally associated with a particular sense.
run-on headword
A headword occurring at the end of an entry and that is derived from that entry's headword. For example, the adverb softly at the end of the entry for the adjective soft would be a run-on headword.
sense
A particular meaning of a headword, and a unit for organizing information pertaining to this meaning. Sense units are typically distinguished from one another by numeric and/or alphabetic labels.
sense label
A short phrase that restricts and clarifies the meaning of a particular sense.
source language
The language of the term(s) which a user wishes to looks up; in bilingual dictionaries, the language of the headwords in a section of the publication.
stylistic label
A label identifying stylistic usage of a headword or sense, eg, literary.
subject label
A label indicating subject area of a headword or sense, eg, biology, architecture.
synonym
Terms with identical or similar meanings. Groups of synonyms are often tied to a particular sense of a headword in a thesaurus or dictionary.
temporal label
A label indicating current usage status of a headword or sense, eg, archaic.
tense
An inflected form of a verb that indicates when the action is taking place.

text entry search
A feature by which a user can directly input text into a search field and select entries with matching headwords from a list. Reading system developers could implement such a feature in a variety of ways, depending on their preference: by displaying matching results only after the user has input a full string and launched the search, or displaying partial matches as the user types, or positioning a highlight in a scrollable, complete list of dictionary headwords (to cite just a few possibilities).
translation
In a bilingual dictionary, the translation of a particular sense of a source language headword into the translation language.
translation language
In bilingual dictionaries, the language in which translations are offered for headwords in the source language.
usage section
A note providing usage information on a headword, or a more extensive section covering the difficult and confusing aspects of a particular headword's usage.
variant headword
An alternative spelling of a primary headword that carries the same meaning and that should be treated as of equal rank to it for search purposes. For example, an entry with the primary headword kabbalah could have numerous variant headwords: ' kabbalah also kabbala or kabala or cabala or ...' (2)
voice
A relationship between the subject and object of a verb that is either active or passive.

References

1. Crystal, David. (1995). The Cambridge Encyclopedia of the English Language (pp. 448-60). Cambridge: Cambridge University Press.

2. Merriam-Webster, Incorporated. (2003). Merriam-Webster's Collegiate Dictionary, Eleventh Edition. Springfield, Massachusetts: Merriam-Webster, Incorporated.