XML Schema definition for opf content

15 posts / 0 new
Last post

Hi,

Where can I find a valid XML schema definition (.xsd) for an epub opf content ?
I have found the following file : http://code.google.com/p/epubcs/source/browse/html2epub/Content.xsd?spec... however when I try to create a c# class from it I get validation errors :

E:\Developer\ePubAnalyzer\EpubTools\xsd>xsd content.xsd /classes
Microsoft (R) Xml Schemas/DataTypes support utility
[Microsoft (R) .NET Framework, Version 4.0.30319.17929]
Copyright (C) Microsoft Corporation. All rights reserved.
Schema validation warning: L'élément 'http://purl.org/dc/elements/1.1/:description' n'est pas déclaré. Line 19, position 18.
Schema validation warning: L'élément 'http://purl.org/dc/elements/1.1/:language' n'est pas déclaré. Line 20, position 18.
Schema validation warning: L'élément 'http://purl.org/dc/elements/1.1/:creator' n'est pas déclaré. Line 21, position 18.
Schema validation warning: L'élément 'http://purl.org/dc/elements/1.1/:title' n'est pas déclaré. Line 28, position 18.
Schema validation warning: L'élément 'http://purl.org/dc/elements/1.1/:date' n'est pas déclaré. Line 29, position 18.
Schema validation warning: L'élément 'http://purl.org/dc/elements/1.1/:contribut or' n'est pas déclaré. Line 30, position 18.
Schema validation warning: L'élément 'http://purl.org/dc/elements/1.1/:identifier' n'est pas déclaré. Line 31, position 18.
Schema validation warning: L'élément 'http://purl.org/dc/elements/1.1/:subject' n'est pas déclaré. Line 32, position 18.

Warning: Schema could not be validated. Class generation may fail or may produce incorrect results.

Thanks

You need to account for this import statement when you call the program:

<xs:import namespace="http://purl.org/dc/elements/1.1/" schemaLocation="references/Elements.xsd" />

Untested, but I believe you just need to modify your command line to:

> xsd content.xsd references/Elements.xsd /classes

It should then compile, as I don't get any errors viewing the schema in an xml editor like Oxygen.

You won't get the same level of validation as the official RelaxNG+Schematron schemas, but it should work for basic structural testing. IDPF doesn't maintain XSD versions of the EPUB schemas.

Thanks it works. And what about nav ? is it possible to have a xsd to load epub 3.0 nav file ?

Sorry, if such a thing exists already I don't know where you'd find it. I've only ever used the RelaxNG and Schematron files.

You could possibly use jing and trang to convert from RelaxNG to XSD.

The rough instructions are:

1) convert the rnc to rng using trang:

java -jar trang.jar epub-nav-30.rnc epub-nav-30.rng

2) save the simplified rng generated by jing to avoid external ref errors: 

java -jar jing.jar -s epub-nav-30.rng > nav.rng

3) use trang again to convert to an xsd:

java -jar trang.jar nav.rng nav.xsd

But on a quick try, I didn't get a working XSD. The HTML schemas that underpin the navigation document result in numerous UPA errors on conversion, and short of compromising your validation by removing conflicts until you get a working schema I don't know that you'll be able to work around this problem.

Ok thanks for trying. Wouldn't be possible to use a mixed approach, so I could extract programmatically the interesting part and I do what is possible to get the following xml nodes :

<ol xmlns="http://www.w3.org/1999/xhtml">
<li>
<a href="heftywater.xhtml#title">Hefty Water</a>
<ol>
<li>
<a href="heftywater.xhtml#switch">The Switch</a>
</li>
<li>
<a href="heftywater.xhtml#source">The Source</a>
</li>
<li>
<a href="heftywater.xhtml#ruby">Hefty Ruby Water</a>
</li>
</ol>
</li>
</ol>

Then maybe I could use a xsd to parse those nodes. Do you think it would be possible to create a xsd that can read that kind of xml structure ?

Sure, and if you tweak at the schema you can reduce it to something workable. It's just that a simple programmatic translation from one language to the other is problematic.

If you're not worried about MathML and SVG appearing in your table of contents entries, you can make a working XSD without too much pain. I've put a quick translation at http://matt.garrish.ca/epub3/guidelines/res/nav.zip that you're free to take and play with. It only includes the minimal set of inline elements that you might expect in a toc entry, so you'll also find things like the form elements missing.

It does validate against a number of nav documents from the epub samples project, but it's also a hastily chopped up version of a machine translation so you may still find all kinds of problems in it that I haven't. Caveat emptor, in other words... :)

And an issue I noticed just before posting is that it doesn't allow a section to wrap the content, but I'm not sure if that's valid to the specification or not. It should be allowed, and is valid to the RelaxNG schemas, since the you can include a heading, but the specification definition does not mention one being allowed in detailing all the elements that are. Anyway, there's a second modified version if you need to allow sections here: http://matt.garrish.ca/epub3/guidelines/res/nav-section.zip

I'm going to make a note that the definition should be made clearer, too.

Actually, I'm probably over-prescribing the nav document as a set of 1..n nav elements. I shouldn't write while I'm still asleep. Beyond the restrictions on the nav elements, all that is required is the presence of a toc nav element and to present any embedded nav elements to the reader. There's isn't any restriction on what else you could include in it, and it could include other information if also being used in the spine.

The schemas above assume only the presence of 1..n nav elements, which is most likely all you're going to find. If you are expecting really exotic documents with other prose and content, this reduced schema again won't work. You would need to extract the navs and independently verify them, as you noted, reduce the content document before verifying, or keep tweaking at the content model until you get what you need.

Thanks again, I am so lucky to have found you and that you are an expert in this domain. So now I have generated some c# classes from your xsd and now I need to test.

Actually I think for my use I will remove all xs:attribute under nav, title :
<xs:attribute name="style" type="xs:string"/>
<xs:attribute name="onabort" type="xs:string"/>
<xs:attribute name="onblur" type="xs:string"/>
<xs:attribute name="oncanplay" type="xs:string"/>
<xs:attribute name="oncanplaythrough" type="xs:string"/>

because I don't want to validate a nav but just import the xml into some c# data structure that is a common abstraction for ncx and nav. So basically I will only access body, title, nav and I don't care about all possible attributes like onBlabla or aria-xyz.

I'm just happy if I can come out from an excursion into XSD unbloodied and unbeaten... :)

Best of luck with it.

Warning: Schema could not be validated. Class generation may fail or may produce incorrect results.

It's always need us to do so.

I found a small issue with the xsd used to parse the package because in the definition I have :

<xs:element name="spine">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" name="itemref">
<xs:complexType>
<xs:attribute name="idref" type="xs:string" use="required" />
<xs:attribute name="linear" type="xs:boolean" />
<xs:attribute name="id" type="xs:string" />
<xs:attribute name="properties" type="xs:string" />
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="toc" type="xs:string" use="required" />
</xs:complexType>
</xs:element>

and in one of the epub I get I have the following line :

<itemref idref="cover" linear="no"/>

So I get an exception because liner is declared to have boolean type (true, false) but here it uses yes/no, how can I declare a type in xsd that accepts the 4 possible values (true, false, yes, no) ?

actually I made a mistake because epub spec says that the valid values for linear are yes/no so how can I fix the following package.xsd and

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:calibre="http://calibre.kovidgoyal.net/2009/metadata"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:opf="http://www.idpf.org/2007/opf"
xmlns:dc="http://purl.org/dc/elements/1.1/"
attributeFormDefault="unqualified"
elementFormDefault="qualified"
targetNamespace="http://www.idpf.org/2007/opf"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:import namespace="http://purl.org/dc/elements/1.1/" schemaLocation="references/elements.xsd" />
<xs:element name="package">
<xs:complexType>
<xs:sequence>
<xs:element name="metadata">
<xs:complexType>
<xs:sequence>
<xs:choice maxOccurs="unbounded">
<xs:element ref="dc:description" />
<xs:element ref="dc:language" />
<xs:element ref="dc:creator" />
<xs:element name="meta">
<xs:complexType>
<xs:attribute name="name" type="xs:string" use="required" />
<xs:attribute name="content" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
<xs:element ref="dc:title" />
<xs:element ref="dc:date" />
<xs:element ref="dc:contributor" />
<xs:element ref="dc:identifier" />
<xs:element ref="dc:subject" />
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="manifest">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" name="item">
<xs:complexType>
<xs:attribute name="href" type="xs:string" use="required" />
<xs:attribute name="id" type="xs:string" use="required" />
<xs:attribute name="media-type" type="xs:string" use="required" />
<xs:attribute name="properties" type="xs:string" />
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="spine">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" name="itemref">
<xs:complexType>
<xs:attribute name="idref" type="xs:string" use="required" />
<xs:attribute name="linear" type="xs:boolean" />
<xs:attribute name="id" type="xs:string" />
<xs:attribute name="properties" type="xs:string" />
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="toc" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
<xs:element name="guide">
<xs:complexType>
<xs:sequence>
<xs:element name="reference">
<xs:complexType>
<xs:attribute name="href" type="xs:string" use="required" />
<xs:attribute name="type" type="xs:string" use="required" />
<xs:attribute name="title" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="version" type="xs:decimal" use="required" />
<xs:attribute name="unique-identifier" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
<xs:attribute name="file-as" type="xs:string" />
<xs:attribute name="role" type="xs:string" />
</xs:schema>

references/elements.xsd:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:tns="http://purl.org/dc/elements/1.1/" attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="http://purl.org/dc/elements/1.1/" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:import namespace="http://www.idpf.org/2007/opf" />
<xs:element name="description" type="xs:string" />
<xs:element name="language" type="xs:string" />
<xs:element name="creator">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute xmlns:q1="http://www.idpf.org/2007/opf" ref="q1:file-as" use="required" />
<xs:attribute xmlns:q2="http://www.idpf.org/2007/opf" ref="q2:role" use="required" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:element name="title" type="xs:string" />
<xs:element name="date" type="xs:dateTime" />
<xs:element name="contributor">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute xmlns:q3="http://www.idpf.org/2007/opf" ref="q3:role" use="required" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:element name="identifier">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="id" type="xs:string" use="required" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:element name="subject" type="xs:string" />
</xs:schema>

After some googling I found the following definition and I suppose the solution is to use something like that :

http://www.oxygenxml.com/samples/xml-schema-documentation/MusicXML-Schem...

what do you think ?

Right, the xs:boolean type only allows 1/0 and true/false. You need to change this definition:

<xs:attribute name="linear" type="xs:boolean" />

to:

 

<xs:attribute name="linear">
    <xs:simpleType>
        <xs:restriction base="xs:token">
            <xs:enumeration value="yes"/>
            <xs:enumeration value="no"/>
        </xs:restriction>
   </xs:simpleType>
</xs:attribute>
 
What this definition does is tokenize the value ot the attribute (trim whitespace) and ensures that the resulting value is one of the two enumerated values.

HTML really hard...:(

Secondary menu