sections/parts in separated files

9 posts / 0 new

Submitted by Welblaud on November 12, 2014 - 7:25am

I'd like to know whether it is possible to divide xml section parts between more files.

In my books I was used to include every chapter in it's file. Later I have decided to add some semantics. Every chapter is enclosed in <section epub:type="chapter" id="c01"></section> tags. Now to parts—is it “allowed” to do it in the way below? Validators remain quiet about that…

1st FILE
======
<section epub:type="part" id="p01">
<section epub:type="chapter" id="c01">
...
</section>
<section epub:type="chapter" id="c02">
...
</section>
</section>

2nd FILE
======
<section epub:type="part" id="p01">
<section epub:type="chapter" id="c03">
...
</section>
<section epub:type="chapter" id="c04">
...
</section>
</section>

Submitted by O.H. on November 12, 2014 - 9:52am

There is no technical problem with this, but as far as I understand your intention, it is not the markup to get what you mean.

(X)HTML markup , what you use, is only just for one file, here this means, everything between the begin and the end tag of an element section belongs to this element and nothing else. The attribute id has only a meaning for the current document, no relation to other documents or to
other fragments in other documents having the same id value. To indicate relations, you can use a link element in the head with a rel attribute, respectively an a element with a rel attribute. Maybe with RDFa attributes using a more advanced vocabulary as (X(HTML has, there is some chance to markup relations in more details. In general (X)HTML is not very good to markup complex meta information about a collection of files, it does not even have much elements with semantical meaning to markup books properly, therefore there are such constructions like the type attribute from the OPS namespace or RDFa, allowing to use other vocabularies.
EPUB provides information about the book structure in the OPF-file or navigation documents.
The OPF-file is more about the complete book, but in the navigation documents you can provide some structured information about all the documents, the book consists of.
For the OPF-file it might be useful to study the element collection, introduced in version 3.0.1.
I think, this has the closest relation to your intentions.
Within navigation this is possible both in the NCX-file and in the (X)HTML based navigation file available for EPUB 3 to provide some information about the structure of the book. Due to some limitations introduced in EPUB 3 however, in the (X)HTML based navigation file the markup for more complex book structures is suboptimal and of low semantical relevance due to the choice/restriction of elements done in EPUB 3 - for example something like headings with the element span ;o)

Submitted by Welblaud on November 13, 2014 - 1:02am

Thanks a lot for clarification! The collection element seems really as convenient here. I meant the whole semantics thing is “ipso facto” for some future use. Theoretically when parsing such a collection of sources, it would include subchapters under the same part (like in a hash).

I think I should stick more to real basics ;o)

Submitted by Welblaud on May 4, 2015 - 1:05am

Now, after another bunch of books, I have realized this is a logical problem. I understand this is a thing which counts in a file, but what about bigger divisions above chapters? If every chapter has it’s own file, every file could be marked as a 'bodymatter'. And this means the logic I have been thinking about last time still should work.

FILE 01
<section epub:type="part" id="01">
<section epub:type="chapter" id="c01">
</section>
</section>

FILE 02
<section epub:type="part" id="01">
<section epub:type="chapter" id="c02">
</section>
</section>

Right?

Submitted by matt.garrish on May 4, 2015 - 6:01am

It's something of an intractable problem given the way that content has to be artificially chunked for distribution.

The problem with your approach of adding artificial wrappers to every file is that it monkeys with the outline of each document, as you're effectively stating that each file has a new part in it. Having matching IDs or whatnot is meaningless to the various kinds of user agents that will process the markup.

Following the HTML5 outline algorithm, for example, each subsequent use of a wrapper section for the part results in an "untitled section" starting each file. From an accessibility perspective, adding such wrappers can lead to the reader being told for each file that they're entering a new section, but one they won't realize has no purpose. Reannouncing the part heading over and over is no more helpful.

How to retain the original structure of the publication is a question that comes up periodically in the working group. One answer has been to reconfigure the spine to allow nesting so that the logical structure is retained there, but that would be hugely problematic from a backwards compatibility perspective. A more promising suggestion has been to look at adding semantics to the table of contents, since it already reflects the document hierarchy.

There are two problems that need to be solved before that could be practical, however. One is that epub:type identifies the nature of the content it is attached to, so it really should be a @rel attribute on the a elements. But that leads to having to expand the html5 link relationships list.

The other is that you still can't fully express front/body/back matter without adding a label-less list, which is a violation of the production rules. All you can do is note the positional start points for each.

Given that no one has had a pressing reason to solve the problem -- the need is largely predicated on being able to recombine an epub into its monolithic form, since chunking remains a necessary evil -- it's an issue that keeps getting kicked down the line.

Submitted by Welblaud on May 4, 2015 - 11:43am

Thank you for the explanation. Simply said, does that mean using semantics in the way when several files are marked as 'bodymatter' and include one or more chapters is simply a non-sense from this point of view? :o) It is the same attitude like the one with nested sections.

Frankly, this is really sad. I would expect the semantics is one of the most important (and working) things for e-books. However, it seems e-books are still a matter of not so fine-tuned CSS styling. The interesting point is we are still a sort of pioneers, always trying to figure out what to use and what 'could' be reliable.

Submitted by O.H. on May 5, 2015 - 4:19am

In XHTML you can use RDFa for semantics, within EPUB the EPUB/OPS:role extension as well.
I do it to indicate especially poetry and other structures not available in XHTML. EPUB has some vocabulary to indicate some advanced semantics for books to be used within XHTML documents. For issues not mentioned in this EPUB vocabulary I typically reference items from LML (literature markup language).
Using RDFa one builds complete RDF triples, including the capability to provide semantical information as well about other documents, therefore one can use this within the XHTML navigation document or other content documents as additional semantical information about the book structure.

XHTML (including HTML5) is not intended for books and not good to indicate the semantical meaning of content, a few new elements in HTML5 do not change a lot here, but the addition of RDFa does. With this it is possible to use other vocabularies. Of course, usual viewers will not care about it, but specific potential viewers interested in semantics can. The audience as well can unzip the EPUB-archive and apply a transformation to a better format for the content, taking into account semantical information and exposing it somehow.
If we add now semantical information, people at least have the option to take it into account.
Fortunately this is already possible now, one can put more than tag soup into EPUB ;o)

Submitted by Welblaud on May 5, 2015 - 10:43am

Thank you for the direction! I will study that thoroughly!

However, it probably does not solve the problem of meta-sections playing the role of an "umbrella" for a couple of sub-sections in separate files. I would expect any smarter parsing system would see:

file 1
backmatter
resume

file 2
backmatter
bibliography

file 3
backmatter
index

... and figure out ah-ha! Here we have the backmatter part containing three separated things. Very common, very clean.

Submitted by matt.garrish on May 5, 2015 - 11:13am

That's the intractable part.

You must reduce your file size by breaking out the subsections, but yet you want to retain information about the logical structure...

As I mentioned, the answer seems to lie in an external representation of the structure for now, but none are perfect.

I was looking at the EDUPUB spec again this morning, and the idea of using the table of contents is formalized there, but it still relies on the problematic epub:type atttribute. I opened an issue to have them reconsider, since the table of contents can appear in the spine in which case it suggests all the structures being pointed to are also present in the file. Anyway, that's a boring point of arcana that has to be resolved.

And I'm not suggesting there's no value in applying the semantics, but when it comes to groupings you have to think practically within the confines.

What most people do is put the part heading in a separate file from its chapters, so you'd get this:

While suboptimal for processing by machine, the user doesn't lose any information. Non-visual readers will still move by heading, and are aware of where they are.

As for front/body/back matter, there's no consensus I've found. Some people like to attach the appropriate one to the body of every file as a coupling to the more specific semantics. Others use them as indicators of the transition points, as I mentioned above. I tend to prefer the latter approach, as there's no value in overwhelming files with semantics.

But, each time you step back to a bigger grouping element -- whether part, volume, *matter, or what -- you face this same problem again.

sections/parts in separated files

Search form

Secondary menu