choosing a format for epub

9 posts / 0 new
Last post

Dear Sirs and Madames,

I have a title on Amazon via Createspace but now wish to publish it via epub as well - but which one? It's a textbook and there are can be no changes made to the original

My essential requirements are that the copy appear on the epub format exactly as it does on the print format, especially the page numbers. The reason for this is that the book (320 pp.) is full of interior cross-referencing such that terms are repeated and lessons re-learned if needs be. There are also photo insertions and several article insertions which have been placed in (MS Word) boxes

The document is also full of any combination you can think of for underlining, italics, bold-face, three different fonts and all the various types of bracketing and indentation there is. They all must be preserved.

I've stripped out all the headers and manually inserted page numbers and I have created a .pdf version which contains everything there is in the original .doc document (except for page numbers, which I've manually inserted). It may be the way forward here but how do I secure it?

What I've been doing is waiting for Kindle to come up with an epub version that will replicate every thing in the book as the original ; however, to my awareneness, that hasn't happened yet.

I'm a writer, not a tech person; so it's hard for me to keep up with the advances made in e-publishing and accordingly I have shied away from the whole conversation. But if anyone out there can give me a good hard push toward , I'd appreciate the direction now. There are many students and teachers out there who simply can't pay the freight for two lb. of the hard copy which makes up this book.

The e-pub world looks like the way forward for them and me but I am clueless about how to approach it. Your advice would be much appreciated.

You are using a digital format.
The main question is, why do you stick in terms of printed books like pages, having no meaning for digital representations - here it is meaningful to structure due to the content, not the size of a piece of paper.
Therefore no surprise, that you run into trouble, if you insist to transfer your printed book into a digital format.

What you can do ist simply to scan your book in high resolution. I think, there are specific archive formats for such scanned printed books - they are huge, inaccessible and of low quality, but you get a digital variant of your book relatively close to the printed version - for each page one scan.
It is possible to use such scans as well within EPUB, if you use a format like PNG oder GIF for the scans and embed it into XHTML-files, but you need a text alternative for each such image as well - effectively this means, that you have to make you scanned book accessible, converting it to something machine readable anyway.

Obviously a much better approach is to convert all you content into structured XHTML and SVG documents manually, adding text alternatives to images and using such a structured, accessible collection of documents as the content for an EPUB archive, simply by forgetting about how your printed book appears, because it does not really matter for the digital variant, having no such constraints than printed books.

The way to preserve the pages from your printed book when converting to digital is to make a fixed-layout epub archive.

Repeating that I do have a .pdf version of the book for which pages are linked vertically, and that there are offers out there to accept .pdf linked pages, how do I present the book using these facilities and avoid loss of control of the document?

Thanks for your replies.

The PDF is typically a little bit late in the production process to convert something - even the mentioned word document is not pretty good, but one can try to export this to a more meaningful format, if these word programs do not manage to generate some EPUB output themeselves.
If not, one either extract only text and images and restarts with this - or one tries to export XHTML, but due to the examples I have seen, word has a lot of trouble in exporting structured and meaningful output in (X)HTML ...

I think, calibre can import word, maybe even PDF to generate some tag soup EPUB, that can be used as a starting point for something meainigful, but of couse starting with PDF or word, such programs cannot understand the meaning of content, therefore one cannot expect much from the generated result - still a lot to do manually.

In general it is better to start with semantically rich formats, converting this to poor formats like PDF etc only for printing.

Thanks again.

It's all pretty esoteric to me but the consensus I think seems to be that I should find another format and retype all 320 pp. of the book. Other activity has taken precedence in my life, so I suppose I'll just give the thing a rest for now.

One more question - Do I understand correctly that the only printed books currently being converted to digital (ex.Kindle, Kobo etc.) are bodies of paragraphs where the elemental conversion is of individual characters?

PDF or Postscript mainly describe, how to position glyphs and other graphics on a canvas without much indication about the meaning of such a structure like a heading, paragraph or chapter, links etc.
This is similar to SVG, but more cryptical syntax - and SVG has the advantage, that it is an XML format with acessibility features and extensible.

XHTML describes mainly meaning and functionality of text, it is an XML format as well.

The newest format from Amazon can be seen as an obfuscated EPUB, intentionally made incompatible to protect Amazon and to bound people to Amazon, therefore not really something relevant for content itself. But Amazon provides a program to convert simple EPUBs into this obfuscated format automatically.

One the one hand one can distinguish between XML formats (used by EPUB and some others) and older, more complex formats like PDF and postscript - if one creates such old formats for example with LaTex, there are scripts to convert into XML to simplify further processing, but else is ist difficult. Anyway typically with such automatic conversion a lot of semantic information about the content is lost, if it existed at all in an old format.
If one wants a high quality book, one has some work, even if one has some tools for automatic conversion. If not, with some luck the automatic conversion can be sufficient for simple content and low quality results.

If you need precise control about page layout and precise glyph positioning for printing, postscript and PDF are good formats - if it works for you, not a big problem - but such formats may cause more accessiblilty problems than formats like XHTML and SVG, but screen readers today are not pretty good for SVG as well, therefore if one has al lot of graphics, both solutions may cause problems in practice. Fixed llayout may cause accessibility problems as well - depends on the audience and the way one realises the fixed layout.

Currently the general approach, if one needs both printed versions and screen versions of a book ist to start with a semantical rich XML format containing all relevant information.
This is used to create different output formats with scripts or programs, taking into account more or less the speific capabilities and gaps of each output formats including workarounds for not available features in the a specific output format.
If one starts with a format of poor quality, optimised for one specific output media, obviously one has problems with conversion, because information necessary or useful for other output media is missing in the source. Problematic as well - the conversion programs and scripts are typically simple, they do not understand the content and are not able to make usage of all capabilities of the desired output format - therefore one gets poor conversion results for different reasons. If one starts with a semantical rich source, obviously it is simpler to write or use conversion tools of better quality to generate something meaningful automatically.
But to use the right tool is essential for the quality of the output.

Thanks again, O.H.

I tried some sample conversions of my ms to e - pub formats and the results were actually better than I expected - but there were many imperfections which made the exercise a failure. I'll just wait on technology for a usable e - pub format

Best idea for me at this stage will be to ship out the document as a .pdf with appropriate security - I'm working on that feature - and let the end recipient decide what gets printed and what doesn't. I emailed the document to myself and printed some sample pages, which included embedded page numbers. The result was spot on..

The web site around this discussion is Have a look when you have a chance.

- Gahans

You can unzip the EPUB archive and fix problems manually with a text editor, this is one of the advantages of formats used in EPUB - this would be difficult in word or PDF ...

Secondary menu