Making an ePub book from LaTeX

Ian Hutchinson

Contents

1  What is ePub?
2  Mathematics
3  Translation to XHTML
4  Constructing the ePub from XHTML


TeX and LaTeX are well suited to producing electronically publishable documents. What is likely to be a continuing need is to translate LaTeX documents into standard electronic book format, notably ePub, which is the format adopted by most electronic reader publishers except the biggest one (you know who).

1  What is ePub?

You don't really want to know what the ePub standard is. And you are not going to find out much here. In short, though, an ePub book is a XHTML1 version of the book packaged together with its figures and other resources into a zip file. It has a couple of files of metadata. One describes the contents of the zip file and the other is the table of contents of the book, giving navigation links to chapters and sections.
Converting a LaTeX book to ePub is basically a two step process: (1) translate it to XHTML; (2) package up the XHTML into an ePub file.
If you are to be satisfied with your ePub version, you need to realize the difference between logical mark-up and page layout. I suggest you consult my brief discussion Should I translate to HTML or not at http://hutchinson.belmont.ma.us/tth/shouldi.html. The ePub format is XHTML; so you don't have a choice.

2  Mathematics

Mathematics is problematic in ePub. The natural solution is to use MathML. Unfortunately that is not yet part of the official ePub format. Stay tuned.

3  Translation to XHTML

There are several quite capable LaTeX to HTML translators. I wrote TtH so I recommend it, and I'm interested to hear about ways TtH output can be optimized for ePub. Because TtH uses HTML to represent equations, many simple equations will translate correctly. But complex equations probably won't, and that can be fixed only when MathML is adopted by ePub. You can get TtH free from http://hutchinson.belmont.ma.us/tth/
Once you have TtH installed, translation is as simple as issuing the command:
tth -w2 -e2 mybook.tex
This will create a file mybook.html in the directory you are working in. The switch -w2 tells TtH to use XHTML output, and -e2 tells it to include figures inline in the text. That's it. You are done with step 1.
Generally TtH will emit a variety of informational and warning messages for unusual LaTeX constructs. As long as TtH ends by saying "Number of lines processed approximately ..." it has probably translated the whole of your document and you should proceed. If not, or if there's something unsatisfactory about the translation, you need to dig into your TeX code and read the TtH manual.

4  Constructing the ePub from XHTML

There are a couple of open source applications that can construct the ePub zip file for you, and construct the required metadata. One is called Calibre, but the one I like best is Sigil. It can be a little tricky to get going on linux operating systems that are older than a year or so, because it depends upon recent Qt libraries, but there's a way to get a newer library and make it work. Windows you ask? Sorry, I don't do windows (but someone else probably can give advice).
To create the ePub file, just open Sigil, and do File->Open and choose your mybook.html. It will suck in your XHTML file, and also all the figures that are referred to in it, and construct the complete ePub format file. You can then save it and you would be done.
Actually you are not quite done because you have to generate a table of contents and fix some metadata before this is really a valid ePub file. (And by the way publishers and booksellers are very picky about ePub books having to validate against the standard, before they accept them.) There are many more details about getting HTML into Sigil at http://code.google.com/p/sigil/wiki/BasicTutorial, but here's a brief summary of essentials.
Table of Contents.   Your LaTeX file contains an automatic table of contents, right? If so and you made it in the standard LaTeX way, TtH will have translated it and put it in your XHTML file. But still the ePub navigation TOC must be created. So first, in Sigil click on Generate TOC from headings. It will give you the chance to include or not various headings. Go ahead.
ePub Validation   Now do Tools->Validate ePub. Unfortunately, you'll get two errors: "The < language > element is missing" and "The < title > element is missing". You can enter the title and language metadata can be by using Tools->Meta data... This pops up an entry form where you can type the title and author in. Then you should be fine.
Those errors refer to the file content.opf which Sigil generated. If you prefer you can edit it directly by hand. Double click on that file in the Sigil column that shows the content of your archive. It will protest that this is for experts only. Be brave and say ok. You'll then see a mess of markup, and near the top will be something like
<dc:identifier id="BookId" 
opf:scheme="UUID">urn:uuid:bb3f2792-ead3-42bb-80fd-727934819cec
</dc:identifier>

You need to enter either immediately before or immediately afterward two new markup elements as follows:
<dc:title>My Book's Title</dc:title>
<dc:language>en</dc:language>

Don't do anything else in the content.opf file unless you really are an expert.
Then validate again; you should see the report: "No problems found". You can save the mybook.epub file and you are done.
It is possible that on validation you will discover XHTML errors that arise from unusual LaTeX. If so, then you might have to dig back into your LaTeX source and find out why. Actually Sigil usually does quite a decent job of cleaning up XHTML code so that it satisfies the tight 1.1 standard.
There are other requirements (beyond the ePub standard) for title page and other stuff that publishers usually demand. They require you to do further editing. But that's between you and Sigil (or Calibre).

Footnotes:

1Strictly XHTML 1.1, which is a very picky form of the HTML language of the world wide web.


File translated from TEX by TTH, version 4.03.
On 20 Feb 2012, 18:43.