Should I translate TeX to HTML or not?

Ian Hutchinson

1  Page Representation Formats

TeX and LaTeX are well suited to producing electronically publishable documents. However, it is important to realize the difference between page layout and functional mark-up. TeX is capable of extremely detailed page layout, specifying precisely where on the page symbols go. HTML is not, because HTML is a functional mark-up language (specifying primarily document structure) not a page layout language. HTML's exact rendering is not specified by the document that is published but is, to some degree, left to the discretion of the browser. This is a deliberate choice. It recognizes that the window size, resolution, or shape on which a document is viewed will vary from reader to reader, and that therefore layout, font size, and other choices for good readability should be at least partly up to the reader, not the author. The result is that well designed HTML is excellent for browsing, but clumsy for printing.
Most authors are not used to such flexibility, they are used to producing static documents whose appearance is the same for everyone, because, for example, they are copies of a piece of paper. If you require your readers to see an exact replication of what your document looks like to you, then you cannot use HTML to transmit it, no matter what format it starts in. That is true not just for translated TeX but also for any authoring tool from which HTML is to be produced. The only way to produce documents whose appearance is completely controlled is to represent them in a page layout language such as such as PDF or Postscript or, for that matter, DVI. These formats are not as good as HTML for browsing, despite substantial hyperlinking ability in PDF, but they are better for transmitting a printable copy. Parenthetically, word processor formats are less satisfactory for transmitting printable copy, hopeless for browsing, and unreliable for archiving because of the instability of the format.

2  Mathematics

TeX's excellent mathematical capabilities are absent from HTML and browsers. There are then two main choices for representing equations in HTML: using bit-mapped images, or using browser fonts and tables for layout. The advantage of the bit-mapped approach is that it uses capabilities that are essentially universal to every graphical browser. Its disadvantages are that it requires a separate graphical file for every equation, which becomes very cumbersome and slow to download. Also the alignment and sizing of the graphical equations is uncertain with respect to the rest of the text. The advantages of the font and table approach used by TtH are that one HTML document contains all the information, giving portability and speed of download. The disadvantages are that it depends on having the symbol font accessible on the browser, and that the equation layout is not as compact or elegant as TeX's.
The MathML standard has been developed to represent mathematics in electronic documents. MathML is not HTML. Popular browsers do not currently (Mar 2003) render MathML without additional plugin software or fonts. The standard is in any case that MathML is supported within XML not strictly HTML. What is holding up wider adoption of MathML is not questions of production of MathML, since translators such as TtM are fully up to that job, rather it is the weakness of support in leading browsers. But even when and if MathML is routinely supported by browsers out of the box, documents' appearance will still be in the hands of the browser not the author.

3  Conclusion

So should I translate to HTML?   If you want to provide the easiest browsable format, yes. If you feel it is essential to control the precise layout for aesthetic or other reasons, no. But notice the answer has nothing to do with whether the format starts as TeX.



File translated from TEX by TTHgold, version 3.70.
On 28 Aug 2005, 18:01.