Approaches to WWW Mathematics Documents

Ian Hutchinson
Original version Nov 97, Footnote updates Jul 1998 Mar 2002


This is a brief summary, including some personal critiques, of different approaches to publishing mathematical documents on the World Wide Web. Other fuller discussions exist [1] , e.g. W3C Math project.

Background

Layout and typesetting of mathematics is extremely demanding. The system TeX, developed by Donald Knuth, (and its spin-offs LaTeX and other extensions) largely solved this problem for computer-generated printed documents. Word-processors came into use after TeX, and the prominent ones added equation capabilities only much later still. Even now, for mathematics they do not equal the power of TeX, although their WYSIWYG interfaces appeal to many users. The standard mark-up language of the world wide web, HTML, does not currently support mathematics directly. Although draft standards for mathematics have been [2] (and will be) proposed, no popular web browser has yet adopted them. Interim solutions are needed, since mathematics is the language of science, which brought the web into being.
Interim solutions can be considered to be of five generic types.
1
Graphical page display.
2
Native HTML using symbol fonts.
3
Embedded graphics in HTML.
4
Java applet equation generation.
5
Plug-in equation rendering. [3]
These approaches have different strengths and weaknesses that are discussed below. In summary, 1 gives high quality printable distribution but is too cumbersome for browsing. 2 is best for browsing but requires some compromises on layout. 3 is being supplanted by 2, and 4 and 5 are not currently in widespread use.

Graphical Page Display

This is the approach of Adobe Acrobat and the .pdf file format. The page is rendered into a page-description language such as Postscript or PDF. The viewer has to use a helper or plug-in application to display it. Its advantages are the ability to render pages with layout essentially exactly as in an original (e.g. TeX) document. Its disadvantages are that this requires a verbose and bulky file format that includes all the non-standard fonts that are used. It also restricts searchability because the document is not text. PDF partially addresses these problems but cannot escape them all. A helper or plug-in has to be installed on the viewing system. In this respect, the approach has little advantage other than market share over simply putting the (e.g.) TeX source on the web and expecting people to be able to render it via a helper (TeX plus xdvi or the equivalent). That way you lose hyperlinkability, though, unless you use hypertex. An even simpler alternative is to convert each TeX page into a GIF file and somehow index the pages together. This approach requires no extra browser plug-in, since graphical browsers all support GIF images directly, but including linkages is virtually impossible, the files remain bulky, and high-quality printing is lost.

Native HTML using symbol fonts

This approach aims less for exact similarity of rendering and more for a full translation into HTML, accepting compromises on the aesthetics of layout. The key problems to be overcome are the need for appropriate symbol fonts and the method to lay out the document. The translator TtH (for plain TeX and LaTeX) uses the symbol fonts available through browser-supported (HTML4.0 compatible) tags on most viewing systems and HTML3.2 tables for mathematics layout, thus requiring no viewer installation. It is clearly the fastest and most convenient approach. An extra benefit is that document structure such as footnotes, lists, cross-references and sectioning, is automatically included with internal hyperlinks.

Embedded graphics in HTML

This is the approach of LaTeX2HTML and a similar approach is adopted in TeX4ht. LaTeX markup is translated into HTML, including many layout constructs such as lists, sections, headings etc. Equations, which are not easily translated, are represented by graphical (e.g. GIF) images, whose inclusion is supported by all graphical browsers. This allows the equations to be presented in their original rendering and does not require the viewer to have any additional fonts. Its disadvantages are that it is generally low resolution; there are problems with alignment and sizing of the equations; and, in complicated documents, many, many files represent the original document. Each of the gif images has to be downloaded. Its main advantage is that no additional installation is required of the viewer.

Java applet equation generation.

This approach is best currently represented by WebEQ for which equation mark up is included in the source HTML document as a parameter to the applet. The input processing module is determined by a parameter in the applet tag, so that authors can select between various input formats. At present, WebEQ processes a LaTeX-like mark up language called WebTeX. Its advantages (and disadvantages) include most of those of the embedded graphics approach. Compared with the embedded graphics approach, the Java applet has much more initial download overhead, making the first page very slow, but it has the ability in principle to adapt to the viewer to improve layout.

Plug-in equation rendering.

The main distinction in principle between this approach and the Java applet approach is that the plug-in has to be installed on the viewing operating system before it can be used. IBM techexplorer is a representative example under development. The benefits of the plug-in and Java approaches might well come into their own if they became an integral part of the popular browsers, but that requires the adoption by browser programmers of a comprehensive math specification for HTML, which doesn't seem to be happening[4], [5]. [6]

[Additions after 1997 in the form of footnotes.]

Footnotes:

[1]Unfortunately much of the existing W3C discussion of mathematics in the web context misrepresents the actual situation by assuming that equations can be rendered only by images, whereas TtH shows this to be palpably false. Another aspect of W3C attention is a focus not merely on rendering typeset mathematics but on expressing its content. That interesting topic is not being discussed here.
[2]Mathematics rendering was part of the HTML3.0 proposal and substantial work was done on a TeX to HTML3.0 math equation converter. However, that draft standard died when it was not adopted by the major browsers
[3]One can now (Mar 2002) consider adding an additional category: MathML, since the new Mozilla browser is beginning to be adopted, and it now has native MathML. However, formally, Mozilla does not recognize the possibility of including MathML in HTML documents, only in XML documents.
[4]It seems (mid 1998) that the MathML draft specification is gaining support from mathematics-oriented organizations and it is partly implemented in the amaya browser from W3C.
[5]Even if browsers adopt MathML, it is far too verbose to author directly, so document translators will be required.
[6]Mar 2002 there are encouraging signs that MathML may finally get included by default in major browsers.


File translated from TEX by TTHgold, version 3.70.
On 28 Aug 2005, 18:01.