Original version Nov 97, Footnote updates Jul 1998 Mar 2002
This is a brief summary, including some personal critiques, of
different approaches to publishing mathematical documents on the World
Wide Web. Other fuller discussions exist
[1]
, e.g.
W3C Math project.
Background
Layout and typesetting of mathematics is extremely demanding. The
system TeX, developed by Donald Knuth, (and its spin-offs LaTeX and
other extensions) largely solved this problem for computer-generated
printed documents. Word-processors came into use after TeX, and the
prominent ones added equation capabilities only much later still. Even
now, for mathematics they do not equal the power of TeX, although
their WYSIWYG interfaces appeal to many users. The standard mark-up
language of the world wide web, HTML, does not currently support
mathematics directly. Although draft standards for mathematics have
been [2]
(and will be) proposed, no popular web browser has yet adopted
them. Interim solutions are needed, since mathematics is the language
of science, which brought the web into being.
Interim solutions can be considered to be of five generic types.
These approaches have different strengths and weaknesses that are
discussed below. In summary, 1 gives high quality printable
distribution but is too cumbersome for browsing. 2 is best for
browsing but requires some compromises on layout. 3 is being
supplanted by 2, and 4 and 5 are not currently in widespread use.
Graphical Page Display
This is the approach of Adobe Acrobat and the .pdf file format. The
page is rendered into a page-description language such as Postscript
or PDF. The viewer has to use a helper or plug-in application to
display it. Its advantages are the ability to render pages with layout
essentially exactly as in an original (e.g. TeX) document. Its
disadvantages are that this requires a verbose and bulky file format
that includes all the non-standard fonts that are used. It also
restricts searchability because the document is not text. PDF
partially addresses these problems but cannot escape them all. A
helper or plug-in has to be installed on the viewing system. In this
respect, the approach has little advantage other than market share
over simply putting the (e.g.) TeX source on the web and expecting
people to be able to render it via a helper (TeX plus xdvi or the
equivalent). That way you lose hyperlinkability, though, unless you use
hypertex. An even simpler
alternative is to convert each TeX page into a GIF file and somehow
index the pages together. This approach requires no extra browser
plug-in, since graphical browsers all support GIF images directly, but
including linkages is virtually impossible, the files remain bulky,
and high-quality printing is lost.
Native HTML using symbol fonts
This approach aims less for exact similarity of rendering and more for
a full translation into HTML, accepting compromises on the aesthetics
of layout. The key problems to be overcome are the need for appropriate
symbol fonts and the method to lay out the document. The
translator TtH
(for plain TeX and LaTeX) uses the symbol fonts available through
browser-supported (HTML4.0 compatible) tags on most viewing systems
and HTML3.2 tables for mathematics layout, thus requiring no viewer
installation. It is clearly the fastest and most convenient
approach. An extra benefit is that document structure such as
footnotes, lists, cross-references and sectioning, is automatically
included with internal hyperlinks.
Embedded graphics in HTML
This is the approach of
LaTeX2HTML
and a similar approach is adopted in
TeX4ht.
LaTeX markup is translated into HTML, including many layout constructs
such as lists, sections, headings etc. Equations, which are not easily
translated, are represented by graphical (e.g. GIF) images, whose
inclusion is supported by all graphical browsers. This allows the
equations to be presented in their original rendering and does not
require the viewer to have any additional fonts. Its disadvantages are
that it is generally low resolution; there are problems with alignment
and sizing of the equations; and, in complicated documents, many, many
files represent the original document. Each of the gif images has to
be downloaded. Its main advantage is that no additional installation
is required of the viewer.
Java applet equation generation.
This approach is best currently represented by
WebEQ
for which equation mark up is included in the source HTML document as
a parameter to the applet. The input processing module is determined
by a parameter in the applet tag, so that authors can select between
various input formats. At present, WebEQ processes a LaTeX-like mark
up language called WebTeX. Its advantages (and disadvantages) include
most of those of the embedded graphics approach. Compared with the
embedded graphics approach, the Java applet has much more initial
download overhead, making the first page very slow, but it has the
ability in principle to adapt to the viewer to improve layout.
Plug-in equation rendering.
The main distinction in principle between this approach and the Java
applet approach is that the plug-in has to be installed on the viewing
operating system before it can be used.
IBM
techexplorer
is a representative example under development. The benefits of the
plug-in and Java approaches might well come into their own if they
became an integral part of the popular browsers, but that requires the
adoption by browser programmers of a comprehensive math specification
for HTML, which doesn't seem to be happening[4],
[5].
[6] [Additions after 1997 in the form of footnotes.]
Footnotes:
[1]Unfortunately much of
the existing W3C discussion of mathematics in the web context
misrepresents the actual situation by assuming that equations can be
rendered only by images, whereas
TtH shows this to
be palpably false. Another aspect of W3C attention is a focus not
merely on rendering typeset mathematics but on expressing its content. That
interesting topic is not being discussed here.
[2]Mathematics rendering was part of the HTML3.0
proposal and substantial work was done on a TeX to HTML3.0 math
equation converter. However, that draft standard died when it was not
adopted by the major browsers
[3]One can now (Mar 2002) consider adding an additional category: MathML,
since the new Mozilla browser is beginning to be adopted, and it now
has native MathML. However, formally, Mozilla does not recognize the
possibility of including MathML in HTML documents, only in XML
documents.
[4]It seems
(mid 1998) that the MathML draft specification is gaining support from
mathematics-oriented organizations and it is partly implemented in the
amaya browser from W3C.
[5]Even if browsers adopt MathML, it is far too verbose to
author directly, so document translators will be required.
[6]Mar 2002 there are encouraging signs that MathML may finally get
included by default in major browsers.
File translated from
TEX
by
TTHgold,
version 3.70. On 28 Aug 2005, 18:01.