13 Browser and Server Problems
TTH translates TEX into standard HTML and takes account as far as
possible of the idiosyncrasies of the major browsers. Nevertheless,
there are several problems that are associated with the browsers, and
a few that are associated with web servers. Authors and publishers
should recognize that these are not TTH bugs. Font-related
problems are complicated. If you don't need all the gory details, you
might want to read section 13.1 and then skip to
13.3.
13.1 Accessing Symbol Fonts: Overview
Many of the most serious difficulties of Mathematics rendering in HTML
are associated with the need for extra symbols. In addition to various
Greek letters and mathematical operators, one needs access to the
glyphs used to build up from parts the large brackets matching the
height of built-up fractions. These symbols are almost universally
present on systems with graphical browsers, which all have a
"Symbol" font, generally based on that made freely available by
Adobe. The problem lies in accessing the font because of
shortcomings in the browsers and the HTML standards that relate to
font use.
In brief, there are three ways to access the symbol fonts; these will
be described in more detail below. The following table indicates which of
these approaches to accessing the symbol fonts works with which
browser. It also outlines which of the mathematics rendering
improvements via CSS positioning are satisfactory.
| Symbol Encoding | CSS Positioning |
| 8-bit numeric | Adobe Private | Unicode 3.2 | relative | height
compress |
TTH switch | -u0 | -u1 | -u2 | -y2 | -y1 |
Browser: | | | | | |
MSIE 5.0 | Yes | No | No | Yes | Buggy |
Mozilla 1.x X | Alias/Font | Buggy | Buggy | Yes | Yes |
Firefox 1.x X | Alias/Font | Buggy | Buggy | Yes | Yes |
Firefox 1.x Win | Yes | Buggy | Buggy | Yes | Yes |
Konqueror 1.9.8 | Alias | No | No | Yes | Yes |
Firefox 3.5 X | No | Buggy | Ugly | Yes | Yes |
Chrome 4.0 X | No | Buggy | Ugly | Yes | Yes |
Firefox 3.5 Win | Yes | No | Buggy | Yes | Yes |
MSIE 8.0 Win | Yes | No | Ugly | Yes | Yes |
This situation is painful. The 8-bit numeric style symbol access
method, which was the approach originally pioneered by TTH, used to
work with a significant number of browsers but needed additional font
settings for X-window systems. This is the approach that TTH used
to use
by default. However Mozilla and Firefox have systematically moved
towards disabling this method under linux and OSX, presumably because
they consider it not standards-compliant. They have not properly
implemented the unicode 3.2 alternative, because the glyphs they use
for built-up delimiters are incorrectly sized and leave ugly gaps. In
some cases the spacing is completely erroneous. One is left with the
choice between the traditional 8-bit approach, which works well with all
MSWindows systems up to Vista, but does not work with most recent
X-based operating systems; or Unicode 3.2 which works with most
browsers, but is badly buggy in Windows Firefox and ugly everywhere.
In the interests of an eventual rationalization of this situation, TtH
has changed to make the Unicode 3.2 coding its default from the 2010
version 3.87 on, but this by no means universally satisfactory.
13.2 Accessing Symbol Fonts: Details
Prior to HTML4.0, that is, during the major phase of the evolution of
HTML, the default encoding for HTML documents was ISO-8859-1
(sometimes called ISO Latin-1). The document encoding defines a
mapping between the bytes of the file itself and characters. The
HTML4.0 standard draws a strict (but often confused) distinction
between the document "character set", sometimes referred to more
recently as the character "repertoire"(which refers to all the
characters that might be used in it) and the "document encoding"
(which encodes a subset of the character set by mapping them to
bytes). The confusion is compounded by the entrenched usage of the term
"charset" to refer to the "document encoding" (not the character
set). This usage is presumably a reflection of the prior lack of any
significant distinction between the two.
Purists since the adoption of HMTL4.0 regard the selection of a glyph
as governed by the process: (byte) code →glyph-name → font-glyph. In this view, even
though the font contains the glyphs in a well defined order, the
glyph is accessed not by its position in the font but by its name. For
example, in a document with ISO-8859-1 encoding, the byte with decimal
value 97 maps to the "latin small letter a" which is accessed from
the font on that basis. On this view, it is not possible, or rather
ought not to be possible, to access the Greek letter alpha by
specifying that the font is Symbol and the byte coding decimal value
is 97, despite the fact that the Greek alpha is indeed in the same
position in the Symbol font as the lower case a in its font. This is
because (the story goes) 97 means "latin small letter a" and the
Symbol font simply does not contain the latin small letter a.
In practice, of course, most browsers, including Internet Explorer (to
8.x), have not taken so pedantic an approach. In a document that is
encoded in the same order as the fonts on the system, as is the case
for ISO-8859 on systems other than the (old) MacIntosh, the browser maps
code to glyph directly on the basis of numeric position in the
font. Therefore it is perfectly sensible to specify eight-bit code 97
and Symbol font to obtain alpha. In other words, the browsers treat
the Symbol font as if it were an ISO-8859 font even though, as far as
the glyph names are concerned, it is not. It can be argued, even
within the world-view of standards lawyers, that a document that does
not explicitly specify its encoding (and TTH documents do not) could
be considered to obey its own font encoding or some unspecified
encoding, in which case, bytes ought to be permitted to refer directly
to numeric font positions, in just this fashion, regardless of whether
the font is identified as ISO 8859. But such arguments are usually a
waste of breath. In any case, recent versions of Mozilla and its
derivatives on the Windows operating system will properly render
symbols provided they are told that the DOCTYPE is HTML 4.0, not HTML
4.01. This is the reason why TTH has reverted to giving its
documents this rather out of date DOCTYPE.
On the X-windows system, a distinction between fonts is provided
directly in the system via the font naming conventions. Mozilla takes
notice of this font allocation by permitting access only to fonts
whose names end 8859-1, for default encoded documents. The symbol font
is not one of those fonts unless additional steps are taken. The
enabling of the symbol font requires specification of some system font
aliases, or installation of a specially encoded Symbol font, which
then ensures that the Symbol font is treated as if it were ISO-8859-1
encoded. Notice that this type of problem arises for any document that
wants to access more than one language of font. Thus, any document
desiring a mixture of, for example, western and cyrilic characters
would face the same problem.
To summarise, the symbol font is present on practically every computer
on the planet that runs a graphical browser. Under the MSWindows
operating system, IE to version 8.x, and Mozilla (gecko)-based
browsers treat the symbol font as if it were a numerically encoded
font and compatible with ISO 8859-1 encoding, provided the DOCTYPE is
HTML 4.0 Transitional. Treating the font as such enables the glyphs to be
accessed using either eight-bit codes in just the same way as standard
ASCII characters. This is the way that documents have accessed these
glyphs for years.
The HTML4.01 standard says that unicode (ISO 10646, also called UCS) is
the character set of HTML, and that the way characters outside the
current document encoding should be accessed is through unicode
points. Unicode is backwardly compatible with ISO 8859-1 in a way that
we need not dwell on. Unicode is supposed to fix all the font problems
that are described here, and with luck eventually it will indeed
help. The problem is that (1) Unicode is enormous, so only a tiny
fraction of it is so far supported, and (2) in its original incarnation
unicode does not even assign points to the parts of large delimiters
that are needed for mathematics. They are present in the new
version of unicode, 3.2, becoming current. However, as the
table above shows, no browser cleanly supports the new unicode
assignments. Mozilla used to support some assignments of points in
unicode's designated "private usage area" to the glyphs we
need. Apparently these assignments have become de-facto standards for
the Adobe Symbol font in typographic circles. No other browser
supports them. They are not and, according to unicode principles,
never will be part of the unicode standard, and appear to be on the
way out.
The option that mathematics web publishing currently has, then, is
either an approach that works with Windows browsers but which purists
say is not consistent with latest standards, or a representation that
is consistent with the standard but useless with some browsers. It
would be really nice if the browsers would get their act together on
mathematical symbols.
13.3 Printing
In many browsers, the printing fonts are hard coded into the browser
and the font-changing commands are ignored when printing. For that
reason, visitors viewing TTH documents will often not be able to
print readable versions of documents with lots of mathematics. This
problem could, and should, be fixed in the browsers. However, if you
want your readers to be able to print a high-quality paper copy of the
file, then you probably want to make available to them either the
TEX source or a common page-description format such as Postscript or
PDF. Since HTML documents download and display so much faster and
better than these other formats on the screen, TTH's translation
provides the natural medium for people to browse, but not
necessarily the best medium for paper production.
13.4 Netscape/Mozilla Composer
Netscape Composer and Mozilla Composer is
too clever for its own good. If you run an HTML document produced by TTH
through Netscape Composer, all sorts of internal translations are
performed that are detrimental to its eventual display. For example,
if you subsequently save the document with the usual encoding set
(Western), the eightbit codes that work with Macs are replaced with
HTML4.0 entities such as [&]ograve; or [&]pound;. This effectively
breaks the document for viewing on Macs because it undoes everything
just explained. Even if you use User-Defined encoding, which prevents
this particular substitution, Composer will rearrange the document in
various ways that it thinks are better, but that make the display of
the document worse. The moral is, don't run TTH documents through
Netscape Composer.
You therefore cannot use the "publish" facility
of Composer. Transfering the document to the server with plain old ftp
will keep it away from Composer's clutches.
13.5 Other Browser Bugs
Font changing commands do not propagate from cell to cell of HTML
tables. In rendering equations (using tables) TTH circumvents this
bug (excuse me, feature) at the cost of significant extra effort and
slightly verbose HTML. However, for tables generated by
\halign or \begin{tabular} TTH takes no special steps
to avoid this problem. A change of font face in a cell, for example by
\it will not carry over to the next cell. A document
containing this problem will not pass some HTML validations. It is
prevented if every cell of a TEX table is enclosed in braces and the
required style applied separately to every cell - a serious
annoyance.
Tables are incapable of being properly embedded within a line of text.
They generally force a new line. This is quite a significant handicap
when translating in-line material that could use a table. It can be
argued that this behaviour is required by the HTML
standard. Specifically, the <p> element is defined as having
in-line attributes which prevent it from containing any elements
defined as being block type, of which <table> or
actually strictly <td> is one. However, even if you ensure that
text is not inside a <p>, most browsers force a new line.
13.6 Web server problems
The HTML files that TTH produces are encoded using the charset
ISO-8859-1, like most web files. In newer linux systems the default
file encoding on the computer is in many cases now UTF-8. For the
characters with codes above 128, this can cause problems with the web
server. The web server may wrongly assume that the HTML file is a
UTF-8-encoded file, and declare this assumption in the http content-type
header that it sends to browsers when they access the file. For
gecko-based browsers, the http content-type declaration overrides any
internal file declaration of the encoding of the file. Consequently,
the browser treats this file as if it is UTF-8 encoded, with the
result that codes higher than 128 are misinterpreted. This is an
inadequacy in the web server (apache is known to behave this way in
some situations).
There are several options to work around this problem.
It is possible to convert all files from ISO-8859-1 to UTF-8 encoding,
using a utility called iconv, present on most modern linux
installations. This is not an attractive solution because then when
the files are browsed locally (via file://...) they will display
incorrectly. Locally, the browser does not have the http content-type
declaration to guide (or misguide) it, and it thinks the files are
ISO-8859-1 encoded. But if they've been converted, they are not.
The better approach seems to be to fix the web server so that it gets
the file content-type right. This can be done on a per-directory basis
by creating a file called .htaccess in the directory. This file
should contain the line:
AddType text/html;charset=ISO-8859-1 html
This tells the server that all files in this directory and its
subdirectories that have extension html are to be considered of
type HTML and encoded with the ISO-8859-1 charset.
Unfortunately some web servers are configured not to pay attention to
the .htaccess file. If yours is one, you have to get the web
master to edit the server configuration file
(/etc/httpd/conf/httpd.conf). The lines that read
AllowOverride None must read instead
AllowOverride FileInfo. Alternatively, get the webmaster to
change the line in that configuration file that reads
AddDefaultCharset UTF-8 to read instead
AddDefaultCharset ISO-8859-1
and once the server is restarted all your troubles will be over
without any of those pesky .htaccess files.
There are other ways of accomplishing the same thing in the web
server, if you are a guru. Information is available at
the
W3C FAQ.