In DSSSL, the input characters are normalized
into a sequence of characters that each represents a specific
meaning regardless of how it was originally encoded as a
single character, as multiple characters in a particular character
set, or as an entity reference. Each DSSSL specification defines a
single character repertoire. The character repertoire shall include
all
characters used in the DSSSL specification, in the source groves, and in
the flow object tree; therefore, only these characters may be used.
The declaration of each
character also includes a set of properties that may be significant in
the formatting process, for example, that the character represents a
word space.
The DSSSL specification, which may have been encoded using a
different coded character set than the source document, is also
translated into a sequence of characters belonging to the same
repertoire as the characters used in the DSSSL trees. All
comparisons, such as matching an element name, are performed by
comparing these characters rather than using the coded characters of
the original SGML document.
A sequence of characters in the input grove may be manipulated by a
transformation process into another sequence under the control of a
character-to-character map. This technique is typically used when
parts of the source document contain transliterated text.
The characters in the input grove to the formatter are transformed
into glyph identifiers during the formatting process. The
transformation is controlled by character-to-glyph and
ligature-to-glyph maps in which one or more characters are mapped into
one or more glyph identifiers. The map to be used is not fixed for a
document, but is expressed as a formatting characteristic that may be
specified for an area or for a portion of the input grove. Ligatures
are specified by mapping more than one character to a single glyph.
Additional properties specify the font to be used. This
information, together with the glyph identifier, selects an actual
shape to be used in rendering. Hyphenation points are determined based
on the characters, but width calculations are based on the metrics of
the actual rendering shapes (i.e., based on the glyphs).