Handling of the SGML declaration in SP

Extended Naming Rules

SP supports the Extended Naming Rules as specified in Annex J of ISO 8879:1986 (added by the 1996 technical corrigendum).

Default SGML declaration

If the SGML declaration is omitted and there is no applicable SGMLDECL entry in a catalog, the following declaration will be implied:

		    <!SGML "ISO 8879:1986"
			    CHARSET
BASESET  "ISO 646-1983//CHARSET
	  International Reference Version (IRV)//ESC 2/5 4/0"
DESCSET    0  9 UNUSED
	   9  2  9
	  11  2 UNUSED
	  13  1 13
	  14 18 UNUSED
	  32 95 32
	 127  1 UNUSED
CAPACITY PUBLIC    "ISO 8879:1986//CAPACITY Reference//EN"
SCOPE    DOCUMENT
SYNTAX
SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
	 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 255
BASESET  "ISO 646-1983//CHARSET International Reference Version
	  (IRV)//ESC 2/5 4/0"
DESCSET  0 128 0
FUNCTION RE                    13
	 RS                    10
	 SPACE                 32
	 TAB       SEPCHAR     9
NAMING   LCNMSTRT  ""
	 UCNMSTRT  ""
	 LCNMCHAR  "-."
	 UCNMCHAR  "-."
	 NAMECASE  GENERAL     YES
		   ENTITY      NO
DELIM    GENERAL   SGMLREF
	 SHORTREF  SGMLREF
NAMES    SGMLREF
QUANTITY SGMLREF
	 ATTCNT    99999999
	 ATTSPLEN  99999999
	 DTEMPLEN  24000
	 ENTLVL    99999999
	 GRPCNT    99999999
	 GRPGTCNT  99999999
	 GRPLVL    99999999
	 LITLEN    24000
	 NAMELEN   99999999
	 PILEN     24000
	 TAGLEN    99999999
	 TAGLVL    99999999
			   FEATURES
MINIMIZE DATATAG   NO
	 OMITTAG   YES
	 RANK      YES
	 SHORTTAG  YES
LINK     SIMPLE    YES 1000
	 IMPLICIT  YES
	 EXPLICIT  YES 1
OTHER    CONCUR    NO
	 SUBDOC    YES 99999999
	 FORMAL    YES
			  APPINFO NONE>

with the exception that all characters that are neither significant nor shunned will be assigned to DATACHAR.

Character sets

A character in a base character set is described either by giving its number in a universal character set, or by specifying a minimum literal. The first 65536 character numbers in the universal character set are assumed to be the same as in Unicode 2.0 (ISO/IEC 10646). The remaining character numbers can be assigned in any way convenient.

The public identifier of a base character set can be associated with an entity that describes it by using a PUBLIC entry in the catalog entry file. The entity must be a fragment of an SGML declaration consisting of the portion of a character set description, following the DESCSET keyword, that is, it must be a sequence of character descriptions, where each character description specifies a described character number, the number of characters and either a character number in the universal character set, a minimum literal or the keyword UNUSED. Character numbers in the universal character set can be as big as 99999999.

In addition SP has built in knowledge of many character sets. These are identified using the designating sequence in the public identifier. The following designating sequences are recognized:

ESC 2/5 4/0
The full set of ISO 646 IRV. This is not a registered character set, but is recommended by ISO 8879 (clause 10.2.2.4).
ESC 2/8 4/0
G0 set of ISO 646 IRV, ISO Registration Number 2.
ESC 2/8 4/2
G0 set of ASCII, ISO Registration Number 6.
ESC 2/1 4/0
C0 set of ISO 646, ISO Registration Number 1.
ESC 2/13 4/1
G1 set of ISO 8859-1
ESC 2/13 4/2
G1 set of ISO 8859-2
ESC 2/13 4/3
G1 set of ISO 8859-3
ESC 2/13 4/4
G1 set of ISO 8859-4
ESC 2/13 4/12
G1 set of ISO 8859-5
ESC 2/13 4/7
G1 set of ISO 8859-6
ESC 2/13 4/6
G1 set of ISO 8859-7
ESC 2/13 4/8
G1 set of ISO 8859-8
ESC 2/13 4/13
G1 set of ISO 8859-9
ESC 2/8 4/10
Roman set from JIS-X-0202. JIS version of ISO 646. ISO Registration Number 14.
ESC 2/8 4/9
Katakana set from JIS X 0201. ISO Registration Number 13.
ESC 2/4 4/2
ESC 2/6 4/0 ESC 2/4 4/2
JIS X 0208-1990. ISO Registration Numbers 87 and 168.
ESC 2/4 2/8 4/4
JIS X 0212-1990. ISO Registration Number 159.
ESC 2/4 4/1
GB 2312-80. ISO Registration Number 58.
ESC 2/4 2/8 4/3
KS C 5601-1992. ISO Registration Number 149.
ESC 2/5 2/15 4/0
ESC 2/5 2/15 4/3
ESC 2/5 2/15 4/5
ISO/IEC 10646 UCS-2
ESC 2/5 2/15 4/1
ESC 2/5 2/15 4/4
ESC 2/5 2/15 4/6
ISO/IEC 10646 UCS-4

Concrete syntaxes

The public identifier for a public concrete syntax can be associated with an entity that describes using a PUBLIC entry in the catalog entry file. The entity must be a fragment of an SGML declaration consisting of a concrete syntax description starting with the SHUNCHAR keyword as in an SGML declaration. The entity can also make use of the following extensions:

Capacity sets

The public identifier for a public capacity set can be associated with an entity that describes using a PUBLIC entry in the catalog entry file. The entity must be a fragment of an SGML declaration consisting of a sequence of capacity names and numbers.

James Clark
jjc@jclark.com