All Packages Class Hierarchy This Package Previous Next Index
Class com.jclark.xml.tok.Encoding
java.lang.Object
|
+----com.jclark.xml.tok.Encoding
- public abstract class Encoding
- extends Object
An Encoding
object corresponds to a possible
encoding (a mapping from characters to sequences of bytes).
It provides operations on byte arrays
that represent all or part of a parsed XML entity in that encoding.
The set of ASCII characters excluding $@\^`{}~
have a special status; these are called XML significant
characters.
This class imposes certain restrictions on an encoding:
- the encoding must be stateless;
- a single byte must not encode more than one character;
- all XML significant characters must be encoded by the same number
of bytes, and no character may be encoded by fewer bytes.
Several methods operate on byte subarrays. The subarray is specified
by a byte array buf
and two integers,
off
and end
; off
gives the index in buf
of the first byte of the subarray
and end
gives the
index in buf
of the byte immediately after the last byte.
Use the getInitialEncoding
method to get an
Encoding
object to use to start parsing an entity.
The main operations provided by Encoding
are
tokenizeProlog
, tokenizeContent
and
tokenizeCdataSection
;
these are used to divide up an XML entity into tokens.
tokenizeProlog
is used for the prolog of an XML document
as well as for the external subset and parameter entities (except
when referenced in an EntityValue
);
it can also be used for parsing the Misc
* that follows
the document element.
tokenizeContent
is used for the document element and for
parsed general entities that are referenced in content
except for CDATA sections.
tokenizeCdataSection
is used for CDATA sections, following
the <![CDATA[
up to and including the ]]>
.
tokenizeAttributeValue
and tokenizeEntityValue
are used to further divide up tokens returned by tokenizeProlog
and tokenizeContent
; they are also used to divide up entities
referenced in attribute values or entity values.
-
TOK_ATTRIBUTE_VALUE_S
- Represents a white space character in an attribute value,
excluding white space characters that are part of line boundaries.
-
TOK_CDATA_SECT_CLOSE
- Represents the end of a CDATA section
]]>
.
-
TOK_CDATA_SECT_OPEN
- Represents the start of a CDATA section
<![CDATA[
.
-
TOK_CHAR_PAIR_REF
- Represents a numeric character reference (decimal or hexadecimal),
when the referenced character is greater than 0xFFFF and so is
represented by a pair of chars.
-
TOK_CHAR_REF
- Represents a numeric character reference (decimal or hexadecimal),
when the referenced character is less than or equal to 0xFFFF
and so is represented by a single char.
-
TOK_CLOSE_BRACKET
- Represents
]
in the prolog.
-
TOK_CLOSE_PAREN
- Represents a
)
in the prolog that is not
followed immediately by any of
*
, +
or ?
.
-
TOK_CLOSE_PAREN_ASTERISK
- Represents
)*
in the prolog.
-
TOK_CLOSE_PAREN_PLUS
- Represents
)+
in the prolog.
-
TOK_CLOSE_PAREN_QUESTION
- Represents
)?
in the prolog.
-
TOK_COMMA
- Represents
,
in the prolog.
-
TOK_COMMENT
- Represents a comment
<!-- comment -->
.
-
TOK_COND_SECT_CLOSE
- Represents
]]>
in the prolog.
-
TOK_COND_SECT_OPEN
- Represents
<![
in the prolog.
-
TOK_DATA_CHARS
- Represents one or more characters of data.
-
TOK_DATA_NEWLINE
- Represents a newline (CR, LF or CR followed by LF) in data.
-
TOK_DECL_CLOSE
- Represents
>
in the prolog.
-
TOK_DECL_OPEN
- Represents
<!NAME
in the prolog.
-
TOK_EMPTY_ELEMENT_NO_ATTS
- Represents an empty element tag
<name/>
,
that doesn't have any attribute specifications.
-
TOK_EMPTY_ELEMENT_WITH_ATTS
- Represents an empty element tag
<name att="val"/>
,
that contains one or more attribute specifications.
-
TOK_END_TAG
- Represents a complete end-tag
</name>
.
-
TOK_ENTITY_REF
- Represents a general entity reference.
-
TOK_LITERAL
- Represents a literal (EntityValue, AttValue, SystemLiteral or
PubidLiteral).
-
TOK_MAGIC_ENTITY_REF
- Represents a general entity reference to a one of the 5 predefined
entities
amp
, lt
, gt
,
quot
, apos
.
-
TOK_NAME
- Represents a name in the prolog.
-
TOK_NAME_ASTERISK
- Represents a name followed immediately by
*
.
-
TOK_NAME_PLUS
- Represents a name followed immediately by
+
.
-
TOK_NAME_QUESTION
- Represents a name followed immediately by
?
.
-
TOK_NMTOKEN
- Represents a name token in the prolog that is not a name.
-
TOK_OPEN_BRACKET
- Represents
[
in the prolog.
-
TOK_OPEN_PAREN
- Represents a
(
in the prolog.
-
TOK_OR
- Represents
|
in the prolog.
-
TOK_PARAM_ENTITY_REF
- Represents a parameter entity reference in the prolog.
-
TOK_PERCENT
- Represents a
%
in the prolog that does not start
a parameter entity reference.
-
TOK_PI
- Represents a processing instruction.
-
TOK_POUND_NAME
- Represents
#NAME
in the prolog.
-
TOK_PROLOG_S
- Represents whitespace in the prolog.
-
TOK_START_TAG_NO_ATTS
- Represents a complete start-tag
<name>
,
that doesn't have any attribute specifications.
-
TOK_START_TAG_WITH_ATTS
- Represents a complete start-tag
<name att="val">
,
that contains one or more attribute specifications.
-
TOK_XML_DECL
- Represents an XML declaration or text declaration (a processing
instruction whose target is
xml
).
-
convert(byte[], int, int, char[], int)
- Convert bytes to characters.
-
getEncoding(String)
- Returns an
Encoding
corresponding to
the specified IANA character set name.
-
getFixedBytesPerChar()
- Returns the number of bytes required to represent each
char
,
or zero if different char
s are represented by different
numbers of bytes.
-
getInitialEncoding(byte[], int, int, Token)
- Returns an encoding object to be used to start parsing an external entity.
-
getInternalEncoding()
- Returns an
Encoding
object for use with internal entities.
-
getMinBytesPerChar()
- Returns the minimum number of bytes required to represent a single
character in this encoding.
-
getPublicId(byte[], int, int)
- Checks that a literal contained in the specified byte subarray
is a legal public identifier and returns a string with
the normalized content of the public id.
-
getSingleByteEncoding(String)
- Returns an
Encoding
for entities encoded with
a single-byte encoding (an encoding in which each byte represents
exactly one character).
-
matchesXMLString(byte[], int, int, String)
- Returns true if the specified byte subarray is equal to the string.
-
movePosition(byte[], int, int, Position)
- Moves a position forward.
-
skipIgnoreSect(byte[], int, int)
- Skips over an ignored conditional section.
-
skipS(byte[], int, int)
- Skips over XML whitespace characters at the start of the specified
subarray.
-
tokenizeAttributeValue(byte[], int, int, Token)
- Scans the first token of a byte subarrary that contains part of
literal attribute value.
-
tokenizeCdataSection(byte[], int, int, Token)
- Scans the first token of a byte subarrary that starts with the
content of a CDATA section.
-
tokenizeContent(byte[], int, int, ContentToken)
- Scans the first token of a byte subarrary that contains content.
-
tokenizeEntityValue(byte[], int, int, Token)
- Scans the first token of a byte subarrary that contains part of
literal entity value.
-
tokenizeProlog(byte[], int, int, Token)
- Scans the first token of a byte subarray that contains part of a
prolog.
TOK_DATA_CHARS
public static final int TOK_DATA_CHARS
- Represents one or more characters of data.
TOK_DATA_NEWLINE
public static final int TOK_DATA_NEWLINE
- Represents a newline (CR, LF or CR followed by LF) in data.
TOK_START_TAG_NO_ATTS
public static final int TOK_START_TAG_NO_ATTS
- Represents a complete start-tag
<name>
,
that doesn't have any attribute specifications.
TOK_START_TAG_WITH_ATTS
public static final int TOK_START_TAG_WITH_ATTS
- Represents a complete start-tag
<name att="val">
,
that contains one or more attribute specifications.
TOK_EMPTY_ELEMENT_NO_ATTS
public static final int TOK_EMPTY_ELEMENT_NO_ATTS
- Represents an empty element tag
<name/>
,
that doesn't have any attribute specifications.
TOK_EMPTY_ELEMENT_WITH_ATTS
public static final int TOK_EMPTY_ELEMENT_WITH_ATTS
- Represents an empty element tag
<name att="val"/>
,
that contains one or more attribute specifications.
TOK_END_TAG
public static final int TOK_END_TAG
- Represents a complete end-tag
</name>
.
TOK_CDATA_SECT_OPEN
public static final int TOK_CDATA_SECT_OPEN
- Represents the start of a CDATA section
<![CDATA[
.
TOK_CDATA_SECT_CLOSE
public static final int TOK_CDATA_SECT_CLOSE
- Represents the end of a CDATA section
]]>
.
TOK_ENTITY_REF
public static final int TOK_ENTITY_REF
- Represents a general entity reference.
TOK_MAGIC_ENTITY_REF
public static final int TOK_MAGIC_ENTITY_REF
- Represents a general entity reference to a one of the 5 predefined
entities
amp
, lt
, gt
,
quot
, apos
.
TOK_CHAR_REF
public static final int TOK_CHAR_REF
- Represents a numeric character reference (decimal or hexadecimal),
when the referenced character is less than or equal to 0xFFFF
and so is represented by a single char.
TOK_CHAR_PAIR_REF
public static final int TOK_CHAR_PAIR_REF
- Represents a numeric character reference (decimal or hexadecimal),
when the referenced character is greater than 0xFFFF and so is
represented by a pair of chars.
TOK_PI
public static final int TOK_PI
- Represents a processing instruction.
TOK_XML_DECL
public static final int TOK_XML_DECL
- Represents an XML declaration or text declaration (a processing
instruction whose target is
xml
).
TOK_COMMENT
public static final int TOK_COMMENT
- Represents a comment
<!-- comment -->
.
This can occur both in the prolog and in content.
TOK_ATTRIBUTE_VALUE_S
public static final int TOK_ATTRIBUTE_VALUE_S
- Represents a white space character in an attribute value,
excluding white space characters that are part of line boundaries.
TOK_PARAM_ENTITY_REF
public static final int TOK_PARAM_ENTITY_REF
- Represents a parameter entity reference in the prolog.
TOK_PROLOG_S
public static final int TOK_PROLOG_S
- Represents whitespace in the prolog.
The token contains one or more whitespace characters.
TOK_DECL_OPEN
public static final int TOK_DECL_OPEN
- Represents
<!NAME
in the prolog.
TOK_DECL_CLOSE
public static final int TOK_DECL_CLOSE
- Represents
>
in the prolog.
TOK_NAME
public static final int TOK_NAME
- Represents a name in the prolog.
TOK_NMTOKEN
public static final int TOK_NMTOKEN
- Represents a name token in the prolog that is not a name.
TOK_POUND_NAME
public static final int TOK_POUND_NAME
- Represents
#NAME
in the prolog.
TOK_OR
public static final int TOK_OR
- Represents
|
in the prolog.
TOK_PERCENT
public static final int TOK_PERCENT
- Represents a
%
in the prolog that does not start
a parameter entity reference.
This can occur in an entity declaration.
TOK_OPEN_PAREN
public static final int TOK_OPEN_PAREN
- Represents a
(
in the prolog.
TOK_CLOSE_PAREN
public static final int TOK_CLOSE_PAREN
- Represents a
)
in the prolog that is not
followed immediately by any of
*
, +
or ?
.
TOK_OPEN_BRACKET
public static final int TOK_OPEN_BRACKET
- Represents
[
in the prolog.
TOK_CLOSE_BRACKET
public static final int TOK_CLOSE_BRACKET
- Represents
]
in the prolog.
TOK_LITERAL
public static final int TOK_LITERAL
- Represents a literal (EntityValue, AttValue, SystemLiteral or
PubidLiteral).
TOK_NAME_QUESTION
public static final int TOK_NAME_QUESTION
- Represents a name followed immediately by
?
.
TOK_NAME_ASTERISK
public static final int TOK_NAME_ASTERISK
- Represents a name followed immediately by
*
.
TOK_NAME_PLUS
public static final int TOK_NAME_PLUS
- Represents a name followed immediately by
+
.
TOK_COND_SECT_OPEN
public static final int TOK_COND_SECT_OPEN
- Represents
<![
in the prolog.
TOK_COND_SECT_CLOSE
public static final int TOK_COND_SECT_CLOSE
- Represents
]]>
in the prolog.
TOK_CLOSE_PAREN_QUESTION
public static final int TOK_CLOSE_PAREN_QUESTION
- Represents
)?
in the prolog.
TOK_CLOSE_PAREN_ASTERISK
public static final int TOK_CLOSE_PAREN_ASTERISK
- Represents
)*
in the prolog.
TOK_CLOSE_PAREN_PLUS
public static final int TOK_CLOSE_PAREN_PLUS
- Represents
)+
in the prolog.
TOK_COMMA
public static final int TOK_COMMA
- Represents
,
in the prolog.
convert
public abstract int convert(byte sourceBuf[],
int sourceStart,
int sourceEnd,
char targetBuf[],
int targetStart)
- Convert bytes to characters.
The bytes on
sourceBuf
between sourceStart
and sourceEnd
are converted to characters and stored
in targetBuf
starting at targetStart
.
(targetBuf.length - targetStart) * getMinBytesPerChar()
must be at greater than or equal to
sourceEnd - sourceStart
.
If getFixedBytesPerChar
returns a value greater than 0,
then the return value will be equal to
(sourceEnd - sourceStart)/getFixedBytesPerChar()
.
- Returns:
- the number of characters stored into
targetBuf
- See Also:
- getFixedBytesPerChar
getFixedBytesPerChar
public abstract int getFixedBytesPerChar()
- Returns the number of bytes required to represent each
char
,
or zero if different char
s are represented by different
numbers of bytes. The value returned will 0, 1, 2, or 4.
movePosition
public abstract void movePosition(byte buf[],
int off,
int end,
Position pos)
- Moves a position forward.
On entry,
pos
gives the position of the byte at index
off
in buf
.
On exit, it pos
will give the position of the byte at index
end
, which must be greater than or equal to off
.
The bytes between off
and end
must encode
one or more complete characters.
A carriage return followed by a line feed will be treated as a single
line delimiter provided that they are given to movePosition
together.
tokenizeCdataSection
public final int tokenizeCdataSection(byte buf[],
int off,
int end,
Token token) throws EmptyTokenException, PartialTokenException, InvalidTokenException, ExtensibleTokenException
- Scans the first token of a byte subarrary that starts with the
content of a CDATA section.
Returns one of the following integers according to the type of token
that the subarray starts with:
TOK_DATA_CHARS
TOK_DATA_NEWLINE
TOK_CDATA_SECT_CLOSE
Information about the token is stored in token
.
After TOK_CDATA_SECT_CLOSE
is returned, the application
should use tokenizeContent
.
- Throws: EmptyTokenException
- if the subarray is empty
- Throws: PartialTokenException
- if the subarray contains only part of
a legal token
- Throws: InvalidTokenException
- if the subarrary does not start
with a legal token or part of one
- Throws: ExtensibleTokenException
- if the subarray encodes just a carriage
return ('\r')
- See Also:
- TOK_DATA_CHARS, TOK_DATA_NEWLINE, TOK_CDATA_SECT_CLOSE, Token, EmptyTokenException, PartialTokenException, InvalidTokenException, ExtensibleTokenException, tokenizeContent
tokenizeContent
public final int tokenizeContent(byte buf[],
int off,
int end,
ContentToken token) throws PartialTokenException, InvalidTokenException, EmptyTokenException, ExtensibleTokenException
- Scans the first token of a byte subarrary that contains content.
Returns one of the following integers according to the type of token
that the subarray starts with:
TOK_START_TAG_NO_ATTS
TOK_START_TAG_WITH_ATTS
TOK_EMPTY_ELEMENT_NO_ATTS
TOK_EMPTY_ELEMENT_WITH_ATTS
TOK_END_TAG
TOK_DATA_CHARS
TOK_DATA_NEWLINE
TOK_CDATA_SECT_OPEN
TOK_ENTITY_REF
TOK_MAGIC_ENTITY_REF
TOK_CHAR_REF
TOK_CHAR_PAIR_REF
TOK_PI
TOK_XML_DECL
TOK_COMMENT
Information about the token is stored in token
.
When TOK_CDATA_SECT_OPEN
is returned,
tokenizeCdataSection
should be called until
it returns TOK_CDATA_SECT
.
- Throws: EmptyTokenException
- if the subarray is empty
- Throws: PartialTokenException
- if the subarray contains only part of
a legal token
- Throws: InvalidTokenException
- if the subarrary does not start
with a legal token or part of one
- Throws: ExtensibleTokenException
- if the subarray encodes just a carriage
return ('\r')
- See Also:
- TOK_START_TAG_NO_ATTS, TOK_START_TAG_WITH_ATTS, TOK_EMPTY_ELEMENT_NO_ATTS, TOK_EMPTY_ELEMENT_WITH_ATTS, TOK_END_TAG, TOK_DATA_CHARS, TOK_DATA_NEWLINE, TOK_CDATA_SECT_OPEN, TOK_ENTITY_REF, TOK_MAGIC_ENTITY_REF, TOK_CHAR_REF, TOK_CHAR_PAIR_REF, TOK_PI, TOK_XML_DECL, TOK_COMMENT, ContentToken, EmptyTokenException, PartialTokenException, InvalidTokenException, ExtensibleTokenException, tokenizeCdataSection
getInitialEncoding
public static final Encoding getInitialEncoding(byte buf[],
int off,
int end,
Token token)
- Returns an encoding object to be used to start parsing an external entity.
The encoding is chosen based on the initial 4 bytes of the entity.
- Parameters:
- buf - the byte array containing the initial bytes of the entity
- off - the index in
buf
of the first byte of the entity
- end - the index in
buf
following the last available
byte of the entity; end - off
must be greater than or equal
to 4 unless the entity has fewer that 4 bytes, in which case it must
be equal to the length of the entity
- token - receives information about the presence of a byte order
mark; if the entity starts with a byte order mark
then
token.getTokenEnd()
will return off + 2
, otherwise it will return
off
- See Also:
- TextDecl, XmlDecl, TOK_XML_DECL, getEncoding, getInternalEncoding
getEncoding
public final Encoding getEncoding(String name)
- Returns an
Encoding
corresponding to
the specified IANA character set name.
Returns this Encoding
if the name is null.
Returns null if the specified encoding is not supported.
Note that there are two distinct Encoding
objects
associated with the name UTF-16
, one for
each possible byte order; if this Encoding
is UTF-16 with little-endian byte ordering, then
getEncoding("UTF-16")
will return this,
otherwise it will return an Encoding
for
UTF-16 with big-endian byte ordering.
- Parameters:
- name - a string specifying the IANA name of the encoding; this is
case insensitive
getSingleByteEncoding
public final Encoding getSingleByteEncoding(String map)
- Returns an
Encoding
for entities encoded with
a single-byte encoding (an encoding in which each byte represents
exactly one character).
- Parameters:
- map - a string specifying the character represented by each byte;
the string must have a length of 256;
map.charAt(b)
specifies the character encoded by byte b
; bytes that do
not represent any character should be mapped to ?
getInternalEncoding
public static final Encoding getInternalEncoding()
- Returns an
Encoding
object for use with internal entities.
This is a UTF-16 big endian encoding, except that newlines
are assumed to have been normalized into line feed,
so carriage return is treated like a space.
tokenizeProlog
public final int tokenizeProlog(byte buf[],
int off,
int end,
Token token) throws PartialTokenException, InvalidTokenException, EmptyTokenException, ExtensibleTokenException, EndOfPrologException
- Scans the first token of a byte subarray that contains part of a
prolog.
Returns one of the following integers according to the type of token
that the subarray starts with:
TOK_PI
TOK_XML_DECL
TOK_COMMENT
TOK_PARAM_ENTITY_REF
TOK_PROLOG_S
TOK_DECL_OPEN
TOK_DECL_CLOSE
TOK_NAME
TOK_NMTOKEN
TOK_POUND_NAME
TOK_OR
TOK_PERCENT
TOK_OPEN_PAREN
TOK_CLOSE_PAREN
TOK_OPEN_BRACKET
TOK_CLOSE_BRACKET
TOK_LITERAL
TOK_NAME_QUESTION
TOK_NAME_ASTERISK
TOK_NAME_PLUS
TOK_COND_SECT_OPEN
TOK_COND_SECT_CLOSE
TOK_CLOSE_PAREN_QUESTION
TOK_CLOSE_PAREN_ASTERISK
TOK_CLOSE_PAREN_PLUS
TOK_COMMA
- Throws: EmptyTokenException
- if the subarray is empty
- Throws: PartialTokenException
- if the subarray contains only part of
a legal token
- Throws: InvalidTokenException
- if the subarrary does not start
with a legal token or part of one
- Throws: EndOfPrologException
- if the subarray starts with the document
element;
tokenizeContent
should be used on the remainder
of the entity
- Throws: ExtensibleTokenException
- if the subarray is a legal token
but subsequent bytes in the same entity could be part of the token
- See Also:
- TOK_PI, TOK_XML_DECL, TOK_COMMENT, TOK_PARAM_ENTITY_REF, TOK_PROLOG_S, TOK_DECL_OPEN, TOK_DECL_CLOSE, TOK_NAME, TOK_NMTOKEN, TOK_POUND_NAME, TOK_OR, TOK_PERCENT, TOK_OPEN_PAREN, TOK_CLOSE_PAREN, TOK_OPEN_BRACKET, TOK_CLOSE_BRACKET, TOK_LITERAL, TOK_NAME_QUESTION, TOK_NAME_ASTERISK, TOK_NAME_PLUS, TOK_COND_SECT_OPEN, TOK_COND_SECT_CLOSE, TOK_CLOSE_PAREN_QUESTION, TOK_CLOSE_PAREN_ASTERISK, TOK_CLOSE_PAREN_PLUS, TOK_COMMA, ContentToken, EmptyTokenException, PartialTokenException, InvalidTokenException, ExtensibleTokenException, EndOfPrologException
tokenizeAttributeValue
public final int tokenizeAttributeValue(byte buf[],
int off,
int end,
Token token) throws PartialTokenException, InvalidTokenException, EmptyTokenException, ExtensibleTokenException
- Scans the first token of a byte subarrary that contains part of
literal attribute value. The opening and closing delimiters
are not included in the subarrary.
Returns one of the following integers according to the type of
token that the subarray starts with:
TOK_DATA_CHARS
TOK_DATA_NEWLINE
TOK_ATTRIBUTE_VALUE_S
TOK_MAGIC_ENTITY_REF
TOK_ENTITY_REF
TOK_CHAR_REF
TOK_CHAR_PAIR_REF
- Throws: EmptyTokenException
- if the subarray is empty
- Throws: PartialTokenException
- if the subarray contains only part of
a legal token
- Throws: InvalidTokenException
- if the subarrary does not start
with a legal token or part of one
- Throws: ExtensibleTokenException
- if the subarray encodes just a carriage
return ('\r')
- See Also:
- TOK_DATA_CHARS, TOK_DATA_NEWLINE, TOK_ATTRIBUTE_VALUE_S, TOK_MAGIC_ENTITY_REF, TOK_ENTITY_REF, TOK_CHAR_REF, TOK_CHAR_PAIR_REF, Token, EmptyTokenException, PartialTokenException, InvalidTokenException, ExtensibleTokenException
tokenizeEntityValue
public final int tokenizeEntityValue(byte buf[],
int off,
int end,
Token token) throws PartialTokenException, InvalidTokenException, EmptyTokenException, ExtensibleTokenException
- Scans the first token of a byte subarrary that contains part of
literal entity value. The opening and closing delimiters
are not included in the subarrary.
Returns one of the following integers according to the type of
token that the subarray starts with:
TOK_DATA_CHARS
TOK_DATA_NEWLINE
TOK_PARAM_ENTITY_REF
TOK_MAGIC_ENTITY_REF
TOK_ENTITY_REF
TOK_CHAR_REF
TOK_CHAR_PAIR_REF
- Throws: EmptyTokenException
- if the subarray is empty
- Throws: PartialTokenException
- if the subarray contains only part of
a legal token
- Throws: InvalidTokenException
- if the subarrary does not start
with a legal token or part of one
- Throws: ExtensibleTokenException
- if the subarray encodes just a carriage
return ('\r')
- See Also:
- TOK_DATA_CHARS, TOK_DATA_NEWLINE, TOK_MAGIC_ENTITY_REF, TOK_ENTITY_REF, TOK_PARAM_ENTITY_REF, TOK_CHAR_REF, TOK_CHAR_PAIR_REF, Token, EmptyTokenException, PartialTokenException, InvalidTokenException, ExtensibleTokenException
skipIgnoreSect
public final int skipIgnoreSect(byte buf[],
int off,
int end) throws PartialTokenException, InvalidTokenException
- Skips over an ignored conditional section.
The subarray starts following the
<![ IGNORE [
.
- Returns:
- the index of the character following the closing
]]>
- Throws: PartialTokenException
- if the subarray does not contain the
complete ignored conditional section
- Throws: InvalidTokenException
- if the ignored conditional section
contains illegal characters
getPublicId
public final String getPublicId(byte buf[],
int off,
int end) throws InvalidTokenException
- Checks that a literal contained in the specified byte subarray
is a legal public identifier and returns a string with
the normalized content of the public id.
The subarray includes the opening and closing quotes.
- Throws: InvalidTokenException
- if it is not a legal public identifier
matchesXMLString
public final boolean matchesXMLString(byte buf[],
int off,
int end,
String str)
- Returns true if the specified byte subarray is equal to the string.
The string must contain only XML significant characters.
skipS
public final int skipS(byte buf[],
int off,
int end)
- Skips over XML whitespace characters at the start of the specified
subarray.
- Returns:
- the index of the first non-whitespace character,
end
if there is the subarray is all whitespace
getMinBytesPerChar
public final int getMinBytesPerChar()
- Returns the minimum number of bytes required to represent a single
character in this encoding. The value will be 1, 2 or 4.
All Packages Class Hierarchy This Package Previous Next Index