Currently XSLT and XQuery use very similar syntaxes for element and attribute construction. For example,
<p>This is a <a href="{@ref}">link</a>.</p>
works the same in both XSLT and XQuery. However, there are numerous minor differences:
xsl:element
.<
and &
are subject
to the usual XML rules. In particular, in XSLT when <
is used in a comparison operator it needs to be escaped as
<
. In XQuery, parts of the query are processed
using an XML-like parse, and parts are processed using an XPath-like
parse; the XPath-like parsing does not operate on the results of the
XML-like parsing. So you can use "<" as a comparison operator
without escaping it. In XSLT, character references are recognized
uniformly as in XSLT; in XQuery, they are recognized only in a few
places.I believe that many users will need to work with both XSLT and XQuery, and these numerous subtle differences will be extremely confusing to such users. I believe we should take the best aspects of the XSLT and XQuery element construction syntax and unify them into a single syntax to be used by both XSLT 2.0 and XQuery 1.0.
The XQuery and the XSLT syntaxes each have their advantages.
XQuery is more human-readable. For example, many users will prefer:
if ($x = 1) then <foo/> else <bar/>
to
<xsl:choose> <xsl:when test="$x = 1"> <foo/> </xsl:when> <xsl:otherwise> <bar/> </xsl:otherwise> </xsl:choose>
XQuery has better composability. Element constructors are expressions that can be embedded directly in other expressions. With the XSLT syntax (as in XSLT 1.1), an element constructor has to be used to define a variable, which can then be referenced inside a expression.
XSLT is well-formed XML. Moreover, it is well-formed in a non-trivial way. You don't have to wrap everything in a CDATA section. Element constructors are recognized as elements by the XML parse. This has lots of benefits:
It is easiest to describe the proposed solution in terms of a change from the current XQuery syntax. The fundamental idea is that the XQuery parser would operate on an XML infoset rather than on a string. The XML infoset might be produced by an XML parser but it could be produced by other means, for example by construction of a DOM or even by the evaluation of a query. Since XQueries may appear embedded in XML documents as attributes or element content, we do not want to restrict an XQuery to be a single document info item or element info item. Instead, we specify that an XQuery is represented as a sequence of element info items and characters. Call such a sequence a parseable sequence.
The basic parsing model would be as follows. To parse a parseable sequence, the parser first processes each element info item. The processing of an element info item will be explained in more detail below. At this stage, what's important is that the processing of an element info item results in an expression. Call these expressions element expressions. By processing the element info items, the parseable sequence is turned into a sequence of characters interspersed with element expressions. In order to parse this sequence we treat each element expression as an additional terminal symbol (a special token); in effect we are parsing a string over on alphabet consisting of Unicode plus one additional symbol. To avoid confusion I will call such a string an xstring.
Note that XML does not allow all Unicode characters to appear in XML documents, not even by using character references. Thus, an implementation can use one of the Unicode characters disallowed by XML to represent an ElementExpression token; thus an implementation can represent an xstring by a Unicode string together with a parallel array of expressions. This will allow standard grammar tools that operate on Unicode strings such as JavaCC to be continue to be used.
There are two modes for parsing an xstring: literal mode and expression mode. Each mode is specified by a grammar in which ElementExpression can appear as a terminal. The grammar for expression mode would be the same as the grammar for ExprSequence in the current XQuery grammar, except that grammar for ElementConstructor would be replaced by the terminal for ElementExpression. The grammar for literal mode would be similar to the ElementContent production in the current XQuery grammar, but much simpler because it would be operating on the results of an XML parser.
LiteralModeContent ::= (NonBraceChar | EnclosedExpr | QuotedBrace)* QuotedBrace = '{{' | '}}' EnclosedExpr = '{' ExprSequence '}' NonBraceChar = [^{}]
Now we can explain how an element info item is processed. The processing of an element info item depends on its namespace URI and local name. If the namespace URI is not the XQuery (or XSLT) namespace, then the element info item represents an element constructor. To process an element constructor, each attribute value is treated as a sequence of characters and parsed in literal mode. The children of the element info item are turned into a parseable sequence by ignoring info items other than character and element info items; this parseable sequence is also parsed in literal mode. The expressions for the attributes and children are combined into an element constructor expression. This element constructor expression is the result of processing the element info item.
The XQuery namespace would contain at least a top-level element to contain the query. For example,
<xq:expr xmlns:xq="http://www.w3.org/2001/XML/Query"> //order[@id = 'xyzzy'] </xq:expr>
To process an xq:expr
element info item, the children
are turned into a parseable sequence by ignoring comments and
processing instructions (as with element constructors); the parseable
sequence is then parsed in expression mode.
Namespace bindings to be used for interpreting the XQuery would
naturally be expressed using namespace declarations on the
xq:expr
element (similarly to XSLT and XML Schema).
<xq:expr xmlns:xq="http://www.w3.org/2001/XML/Query" xmlns:eg="http://www.example.com"> //eg:order[@id = 'xyzzy'] </xq:expr>
It might be desirable to have a slightly mode elaborate wrapper around the expression in order to accomodate other declarations. For example,
<xq:query xmlns:xq="http://www.w3.org/2001/XML/Query"> <xsd:complexType name="foo"> ... </xsd:complexType> <xq:functions> function bar() { ... } </xq:functions> <xq:expr> bar(//order[@id='xyzzy']) </xq:expr> </xq:query>
The xq:expr
would not be needed in all circumstances.
For example, XSLT might allow an XQuery expression to be used as the
value of the select
attribute of
xsl:apply-templates
. In this context, the XQuery would
be a parseable sequence containing only characters and so would not be
able to contain literal element constructors.
As far as the composable syntax of XQuery is concerned, the changes from a user's perspective relative to the current syntax are not large.
The current XQuery syntax for constructing elements with names
specified by expressions would have to change since it is not
well-formed XML. One possibility is to use a xq:element
element. This could have a name
attribute whose value
would be interpreted in expression mode as an expression returning a
QName; the content would be interpreted in literal mode.
<xq:element name="$x">The value of $x is {$x}.</xq:element>
Alternatively, there could be a element
function that
constructed an element. In this case, it would probably be convenient
to have an xq:content
element which parsed its content in
literal mode. For example.
element($x, <xq:content>The value of $x is {$x}.</xq:content>)
There would be a similar issue for constructing attributes with names specified by expressions.
The rules for quoting would be different, and would use the
standard XML rules rather than XQuery-specific rules. Character
references would be recognized in more places. <
and
&
characters that are not part of tags or references
would need to be escaped. In particular, <
in
comparison operators would need to be written as
<
.
The major change from an XSLT perspective is that {}
would be recognized in templates (that is, within the content of
elements that allow characters, instructions and literal result
elements as children) as enclosing an expression, just as it now is in
attribute content. There could be an attribute on
xsl:stylesheet
to turn this off. Literal result elements
and XSLT expressions would be allowed inside curly braces and would be
treated as ElementExpression tokens. For example,
<xsl:template name="foo">{ if (count(author) == 1) then <name>{author}</name> else <nameList> <xsl:for-each select="author"><name>{.}</name></xsl:for-each> </nameList> }</xsl:template>
This implies that the semantics of XSLT instruction execution must be explained in a similar way to XPath expression evaluation. For any XSLT instruction, the semantics must specify the expression language object to which it evaluates, in terms of the expression language objects produced by subinstructions/subexpressions. This was not possible in XSLT 1.0 because the expression language lacked sequences. This implies a complete rewrite of the XSLT spec, but it should result in a much more rigorously defined language.
With expressions being allowed to contain elements, some additional
changes become natural. In particular, it would be desirable to have
an xsl:expr
element whose content is an expression, and
whose semantics are that it evaluates to result of evaluating its
content. For the select
attribute of
xsl:for-each
and xsl:apply-templates
it
would be natural to allow the expression to be specified alternatively
in a select
child element, and similarly for the
test
element of xsl:when
and
xsl:test
. Thus xsl:choose
instructions and
if
expressions become semantically equivalent: both are
equally powerful; the difference is purely syntactic.
Embedding ABQL within XQuery and vice-versa becomes trivial.
If XSLT allows element syntax within expressions and some XSLT expressions have equivalent semantics to XSLT elements, ABQL and XSLT do not need to be completely different languages.
The logical conclusion would seem to be that we have one set of semantic constructs. For each semantic construct, we have a syntax that uses elements and one that uses characters (an XML and non-XML syntax). These can be freely mixed. Constructs using element syntax can contain constructs using non-element syntax and vice-versa without restriction.
James Clark$Id: construct.html,v 1.5 2001/05/27 12:22:21 jjc Exp $