summaryrefslogtreecommitdiff
path: root/doc/xml.xml
blob: d64d2e3e22cc824ffd25b30b72b1ab19c5adfa90 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
<!-- neon XML interface -*- text -*- -->

<sect1 id="xml">

  <title>Parsing XML</title>

  <para>The &neon; XML interface is exposed by the
  <filename>ne_xml.h</filename> header file.  This interface gives a
  wrapper around the standard <ulink
  url="http://www.saxproject.org/">SAX</ulink> API used by XML
  parsers, with an additional abstraction, <firstterm>stacked SAX
  handlers</firstterm>, and also giving consistent <ulink
  url="http://www.w3.org/TR/REC-xml-names">XML Namespace</ulink> support.</para>

<sect2 id="xml-sax">
  <title>Introduction to SAX</title>

  <para>A SAX-based parser works by emitting a sequence of
  <firstterm>events</firstterm> to reflect the tokens being parsed
  from the XML document.  For example, parsing the following document
  fragment:

<programlisting><![CDATA[
<hello>world</hello>
]]></programlisting>

  results in the following events:

  <orderedlist>
    <listitem>
      <simpara>&startelm; "hello"</simpara>
    </listitem>
    <listitem>
      <simpara>&cdata; "world"</simpara>
    </listitem>
    <listitem>
      <simpara>&endelm; "hello"</simpara>
    </listitem>
  </orderedlist>

  This example demonstrates the three event types used used in the
  subset of SAX exposed by the &neon; XML interface: &startelm;,
  &cdata; and &endelm;.  In a C API, an <quote>event</quote> is
  implemented as a function callback; three callback types are used in
  &neon;, one for each type of event.</para>

</sect2>

<sect2 id="xml-stacked">
  <title>Stacked SAX handlers</title>

  <para>WebDAV property values are represented as fragments of XML,
  transmitted as parts of larger XML documents over HTTP (notably in
  the body of the response to a <literal>PROPFIND</literal> request).
  When &neon; parses such documents, the SAX events generated for
  these property value fragments may need to be handled by the
  application, since &neon; has no knowledge of the structure of
  properties used by the application.</para>

  <para>To solve this problem<footnote id="foot.xml.sax"><para>This
  <quote>problem</quote> only needs solving because the SAX interface
  is so inflexible when implemented as C function callbacks; a better
  approach would be to use an XML parser interface which is not based
  on callbacks.</para></footnote> the &neon; XML interface introduces
  the concept of a <firstterm>SAX handler</firstterm>.  A SAX handler
  comprises a &startelm;, &cdata; and &endelm; callback; the
  &startelm; callback being defined such that each handler may
  <emphasis>accept</emphasis> or <emphasis>decline</emphasis> the
  &startelm; event.  Handlers are composed into a <firstterm>handler
  stack</firstterm> before parsing a document.  When a new &startelm;
  event is generated by the XML parser, &neon; invokes each &startelm;
  callback in the handler stack in turn until one accepts the event.
  The handler which accepts the event will then be subsequently be
  passed &cdata; events if the element contains character data,
  followed by an &endelm; event when the element is closed.  If no
  handler in the stack accepts a &startelm; event, the branch of the
  tree is ignored.</para>

  <para>To illustrate, given a handler A, which accepts the
  <literal>cat</literal> and <literal>age</literal> elements, and a
  handler B, which accepts the <literal>name</literal> element, the
  following document:

<example id="xml-example">
<title>An example XML document</title>
<programlisting><![CDATA[
<cat>
  <age>3</age>    
  <name>Bob</name>
</cat>
]]></programlisting></example>

  would be parsed as follows:
  
  <orderedlist>
    <listitem>
      <simpara>A &startelm; "cat" &rarr; <emphasis>accept</emphasis></simpara>
    </listitem>
    <listitem>
      <simpara>A &startelm; "age" &rarr; <emphasis>accept</emphasis></simpara>
    </listitem>
    <listitem>
      <simpara>A &cdata; "3"</simpara>
    </listitem>
    <listitem>
      <simpara>A &endelm; "age"</simpara>
    </listitem>
    <listitem>
      <simpara>A &startelm; "name" &rarr; <emphasis>decline</emphasis></simpara>
    </listitem>
    <listitem>
      <simpara>B &startelm; "name" &rarr; <emphasis>accept</emphasis></simpara>
    </listitem>
    <listitem>
      <simpara>B &cdata; "Bob"</simpara>
    </listitem>
    <listitem>
      <simpara>B &endelm; "name"</simpara>
    </listitem>
    <listitem>
      <simpara>A &endelm; "cat"</simpara>
    </listitem>
  </orderedlist></para>

  <para>The search for a handler which will accept a &startelm; event
  begins at the handler of the parent element and continues toward the
  top of the stack.  For the root element, it begins at the base of
  the stack.  In the above example, handler A is at the base, and
  handler B at the top; if the <literal>name</literal> element had any
  children, only B's &startelm; would be invoked to accept
  them.</para>

</sect2>

<sect2 id="xml-state">
  <title>Maintaining state</title>

  <para>To facilitate communication between independent handlers, a
  <firstterm>state integer</firstterm> is associated with each element
  being parsed.  This integer is returned by &startelm; callback and
  is passed to the subsequent &cdata; and &endelm; callbacks
  associated with the element.  The state integer of the parent
  element is also passed to each &startelm; callback, the value zero
  used for the root element (which by definition has no
  parent).</para>

  <para>To further extend <xref linkend="xml-example"/>: if handler A
  defines that the state of the root element <sgmltag>cat</sgmltag>
  will be <literal>42</literal>, the event trace would be as
  follows:

  <orderedlist>
    <listitem>
      <simpara>A &startelm; (parent = 0, "cat") &rarr;
      <emphasis>accept</emphasis>, state = 42
      </simpara>
    </listitem>
    <listitem>
      <simpara>A &startelm; (parent = 42, "age") &rarr; 
      <emphasis>accept</emphasis>, state = 50
      </simpara>
    </listitem>
    <listitem>
      <simpara>A &cdata; (state = 50, "3")</simpara>
    </listitem>
    <listitem>
      <simpara>A &endelm; (state = 50, "age")</simpara>
    </listitem>
    <listitem>
      <simpara>A &startelm; (parent = 42, "name") &rarr; 
      <emphasis>decline</emphasis></simpara>
    </listitem>
    <listitem>
      <simpara>B &startelm; (parent = 42, "name") &rarr;
      <emphasis>accept</emphasis>, state = 99</simpara>
    </listitem>
    <listitem>
      <simpara>B &cdata; (state = 99, "Bob")</simpara>
    </listitem>
    <listitem>
      <simpara>B &endelm; (state = 99, "name")</simpara>
    </listitem>
    <listitem>
      <simpara>A &endelm; (state = 42, "cat")</simpara>
    </listitem>
  </orderedlist></para>

  <para>To avoid collisions between state integers used by different
  handlers, the interface definition of any handler includes the range
  of integers it will use.</para>

</sect2>

<sect2 id="xml-ns">
  <title>XML namespaces</title>

  <para>To support XML namespaces, every element name is represented
  as a <emphasis>(namespace, name)</emphasis> pair.  The &startelm;
  and &endelm; callbacks are passed namespace and name strings
  accordingly.  If an element in the XML document has no declared
  namespace, the namespace given will be the empty string,
  <literal>""</literal>.</para>

</sect2>

</sect1>