1 <?xml version="1.0" encoding="iso-8859-1"?> |
|
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" |
|
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
|
4 <html> |
|
5 <head> |
|
6 <!-- Copyright 1999,2000 Clark Cooper <coopercc@netheaven.com> |
|
7 All rights reserved. |
|
8 This is free software. You may distribute or modify according to |
|
9 the terms of the MIT/X License --> |
|
10 <title>Expat XML Parser</title> |
|
11 <meta name="author" content="Clark Cooper, coopercc@netheaven.com" /> |
|
12 <meta http-equiv="Content-Style-Type" content="text/css" /> |
|
13 <link href="style.css" rel="stylesheet" type="text/css" /> |
|
14 </head> |
|
15 <body> |
|
16 <table cellspacing="0" cellpadding="0" width="100%"> |
|
17 <tr> |
|
18 <td class="corner"><img src="expat.png" alt="(Expat logo)" /></td> |
|
19 <td class="banner"><h1>The Expat XML Parser</h1></td> |
|
20 </tr> |
|
21 <tr> |
|
22 <td class="releaseno">Release 2.0.1</td> |
|
23 <td></td> |
|
24 </tr> |
|
25 </table> |
|
26 <div class="content"> |
|
27 |
|
28 <p>Expat is a library, written in C, for parsing XML documents. It's |
|
29 the underlying XML parser for the open source Mozilla project, Perl's |
|
30 <code>XML::Parser</code>, Python's <code>xml.parsers.expat</code>, and |
|
31 other open-source XML parsers.</p> |
|
32 |
|
33 <p>This library is the creation of James Clark, who's also given us |
|
34 groff (an nroff look-alike), Jade (an implemention of ISO's DSSSL |
|
35 stylesheet language for SGML), XP (a Java XML parser package), XT (a |
|
36 Java XSL engine). James was also the technical lead on the XML |
|
37 Working Group at W3C that produced the XML specification.</p> |
|
38 |
|
39 <p>This is free software, licensed under the <a |
|
40 href="../COPYING">MIT/X Consortium license</a>. You may download it |
|
41 from <a href="http://www.libexpat.org/">the Expat home page</a>. |
|
42 </p> |
|
43 |
|
44 <p>The bulk of this document was originally commissioned as an article |
|
45 by <a href="http://www.xml.com/">XML.com</a>. They graciously allowed |
|
46 Clark Cooper to retain copyright and to distribute it with Expat. |
|
47 This version has been substantially extended to include documentation |
|
48 on features which have been added since the original article was |
|
49 published, and additional information on using the original |
|
50 interface.</p> |
|
51 |
|
52 <hr /> |
|
53 <h2>Table of Contents</h2> |
|
54 <ul> |
|
55 <li><a href="#overview">Overview</a></li> |
|
56 <li><a href="#building">Building and Installing</a></li> |
|
57 <li><a href="#using">Using Expat</a></li> |
|
58 <li><a href="#reference">Reference</a> |
|
59 <ul> |
|
60 <li><a href="#creation">Parser Creation Functions</a> |
|
61 <ul> |
|
62 <li><a href="#XML_ParserCreate">XML_ParserCreate</a></li> |
|
63 <li><a href="#XML_ParserCreateNS">XML_ParserCreateNS</a></li> |
|
64 <li><a href="#XML_ParserCreate_MM">XML_ParserCreate_MM</a></li> |
|
65 <li><a href="#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></li> |
|
66 <li><a href="#XML_ParserFree">XML_ParserFree</a></li> |
|
67 <li><a href="#XML_ParserReset">XML_ParserReset</a></li> |
|
68 </ul> |
|
69 </li> |
|
70 <li><a href="#parsing">Parsing Functions</a> |
|
71 <ul> |
|
72 <li><a href="#XML_Parse">XML_Parse</a></li> |
|
73 <li><a href="#XML_ParseBuffer">XML_ParseBuffer</a></li> |
|
74 <li><a href="#XML_GetBuffer">XML_GetBuffer</a></li> |
|
75 <li><a href="#XML_StopParser">XML_StopParser</a></li> |
|
76 <li><a href="#XML_ResumeParser">XML_ResumeParser</a></li> |
|
77 <li><a href="#XML_GetParsingStatus">XML_GetParsingStatus</a></li> |
|
78 </ul> |
|
79 </li> |
|
80 <li><a href="#setting">Handler Setting Functions</a> |
|
81 <ul> |
|
82 <li><a href="#XML_SetStartElementHandler">XML_SetStartElementHandler</a></li> |
|
83 <li><a href="#XML_SetEndElementHandler">XML_SetEndElementHandler</a></li> |
|
84 <li><a href="#XML_SetElementHandler">XML_SetElementHandler</a></li> |
|
85 <li><a href="#XML_SetCharacterDataHandler">XML_SetCharacterDataHandler</a></li> |
|
86 <li><a href="#XML_SetProcessingInstructionHandler">XML_SetProcessingInstructionHandler</a></li> |
|
87 <li><a href="#XML_SetCommentHandler">XML_SetCommentHandler</a></li> |
|
88 <li><a href="#XML_SetStartCdataSectionHandler">XML_SetStartCdataSectionHandler</a></li> |
|
89 <li><a href="#XML_SetEndCdataSectionHandler">XML_SetEndCdataSectionHandler</a></li> |
|
90 <li><a href="#XML_SetCdataSectionHandler">XML_SetCdataSectionHandler</a></li> |
|
91 <li><a href="#XML_SetDefaultHandler">XML_SetDefaultHandler</a></li> |
|
92 <li><a href="#XML_SetDefaultHandlerExpand">XML_SetDefaultHandlerExpand</a></li> |
|
93 <li><a href="#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a></li> |
|
94 <li><a href="#XML_SetExternalEntityRefHandlerArg">XML_SetExternalEntityRefHandlerArg</a></li> |
|
95 <li><a href="#XML_SetSkippedEntityHandler">XML_SetSkippedEntityHandler</a></li> |
|
96 <li><a href="#XML_SetUnknownEncodingHandler">XML_SetUnknownEncodingHandler</a></li> |
|
97 <li><a href="#XML_SetStartNamespaceDeclHandler">XML_SetStartNamespaceDeclHandler</a></li> |
|
98 <li><a href="#XML_SetEndNamespaceDeclHandler">XML_SetEndNamespaceDeclHandler</a></li> |
|
99 <li><a href="#XML_SetNamespaceDeclHandler">XML_SetNamespaceDeclHandler</a></li> |
|
100 <li><a href="#XML_SetXmlDeclHandler">XML_SetXmlDeclHandler</a></li> |
|
101 <li><a href="#XML_SetStartDoctypeDeclHandler">XML_SetStartDoctypeDeclHandler</a></li> |
|
102 <li><a href="#XML_SetEndDoctypeDeclHandler">XML_SetEndDoctypeDeclHandler</a></li> |
|
103 <li><a href="#XML_SetDoctypeDeclHandler">XML_SetDoctypeDeclHandler</a></li> |
|
104 <li><a href="#XML_SetElementDeclHandler">XML_SetElementDeclHandler</a></li> |
|
105 <li><a href="#XML_SetAttlistDeclHandler">XML_SetAttlistDeclHandler</a></li> |
|
106 <li><a href="#XML_SetEntityDeclHandler">XML_SetEntityDeclHandler</a></li> |
|
107 <li><a href="#XML_SetUnparsedEntityDeclHandler">XML_SetUnparsedEntityDeclHandler</a></li> |
|
108 <li><a href="#XML_SetNotationDeclHandler">XML_SetNotationDeclHandler</a></li> |
|
109 <li><a href="#XML_SetNotStandaloneHandler">XML_SetNotStandaloneHandler</a></li> |
|
110 </ul> |
|
111 </li> |
|
112 <li><a href="#position">Parse Position and Error Reporting Functions</a> |
|
113 <ul> |
|
114 <li><a href="#XML_GetErrorCode">XML_GetErrorCode</a></li> |
|
115 <li><a href="#XML_ErrorString">XML_ErrorString</a></li> |
|
116 <li><a href="#XML_GetCurrentByteIndex">XML_GetCurrentByteIndex</a></li> |
|
117 <li><a href="#XML_GetCurrentLineNumber">XML_GetCurrentLineNumber</a></li> |
|
118 <li><a href="#XML_GetCurrentColumnNumber">XML_GetCurrentColumnNumber</a></li> |
|
119 <li><a href="#XML_GetCurrentByteCount">XML_GetCurrentByteCount</a></li> |
|
120 <li><a href="#XML_GetInputContext">XML_GetInputContext</a></li> |
|
121 </ul> |
|
122 </li> |
|
123 <li><a href="#miscellaneous">Miscellaneous Functions</a> |
|
124 <ul> |
|
125 <li><a href="#XML_SetUserData">XML_SetUserData</a></li> |
|
126 <li><a href="#XML_GetUserData">XML_GetUserData</a></li> |
|
127 <li><a href="#XML_UseParserAsHandlerArg">XML_UseParserAsHandlerArg</a></li> |
|
128 <li><a href="#XML_SetBase">XML_SetBase</a></li> |
|
129 <li><a href="#XML_GetBase">XML_GetBase</a></li> |
|
130 <li><a href="#XML_GetSpecifiedAttributeCount">XML_GetSpecifiedAttributeCount</a></li> |
|
131 <li><a href="#XML_GetIdAttributeIndex">XML_GetIdAttributeIndex</a></li> |
|
132 <li><a href="#XML_SetEncoding">XML_SetEncoding</a></li> |
|
133 <li><a href="#XML_SetParamEntityParsing">XML_SetParamEntityParsing</a></li> |
|
134 <li><a href="#XML_UseForeignDTD">XML_UseForeignDTD</a></li> |
|
135 <li><a href="#XML_SetReturnNSTriplet">XML_SetReturnNSTriplet</a></li> |
|
136 <li><a href="#XML_DefaultCurrent">XML_DefaultCurrent</a></li> |
|
137 <li><a href="#XML_ExpatVersion">XML_ExpatVersion</a></li> |
|
138 <li><a href="#XML_ExpatVersionInfo">XML_ExpatVersionInfo</a></li> |
|
139 <li><a href="#XML_GetFeatureList">XML_GetFeatureList</a></li> |
|
140 <li><a href="#XML_FreeContentModel">XML_FreeContentModel</a></li> |
|
141 <li><a href="#XML_MemMalloc">XML_MemMalloc</a></li> |
|
142 <li><a href="#XML_MemRealloc">XML_MemRealloc</a></li> |
|
143 <li><a href="#XML_MemFree">XML_MemFree</a></li> |
|
144 </ul> |
|
145 </li> |
|
146 </ul> |
|
147 </li> |
|
148 </ul> |
|
149 |
|
150 <hr /> |
|
151 <h2><a name="overview">Overview</a></h2> |
|
152 |
|
153 <p>Expat is a stream-oriented parser. You register callback (or |
|
154 handler) functions with the parser and then start feeding it the |
|
155 document. As the parser recognizes parts of the document, it will |
|
156 call the appropriate handler for that part (if you've registered one.) |
|
157 The document is fed to the parser in pieces, so you can start parsing |
|
158 before you have all the document. This also allows you to parse really |
|
159 huge documents that won't fit into memory.</p> |
|
160 |
|
161 <p>Expat can be intimidating due to the many kinds of handlers and |
|
162 options you can set. But you only need to learn four functions in |
|
163 order to do 90% of what you'll want to do with it:</p> |
|
164 |
|
165 <dl> |
|
166 |
|
167 <dt><code><a href= "#XML_ParserCreate" |
|
168 >XML_ParserCreate</a></code></dt> |
|
169 <dd>Create a new parser object.</dd> |
|
170 |
|
171 <dt><code><a href= "#XML_SetElementHandler" |
|
172 >XML_SetElementHandler</a></code></dt> |
|
173 <dd>Set handlers for start and end tags.</dd> |
|
174 |
|
175 <dt><code><a href= "#XML_SetCharacterDataHandler" |
|
176 >XML_SetCharacterDataHandler</a></code></dt> |
|
177 <dd>Set handler for text.</dd> |
|
178 |
|
179 <dt><code><a href= "#XML_Parse" |
|
180 >XML_Parse</a></code></dt> |
|
181 <dd>Pass a buffer full of document to the parser</dd> |
|
182 </dl> |
|
183 |
|
184 <p>These functions and others are described in the <a |
|
185 href="#reference">reference</a> part of this document. The reference |
|
186 section also describes in detail the parameters passed to the |
|
187 different types of handlers.</p> |
|
188 |
|
189 <p>Let's look at a very simple example program that only uses 3 of the |
|
190 above functions (it doesn't need to set a character handler.) The |
|
191 program <a href="../examples/outline.c">outline.c</a> prints an |
|
192 element outline, indenting child elements to distinguish them from the |
|
193 parent element that contains them. The start handler does all the |
|
194 work. It prints two indenting spaces for every level of ancestor |
|
195 elements, then it prints the element and attribute |
|
196 information. Finally it increments the global <code>Depth</code> |
|
197 variable.</p> |
|
198 |
|
199 <pre class="eg"> |
|
200 int Depth; |
|
201 |
|
202 void XMLCALL |
|
203 start(void *data, const char *el, const char **attr) { |
|
204 int i; |
|
205 |
|
206 for (i = 0; i < Depth; i++) |
|
207 printf(" "); |
|
208 |
|
209 printf("%s", el); |
|
210 |
|
211 for (i = 0; attr[i]; i += 2) { |
|
212 printf(" %s='%s'", attr[i], attr[i + 1]); |
|
213 } |
|
214 |
|
215 printf("\n"); |
|
216 Depth++; |
|
217 } /* End of start handler */ |
|
218 </pre> |
|
219 |
|
220 <p>The end tag simply does the bookkeeping work of decrementing |
|
221 <code>Depth</code>.</p> |
|
222 <pre class="eg"> |
|
223 void XMLCALL |
|
224 end(void *data, const char *el) { |
|
225 Depth--; |
|
226 } /* End of end handler */ |
|
227 </pre> |
|
228 |
|
229 <p>Note the <code>XMLCALL</code> annotation used for the callbacks. |
|
230 This is used to ensure that the Expat and the callbacks are using the |
|
231 same calling convention in case the compiler options used for Expat |
|
232 itself and the client code are different. Expat tries not to care |
|
233 what the default calling convention is, though it may require that it |
|
234 be compiled with a default convention of "cdecl" on some platforms. |
|
235 For code which uses Expat, however, the calling convention is |
|
236 specified by the <code>XMLCALL</code> annotation on most platforms; |
|
237 callbacks should be defined using this annotation.</p> |
|
238 |
|
239 <p>The <code>XMLCALL</code> annotation was added in Expat 1.95.7, but |
|
240 existing working Expat applications don't need to add it (since they |
|
241 are already using the "cdecl" calling convention, or they wouldn't be |
|
242 working). The annotation is only needed if the default calling |
|
243 convention may be something other than "cdecl". To use the annotation |
|
244 safely with older versions of Expat, you can conditionally define it |
|
245 <em>after</em> including Expat's header file:</p> |
|
246 |
|
247 <pre class="eg"> |
|
248 #include <expat.h> |
|
249 |
|
250 #ifndef XMLCALL |
|
251 #if defined(_MSC_EXTENSIONS) && !defined(__BEOS__) && !defined(__CYGWIN__) |
|
252 #define XMLCALL __cdecl |
|
253 #elif defined(__GNUC__) |
|
254 #define XMLCALL __attribute__((cdecl)) |
|
255 #else |
|
256 #define XMLCALL |
|
257 #endif |
|
258 #endif |
|
259 </pre> |
|
260 |
|
261 <p>After creating the parser, the main program just has the job of |
|
262 shoveling the document to the parser so that it can do its work.</p> |
|
263 |
|
264 <hr /> |
|
265 <h2><a name="building">Building and Installing Expat</a></h2> |
|
266 |
|
267 <p>The Expat distribution comes as a compressed (with GNU gzip) tar |
|
268 file. You may download the latest version from <a href= |
|
269 "http://sourceforge.net/projects/expat/" >Source Forge</a>. After |
|
270 unpacking this, cd into the directory. Then follow either the Win32 |
|
271 directions or Unix directions below.</p> |
|
272 |
|
273 <h3>Building under Win32</h3> |
|
274 |
|
275 <p>If you're using the GNU compiler under cygwin, follow the Unix |
|
276 directions in the next section. Otherwise if you have Microsoft's |
|
277 Developer Studio installed, then from Windows Explorer double-click on |
|
278 "expat.dsp" in the lib directory and build and install in the usual |
|
279 manner.</p> |
|
280 |
|
281 <p>Alternatively, you may download the Win32 binary package that |
|
282 contains the "expat.h" include file and a pre-built DLL.</p> |
|
283 |
|
284 <h3>Building under Unix (or GNU)</h3> |
|
285 |
|
286 <p>First you'll need to run the configure shell script in order to |
|
287 configure the Makefiles and headers for your system.</p> |
|
288 |
|
289 <p>If you're happy with all the defaults that configure picks for you, |
|
290 and you have permission on your system to install into /usr/local, you |
|
291 can install Expat with this sequence of commands:</p> |
|
292 |
|
293 <pre class="eg"> |
|
294 ./configure |
|
295 make |
|
296 make install |
|
297 </pre> |
|
298 |
|
299 <p>There are some options that you can provide to this script, but the |
|
300 only one we'll mention here is the <code>--prefix</code> option. You |
|
301 can find out all the options available by running configure with just |
|
302 the <code>--help</code> option.</p> |
|
303 |
|
304 <p>By default, the configure script sets things up so that the library |
|
305 gets installed in <code>/usr/local/lib</code> and the associated |
|
306 header file in <code>/usr/local/include</code>. But if you were to |
|
307 give the option, <code>--prefix=/home/me/mystuff</code>, then the |
|
308 library and header would get installed in |
|
309 <code>/home/me/mystuff/lib</code> and |
|
310 <code>/home/me/mystuff/include</code> respectively.</p> |
|
311 |
|
312 <h3>Configuring Expat Using the Pre-Processor</h3> |
|
313 |
|
314 <p>Expat's feature set can be configured using a small number of |
|
315 pre-processor definitions. The definition of this symbols does not |
|
316 affect the set of entry points for Expat, only the behavior of the API |
|
317 and the definition of character types in the case of |
|
318 <code>XML_UNICODE_WCHAR_T</code>. The symbols are:</p> |
|
319 |
|
320 <dl class="cpp-symbols"> |
|
321 <dt>XML_DTD</dt> |
|
322 <dd>Include support for using and reporting DTD-based content. If |
|
323 this is defined, default attribute values from an external DTD subset |
|
324 are reported and attribute value normalization occurs based on the |
|
325 type of attributes defined in the external subset. Without |
|
326 this, Expat has a smaller memory footprint and can be faster, but will |
|
327 not load external entities or process conditional sections. This does |
|
328 not affect the set of functions available in the API.</dd> |
|
329 |
|
330 <dt>XML_NS</dt> |
|
331 <dd>When defined, support for the <cite><a href= |
|
332 "http://www.w3.org/TR/REC-xml-names/" >Namespaces in XML</a></cite> |
|
333 specification is included.</dd> |
|
334 |
|
335 <dt>XML_UNICODE</dt> |
|
336 <dd>When defined, character data reported to the application is |
|
337 encoded in UTF-16 using wide characters of the type |
|
338 <code>XML_Char</code>. This is implied if |
|
339 <code>XML_UNICODE_WCHAR_T</code> is defined.</dd> |
|
340 |
|
341 <dt>XML_UNICODE_WCHAR_T</dt> |
|
342 <dd>If defined, causes the <code>XML_Char</code> character type to be |
|
343 defined using the <code>wchar_t</code> type; otherwise, <code>unsigned |
|
344 short</code> is used. Defining this implies |
|
345 <code>XML_UNICODE</code>.</dd> |
|
346 |
|
347 <dt>XML_LARGE_SIZE</dt> |
|
348 <dd>If defined, causes the <code>XML_Size</code> and <code>XML_Index</code> |
|
349 integer types to be at least 64 bits in size. This is intended to support |
|
350 processing of very large input streams, where the return values of |
|
351 <code><a href="#XML_GetCurrentByteIndex" >XML_GetCurrentByteIndex</a></code>, |
|
352 <code><a href="#XML_GetCurrentLineNumber" >XML_GetCurrentLineNumber</a></code> and |
|
353 <code><a href="#XML_GetCurrentColumnNumber" >XML_GetCurrentColumnNumber</a></code> |
|
354 could overflow. It may not be supported by all compilers, and is turned |
|
355 off by default.</dd> |
|
356 |
|
357 <dt>XML_CONTEXT_BYTES</dt> |
|
358 <dd>The number of input bytes of markup context which the parser will |
|
359 ensure are available for reporting via <code><a href= |
|
360 "#XML_GetInputContext" >XML_GetInputContext</a></code>. This is |
|
361 normally set to 1024, and must be set to a positive interger. If this |
|
362 is not defined, the input context will not be available and <code><a |
|
363 href= "#XML_GetInputContext" >XML_GetInputContext</a></code> will |
|
364 always report NULL. Without this, Expat has a smaller memory |
|
365 footprint and can be faster.</dd> |
|
366 |
|
367 <dt>XML_STATIC</dt> |
|
368 <dd>On Windows, this should be set if Expat is going to be linked |
|
369 statically with the code that calls it; this is required to get all |
|
370 the right MSVC magic annotations correct. This is ignored on other |
|
371 platforms.</dd> |
|
372 </dl> |
|
373 |
|
374 <hr /> |
|
375 <h2><a name="using">Using Expat</a></h2> |
|
376 |
|
377 <h3>Compiling and Linking Against Expat</h3> |
|
378 |
|
379 <p>Unless you installed Expat in a location not expected by your |
|
380 compiler and linker, all you have to do to use Expat in your programs |
|
381 is to include the Expat header (<code>#include <expat.h></code>) |
|
382 in your files that make calls to it and to tell the linker that it |
|
383 needs to link against the Expat library. On Unix systems, this would |
|
384 usually be done with the <code>-lexpat</code> argument. Otherwise, |
|
385 you'll need to tell the compiler where to look for the Expat header |
|
386 and the linker where to find the Expat library. You may also need to |
|
387 take steps to tell the operating system where to find this library at |
|
388 run time.</p> |
|
389 |
|
390 <p>On a Unix-based system, here's what a Makefile might look like when |
|
391 Expat is installed in a standard location:</p> |
|
392 |
|
393 <pre class="eg"> |
|
394 CC=cc |
|
395 LDFLAGS= |
|
396 LIBS= -lexpat |
|
397 xmlapp: xmlapp.o |
|
398 $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS) |
|
399 </pre> |
|
400 |
|
401 <p>If you installed Expat in, say, <code>/home/me/mystuff</code>, then |
|
402 the Makefile would look like this:</p> |
|
403 |
|
404 <pre class="eg"> |
|
405 CC=cc |
|
406 CFLAGS= -I/home/me/mystuff/include |
|
407 LDFLAGS= |
|
408 LIBS= -L/home/me/mystuff/lib -lexpat |
|
409 xmlapp: xmlapp.o |
|
410 $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS) |
|
411 </pre> |
|
412 |
|
413 <p>You'd also have to set the environment variable |
|
414 <code>LD_LIBRARY_PATH</code> to <code>/home/me/mystuff/lib</code> (or |
|
415 to <code>${LD_LIBRARY_PATH}:/home/me/mystuff/lib</code> if |
|
416 LD_LIBRARY_PATH already has some directories in it) in order to run |
|
417 your application.</p> |
|
418 |
|
419 <h3>Expat Basics</h3> |
|
420 |
|
421 <p>As we saw in the example in the overview, the first step in parsing |
|
422 an XML document with Expat is to create a parser object. There are <a |
|
423 href="#creation">three functions</a> in the Expat API for creating a |
|
424 parser object. However, only two of these (<code><a href= |
|
425 "#XML_ParserCreate" >XML_ParserCreate</a></code> and <code><a href= |
|
426 "#XML_ParserCreateNS" >XML_ParserCreateNS</a></code>) can be used for |
|
427 constructing a parser for a top-level document. The object returned |
|
428 by these functions is an opaque pointer (i.e. "expat.h" declares it as |
|
429 void *) to data with further internal structure. In order to free the |
|
430 memory associated with this object you must call <code><a href= |
|
431 "#XML_ParserFree" >XML_ParserFree</a></code>. Note that if you have |
|
432 provided any <a href="#userdata">user data</a> that gets stored in the |
|
433 parser, then your application is responsible for freeing it prior to |
|
434 calling <code>XML_ParserFree</code>.</p> |
|
435 |
|
436 <p>The objects returned by the parser creation functions are good for |
|
437 parsing only one XML document or external parsed entity. If your |
|
438 application needs to parse many XML documents, then it needs to create |
|
439 a parser object for each one. The best way to deal with this is to |
|
440 create a higher level object that contains all the default |
|
441 initialization you want for your parser objects.</p> |
|
442 |
|
443 <p>Walking through a document hierarchy with a stream oriented parser |
|
444 will require a good stack mechanism in order to keep track of current |
|
445 context. For instance, to answer the simple question, "What element |
|
446 does this text belong to?" requires a stack, since the parser may have |
|
447 descended into other elements that are children of the current one and |
|
448 has encountered this text on the way out.</p> |
|
449 |
|
450 <p>The things you're likely to want to keep on a stack are the |
|
451 currently opened element and it's attributes. You push this |
|
452 information onto the stack in the start handler and you pop it off in |
|
453 the end handler.</p> |
|
454 |
|
455 <p>For some tasks, it is sufficient to just keep information on what |
|
456 the depth of the stack is (or would be if you had one.) The outline |
|
457 program shown above presents one example. Another such task would be |
|
458 skipping over a complete element. When you see the start tag for the |
|
459 element you want to skip, you set a skip flag and record the depth at |
|
460 which the element started. When the end tag handler encounters the |
|
461 same depth, the skipped element has ended and the flag may be |
|
462 cleared. If you follow the convention that the root element starts at |
|
463 1, then you can use the same variable for skip flag and skip |
|
464 depth.</p> |
|
465 |
|
466 <pre class="eg"> |
|
467 void |
|
468 init_info(Parseinfo *info) { |
|
469 info->skip = 0; |
|
470 info->depth = 1; |
|
471 /* Other initializations here */ |
|
472 } /* End of init_info */ |
|
473 |
|
474 void XMLCALL |
|
475 rawstart(void *data, const char *el, const char **attr) { |
|
476 Parseinfo *inf = (Parseinfo *) data; |
|
477 |
|
478 if (! inf->skip) { |
|
479 if (should_skip(inf, el, attr)) { |
|
480 inf->skip = inf->depth; |
|
481 } |
|
482 else |
|
483 start(inf, el, attr); /* This does rest of start handling */ |
|
484 } |
|
485 |
|
486 inf->depth++; |
|
487 } /* End of rawstart */ |
|
488 |
|
489 void XMLCALL |
|
490 rawend(void *data, const char *el) { |
|
491 Parseinfo *inf = (Parseinfo *) data; |
|
492 |
|
493 inf->depth--; |
|
494 |
|
495 if (! inf->skip) |
|
496 end(inf, el); /* This does rest of end handling */ |
|
497 |
|
498 if (inf->skip == inf->depth) |
|
499 inf->skip = 0; |
|
500 } /* End rawend */ |
|
501 </pre> |
|
502 |
|
503 <p>Notice in the above example the difference in how depth is |
|
504 manipulated in the start and end handlers. The end tag handler should |
|
505 be the mirror image of the start tag handler. This is necessary to |
|
506 properly model containment. Since, in the start tag handler, we |
|
507 incremented depth <em>after</em> the main body of start tag code, then |
|
508 in the end handler, we need to manipulate it <em>before</em> the main |
|
509 body. If we'd decided to increment it first thing in the start |
|
510 handler, then we'd have had to decrement it last thing in the end |
|
511 handler.</p> |
|
512 |
|
513 <h3 id="userdata">Communicating between handlers</h3> |
|
514 |
|
515 <p>In order to be able to pass information between different handlers |
|
516 without using globals, you'll need to define a data structure to hold |
|
517 the shared variables. You can then tell Expat (with the <code><a href= |
|
518 "#XML_SetUserData" >XML_SetUserData</a></code> function) to pass a |
|
519 pointer to this structure to the handlers. This is the first |
|
520 argument received by most handlers. In the <a href="#reference" |
|
521 >reference section</a>, an argument to a callback function is named |
|
522 <code>userData</code> and have type <code>void *</code> if the user |
|
523 data is passed; it will have the type <code>XML_Parser</code> if the |
|
524 parser itself is passed. When the parser is passed, the user data may |
|
525 be retrieved using <code><a href="#XML_GetUserData" |
|
526 >XML_GetUserData</a></code>.</p> |
|
527 |
|
528 <p>One common case where multiple calls to a single handler may need |
|
529 to communicate using an application data structure is the case when |
|
530 content passed to the character data handler (set by <code><a href= |
|
531 "#XML_SetCharacterDataHandler" |
|
532 >XML_SetCharacterDataHandler</a></code>) needs to be accumulated. A |
|
533 common first-time mistake with any of the event-oriented interfaces to |
|
534 an XML parser is to expect all the text contained in an element to be |
|
535 reported by a single call to the character data handler. Expat, like |
|
536 many other XML parsers, reports such data as a sequence of calls; |
|
537 there's no way to know when the end of the sequence is reached until a |
|
538 different callback is made. A buffer referenced by the user data |
|
539 structure proves both an effective and convenient place to accumulate |
|
540 character data.</p> |
|
541 |
|
542 <!-- XXX example needed here --> |
|
543 |
|
544 |
|
545 <h3>XML Version</h3> |
|
546 |
|
547 <p>Expat is an XML 1.0 parser, and as such never complains based on |
|
548 the value of the <code>version</code> pseudo-attribute in the XML |
|
549 declaration, if present.</p> |
|
550 |
|
551 <p>If an application needs to check the version number (to support |
|
552 alternate processing), it should use the <code><a href= |
|
553 "#XML_SetXmlDeclHandler" >XML_SetXmlDeclHandler</a></code> function to |
|
554 set a handler that uses the information in the XML declaration to |
|
555 determine what to do. This example shows how to check that only a |
|
556 version number of <code>"1.0"</code> is accepted:</p> |
|
557 |
|
558 <pre class="eg"> |
|
559 static int wrong_version; |
|
560 static XML_Parser parser; |
|
561 |
|
562 static void XMLCALL |
|
563 xmldecl_handler(void *userData, |
|
564 const XML_Char *version, |
|
565 const XML_Char *encoding, |
|
566 int standalone) |
|
567 { |
|
568 static const XML_Char Version_1_0[] = {'1', '.', '0', 0}; |
|
569 |
|
570 int i; |
|
571 |
|
572 for (i = 0; i < (sizeof(Version_1_0) / sizeof(Version_1_0[0])); ++i) { |
|
573 if (version[i] != Version_1_0[i]) { |
|
574 wrong_version = 1; |
|
575 /* also clear all other handlers: */ |
|
576 XML_SetCharacterDataHandler(parser, NULL); |
|
577 ... |
|
578 return; |
|
579 } |
|
580 } |
|
581 ... |
|
582 } |
|
583 </pre> |
|
584 |
|
585 <h3>Namespace Processing</h3> |
|
586 |
|
587 <p>When the parser is created using the <code><a href= |
|
588 "#XML_ParserCreateNS" >XML_ParserCreateNS</a></code>, function, Expat |
|
589 performs namespace processing. Under namespace processing, Expat |
|
590 consumes <code>xmlns</code> and <code>xmlns:...</code> attributes, |
|
591 which declare namespaces for the scope of the element in which they |
|
592 occur. This means that your start handler will not see these |
|
593 attributes. Your application can still be informed of these |
|
594 declarations by setting namespace declaration handlers with <a href= |
|
595 "#XML_SetNamespaceDeclHandler" |
|
596 ><code>XML_SetNamespaceDeclHandler</code></a>.</p> |
|
597 |
|
598 <p>Element type and attribute names that belong to a given namespace |
|
599 are passed to the appropriate handler in expanded form. By default |
|
600 this expanded form is a concatenation of the namespace URI, the |
|
601 separator character (which is the 2nd argument to <code><a href= |
|
602 "#XML_ParserCreateNS" >XML_ParserCreateNS</a></code>), and the local |
|
603 name (i.e. the part after the colon). Names with undeclared prefixes |
|
604 are not well-formed when namespace processing is enabled, and will |
|
605 trigger an error. Unprefixed attribute names are never expanded, |
|
606 and unprefixed element names are only expanded when they are in the |
|
607 scope of a default namespace.</p> |
|
608 |
|
609 <p>However if <code><a href= "#XML_SetReturnNSTriplet" |
|
610 >XML_SetReturnNSTriplet</a></code> has been called with a non-zero |
|
611 <code>do_nst</code> parameter, then the expanded form for names with |
|
612 an explicit prefix is a concatenation of: URI, separator, local name, |
|
613 separator, prefix.</p> |
|
614 |
|
615 <p>You can set handlers for the start of a namespace declaration and |
|
616 for the end of a scope of a declaration with the <code><a href= |
|
617 "#XML_SetNamespaceDeclHandler" >XML_SetNamespaceDeclHandler</a></code> |
|
618 function. The StartNamespaceDeclHandler is called prior to the start |
|
619 tag handler and the EndNamespaceDeclHandler is called after the |
|
620 corresponding end tag that ends the namespace's scope. The namespace |
|
621 start handler gets passed the prefix and URI for the namespace. For a |
|
622 default namespace declaration (xmlns='...'), the prefix will be null. |
|
623 The URI will be null for the case where the default namespace is being |
|
624 unset. The namespace end handler just gets the prefix for the closing |
|
625 scope.</p> |
|
626 |
|
627 <p>These handlers are called for each declaration. So if, for |
|
628 instance, a start tag had three namespace declarations, then the |
|
629 StartNamespaceDeclHandler would be called three times before the start |
|
630 tag handler is called, once for each declaration.</p> |
|
631 |
|
632 <h3>Character Encodings</h3> |
|
633 |
|
634 <p>While XML is based on Unicode, and every XML processor is required |
|
635 to recognized UTF-8 and UTF-16 (1 and 2 byte encodings of Unicode), |
|
636 other encodings may be declared in XML documents or entities. For the |
|
637 main document, an XML declaration may contain an encoding |
|
638 declaration:</p> |
|
639 <pre> |
|
640 <?xml version="1.0" encoding="ISO-8859-2"?> |
|
641 </pre> |
|
642 |
|
643 <p>External parsed entities may begin with a text declaration, which |
|
644 looks like an XML declaration with just an encoding declaration:</p> |
|
645 <pre> |
|
646 <?xml encoding="Big5"?> |
|
647 </pre> |
|
648 |
|
649 <p>With Expat, you may also specify an encoding at the time of |
|
650 creating a parser. This is useful when the encoding information may |
|
651 come from a source outside the document itself (like a higher level |
|
652 protocol.)</p> |
|
653 |
|
654 <p><a name="builtin_encodings"></a>There are four built-in encodings |
|
655 in Expat:</p> |
|
656 <ul> |
|
657 <li>UTF-8</li> |
|
658 <li>UTF-16</li> |
|
659 <li>ISO-8859-1</li> |
|
660 <li>US-ASCII</li> |
|
661 </ul> |
|
662 |
|
663 <p>Anything else discovered in an encoding declaration or in the |
|
664 protocol encoding specified in the parser constructor, triggers a call |
|
665 to the <code>UnknownEncodingHandler</code>. This handler gets passed |
|
666 the encoding name and a pointer to an <code>XML_Encoding</code> data |
|
667 structure. Your handler must fill in this structure and return |
|
668 <code>XML_STATUS_OK</code> if it knows how to deal with the |
|
669 encoding. Otherwise the handler should return |
|
670 <code>XML_STATUS_ERROR</code>. The handler also gets passed a pointer |
|
671 to an optional application data structure that you may indicate when |
|
672 you set the handler.</p> |
|
673 |
|
674 <p>Expat places restrictions on character encodings that it can |
|
675 support by filling in the <code>XML_Encoding</code> structure. |
|
676 include file:</p> |
|
677 <ol> |
|
678 <li>Every ASCII character that can appear in a well-formed XML document |
|
679 must be represented by a single byte, and that byte must correspond to |
|
680 it's ASCII encoding (except for the characters $@\^'{}~)</li> |
|
681 <li>Characters must be encoded in 4 bytes or less.</li> |
|
682 <li>All characters encoded must have Unicode scalar values less than or |
|
683 equal to 65535 (0xFFFF)<em>This does not apply to the built-in support |
|
684 for UTF-16 and UTF-8</em></li> |
|
685 <li>No character may be encoded by more that one distinct sequence of |
|
686 bytes</li> |
|
687 </ol> |
|
688 |
|
689 <p><code>XML_Encoding</code> contains an array of integers that |
|
690 correspond to the 1st byte of an encoding sequence. If the value in |
|
691 the array for a byte is zero or positive, then the byte is a single |
|
692 byte encoding that encodes the Unicode scalar value contained in the |
|
693 array. A -1 in this array indicates a malformed byte. If the value is |
|
694 -2, -3, or -4, then the byte is the beginning of a 2, 3, or 4 byte |
|
695 sequence respectively. Multi-byte sequences are sent to the convert |
|
696 function pointed at in the <code>XML_Encoding</code> structure. This |
|
697 function should return the Unicode scalar value for the sequence or -1 |
|
698 if the sequence is malformed.</p> |
|
699 |
|
700 <p>One pitfall that novice Expat users are likely to fall into is that |
|
701 although Expat may accept input in various encodings, the strings that |
|
702 it passes to the handlers are always encoded in UTF-8 or UTF-16 |
|
703 (depending on how Expat was compiled). Your application is responsible |
|
704 for any translation of these strings into other encodings.</p> |
|
705 |
|
706 <h3>Handling External Entity References</h3> |
|
707 |
|
708 <p>Expat does not read or parse external entities directly. Note that |
|
709 any external DTD is a special case of an external entity. If you've |
|
710 set no <code>ExternalEntityRefHandler</code>, then external entity |
|
711 references are silently ignored. Otherwise, it calls your handler with |
|
712 the information needed to read and parse the external entity.</p> |
|
713 |
|
714 <p>Your handler isn't actually responsible for parsing the entity, but |
|
715 it is responsible for creating a subsidiary parser with <code><a href= |
|
716 "#XML_ExternalEntityParserCreate" |
|
717 >XML_ExternalEntityParserCreate</a></code> that will do the job. This |
|
718 returns an instance of <code>XML_Parser</code> that has handlers and |
|
719 other data structures initialized from the parent parser. You may then |
|
720 use <code><a href= "#XML_Parse" >XML_Parse</a></code> or <code><a |
|
721 href= "#XML_ParseBuffer">XML_ParseBuffer</a></code> calls against this |
|
722 parser. Since external entities my refer to other external entities, |
|
723 your handler should be prepared to be called recursively.</p> |
|
724 |
|
725 <h3>Parsing DTDs</h3> |
|
726 |
|
727 <p>In order to parse parameter entities, before starting the parse, |
|
728 you must call <code><a href= "#XML_SetParamEntityParsing" |
|
729 >XML_SetParamEntityParsing</a></code> with one of the following |
|
730 arguments:</p> |
|
731 <dl> |
|
732 <dt><code>XML_PARAM_ENTITY_PARSING_NEVER</code></dt> |
|
733 <dd>Don't parse parameter entities or the external subset</dd> |
|
734 <dt><code>XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE</code></dt> |
|
735 <dd>Parse parameter entites and the external subset unless |
|
736 <code>standalone</code> was set to "yes" in the XML declaration.</dd> |
|
737 <dt><code>XML_PARAM_ENTITY_PARSING_ALWAYS</code></dt> |
|
738 <dd>Always parse parameter entities and the external subset</dd> |
|
739 </dl> |
|
740 |
|
741 <p>In order to read an external DTD, you also have to set an external |
|
742 entity reference handler as described above.</p> |
|
743 |
|
744 <h3 id="stop-resume">Temporarily Stopping Parsing</h3> |
|
745 |
|
746 <p>Expat 1.95.8 introduces a new feature: its now possible to stop |
|
747 parsing temporarily from within a handler function, even if more data |
|
748 has already been passed into the parser. Applications for this |
|
749 include</p> |
|
750 |
|
751 <ul> |
|
752 <li>Supporting the <a href= "http://www.w3.org/TR/xinclude/" |
|
753 >XInclude</a> specification.</li> |
|
754 |
|
755 <li>Delaying further processing until additional information is |
|
756 available from some other source.</li> |
|
757 |
|
758 <li>Adjusting processor load as task priorities shift within an |
|
759 application.</li> |
|
760 |
|
761 <li>Stopping parsing completely (simply free or reset the parser |
|
762 instead of resuming in the outer parsing loop). This can be useful |
|
763 if a application-domain error is found in the XML being parsed or if |
|
764 the result of the parse is determined not to be useful after |
|
765 all.</li> |
|
766 </ul> |
|
767 |
|
768 <p>To take advantage of this feature, the main parsing loop of an |
|
769 application needs to support this specifically. It cannot be |
|
770 supported with a parsing loop compatible with Expat 1.95.7 or |
|
771 earlier (though existing loops will continue to work without |
|
772 supporting the stop/resume feature).</p> |
|
773 |
|
774 <p>An application that uses this feature for a single parser will have |
|
775 the rough structure (in pseudo-code):</p> |
|
776 |
|
777 <pre class="pseudocode"> |
|
778 fd = open_input() |
|
779 p = create_parser() |
|
780 |
|
781 if parse_xml(p, fd) { |
|
782 /* suspended */ |
|
783 |
|
784 int suspended = 1; |
|
785 |
|
786 while (suspended) { |
|
787 do_something_else() |
|
788 if ready_to_resume() { |
|
789 suspended = continue_parsing(p, fd); |
|
790 } |
|
791 } |
|
792 } |
|
793 </pre> |
|
794 |
|
795 <p>An application that may resume any of several parsers based on |
|
796 input (either from the XML being parsed or some other source) will |
|
797 certainly have more interesting control structures.</p> |
|
798 |
|
799 <p>This C function could be used for the <code>parse_xml</code> |
|
800 function mentioned in the pseudo-code above:</p> |
|
801 |
|
802 <pre class="eg"> |
|
803 #define BUFF_SIZE 10240 |
|
804 |
|
805 /* Parse a document from the open file descriptor 'fd' until the parse |
|
806 is complete (the document has been completely parsed, or there's |
|
807 been an error), or the parse is stopped. Return non-zero when |
|
808 the parse is merely suspended. |
|
809 */ |
|
810 int |
|
811 parse_xml(XML_Parser p, int fd) |
|
812 { |
|
813 for (;;) { |
|
814 int last_chunk; |
|
815 int bytes_read; |
|
816 enum XML_Status status; |
|
817 |
|
818 void *buff = XML_GetBuffer(p, BUFF_SIZE); |
|
819 if (buff == NULL) { |
|
820 /* handle error... */ |
|
821 return 0; |
|
822 } |
|
823 bytes_read = read(fd, buff, BUFF_SIZE); |
|
824 if (bytes_read < 0) { |
|
825 /* handle error... */ |
|
826 return 0; |
|
827 } |
|
828 status = XML_ParseBuffer(p, bytes_read, bytes_read == 0); |
|
829 switch (status) { |
|
830 case XML_STATUS_ERROR: |
|
831 /* handle error... */ |
|
832 return 0; |
|
833 case XML_STATUS_SUSPENDED: |
|
834 return 1; |
|
835 } |
|
836 if (bytes_read == 0) |
|
837 return 0; |
|
838 } |
|
839 } |
|
840 </pre> |
|
841 |
|
842 <p>The corresponding <code>continue_parsing</code> function is |
|
843 somewhat simpler, since it only need deal with the return code from |
|
844 <code><a href= "#XML_ResumeParser">XML_ResumeParser</a></code>; it can |
|
845 delegate the input handling to the <code>parse_xml</code> |
|
846 function:</p> |
|
847 |
|
848 <pre class="eg"> |
|
849 /* Continue parsing a document which had been suspended. The 'p' and |
|
850 'fd' arguments are the same as passed to parse_xml(). Return |
|
851 non-zero when the parse is suspended. |
|
852 */ |
|
853 int |
|
854 continue_parsing(XML_Parser p, int fd) |
|
855 { |
|
856 enum XML_Status status = XML_ResumeParser(p); |
|
857 switch (status) { |
|
858 case XML_STATUS_ERROR: |
|
859 /* handle error... */ |
|
860 return 0; |
|
861 case XML_ERROR_NOT_SUSPENDED: |
|
862 /* handle error... */ |
|
863 return 0;. |
|
864 case XML_STATUS_SUSPENDED: |
|
865 return 1; |
|
866 } |
|
867 return parse_xml(p, fd); |
|
868 } |
|
869 </pre> |
|
870 |
|
871 <p>Now that we've seen what a mess the top-level parsing loop can |
|
872 become, what have we gained? Very simply, we can now use the <code><a |
|
873 href= "#XML_StopParser" >XML_StopParser</a></code> function to stop |
|
874 parsing, without having to go to great lengths to avoid additional |
|
875 processing that we're expecting to ignore. As a bonus, we get to stop |
|
876 parsing <em>temporarily</em>, and come back to it when we're |
|
877 ready.</p> |
|
878 |
|
879 <p>To stop parsing from a handler function, use the <code><a href= |
|
880 "#XML_StopParser" >XML_StopParser</a></code> function. This function |
|
881 takes two arguments; the parser being stopped and a flag indicating |
|
882 whether the parse can be resumed in the future.</p> |
|
883 |
|
884 <!-- XXX really need more here --> |
|
885 |
|
886 |
|
887 <hr /> |
|
888 <!-- ================================================================ --> |
|
889 |
|
890 <h2><a name="reference">Expat Reference</a></h2> |
|
891 |
|
892 <h3><a name="creation">Parser Creation</a></h3> |
|
893 |
|
894 <pre class="fcndec" id="XML_ParserCreate"> |
|
895 XML_Parser XMLCALL |
|
896 XML_ParserCreate(const XML_Char *encoding); |
|
897 </pre> |
|
898 <div class="fcndef"> |
|
899 Construct a new parser. If encoding is non-null, it specifies a |
|
900 character encoding to use for the document. This overrides the document |
|
901 encoding declaration. There are four built-in encodings: |
|
902 <ul> |
|
903 <li>US-ASCII</li> |
|
904 <li>UTF-8</li> |
|
905 <li>UTF-16</li> |
|
906 <li>ISO-8859-1</li> |
|
907 </ul> |
|
908 Any other value will invoke a call to the UnknownEncodingHandler. |
|
909 </div> |
|
910 |
|
911 <pre class="fcndec" id="XML_ParserCreateNS"> |
|
912 XML_Parser XMLCALL |
|
913 XML_ParserCreateNS(const XML_Char *encoding, |
|
914 XML_Char sep); |
|
915 </pre> |
|
916 <div class="fcndef"> |
|
917 Constructs a new parser that has namespace processing in effect. Namespace |
|
918 expanded element names and attribute names are returned as a concatenation |
|
919 of the namespace URI, <em>sep</em>, and the local part of the name. This |
|
920 means that you should pick a character for <em>sep</em> that can't be |
|
921 part of a legal URI. There is a special case when <em>sep</em> is the null |
|
922 character <code>'\0'</code>: the namespace URI and the local part will be |
|
923 concatenated without any separator - this is intended to support RDF processors. |
|
924 It is a programming error to use the null separator with |
|
925 <a href= "#XML_SetReturnNSTriplet">namespace triplets</a>.</div> |
|
926 |
|
927 <pre class="fcndec" id="XML_ParserCreate_MM"> |
|
928 XML_Parser XMLCALL |
|
929 XML_ParserCreate_MM(const XML_Char *encoding, |
|
930 const XML_Memory_Handling_Suite *ms, |
|
931 const XML_Char *sep); |
|
932 </pre> |
|
933 <pre class="signature"> |
|
934 typedef struct { |
|
935 void *(XMLCALL *malloc_fcn)(size_t size); |
|
936 void *(XMLCALL *realloc_fcn)(void *ptr, size_t size); |
|
937 void (XMLCALL *free_fcn)(void *ptr); |
|
938 } XML_Memory_Handling_Suite; |
|
939 </pre> |
|
940 <div class="fcndef"> |
|
941 <p>Construct a new parser using the suite of memory handling functions |
|
942 specified in <code>ms</code>. If <code>ms</code> is NULL, then use the |
|
943 standard set of memory management functions. If <code>sep</code> is |
|
944 non NULL, then namespace processing is enabled in the created parser |
|
945 and the character pointed at by sep is used as the separator between |
|
946 the namespace URI and the local part of the name.</p> |
|
947 </div> |
|
948 |
|
949 <pre class="fcndec" id="XML_ExternalEntityParserCreate"> |
|
950 XML_Parser XMLCALL |
|
951 XML_ExternalEntityParserCreate(XML_Parser p, |
|
952 const XML_Char *context, |
|
953 const XML_Char *encoding); |
|
954 </pre> |
|
955 <div class="fcndef"> |
|
956 Construct a new <code>XML_Parser</code> object for parsing an external |
|
957 general entity. Context is the context argument passed in a call to a |
|
958 ExternalEntityRefHandler. Other state information such as handlers, |
|
959 user data, namespace processing is inherited from the parser passed as |
|
960 the 1st argument. So you shouldn't need to call any of the behavior |
|
961 changing functions on this parser (unless you want it to act |
|
962 differently than the parent parser). |
|
963 </div> |
|
964 |
|
965 <pre class="fcndec" id="XML_ParserFree"> |
|
966 void XMLCALL |
|
967 XML_ParserFree(XML_Parser p); |
|
968 </pre> |
|
969 <div class="fcndef"> |
|
970 Free memory used by the parser. Your application is responsible for |
|
971 freeing any memory associated with <a href="#userdata">user data</a>. |
|
972 </div> |
|
973 |
|
974 <pre class="fcndec" id="XML_ParserReset"> |
|
975 XML_Bool XMLCALL |
|
976 XML_ParserReset(XML_Parser p, |
|
977 const XML_Char *encoding); |
|
978 </pre> |
|
979 <div class="fcndef"> |
|
980 Clean up the memory structures maintained by the parser so that it may |
|
981 be used again. After this has been called, <code>parser</code> is |
|
982 ready to start parsing a new document. All handlers are cleared from |
|
983 the parser, except for the unknownEncodingHandler. The parser's external |
|
984 state is re-initialized except for the values of ns and ns_triplets. |
|
985 This function may not be used on a parser created using <code><a href= |
|
986 "#XML_ExternalEntityParserCreate" >XML_ExternalEntityParserCreate</a |
|
987 ></code>; it will return <code>XML_FALSE</code> in that case. Returns |
|
988 <code>XML_TRUE</code> on success. Your application is responsible for |
|
989 dealing with any memory associated with <a href="#userdata">user data</a>. |
|
990 </div> |
|
991 |
|
992 <h3><a name="parsing">Parsing</a></h3> |
|
993 |
|
994 <p>To state the obvious: the three parsing functions <code><a href= |
|
995 "#XML_Parse" >XML_Parse</a></code>, <code><a href= "#XML_ParseBuffer"> |
|
996 XML_ParseBuffer</a></code> and <code><a href= "#XML_GetBuffer"> |
|
997 XML_GetBuffer</a></code> must not be called from within a handler |
|
998 unless they operate on a separate parser instance, that is, one that |
|
999 did not call the handler. For example, it is OK to call the parsing |
|
1000 functions from within an <code>XML_ExternalEntityRefHandler</code>, |
|
1001 if they apply to the parser created by |
|
1002 <code><a href= "#XML_ExternalEntityParserCreate" |
|
1003 >XML_ExternalEntityParserCreate</a></code>.</p> |
|
1004 |
|
1005 <p>Note: the <code>len</code> argument passed to these functions |
|
1006 should be considerably less than the maximum value for an integer, |
|
1007 as it could create an integer overflow situation if the added |
|
1008 lengths of a buffer and the unprocessed portion of the previous buffer |
|
1009 exceed the maximum integer value. Input data at the end of a buffer |
|
1010 will remain unprocessed if it is part of an XML token for which the |
|
1011 end is not part of that buffer.</p> |
|
1012 |
|
1013 <pre class="fcndec" id="XML_Parse"> |
|
1014 enum XML_Status XMLCALL |
|
1015 XML_Parse(XML_Parser p, |
|
1016 const char *s, |
|
1017 int len, |
|
1018 int isFinal); |
|
1019 </pre> |
|
1020 <pre class="signature"> |
|
1021 enum XML_Status { |
|
1022 XML_STATUS_ERROR = 0, |
|
1023 XML_STATUS_OK = 1 |
|
1024 }; |
|
1025 </pre> |
|
1026 <div class="fcndef"> |
|
1027 Parse some more of the document. The string <code>s</code> is a buffer |
|
1028 containing part (or perhaps all) of the document. The number of bytes of s |
|
1029 that are part of the document is indicated by <code>len</code>. This means |
|
1030 that <code>s</code> doesn't have to be null terminated. It also means that |
|
1031 if <code>len</code> is larger than the number of bytes in the block of |
|
1032 memory that <code>s</code> points at, then a memory fault is likely. The |
|
1033 <code>isFinal</code> parameter informs the parser that this is the last |
|
1034 piece of the document. Frequently, the last piece is empty (i.e. |
|
1035 <code>len</code> is zero.) |
|
1036 If a parse error occurred, it returns <code>XML_STATUS_ERROR</code>. |
|
1037 Otherwise it returns <code>XML_STATUS_OK</code> value. |
|
1038 </div> |
|
1039 |
|
1040 <pre class="fcndec" id="XML_ParseBuffer"> |
|
1041 enum XML_Status XMLCALL |
|
1042 XML_ParseBuffer(XML_Parser p, |
|
1043 int len, |
|
1044 int isFinal); |
|
1045 </pre> |
|
1046 <div class="fcndef"> |
|
1047 This is just like <code><a href= "#XML_Parse" >XML_Parse</a></code>, |
|
1048 except in this case Expat provides the buffer. By obtaining the |
|
1049 buffer from Expat with the <code><a href= "#XML_GetBuffer" |
|
1050 >XML_GetBuffer</a></code> function, the application can avoid double |
|
1051 copying of the input. |
|
1052 </div> |
|
1053 |
|
1054 <pre class="fcndec" id="XML_GetBuffer"> |
|
1055 void * XMLCALL |
|
1056 XML_GetBuffer(XML_Parser p, |
|
1057 int len); |
|
1058 </pre> |
|
1059 <div class="fcndef"> |
|
1060 Obtain a buffer of size <code>len</code> to read a piece of the document |
|
1061 into. A NULL value is returned if Expat can't allocate enough memory for |
|
1062 this buffer. This has to be called prior to every call to |
|
1063 <code><a href= "#XML_ParseBuffer" >XML_ParseBuffer</a></code>. A |
|
1064 typical use would look like this: |
|
1065 |
|
1066 <pre class="eg"> |
|
1067 for (;;) { |
|
1068 int bytes_read; |
|
1069 void *buff = XML_GetBuffer(p, BUFF_SIZE); |
|
1070 if (buff == NULL) { |
|
1071 /* handle error */ |
|
1072 } |
|
1073 |
|
1074 bytes_read = read(docfd, buff, BUFF_SIZE); |
|
1075 if (bytes_read < 0) { |
|
1076 /* handle error */ |
|
1077 } |
|
1078 |
|
1079 if (! XML_ParseBuffer(p, bytes_read, bytes_read == 0)) { |
|
1080 /* handle parse error */ |
|
1081 } |
|
1082 |
|
1083 if (bytes_read == 0) |
|
1084 break; |
|
1085 } |
|
1086 </pre> |
|
1087 </div> |
|
1088 |
|
1089 <pre class="fcndec" id="XML_StopParser"> |
|
1090 enum XML_Status XMLCALL |
|
1091 XML_StopParser(XML_Parser p, |
|
1092 XML_Bool resumable); |
|
1093 </pre> |
|
1094 <div class="fcndef"> |
|
1095 |
|
1096 <p>Stops parsing, causing <code><a href= "#XML_Parse" |
|
1097 >XML_Parse</a></code> or <code><a href= "#XML_ParseBuffer" |
|
1098 >XML_ParseBuffer</a></code> to return. Must be called from within a |
|
1099 call-back handler, except when aborting (when <code>resumable</code> |
|
1100 is <code>XML_FALSE</code>) an already suspended parser. Some |
|
1101 call-backs may still follow because they would otherwise get |
|
1102 lost, including |
|
1103 <ul> |
|
1104 <li> the end element handler for empty elements when stopped in the |
|
1105 start element handler,</li> |
|
1106 <li> the end namespace declaration handler when stopped in the end |
|
1107 element handler,</li> |
|
1108 <li> the character data handler when stopped in the character data handler |
|
1109 while making multiple call-backs on a contiguous chunk of characters,</li> |
|
1110 </ul> |
|
1111 and possibly others.</p> |
|
1112 |
|
1113 <p>This can be called from most handlers, including DTD related |
|
1114 call-backs, except when parsing an external parameter entity and |
|
1115 <code>resumable</code> is <code>XML_TRUE</code>. Returns |
|
1116 <code>XML_STATUS_OK</code> when successful, |
|
1117 <code>XML_STATUS_ERROR</code> otherwise. The possible error codes |
|
1118 are:</p> |
|
1119 <dl> |
|
1120 <dt><code>XML_ERROR_SUSPENDED</code></dt> |
|
1121 <dd>when suspending an already suspended parser.</dd> |
|
1122 <dt><code>XML_ERROR_FINISHED</code></dt> |
|
1123 <dd>when the parser has already finished.</dd> |
|
1124 <dt><code>XML_ERROR_SUSPEND_PE</code></dt> |
|
1125 <dd>when suspending while parsing an external PE.</dd> |
|
1126 </dl> |
|
1127 |
|
1128 <p>Since the stop/resume feature requires application support in the |
|
1129 outer parsing loop, it is an error to call this function for a parser |
|
1130 not being handled appropriately; see <a href= "#stop-resume" |
|
1131 >Temporarily Stopping Parsing</a> for more information.</p> |
|
1132 |
|
1133 <p>When <code>resumable</code> is <code>XML_TRUE</code> then parsing |
|
1134 is <em>suspended</em>, that is, <code><a href= "#XML_Parse" |
|
1135 >XML_Parse</a></code> and <code><a href= "#XML_ParseBuffer" |
|
1136 >XML_ParseBuffer</a></code> return <code>XML_STATUS_SUSPENDED</code>. |
|
1137 Otherwise, parsing is <em>aborted</em>, that is, <code><a href= |
|
1138 "#XML_Parse" >XML_Parse</a></code> and <code><a href= |
|
1139 "#XML_ParseBuffer" >XML_ParseBuffer</a></code> return |
|
1140 <code>XML_STATUS_ERROR</code> with error code |
|
1141 <code>XML_ERROR_ABORTED</code>.</p> |
|
1142 |
|
1143 <p><strong>Note:</strong> |
|
1144 This will be applied to the current parser instance only, that is, if |
|
1145 there is a parent parser then it will continue parsing when the |
|
1146 external entity reference handler returns. It is up to the |
|
1147 implementation of that handler to call <code><a href= |
|
1148 "#XML_StopParser" >XML_StopParser</a></code> on the parent parser |
|
1149 (recursively), if one wants to stop parsing altogether.</p> |
|
1150 |
|
1151 <p>When suspended, parsing can be resumed by calling <code><a href= |
|
1152 "#XML_ResumeParser" >XML_ResumeParser</a></code>.</p> |
|
1153 |
|
1154 <p>New in Expat 1.95.8.</p> |
|
1155 </div> |
|
1156 |
|
1157 <pre class="fcndec" id="XML_ResumeParser"> |
|
1158 enum XML_Status XMLCALL |
|
1159 XML_ResumeParser(XML_Parser p); |
|
1160 </pre> |
|
1161 <div class="fcndef"> |
|
1162 <p>Resumes parsing after it has been suspended with <code><a href= |
|
1163 "#XML_StopParser" >XML_StopParser</a></code>. Must not be called from |
|
1164 within a handler call-back. Returns same status codes as <code><a |
|
1165 href= "#XML_Parse">XML_Parse</a></code> or <code><a href= |
|
1166 "#XML_ParseBuffer" >XML_ParseBuffer</a></code>. An additional error |
|
1167 code, <code>XML_ERROR_NOT_SUSPENDED</code>, will be returned if the |
|
1168 parser was not currently suspended.</p> |
|
1169 |
|
1170 <p><strong>Note:</strong> |
|
1171 This must be called on the most deeply nested child parser instance |
|
1172 first, and on its parent parser only after the child parser has |
|
1173 finished, to be applied recursively until the document entity's parser |
|
1174 is restarted. That is, the parent parser will not resume by itself |
|
1175 and it is up to the application to call <code><a href= |
|
1176 "#XML_ResumeParser" >XML_ResumeParser</a></code> on it at the |
|
1177 appropriate moment.</p> |
|
1178 |
|
1179 <p>New in Expat 1.95.8.</p> |
|
1180 </div> |
|
1181 |
|
1182 <pre class="fcndec" id="XML_GetParsingStatus"> |
|
1183 void XMLCALL |
|
1184 XML_GetParsingStatus(XML_Parser p, |
|
1185 XML_ParsingStatus *status); |
|
1186 </pre> |
|
1187 <pre class="signature"> |
|
1188 enum XML_Parsing { |
|
1189 XML_INITIALIZED, |
|
1190 XML_PARSING, |
|
1191 XML_FINISHED, |
|
1192 XML_SUSPENDED |
|
1193 }; |
|
1194 |
|
1195 typedef struct { |
|
1196 enum XML_Parsing parsing; |
|
1197 XML_Bool finalBuffer; |
|
1198 } XML_ParsingStatus; |
|
1199 </pre> |
|
1200 <div class="fcndef"> |
|
1201 <p>Returns status of parser with respect to being initialized, |
|
1202 parsing, finished, or suspended, and whether the final buffer is being |
|
1203 processed. The <code>status</code> parameter <em>must not</em> be |
|
1204 NULL.</p> |
|
1205 |
|
1206 <p>New in Expat 1.95.8.</p> |
|
1207 </div> |
|
1208 |
|
1209 |
|
1210 <h3><a name="setting">Handler Setting</a></h3> |
|
1211 |
|
1212 <p>Although handlers are typically set prior to parsing and left alone, an |
|
1213 application may choose to set or change the handler for a parsing event |
|
1214 while the parse is in progress. For instance, your application may choose |
|
1215 to ignore all text not descended from a <code>para</code> element. One |
|
1216 way it could do this is to set the character handler when a para start tag |
|
1217 is seen, and unset it for the corresponding end tag.</p> |
|
1218 |
|
1219 <p>A handler may be <em>unset</em> by providing a NULL pointer to the |
|
1220 appropriate handler setter. None of the handler setting functions have |
|
1221 a return value.</p> |
|
1222 |
|
1223 <p>Your handlers will be receiving strings in arrays of type |
|
1224 <code>XML_Char</code>. This type is conditionally defined in expat.h as |
|
1225 either <code>char</code>, <code>wchar_t</code> or <code>unsigned short</code>. |
|
1226 The former implies UTF-8 encoding, the latter two imply UTF-16 encoding. |
|
1227 Note that you'll receive them in this form independent of the original |
|
1228 encoding of the document.</p> |
|
1229 |
|
1230 <div class="handler"> |
|
1231 <pre class="setter" id="XML_SetStartElementHandler"> |
|
1232 void XMLCALL |
|
1233 XML_SetStartElementHandler(XML_Parser p, |
|
1234 XML_StartElementHandler start); |
|
1235 </pre> |
|
1236 <pre class="signature"> |
|
1237 typedef void |
|
1238 (XMLCALL *XML_StartElementHandler)(void *userData, |
|
1239 const XML_Char *name, |
|
1240 const XML_Char **atts); |
|
1241 </pre> |
|
1242 <p>Set handler for start (and empty) tags. Attributes are passed to the start |
|
1243 handler as a pointer to a vector of char pointers. Each attribute seen in |
|
1244 a start (or empty) tag occupies 2 consecutive places in this vector: the |
|
1245 attribute name followed by the attribute value. These pairs are terminated |
|
1246 by a null pointer.</p> |
|
1247 <p>Note that an empty tag generates a call to both start and end handlers |
|
1248 (in that order).</p> |
|
1249 </div> |
|
1250 |
|
1251 <div class="handler"> |
|
1252 <pre class="setter" id="XML_SetEndElementHandler"> |
|
1253 void XMLCALL |
|
1254 XML_SetEndElementHandler(XML_Parser p, |
|
1255 XML_EndElementHandler); |
|
1256 </pre> |
|
1257 <pre class="signature"> |
|
1258 typedef void |
|
1259 (XMLCALL *XML_EndElementHandler)(void *userData, |
|
1260 const XML_Char *name); |
|
1261 </pre> |
|
1262 <p>Set handler for end (and empty) tags. As noted above, an empty tag |
|
1263 generates a call to both start and end handlers.</p> |
|
1264 </div> |
|
1265 |
|
1266 <div class="handler"> |
|
1267 <pre class="setter" id="XML_SetElementHandler"> |
|
1268 void XMLCALL |
|
1269 XML_SetElementHandler(XML_Parser p, |
|
1270 XML_StartElementHandler start, |
|
1271 XML_EndElementHandler end); |
|
1272 </pre> |
|
1273 <p>Set handlers for start and end tags with one call.</p> |
|
1274 </div> |
|
1275 |
|
1276 <div class="handler"> |
|
1277 <pre class="setter" id="XML_SetCharacterDataHandler"> |
|
1278 void XMLCALL |
|
1279 XML_SetCharacterDataHandler(XML_Parser p, |
|
1280 XML_CharacterDataHandler charhndl) |
|
1281 </pre> |
|
1282 <pre class="signature"> |
|
1283 typedef void |
|
1284 (XMLCALL *XML_CharacterDataHandler)(void *userData, |
|
1285 const XML_Char *s, |
|
1286 int len); |
|
1287 </pre> |
|
1288 <p>Set a text handler. The string your handler receives |
|
1289 is <em>NOT nul-terminated</em>. You have to use the length argument |
|
1290 to deal with the end of the string. A single block of contiguous text |
|
1291 free of markup may still result in a sequence of calls to this handler. |
|
1292 In other words, if you're searching for a pattern in the text, it may |
|
1293 be split across calls to this handler. Note: Setting this handler to NULL |
|
1294 may <em>NOT immediately</em> terminate call-backs if the parser is currently |
|
1295 processing such a single block of contiguous markup-free text, as the parser |
|
1296 will continue calling back until the end of the block is reached.</p> |
|
1297 </div> |
|
1298 |
|
1299 <div class="handler"> |
|
1300 <pre class="setter" id="XML_SetProcessingInstructionHandler"> |
|
1301 void XMLCALL |
|
1302 XML_SetProcessingInstructionHandler(XML_Parser p, |
|
1303 XML_ProcessingInstructionHandler proc) |
|
1304 </pre> |
|
1305 <pre class="signature"> |
|
1306 typedef void |
|
1307 (XMLCALL *XML_ProcessingInstructionHandler)(void *userData, |
|
1308 const XML_Char *target, |
|
1309 const XML_Char *data); |
|
1310 |
|
1311 </pre> |
|
1312 <p>Set a handler for processing instructions. The target is the first word |
|
1313 in the processing instruction. The data is the rest of the characters in |
|
1314 it after skipping all whitespace after the initial word.</p> |
|
1315 </div> |
|
1316 |
|
1317 <div class="handler"> |
|
1318 <pre class="setter" id="XML_SetCommentHandler"> |
|
1319 void XMLCALL |
|
1320 XML_SetCommentHandler(XML_Parser p, |
|
1321 XML_CommentHandler cmnt) |
|
1322 </pre> |
|
1323 <pre class="signature"> |
|
1324 typedef void |
|
1325 (XMLCALL *XML_CommentHandler)(void *userData, |
|
1326 const XML_Char *data); |
|
1327 </pre> |
|
1328 <p>Set a handler for comments. The data is all text inside the comment |
|
1329 delimiters.</p> |
|
1330 </div> |
|
1331 |
|
1332 <div class="handler"> |
|
1333 <pre class="setter" id="XML_SetStartCdataSectionHandler"> |
|
1334 void XMLCALL |
|
1335 XML_SetStartCdataSectionHandler(XML_Parser p, |
|
1336 XML_StartCdataSectionHandler start); |
|
1337 </pre> |
|
1338 <pre class="signature"> |
|
1339 typedef void |
|
1340 (XMLCALL *XML_StartCdataSectionHandler)(void *userData); |
|
1341 </pre> |
|
1342 <p>Set a handler that gets called at the beginning of a CDATA section.</p> |
|
1343 </div> |
|
1344 |
|
1345 <div class="handler"> |
|
1346 <pre class="setter" id="XML_SetEndCdataSectionHandler"> |
|
1347 void XMLCALL |
|
1348 XML_SetEndCdataSectionHandler(XML_Parser p, |
|
1349 XML_EndCdataSectionHandler end); |
|
1350 </pre> |
|
1351 <pre class="signature"> |
|
1352 typedef void |
|
1353 (XMLCALL *XML_EndCdataSectionHandler)(void *userData); |
|
1354 </pre> |
|
1355 <p>Set a handler that gets called at the end of a CDATA section.</p> |
|
1356 </div> |
|
1357 |
|
1358 <div class="handler"> |
|
1359 <pre class="setter" id="XML_SetCdataSectionHandler"> |
|
1360 void XMLCALL |
|
1361 XML_SetCdataSectionHandler(XML_Parser p, |
|
1362 XML_StartCdataSectionHandler start, |
|
1363 XML_EndCdataSectionHandler end) |
|
1364 </pre> |
|
1365 <p>Sets both CDATA section handlers with one call.</p> |
|
1366 </div> |
|
1367 |
|
1368 <div class="handler"> |
|
1369 <pre class="setter" id="XML_SetDefaultHandler"> |
|
1370 void XMLCALL |
|
1371 XML_SetDefaultHandler(XML_Parser p, |
|
1372 XML_DefaultHandler hndl) |
|
1373 </pre> |
|
1374 <pre class="signature"> |
|
1375 typedef void |
|
1376 (XMLCALL *XML_DefaultHandler)(void *userData, |
|
1377 const XML_Char *s, |
|
1378 int len); |
|
1379 </pre> |
|
1380 |
|
1381 <p>Sets a handler for any characters in the document which wouldn't |
|
1382 otherwise be handled. This includes both data for which no handlers |
|
1383 can be set (like some kinds of DTD declarations) and data which could |
|
1384 be reported but which currently has no handler set. The characters |
|
1385 are passed exactly as they were present in the XML document except |
|
1386 that they will be encoded in UTF-8 or UTF-16. Line boundaries are not |
|
1387 normalized. Note that a byte order mark character is not passed to the |
|
1388 default handler. There are no guarantees about how characters are |
|
1389 divided between calls to the default handler: for example, a comment |
|
1390 might be split between multiple calls. Setting the handler with |
|
1391 this call has the side effect of turning off expansion of references |
|
1392 to internally defined general entities. Instead these references are |
|
1393 passed to the default handler.</p> |
|
1394 |
|
1395 <p>See also <code><a |
|
1396 href="#XML_DefaultCurrent">XML_DefaultCurrent</a></code>.</p> |
|
1397 </div> |
|
1398 |
|
1399 <div class="handler"> |
|
1400 <pre class="setter" id="XML_SetDefaultHandlerExpand"> |
|
1401 void XMLCALL |
|
1402 XML_SetDefaultHandlerExpand(XML_Parser p, |
|
1403 XML_DefaultHandler hndl) |
|
1404 </pre> |
|
1405 <pre class="signature"> |
|
1406 typedef void |
|
1407 (XMLCALL *XML_DefaultHandler)(void *userData, |
|
1408 const XML_Char *s, |
|
1409 int len); |
|
1410 </pre> |
|
1411 <p>This sets a default handler, but doesn't inhibit the expansion of |
|
1412 internal entity references. The entity reference will not be passed |
|
1413 to the default handler.</p> |
|
1414 |
|
1415 <p>See also <code><a |
|
1416 href="#XML_DefaultCurrent">XML_DefaultCurrent</a></code>.</p> |
|
1417 </div> |
|
1418 |
|
1419 <div class="handler"> |
|
1420 <pre class="setter" id="XML_SetExternalEntityRefHandler"> |
|
1421 void XMLCALL |
|
1422 XML_SetExternalEntityRefHandler(XML_Parser p, |
|
1423 XML_ExternalEntityRefHandler hndl) |
|
1424 </pre> |
|
1425 <pre class="signature"> |
|
1426 typedef int |
|
1427 (XMLCALL *XML_ExternalEntityRefHandler)(XML_Parser p, |
|
1428 const XML_Char *context, |
|
1429 const XML_Char *base, |
|
1430 const XML_Char *systemId, |
|
1431 const XML_Char *publicId); |
|
1432 </pre> |
|
1433 <p>Set an external entity reference handler. This handler is also |
|
1434 called for processing an external DTD subset if parameter entity parsing |
|
1435 is in effect. (See <a href="#XML_SetParamEntityParsing"> |
|
1436 <code>XML_SetParamEntityParsing</code></a>.)</p> |
|
1437 |
|
1438 <p>The <code>context</code> parameter specifies the parsing context in |
|
1439 the format expected by the <code>context</code> argument to <code><a |
|
1440 href="#XML_ExternalEntityParserCreate" |
|
1441 >XML_ExternalEntityParserCreate</a></code>. <code>code</code> is |
|
1442 valid only until the handler returns, so if the referenced entity is |
|
1443 to be parsed later, it must be copied. <code>context</code> is NULL |
|
1444 only when the entity is a parameter entity, which is how one can |
|
1445 differentiate between general and parameter entities.</p> |
|
1446 |
|
1447 <p>The <code>base</code> parameter is the base to use for relative |
|
1448 system identifiers. It is set by <code><a |
|
1449 href="#XML_SetBase">XML_SetBase</a></code> and may be NULL. The |
|
1450 <code>publicId</code> parameter is the public id given in the entity |
|
1451 declaration and may be NULL. <code>systemId</code> is the system |
|
1452 identifier specified in the entity declaration and is never NULL.</p> |
|
1453 |
|
1454 <p>There are a couple of ways in which this handler differs from |
|
1455 others. First, this handler returns a status indicator (an |
|
1456 integer). <code>XML_STATUS_OK</code> should be returned for successful |
|
1457 handling of the external entity reference. Returning |
|
1458 <code>XML_STATUS_ERROR</code> indicates failure, and causes the |
|
1459 calling parser to return an |
|
1460 <code>XML_ERROR_EXTERNAL_ENTITY_HANDLING</code> error.</p> |
|
1461 |
|
1462 <p>Second, instead of having the user data as its first argument, it |
|
1463 receives the parser that encountered the entity reference. This, along |
|
1464 with the context parameter, may be used as arguments to a call to |
|
1465 <code><a href= "#XML_ExternalEntityParserCreate" |
|
1466 >XML_ExternalEntityParserCreate</a></code>. Using the returned |
|
1467 parser, the body of the external entity can be recursively parsed.</p> |
|
1468 |
|
1469 <p>Since this handler may be called recursively, it should not be saving |
|
1470 information into global or static variables.</p> |
|
1471 </div> |
|
1472 |
|
1473 <pre class="fcndec" id="XML_SetExternalEntityRefHandlerArg"> |
|
1474 void XMLCALL |
|
1475 XML_SetExternalEntityRefHandlerArg(XML_Parser p, |
|
1476 void *arg) |
|
1477 </pre> |
|
1478 <div class="fcndef"> |
|
1479 <p>Set the argument passed to the ExternalEntityRefHandler. If |
|
1480 <code>arg</code> is not NULL, it is the new value passed to the |
|
1481 handler set using <code><a href="#XML_SetExternalEntityRefHandler" |
|
1482 >XML_SetExternalEntityRefHandler</a></code>; if <code>arg</code> is |
|
1483 NULL, the argument passed to the handler function will be the parser |
|
1484 object itself.</p> |
|
1485 |
|
1486 <p><strong>Note:</strong> |
|
1487 The type of <code>arg</code> and the type of the first argument to the |
|
1488 ExternalEntityRefHandler do not match. This function takes a |
|
1489 <code>void *</code> to be passed to the handler, while the handler |
|
1490 accepts an <code>XML_Parser</code>. This is a historical accident, |
|
1491 but will not be corrected before Expat 2.0 (at the earliest) to avoid |
|
1492 causing compiler warnings for code that's known to work with this |
|
1493 API. It is the responsibility of the application code to know the |
|
1494 actual type of the argument passed to the handler and to manage it |
|
1495 properly.</p> |
|
1496 </div> |
|
1497 |
|
1498 <div class="handler"> |
|
1499 <pre class="setter" id="XML_SetSkippedEntityHandler"> |
|
1500 void XMLCALL |
|
1501 XML_SetSkippedEntityHandler(XML_Parser p, |
|
1502 XML_SkippedEntityHandler handler) |
|
1503 </pre> |
|
1504 <pre class="signature"> |
|
1505 typedef void |
|
1506 (XMLCALL *XML_SkippedEntityHandler)(void *userData, |
|
1507 const XML_Char *entityName, |
|
1508 int is_parameter_entity); |
|
1509 </pre> |
|
1510 <p>Set a skipped entity handler. This is called in two situations:</p> |
|
1511 <ol> |
|
1512 <li>An entity reference is encountered for which no declaration |
|
1513 has been read <em>and</em> this is not an error.</li> |
|
1514 <li>An internal entity reference is read, but not expanded, because |
|
1515 <a href="#XML_SetDefaultHandler"><code>XML_SetDefaultHandler</code></a> |
|
1516 has been called.</li> |
|
1517 </ol> |
|
1518 <p>The <code>is_parameter_entity</code> argument will be non-zero for |
|
1519 a parameter entity and zero for a general entity.</p> <p>Note: skipped |
|
1520 parameter entities in declarations and skipped general entities in |
|
1521 attribute values cannot be reported, because the event would be out of |
|
1522 sync with the reporting of the declarations or attribute values</p> |
|
1523 </div> |
|
1524 |
|
1525 <div class="handler"> |
|
1526 <pre class="setter" id="XML_SetUnknownEncodingHandler"> |
|
1527 void XMLCALL |
|
1528 XML_SetUnknownEncodingHandler(XML_Parser p, |
|
1529 XML_UnknownEncodingHandler enchandler, |
|
1530 void *encodingHandlerData) |
|
1531 </pre> |
|
1532 <pre class="signature"> |
|
1533 typedef int |
|
1534 (XMLCALL *XML_UnknownEncodingHandler)(void *encodingHandlerData, |
|
1535 const XML_Char *name, |
|
1536 XML_Encoding *info); |
|
1537 |
|
1538 typedef struct { |
|
1539 int map[256]; |
|
1540 void *data; |
|
1541 int (XMLCALL *convert)(void *data, const char *s); |
|
1542 void (XMLCALL *release)(void *data); |
|
1543 } XML_Encoding; |
|
1544 </pre> |
|
1545 <p>Set a handler to deal with encodings other than the <a |
|
1546 href="#builtin_encodings">built in set</a>. This should be done before |
|
1547 <code><a href= "#XML_Parse" >XML_Parse</a></code> or <code><a href= |
|
1548 "#XML_ParseBuffer" >XML_ParseBuffer</a></code> have been called on the |
|
1549 given parser.</p> <p>If the handler knows how to deal with an encoding |
|
1550 with the given name, it should fill in the <code>info</code> data |
|
1551 structure and return <code>XML_STATUS_OK</code>. Otherwise it |
|
1552 should return <code>XML_STATUS_ERROR</code>. The handler will be called |
|
1553 at most once per parsed (external) entity. The optional application |
|
1554 data pointer <code>encodingHandlerData</code> will be passed back to |
|
1555 the handler.</p> |
|
1556 |
|
1557 <p>The map array contains information for every possible possible leading |
|
1558 byte in a byte sequence. If the corresponding value is >= 0, then it's |
|
1559 a single byte sequence and the byte encodes that Unicode value. If the |
|
1560 value is -1, then that byte is invalid as the initial byte in a sequence. |
|
1561 If the value is -n, where n is an integer > 1, then n is the number of |
|
1562 bytes in the sequence and the actual conversion is accomplished by a |
|
1563 call to the function pointed at by convert. This function may return -1 |
|
1564 if the sequence itself is invalid. The convert pointer may be null if |
|
1565 there are only single byte codes. The data parameter passed to the convert |
|
1566 function is the data pointer from <code>XML_Encoding</code>. The |
|
1567 string s is <em>NOT</em> nul-terminated and points at the sequence of |
|
1568 bytes to be converted.</p> |
|
1569 |
|
1570 <p>The function pointed at by <code>release</code> is called by the |
|
1571 parser when it is finished with the encoding. It may be NULL.</p> |
|
1572 </div> |
|
1573 |
|
1574 <div class="handler"> |
|
1575 <pre class="setter" id="XML_SetStartNamespaceDeclHandler"> |
|
1576 void XMLCALL |
|
1577 XML_SetStartNamespaceDeclHandler(XML_Parser p, |
|
1578 XML_StartNamespaceDeclHandler start); |
|
1579 </pre> |
|
1580 <pre class="signature"> |
|
1581 typedef void |
|
1582 (XMLCALL *XML_StartNamespaceDeclHandler)(void *userData, |
|
1583 const XML_Char *prefix, |
|
1584 const XML_Char *uri); |
|
1585 </pre> |
|
1586 <p>Set a handler to be called when a namespace is declared. Namespace |
|
1587 declarations occur inside start tags. But the namespace declaration start |
|
1588 handler is called before the start tag handler for each namespace declared |
|
1589 in that start tag.</p> |
|
1590 </div> |
|
1591 |
|
1592 <div class="handler"> |
|
1593 <pre class="setter" id="XML_SetEndNamespaceDeclHandler"> |
|
1594 void XMLCALL |
|
1595 XML_SetEndNamespaceDeclHandler(XML_Parser p, |
|
1596 XML_EndNamespaceDeclHandler end); |
|
1597 </pre> |
|
1598 <pre class="signature"> |
|
1599 typedef void |
|
1600 (XMLCALL *XML_EndNamespaceDeclHandler)(void *userData, |
|
1601 const XML_Char *prefix); |
|
1602 </pre> |
|
1603 <p>Set a handler to be called when leaving the scope of a namespace |
|
1604 declaration. This will be called, for each namespace declaration, |
|
1605 after the handler for the end tag of the element in which the |
|
1606 namespace was declared.</p> |
|
1607 </div> |
|
1608 |
|
1609 <div class="handler"> |
|
1610 <pre class="setter" id="XML_SetNamespaceDeclHandler"> |
|
1611 void XMLCALL |
|
1612 XML_SetNamespaceDeclHandler(XML_Parser p, |
|
1613 XML_StartNamespaceDeclHandler start, |
|
1614 XML_EndNamespaceDeclHandler end) |
|
1615 </pre> |
|
1616 <p>Sets both namespace declaration handlers with a single call.</p> |
|
1617 </div> |
|
1618 |
|
1619 <div class="handler"> |
|
1620 <pre class="setter" id="XML_SetXmlDeclHandler"> |
|
1621 void XMLCALL |
|
1622 XML_SetXmlDeclHandler(XML_Parser p, |
|
1623 XML_XmlDeclHandler xmldecl); |
|
1624 </pre> |
|
1625 <pre class="signature"> |
|
1626 typedef void |
|
1627 (XMLCALL *XML_XmlDeclHandler)(void *userData, |
|
1628 const XML_Char *version, |
|
1629 const XML_Char *encoding, |
|
1630 int standalone); |
|
1631 </pre> |
|
1632 <p>Sets a handler that is called for XML declarations and also for |
|
1633 text declarations discovered in external entities. The way to |
|
1634 distinguish is that the <code>version</code> parameter will be NULL |
|
1635 for text declarations. The <code>encoding</code> parameter may be NULL |
|
1636 for an XML declaration. The <code>standalone</code> argument will |
|
1637 contain -1, 0, or 1 indicating respectively that there was no |
|
1638 standalone parameter in the declaration, that it was given as no, or |
|
1639 that it was given as yes.</p> |
|
1640 </div> |
|
1641 |
|
1642 <div class="handler"> |
|
1643 <pre class="setter" id="XML_SetStartDoctypeDeclHandler"> |
|
1644 void XMLCALL |
|
1645 XML_SetStartDoctypeDeclHandler(XML_Parser p, |
|
1646 XML_StartDoctypeDeclHandler start); |
|
1647 </pre> |
|
1648 <pre class="signature"> |
|
1649 typedef void |
|
1650 (XMLCALL *XML_StartDoctypeDeclHandler)(void *userData, |
|
1651 const XML_Char *doctypeName, |
|
1652 const XML_Char *sysid, |
|
1653 const XML_Char *pubid, |
|
1654 int has_internal_subset); |
|
1655 </pre> |
|
1656 <p>Set a handler that is called at the start of a DOCTYPE declaration, |
|
1657 before any external or internal subset is parsed. Both <code>sysid</code> |
|
1658 and <code>pubid</code> may be NULL. The <code>has_internal_subset</code> |
|
1659 will be non-zero if the DOCTYPE declaration has an internal subset.</p> |
|
1660 </div> |
|
1661 |
|
1662 <div class="handler"> |
|
1663 <pre class="setter" id="XML_SetEndDoctypeDeclHandler"> |
|
1664 void XMLCALL |
|
1665 XML_SetEndDoctypeDeclHandler(XML_Parser p, |
|
1666 XML_EndDoctypeDeclHandler end); |
|
1667 </pre> |
|
1668 <pre class="signature"> |
|
1669 typedef void |
|
1670 (XMLCALL *XML_EndDoctypeDeclHandler)(void *userData); |
|
1671 </pre> |
|
1672 <p>Set a handler that is called at the end of a DOCTYPE declaration, |
|
1673 after parsing any external subset.</p> |
|
1674 </div> |
|
1675 |
|
1676 <div class="handler"> |
|
1677 <pre class="setter" id="XML_SetDoctypeDeclHandler"> |
|
1678 void XMLCALL |
|
1679 XML_SetDoctypeDeclHandler(XML_Parser p, |
|
1680 XML_StartDoctypeDeclHandler start, |
|
1681 XML_EndDoctypeDeclHandler end); |
|
1682 </pre> |
|
1683 <p>Set both doctype handlers with one call.</p> |
|
1684 </div> |
|
1685 |
|
1686 <div class="handler"> |
|
1687 <pre class="setter" id="XML_SetElementDeclHandler"> |
|
1688 void XMLCALL |
|
1689 XML_SetElementDeclHandler(XML_Parser p, |
|
1690 XML_ElementDeclHandler eldecl); |
|
1691 </pre> |
|
1692 <pre class="signature"> |
|
1693 typedef void |
|
1694 (XMLCALL *XML_ElementDeclHandler)(void *userData, |
|
1695 const XML_Char *name, |
|
1696 XML_Content *model); |
|
1697 </pre> |
|
1698 <pre class="signature"> |
|
1699 enum XML_Content_Type { |
|
1700 XML_CTYPE_EMPTY = 1, |
|
1701 XML_CTYPE_ANY, |
|
1702 XML_CTYPE_MIXED, |
|
1703 XML_CTYPE_NAME, |
|
1704 XML_CTYPE_CHOICE, |
|
1705 XML_CTYPE_SEQ |
|
1706 }; |
|
1707 |
|
1708 enum XML_Content_Quant { |
|
1709 XML_CQUANT_NONE, |
|
1710 XML_CQUANT_OPT, |
|
1711 XML_CQUANT_REP, |
|
1712 XML_CQUANT_PLUS |
|
1713 }; |
|
1714 |
|
1715 typedef struct XML_cp XML_Content; |
|
1716 |
|
1717 struct XML_cp { |
|
1718 enum XML_Content_Type type; |
|
1719 enum XML_Content_Quant quant; |
|
1720 const XML_Char * name; |
|
1721 unsigned int numchildren; |
|
1722 XML_Content * children; |
|
1723 }; |
|
1724 </pre> |
|
1725 <p>Sets a handler for element declarations in a DTD. The handler gets |
|
1726 called with the name of the element in the declaration and a pointer |
|
1727 to a structure that contains the element model. It is the |
|
1728 application's responsibility to free this data structure using |
|
1729 <code><a href="#XML_FreeContentModel" |
|
1730 >XML_FreeContentModel</a></code>.</p> |
|
1731 |
|
1732 <p>The <code>model</code> argument is the root of a tree of |
|
1733 <code>XML_Content</code> nodes. If <code>type</code> equals |
|
1734 <code>XML_CTYPE_EMPTY</code> or <code>XML_CTYPE_ANY</code>, then |
|
1735 <code>quant</code> will be <code>XML_CQUANT_NONE</code>, and the other |
|
1736 fields will be zero or NULL. If <code>type</code> is |
|
1737 <code>XML_CTYPE_MIXED</code>, then <code>quant</code> will be |
|
1738 <code>XML_CQUANT_NONE</code> or <code>XML_CQUANT_REP</code> and |
|
1739 <code>numchildren</code> will contain the number of elements that are |
|
1740 allowed to be mixed in and <code>children</code> points to an array of |
|
1741 <code>XML_Content</code> structures that will all have type |
|
1742 XML_CTYPE_NAME with no quantification. Only the root node can be type |
|
1743 <code>XML_CTYPE_EMPTY</code>, <code>XML_CTYPE_ANY</code>, or |
|
1744 <code>XML_CTYPE_MIXED</code>.</p> |
|
1745 |
|
1746 <p>For type <code>XML_CTYPE_NAME</code>, the <code>name</code> field |
|
1747 points to the name and the <code>numchildren</code> and |
|
1748 <code>children</code> fields will be zero and NULL. The |
|
1749 <code>quant</code> field will indicate any quantifiers placed on the |
|
1750 name.</p> |
|
1751 |
|
1752 <p>Types <code>XML_CTYPE_CHOICE</code> and <code>XML_CTYPE_SEQ</code> |
|
1753 indicate a choice or sequence respectively. The |
|
1754 <code>numchildren</code> field indicates how many nodes in the choice |
|
1755 or sequence and <code>children</code> points to the nodes.</p> |
|
1756 </div> |
|
1757 |
|
1758 <div class="handler"> |
|
1759 <pre class="setter" id="XML_SetAttlistDeclHandler"> |
|
1760 void XMLCALL |
|
1761 XML_SetAttlistDeclHandler(XML_Parser p, |
|
1762 XML_AttlistDeclHandler attdecl); |
|
1763 </pre> |
|
1764 <pre class="signature"> |
|
1765 typedef void |
|
1766 (XMLCALL *XML_AttlistDeclHandler)(void *userData, |
|
1767 const XML_Char *elname, |
|
1768 const XML_Char *attname, |
|
1769 const XML_Char *att_type, |
|
1770 const XML_Char *dflt, |
|
1771 int isrequired); |
|
1772 </pre> |
|
1773 <p>Set a handler for attlist declarations in the DTD. This handler is |
|
1774 called for <em>each</em> attribute. So a single attlist declaration |
|
1775 with multiple attributes declared will generate multiple calls to this |
|
1776 handler. The <code>elname</code> parameter returns the name of the |
|
1777 element for which the attribute is being declared. The attribute name |
|
1778 is in the <code>attname</code> parameter. The attribute type is in the |
|
1779 <code>att_type</code> parameter. It is the string representing the |
|
1780 type in the declaration with whitespace removed.</p> |
|
1781 |
|
1782 <p>The <code>dflt</code> parameter holds the default value. It will be |
|
1783 NULL in the case of "#IMPLIED" or "#REQUIRED" attributes. You can |
|
1784 distinguish these two cases by checking the <code>isrequired</code> |
|
1785 parameter, which will be true in the case of "#REQUIRED" attributes. |
|
1786 Attributes which are "#FIXED" will have also have a true |
|
1787 <code>isrequired</code>, but they will have the non-NULL fixed value |
|
1788 in the <code>dflt</code> parameter.</p> |
|
1789 </div> |
|
1790 |
|
1791 <div class="handler"> |
|
1792 <pre class="setter" id="XML_SetEntityDeclHandler"> |
|
1793 void XMLCALL |
|
1794 XML_SetEntityDeclHandler(XML_Parser p, |
|
1795 XML_EntityDeclHandler handler); |
|
1796 </pre> |
|
1797 <pre class="signature"> |
|
1798 typedef void |
|
1799 (XMLCALL *XML_EntityDeclHandler)(void *userData, |
|
1800 const XML_Char *entityName, |
|
1801 int is_parameter_entity, |
|
1802 const XML_Char *value, |
|
1803 int value_length, |
|
1804 const XML_Char *base, |
|
1805 const XML_Char *systemId, |
|
1806 const XML_Char *publicId, |
|
1807 const XML_Char *notationName); |
|
1808 </pre> |
|
1809 <p>Sets a handler that will be called for all entity declarations. |
|
1810 The <code>is_parameter_entity</code> argument will be non-zero in the |
|
1811 case of parameter entities and zero otherwise.</p> |
|
1812 |
|
1813 <p>For internal entities (<code><!ENTITY foo "bar"></code>), |
|
1814 <code>value</code> will be non-NULL and <code>systemId</code>, |
|
1815 <code>publicId</code>, and <code>notationName</code> will all be NULL. |
|
1816 The value string is <em>not</em> NULL terminated; the length is |
|
1817 provided in the <code>value_length</code> parameter. Do not use |
|
1818 <code>value_length</code> to test for internal entities, since it is |
|
1819 legal to have zero-length values. Instead check for whether or not |
|
1820 <code>value</code> is NULL.</p> <p>The <code>notationName</code> |
|
1821 argument will have a non-NULL value only for unparsed entity |
|
1822 declarations.</p> |
|
1823 </div> |
|
1824 |
|
1825 <div class="handler"> |
|
1826 <pre class="setter" id="XML_SetUnparsedEntityDeclHandler"> |
|
1827 void XMLCALL |
|
1828 XML_SetUnparsedEntityDeclHandler(XML_Parser p, |
|
1829 XML_UnparsedEntityDeclHandler h) |
|
1830 </pre> |
|
1831 <pre class="signature"> |
|
1832 typedef void |
|
1833 (XMLCALL *XML_UnparsedEntityDeclHandler)(void *userData, |
|
1834 const XML_Char *entityName, |
|
1835 const XML_Char *base, |
|
1836 const XML_Char *systemId, |
|
1837 const XML_Char *publicId, |
|
1838 const XML_Char *notationName); |
|
1839 </pre> |
|
1840 <p>Set a handler that receives declarations of unparsed entities. These |
|
1841 are entity declarations that have a notation (NDATA) field:</p> |
|
1842 |
|
1843 <div id="eg"><pre> |
|
1844 <!ENTITY logo SYSTEM "images/logo.gif" NDATA gif> |
|
1845 </pre></div> |
|
1846 <p>This handler is obsolete and is provided for backwards |
|
1847 compatibility. Use instead <a href= "#XML_SetEntityDeclHandler" |
|
1848 >XML_SetEntityDeclHandler</a>.</p> |
|
1849 </div> |
|
1850 |
|
1851 <div class="handler"> |
|
1852 <pre class="setter" id="XML_SetNotationDeclHandler"> |
|
1853 void XMLCALL |
|
1854 XML_SetNotationDeclHandler(XML_Parser p, |
|
1855 XML_NotationDeclHandler h) |
|
1856 </pre> |
|
1857 <pre class="signature"> |
|
1858 typedef void |
|
1859 (XMLCALL *XML_NotationDeclHandler)(void *userData, |
|
1860 const XML_Char *notationName, |
|
1861 const XML_Char *base, |
|
1862 const XML_Char *systemId, |
|
1863 const XML_Char *publicId); |
|
1864 </pre> |
|
1865 <p>Set a handler that receives notation declarations.</p> |
|
1866 </div> |
|
1867 |
|
1868 <div class="handler"> |
|
1869 <pre class="setter" id="XML_SetNotStandaloneHandler"> |
|
1870 void XMLCALL |
|
1871 XML_SetNotStandaloneHandler(XML_Parser p, |
|
1872 XML_NotStandaloneHandler h) |
|
1873 </pre> |
|
1874 <pre class="signature"> |
|
1875 typedef int |
|
1876 (XMLCALL *XML_NotStandaloneHandler)(void *userData); |
|
1877 </pre> |
|
1878 <p>Set a handler that is called if the document is not "standalone". |
|
1879 This happens when there is an external subset or a reference to a |
|
1880 parameter entity, but does not have standalone set to "yes" in an XML |
|
1881 declaration. If this handler returns <code>XML_STATUS_ERROR</code>, |
|
1882 then the parser will throw an <code>XML_ERROR_NOT_STANDALONE</code> |
|
1883 error.</p> |
|
1884 </div> |
|
1885 |
|
1886 <h3><a name="position">Parse position and error reporting functions</a></h3> |
|
1887 |
|
1888 <p>These are the functions you'll want to call when the parse |
|
1889 functions return <code>XML_STATUS_ERROR</code> (a parse error has |
|
1890 occurred), although the position reporting functions are useful outside |
|
1891 of errors. The position reported is the byte position (in the original |
|
1892 document or entity encoding) of the first of the sequence of |
|
1893 characters that generated the current event (or the error that caused |
|
1894 the parse functions to return <code>XML_STATUS_ERROR</code>.) The |
|
1895 exceptions are callbacks trigged by declarations in the document |
|
1896 prologue, in which case they exact position reported is somewhere in the |
|
1897 relevant markup, but not necessarily as meaningful as for other |
|
1898 events.</p> |
|
1899 |
|
1900 <p>The position reporting functions are accurate only outside of the |
|
1901 DTD. In other words, they usually return bogus information when |
|
1902 called from within a DTD declaration handler.</p> |
|
1903 |
|
1904 <pre class="fcndec" id="XML_GetErrorCode"> |
|
1905 enum XML_Error XMLCALL |
|
1906 XML_GetErrorCode(XML_Parser p); |
|
1907 </pre> |
|
1908 <div class="fcndef"> |
|
1909 Return what type of error has occurred. |
|
1910 </div> |
|
1911 |
|
1912 <pre class="fcndec" id="XML_ErrorString"> |
|
1913 const XML_LChar * XMLCALL |
|
1914 XML_ErrorString(enum XML_Error code); |
|
1915 </pre> |
|
1916 <div class="fcndef"> |
|
1917 Return a string describing the error corresponding to code. |
|
1918 The code should be one of the enums that can be returned from |
|
1919 <code><a href= "#XML_GetErrorCode" >XML_GetErrorCode</a></code>. |
|
1920 </div> |
|
1921 |
|
1922 <pre class="fcndec" id="XML_GetCurrentByteIndex"> |
|
1923 XML_Index XMLCALL |
|
1924 XML_GetCurrentByteIndex(XML_Parser p); |
|
1925 </pre> |
|
1926 <div class="fcndef"> |
|
1927 Return the byte offset of the position. This always corresponds to |
|
1928 the values returned by <code><a href= "#XML_GetCurrentLineNumber" |
|
1929 >XML_GetCurrentLineNumber</a></code> and <code><a href= |
|
1930 "#XML_GetCurrentColumnNumber" >XML_GetCurrentColumnNumber</a></code>. |
|
1931 </div> |
|
1932 |
|
1933 <pre class="fcndec" id="XML_GetCurrentLineNumber"> |
|
1934 XML_Size XMLCALL |
|
1935 XML_GetCurrentLineNumber(XML_Parser p); |
|
1936 </pre> |
|
1937 <div class="fcndef"> |
|
1938 Return the line number of the position. The first line is reported as |
|
1939 <code>1</code>. |
|
1940 </div> |
|
1941 |
|
1942 <pre class="fcndec" id="XML_GetCurrentColumnNumber"> |
|
1943 XML_Size XMLCALL |
|
1944 XML_GetCurrentColumnNumber(XML_Parser p); |
|
1945 </pre> |
|
1946 <div class="fcndef"> |
|
1947 Return the offset, from the beginning of the current line, of |
|
1948 the position. |
|
1949 </div> |
|
1950 |
|
1951 <pre class="fcndec" id="XML_GetCurrentByteCount"> |
|
1952 int XMLCALL |
|
1953 XML_GetCurrentByteCount(XML_Parser p); |
|
1954 </pre> |
|
1955 <div class="fcndef"> |
|
1956 Return the number of bytes in the current event. Returns |
|
1957 <code>0</code> if the event is inside a reference to an internal |
|
1958 entity and for the end-tag event for empty element tags (the later can |
|
1959 be used to distinguish empty-element tags from empty elements using |
|
1960 separate start and end tags). |
|
1961 </div> |
|
1962 |
|
1963 <pre class="fcndec" id="XML_GetInputContext"> |
|
1964 const char * XMLCALL |
|
1965 XML_GetInputContext(XML_Parser p, |
|
1966 int *offset, |
|
1967 int *size); |
|
1968 </pre> |
|
1969 <div class="fcndef"> |
|
1970 |
|
1971 <p>Returns the parser's input buffer, sets the integer pointed at by |
|
1972 <code>offset</code> to the offset within this buffer of the current |
|
1973 parse position, and set the integer pointed at by <code>size</code> to |
|
1974 the size of the returned buffer.</p> |
|
1975 |
|
1976 <p>This should only be called from within a handler during an active |
|
1977 parse and the returned buffer should only be referred to from within |
|
1978 the handler that made the call. This input buffer contains the |
|
1979 untranslated bytes of the input.</p> |
|
1980 |
|
1981 <p>Only a limited amount of context is kept, so if the event |
|
1982 triggering a call spans over a very large amount of input, the actual |
|
1983 parse position may be before the beginning of the buffer.</p> |
|
1984 |
|
1985 <p>If <code>XML_CONTEXT_BYTES</code> is not defined, this will always |
|
1986 return NULL.</p> |
|
1987 </div> |
|
1988 |
|
1989 <h3><a name="miscellaneous">Miscellaneous functions</a></h3> |
|
1990 |
|
1991 <p>The functions in this section either obtain state information from |
|
1992 the parser or can be used to dynamicly set parser options.</p> |
|
1993 |
|
1994 <pre class="fcndec" id="XML_SetUserData"> |
|
1995 void XMLCALL |
|
1996 XML_SetUserData(XML_Parser p, |
|
1997 void *userData); |
|
1998 </pre> |
|
1999 <div class="fcndef"> |
|
2000 This sets the user data pointer that gets passed to handlers. It |
|
2001 overwrites any previous value for this pointer. Note that the |
|
2002 application is responsible for freeing the memory associated with |
|
2003 <code>userData</code> when it is finished with the parser. So if you |
|
2004 call this when there's already a pointer there, and you haven't freed |
|
2005 the memory associated with it, then you've probably just leaked |
|
2006 memory. |
|
2007 </div> |
|
2008 |
|
2009 <pre class="fcndec" id="XML_GetUserData"> |
|
2010 void * XMLCALL |
|
2011 XML_GetUserData(XML_Parser p); |
|
2012 </pre> |
|
2013 <div class="fcndef"> |
|
2014 This returns the user data pointer that gets passed to handlers. |
|
2015 It is actually implemented as a macro. |
|
2016 </div> |
|
2017 |
|
2018 <pre class="fcndec" id="XML_UseParserAsHandlerArg"> |
|
2019 void XMLCALL |
|
2020 XML_UseParserAsHandlerArg(XML_Parser p); |
|
2021 </pre> |
|
2022 <div class="fcndef"> |
|
2023 After this is called, handlers receive the parser in their |
|
2024 <code>userData</code> arguments. The user data can still be obtained |
|
2025 using the <code><a href= "#XML_GetUserData" |
|
2026 >XML_GetUserData</a></code> function. |
|
2027 </div> |
|
2028 |
|
2029 <pre class="fcndec" id="XML_SetBase"> |
|
2030 enum XML_Status XMLCALL |
|
2031 XML_SetBase(XML_Parser p, |
|
2032 const XML_Char *base); |
|
2033 </pre> |
|
2034 <div class="fcndef"> |
|
2035 Set the base to be used for resolving relative URIs in system |
|
2036 identifiers. The return value is <code>XML_STATUS_ERROR</code> if |
|
2037 there's no memory to store base, otherwise it's |
|
2038 <code>XML_STATUS_OK</code>. |
|
2039 </div> |
|
2040 |
|
2041 <pre class="fcndec" id="XML_GetBase"> |
|
2042 const XML_Char * XMLCALL |
|
2043 XML_GetBase(XML_Parser p); |
|
2044 </pre> |
|
2045 <div class="fcndef"> |
|
2046 Return the base for resolving relative URIs. |
|
2047 </div> |
|
2048 |
|
2049 <pre class="fcndec" id="XML_GetSpecifiedAttributeCount"> |
|
2050 int XMLCALL |
|
2051 XML_GetSpecifiedAttributeCount(XML_Parser p); |
|
2052 </pre> |
|
2053 <div class="fcndef"> |
|
2054 When attributes are reported to the start handler in the atts vector, |
|
2055 attributes that were explicitly set in the element occur before any |
|
2056 attributes that receive their value from default information in an |
|
2057 ATTLIST declaration. This function returns the number of attributes |
|
2058 that were explicitly set times two, thus giving the offset in the |
|
2059 <code>atts</code> array passed to the start tag handler of the first |
|
2060 attribute set due to defaults. It supplies information for the last |
|
2061 call to a start handler. If called inside a start handler, then that |
|
2062 means the current call. |
|
2063 </div> |
|
2064 |
|
2065 <pre class="fcndec" id="XML_GetIdAttributeIndex"> |
|
2066 int XMLCALL |
|
2067 XML_GetIdAttributeIndex(XML_Parser p); |
|
2068 </pre> |
|
2069 <div class="fcndef"> |
|
2070 Returns the index of the ID attribute passed in the atts array in the |
|
2071 last call to <code><a href= "#XML_StartElementHandler" |
|
2072 >XML_StartElementHandler</a></code>, or -1 if there is no ID |
|
2073 attribute. If called inside a start handler, then that means the |
|
2074 current call. |
|
2075 </div> |
|
2076 |
|
2077 <pre class="fcndec" id="XML_SetEncoding"> |
|
2078 enum XML_Status XMLCALL |
|
2079 XML_SetEncoding(XML_Parser p, |
|
2080 const XML_Char *encoding); |
|
2081 </pre> |
|
2082 <div class="fcndef"> |
|
2083 Set the encoding to be used by the parser. It is equivalent to |
|
2084 passing a non-null encoding argument to the parser creation functions. |
|
2085 It must not be called after <code><a href= "#XML_Parse" |
|
2086 >XML_Parse</a></code> or <code><a href= "#XML_ParseBuffer" |
|
2087 >XML_ParseBuffer</a></code> have been called on the given parser. |
|
2088 Returns <code>XML_STATUS_OK</code> on success or |
|
2089 <code>XML_STATUS_ERROR</code> on error. |
|
2090 </div> |
|
2091 |
|
2092 <pre class="fcndec" id="XML_SetParamEntityParsing"> |
|
2093 int XMLCALL |
|
2094 XML_SetParamEntityParsing(XML_Parser p, |
|
2095 enum XML_ParamEntityParsing code); |
|
2096 </pre> |
|
2097 <div class="fcndef"> |
|
2098 This enables parsing of parameter entities, including the external |
|
2099 parameter entity that is the external DTD subset, according to |
|
2100 <code>code</code>. |
|
2101 The choices for <code>code</code> are: |
|
2102 <ul> |
|
2103 <li><code>XML_PARAM_ENTITY_PARSING_NEVER</code></li> |
|
2104 <li><code>XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE</code></li> |
|
2105 <li><code>XML_PARAM_ENTITY_PARSING_ALWAYS</code></li> |
|
2106 </ul> |
|
2107 </div> |
|
2108 |
|
2109 <pre class="fcndec" id="XML_UseForeignDTD"> |
|
2110 enum XML_Error XMLCALL |
|
2111 XML_UseForeignDTD(XML_Parser parser, XML_Bool useDTD); |
|
2112 </pre> |
|
2113 <div class="fcndef"> |
|
2114 <p>This function allows an application to provide an external subset |
|
2115 for the document type declaration for documents which do not specify |
|
2116 an external subset of their own. For documents which specify an |
|
2117 external subset in their DOCTYPE declaration, the application-provided |
|
2118 subset will be ignored. If the document does not contain a DOCTYPE |
|
2119 declaration at all and <code>useDTD</code> is true, the |
|
2120 application-provided subset will be parsed, but the |
|
2121 <code>startDoctypeDeclHandler</code> and |
|
2122 <code>endDoctypeDeclHandler</code> functions, if set, will not be |
|
2123 called. The setting of parameter entity parsing, controlled using |
|
2124 <code><a href= "#XML_SetParamEntityParsing" |
|
2125 >XML_SetParamEntityParsing</a></code>, will be honored.</p> |
|
2126 |
|
2127 <p>The application-provided external subset is read by calling the |
|
2128 external entity reference handler set via <code><a href= |
|
2129 "#XML_SetExternalEntityRefHandler" |
|
2130 >XML_SetExternalEntityRefHandler</a></code> with both |
|
2131 <code>publicId</code> and <code>systemId</code> set to NULL.</p> |
|
2132 |
|
2133 <p>If this function is called after parsing has begun, it returns |
|
2134 <code>XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING</code> and ignores |
|
2135 <code>useDTD</code>. If called when Expat has been compiled without |
|
2136 DTD support, it returns |
|
2137 <code>XML_ERROR_FEATURE_REQUIRES_XML_DTD</code>. Otherwise, it |
|
2138 returns <code>XML_ERROR_NONE</code>.</p> |
|
2139 |
|
2140 <p><b>Note:</b> For the purpose of checking WFC: Entity Declared, passing |
|
2141 <code>useDTD == XML_TRUE</code> will make the parser behave as if |
|
2142 the document had a DTD with an external subset. This holds true even if |
|
2143 the external entity reference handler returns without action.</p> |
|
2144 </div> |
|
2145 |
|
2146 <pre class="fcndec" id="XML_SetReturnNSTriplet"> |
|
2147 void XMLCALL |
|
2148 XML_SetReturnNSTriplet(XML_Parser parser, |
|
2149 int do_nst); |
|
2150 </pre> |
|
2151 <div class="fcndef"> |
|
2152 <p> |
|
2153 This function only has an effect when using a parser created with |
|
2154 <code><a href= "#XML_ParserCreateNS" >XML_ParserCreateNS</a></code>, |
|
2155 i.e. when namespace processing is in effect. The <code>do_nst</code> |
|
2156 sets whether or not prefixes are returned with names qualified with a |
|
2157 namespace prefix. If this function is called with <code>do_nst</code> |
|
2158 non-zero, then afterwards namespace qualified names (that is qualified |
|
2159 with a prefix as opposed to belonging to a default namespace) are |
|
2160 returned as a triplet with the three parts separated by the namespace |
|
2161 separator specified when the parser was created. The order of |
|
2162 returned parts is URI, local name, and prefix.</p> <p>If |
|
2163 <code>do_nst</code> is zero, then namespaces are reported in the |
|
2164 default manner, URI then local_name separated by the namespace |
|
2165 separator.</p> |
|
2166 </div> |
|
2167 |
|
2168 <pre class="fcndec" id="XML_DefaultCurrent"> |
|
2169 void XMLCALL |
|
2170 XML_DefaultCurrent(XML_Parser parser); |
|
2171 </pre> |
|
2172 <div class="fcndef"> |
|
2173 This can be called within a handler for a start element, end element, |
|
2174 processing instruction or character data. It causes the corresponding |
|
2175 markup to be passed to the default handler set by <code><a |
|
2176 href="#XML_SetDefaultHandler" >XML_SetDefaultHandler</a></code> or |
|
2177 <code><a href="#XML_SetDefaultHandlerExpand" |
|
2178 >XML_SetDefaultHandlerExpand</a></code>. It does nothing if there is |
|
2179 not a default handler. |
|
2180 </div> |
|
2181 |
|
2182 <pre class="fcndec" id="XML_ExpatVersion"> |
|
2183 XML_LChar * XMLCALL |
|
2184 XML_ExpatVersion(); |
|
2185 </pre> |
|
2186 <div class="fcndef"> |
|
2187 Return the library version as a string (e.g. <code>"expat_1.95.1"</code>). |
|
2188 </div> |
|
2189 |
|
2190 <pre class="fcndec" id="XML_ExpatVersionInfo"> |
|
2191 struct XML_Expat_Version XMLCALL |
|
2192 XML_ExpatVersionInfo(); |
|
2193 </pre> |
|
2194 <pre class="signature"> |
|
2195 typedef struct { |
|
2196 int major; |
|
2197 int minor; |
|
2198 int micro; |
|
2199 } XML_Expat_Version; |
|
2200 </pre> |
|
2201 <div class="fcndef"> |
|
2202 Return the library version information as a structure. |
|
2203 Some macros are also defined that support compile-time tests of the |
|
2204 library version: |
|
2205 <ul> |
|
2206 <li><code>XML_MAJOR_VERSION</code></li> |
|
2207 <li><code>XML_MINOR_VERSION</code></li> |
|
2208 <li><code>XML_MICRO_VERSION</code></li> |
|
2209 </ul> |
|
2210 Testing these constants is currently the best way to determine if |
|
2211 particular parts of the Expat API are available. |
|
2212 </div> |
|
2213 |
|
2214 <pre class="fcndec" id="XML_GetFeatureList"> |
|
2215 const XML_Feature * XMLCALL |
|
2216 XML_GetFeatureList(); |
|
2217 </pre> |
|
2218 <pre class="signature"> |
|
2219 enum XML_FeatureEnum { |
|
2220 XML_FEATURE_END = 0, |
|
2221 XML_FEATURE_UNICODE, |
|
2222 XML_FEATURE_UNICODE_WCHAR_T, |
|
2223 XML_FEATURE_DTD, |
|
2224 XML_FEATURE_CONTEXT_BYTES, |
|
2225 XML_FEATURE_MIN_SIZE, |
|
2226 XML_FEATURE_SIZEOF_XML_CHAR, |
|
2227 XML_FEATURE_SIZEOF_XML_LCHAR, |
|
2228 XML_FEATURE_NS, |
|
2229 XML_FEATURE_LARGE_SIZE |
|
2230 }; |
|
2231 |
|
2232 typedef struct { |
|
2233 enum XML_FeatureEnum feature; |
|
2234 XML_LChar *name; |
|
2235 long int value; |
|
2236 } XML_Feature; |
|
2237 </pre> |
|
2238 <div class="fcndef"> |
|
2239 <p>Returns a list of "feature" records, providing details on how |
|
2240 Expat was configured at compile time. Most applications should not |
|
2241 need to worry about this, but this information is otherwise not |
|
2242 available from Expat. This function allows code that does need to |
|
2243 check these features to do so at runtime.</p> |
|
2244 |
|
2245 <p>The return value is an array of <code>XML_Feature</code>, |
|
2246 terminated by a record with a <code>feature</code> of |
|
2247 <code>XML_FEATURE_END</code> and <code>name</code> of NULL, |
|
2248 identifying the feature-test macros Expat was compiled with. Since an |
|
2249 application that requires this kind of information needs to determine |
|
2250 the type of character the <code>name</code> points to, records for the |
|
2251 <code>XML_FEATURE_SIZEOF_XML_CHAR</code> and |
|
2252 <code>XML_FEATURE_SIZEOF_XML_LCHAR</code> will be located at the |
|
2253 beginning of the list, followed by <code>XML_FEATURE_UNICODE</code> |
|
2254 and <code>XML_FEATURE_UNICODE_WCHAR_T</code>, if they are present at |
|
2255 all.</p> |
|
2256 |
|
2257 <p>Some features have an associated value. If there isn't an |
|
2258 associated value, the <code>value</code> field is set to 0. At this |
|
2259 time, the following features have been defined to have values:</p> |
|
2260 |
|
2261 <dl> |
|
2262 <dt><code>XML_FEATURE_SIZEOF_XML_CHAR</code></dt> |
|
2263 <dd>The number of bytes occupied by one <code>XML_Char</code> |
|
2264 character.</dd> |
|
2265 <dt><code>XML_FEATURE_SIZEOF_XML_LCHAR</code></dt> |
|
2266 <dd>The number of bytes occupied by one <code>XML_LChar</code> |
|
2267 character.</dd> |
|
2268 <dt><code>XML_FEATURE_CONTEXT_BYTES</code></dt> |
|
2269 <dd>The maximum number of characters of context which can be |
|
2270 reported by <code><a href= "#XML_GetInputContext" |
|
2271 >XML_GetInputContext</a></code>.</dd> |
|
2272 </dl> |
|
2273 </div> |
|
2274 |
|
2275 <pre class="fcndec" id="XML_FreeContentModel"> |
|
2276 void XMLCALL |
|
2277 XML_FreeContentModel(XML_Parser parser, XML_Content *model); |
|
2278 </pre> |
|
2279 <div class="fcndef"> |
|
2280 Function to deallocate the <code>model</code> argument passed to the |
|
2281 <code>XML_ElementDeclHandler</code> callback set using <code><a |
|
2282 href="#XML_SetElementDeclHandler" >XML_ElementDeclHandler</a></code>. |
|
2283 This function should not be used for any other purpose. |
|
2284 </div> |
|
2285 |
|
2286 <p>The following functions allow external code to share the memory |
|
2287 allocator an <code>XML_Parser</code> has been configured to use. This |
|
2288 is especially useful for third-party libraries that interact with a |
|
2289 parser object created by application code, or heavily layered |
|
2290 applications. This can be essential when using dynamically loaded |
|
2291 libraries which use different C standard libraries (this can happen on |
|
2292 Windows, at least).</p> |
|
2293 |
|
2294 <pre class="fcndec" id="XML_MemMalloc"> |
|
2295 void * XMLCALL |
|
2296 XML_MemMalloc(XML_Parser parser, size_t size); |
|
2297 </pre> |
|
2298 <div class="fcndef"> |
|
2299 Allocate <code>size</code> bytes of memory using the allocator the |
|
2300 <code>parser</code> object has been configured to use. Returns a |
|
2301 pointer to the memory or NULL on failure. Memory allocated in this |
|
2302 way must be freed using <code><a href="#XML_MemFree" |
|
2303 >XML_MemFree</a></code>. |
|
2304 </div> |
|
2305 |
|
2306 <pre class="fcndec" id="XML_MemRealloc"> |
|
2307 void * XMLCALL |
|
2308 XML_MemRealloc(XML_Parser parser, void *ptr, size_t size); |
|
2309 </pre> |
|
2310 <div class="fcndef"> |
|
2311 Allocate <code>size</code> bytes of memory using the allocator the |
|
2312 <code>parser</code> object has been configured to use. |
|
2313 <code>ptr</code> must point to a block of memory allocated by <code><a |
|
2314 href="#XML_MemMalloc" >XML_MemMalloc</a></code> or |
|
2315 <code>XML_MemRealloc</code>, or be NULL. This function tries to |
|
2316 expand the block pointed to by <code>ptr</code> if possible. Returns |
|
2317 a pointer to the memory or NULL on failure. On success, the original |
|
2318 block has either been expanded or freed. On failure, the original |
|
2319 block has not been freed; the caller is responsible for freeing the |
|
2320 original block. Memory allocated in this way must be freed using |
|
2321 <code><a href="#XML_MemFree" |
|
2322 >XML_MemFree</a></code>. |
|
2323 </div> |
|
2324 |
|
2325 <pre class="fcndec" id="XML_MemFree"> |
|
2326 void XMLCALL |
|
2327 XML_MemFree(XML_Parser parser, void *ptr); |
|
2328 </pre> |
|
2329 <div class="fcndef"> |
|
2330 Free a block of memory pointed to by <code>ptr</code>. The block must |
|
2331 have been allocated by <code><a href="#XML_MemMalloc" |
|
2332 >XML_MemMalloc</a></code> or <code>XML_MemRealloc</code>, or be NULL. |
|
2333 </div> |
|
2334 |
|
2335 <hr /> |
|
2336 <p><a href="http://validator.w3.org/check/referer"><img |
|
2337 src="valid-xhtml10.png" alt="Valid XHTML 1.0!" |
|
2338 height="31" width="88" class="noborder" /></a></p> |
|
2339 </div> |
|
2340 </body> |
|
2341 </html> |
|