summaryrefslogtreecommitdiff
path: root/doc/entities.html
blob: 4a43c4863e2b28748641dcfa177d4bd4a5e1f8e5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<link rel="SHORTCUT ICON" href="/favicon.ico">
<style type="text/css"><!--
TD {font-family: Verdana,Arial,Helvetica}
BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
H1 {font-family: Verdana,Arial,Helvetica}
H2 {font-family: Verdana,Arial,Helvetica}
H3 {font-family: Verdana,Arial,Helvetica}
A:link, A:visited, A:active { text-decoration: underline }
--></style>
<title>Entities or no entities</title>
</head>
<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000">
<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr>
<td width="180">
<a href="http://www.gnome.org/"><img src="gnome2.png" alt="Gnome2 Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a><div align="left"><a href="http://xmlsoft.org/"><img src="Libxml2-Logo-180x168.gif" alt="Made with Libxml2 Logo"></a></div>
</td>
<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center">
<h1>The XML C library for Gnome</h1>
<h2>Entities or no entities</h2>
</td></tr></table></td></tr></table></td>
</tr></table>
<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr>
<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td>
<table width="100%" border="0" cellspacing="1" cellpadding="3">
<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr>
<tr><td bgcolor="#fffacd">
<form action="search.php" enctype="application/x-www-form-urlencoded" method="GET">
<input name="query" type="TEXT" size="20" value=""><input name="submit" type="submit" value="Search ...">
</form>
<ul>
<li><a href="index.html">Home</a></li>
<li><a href="intro.html">Introduction</a></li>
<li><a href="FAQ.html">FAQ</a></li>
<li><a href="docs.html">Documentation</a></li>
<li><a href="bugs.html">Reporting bugs and getting help</a></li>
<li><a href="help.html">How to help</a></li>
<li><a href="downloads.html">Downloads</a></li>
<li><a href="news.html">News</a></li>
<li><a href="XMLinfo.html">XML</a></li>
<li><a href="XSLT.html">XSLT</a></li>
<li><a href="python.html">Python and bindings</a></li>
<li><a href="architecture.html">libxml architecture</a></li>
<li><a href="tree.html">The tree output</a></li>
<li><a href="interface.html">The SAX interface</a></li>
<li><a href="xmldtd.html">Validation &amp; DTDs</a></li>
<li><a href="xmlmem.html">Memory Management</a></li>
<li><a href="encoding.html">Encodings support</a></li>
<li><a href="xmlio.html">I/O Interfaces</a></li>
<li><a href="catalog.html">Catalog support</a></li>
<li><a href="library.html">The parser interfaces</a></li>
<li><a href="entities.html">Entities or no entities</a></li>
<li><a href="namespaces.html">Namespaces</a></li>
<li><a href="upgrade.html">Upgrading 1.x code</a></li>
<li><a href="threads.html">Thread safety</a></li>
<li><a href="DOM.html">DOM Principles</a></li>
<li><a href="example.html">A real example</a></li>
<li><a href="contribs.html">Contributions</a></li>
<li><a href="xmlreader.html">The Reader Interface</a></li>
<li><a href="tutorial/index.html">Tutorial</a></li>
<li><a href="guidelines.html">XML Guidelines</a></li>
<li>
<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a>
</li>
</ul>
</td></tr>
</table>
<table width="100%" border="0" cellspacing="1" cellpadding="3">
<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr>
<tr><td bgcolor="#fffacd"><ul>
<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li>
<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li>
<li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li>
<li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li>
<li><a href="ftp://xmlsoft.org/">FTP</a></li>
<li><a href="http://www.zlatkovic.com/projects/libxml/">Windows binaries</a></li>
<li><a href="http://garypennington.net/libxml2/">Solaris binaries</a></li>
<li><a href="http://www.zveno.com/open_source/libxml2xslt.html">MacOsX binaries</a></li>
<li><a href="http://sourceforge.net/projects/libxml2-pas/">Pascal bindings</a></li>
<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml&amp;product=libxml2">Bug Tracker</a></li>
</ul></td></tr>
</table>
<table width="100%" border="0" cellspacing="1" cellpadding="3">
<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>API Indexes</b></center></td></tr>
<tr><td bgcolor="#fffacd"><ul>
<li><a href="APIchunk0.html">Alphabetic</a></li>
<li><a href="APIconstructors.html">Constructors</a></li>
<li><a href="APIfunctions.html">Functions/Types</a></li>
<li><a href="APIfiles.html">Modules</a></li>
<li><a href="APIsymbols.html">Symbols</a></li>
</ul></td></tr>
</table>
</td></tr></table></td>
<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
<p>Entities in principle are similar to simple C macros. An entity defines an
abbreviation for a given string that you can reuse many times throughout the
content of your document. Entities are especially useful when a given string
may occur frequently within a document, or to confine the change needed to a
document to a restricted area in the internal subset of the document (at the
beginning). Example:</p>
<pre>1 &lt;?xml version=&quot;1.0&quot;?&gt;
2 &lt;!DOCTYPE EXAMPLE SYSTEM &quot;example.dtd&quot; [
3 &lt;!ENTITY xml &quot;Extensible Markup Language&quot;&gt;
4 ]&gt;
5 &lt;EXAMPLE&gt;
6    &amp;xml;
7 &lt;/EXAMPLE&gt;</pre>
<p>Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing
its name with '&amp;' and following it by ';' without any spaces added. There
are 5 predefined entities in libxml allowing you to escape characters with
predefined meaning in some parts of the xml document content:
<strong>&amp;lt;</strong> for the character '&lt;', <strong>&amp;gt;</strong>
for the character '&gt;',  <strong>&amp;apos;</strong> for the character ''',
<strong>&amp;quot;</strong> for the character '&quot;', and
<strong>&amp;amp;</strong> for the character '&amp;'.</p>
<p>One of the problems related to entities is that you may want the parser to
substitute an entity's content so that you can see the replacement text in
your application. Or you may prefer to keep entity references as such in the
content to be able to save the document back without losing this usually
precious information (if the user went through the pain of explicitly
defining entities, he may have a a rather negative attitude if you blindly
substitute them as saving time). The <a href="html/libxml-parser.html#XMLSUBSTITUTEENTITIESDEFAULT">xmlSubstituteEntitiesDefault()</a>
function allows you to check and change the behaviour, which is to not
substitute entities by default.</p>
<p>Here is the DOM tree built by libxml for the previous document in the
default case:</p>
<pre>/gnome/src/gnome-xml -&gt; ./xmllint --debug test/ent1
DOCUMENT
version=1.0
   ELEMENT EXAMPLE
     TEXT
     content=
     ENTITY_REF
       INTERNAL_GENERAL_ENTITY xml
       content=Extensible Markup Language
     TEXT
     content=</pre>
<p>And here is the result when substituting entities:</p>
<pre>/gnome/src/gnome-xml -&gt; ./tester --debug --noent test/ent1
DOCUMENT
version=1.0
   ELEMENT EXAMPLE
     TEXT
     content=     Extensible Markup Language</pre>
<p>So, entities or no entities? Basically, it depends on your use case. I
suggest that you keep the non-substituting default behaviour and avoid using
entities in your XML document or data if you are not willing to handle the
entity references elements in the DOM tree.</p>
<p>Note that at save time libxml enforces the conversion of the predefined
entities where necessary to prevent well-formedness problems, and will also
transparently replace those with chars (i.e. it will not generate entity
reference elements in the DOM tree or call the reference() SAX callback when
finding them in the input).</p>
<p>
<span style="background-color: #FF0000">WARNING</span>: handling entities
on top of the libxml SAX interface is difficult!!! If you plan to use
non-predefined entities in your documents, then the learning curve to handle
then using the SAX API may be long. If you plan to use complex documents, I
strongly suggest you consider using the DOM interface instead and let libxml
deal with the complexity rather than trying to do it yourself.</p>
<p><a href="bugs.html">Daniel Veillard</a></p>
</td></tr></table></td></tr></table></td></tr></table></td>
</tr></table></td></tr></table>
</body>
</html>