summaryrefslogtreecommitdiff
path: root/doc/tutorial2/libxslt_pipes.xml
blob: 9a672a9b8e19213c479b5f0c36cb7c059c63d3a7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
<?xml version="1.0" encoding="iso-8859-2"?>
<!DOCTYPE article 
SYSTEM "file:///usr/share/docbook/docbook-xml-4.3/docbookx.dtd">

<article id="libxslt">
<articleinfo>
  <author><firstname>Panos</firstname><surname>Louridas</surname></author>
  <copyright>
    <year>2004</year>
    <holder>Panagiotis Louridas</holder>
  </copyright>
  <legalnotice>
    <para>Permission is hereby granted, free of charge, to
  any person obtaining a copy of this software and associated
  documentation files (the "Software"), to deal in the Software
  without restriction, including without limitation the rights to use,
  copy, modify, merge, publish, distribute, sublicense, and/or sell
  copies of the Software, and to permit persons to whom the Software
  is furnished to do so, subject to the following conditions:
  </para>

  <para>The above copyright notice and this permission notice shall be
  included in all copies or substantial portions of the Software.
  </para>

  <para>THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
  EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
  MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
  NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
  LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
  OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
  WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.</para>

  </legalnotice>
</articleinfo>

<title>libxslt: An Extended Tutorial</title>

<sect1><title>Introduction</title>

<para>The Extensible Stylesheet Language Transformations (XSLT)
specification defines an XML template language for transforming XML
documents. An XSLT engine reads an XSLT file and an XML document and
transforms the document accordingly.</para>

<para>We want to perform a series of XSLT transformations to a series
of documents. An obvious solution is to use the operating system's
pipe mechanism and start a series of transformation processes, each
one taking as input the output of the previous transformation. It
would be interesting, though, and perhaps more efficient if we could
do our job within a single process.</para>

<para>libxslt is a library for doing XSLT transformations. It is built
on libxml, which is a library for handling XML documents. libxml and
libxslt are used by the GNOME project. Although developed in the
*NIX world, both libxml and libxslt have been
ported to the MS-Windows platform. In principle an application using
libxslt should be easily portable between the two systems. In
practice, however, there arise various wrinkles. These do not have
anything to do with libxml or libxslt per se, but rather with the
different compilation and linking procedures of each system.</para>

<para>The presented solution is an extension of <ulink
url="http://xmlsoft.org/XSLT/tutorial/libxslttutorial.html">John
Fleck's libxslt tutorial</ulink>, but the present tutorial tries to be
self-contained. It develops a minimal libxslt application
(libxslt_pipes) that can perform a series of transformations to a
series of files in a pipe-like manner. An invocation might be:</para>

<para>
  <userinput>
    libxslt_pipes --out results.xml foo.xsl bar.xsl doc1.xml doc2.xml
  </userinput>
</para>

<para>The <filename>foo.xsl</filename> stylesheet will be applied to
<filename> doc1.xml</filename> and the <filename>bar.xsl</filename>
stylesheet will be applied to the resulting document; then the two
stylesheets will be applied in the same sequence to
<filename>bar.xsl</filename>. The results are sent to
<filename>results.xml</filename> (if no output is specified they are
sent to standard output).</para>

<para>The application is compiled in both *NIX
systems and MS-Windows, where by *NIX systems we
mean Linux, BSD, and other members of the
family. The gcc suite is used in the *NIX platform
and the Microsoft compiler and linker are used in the
MS-Windows platform.</para>

</sect1>

<sect1><title>Setting the Scene</title>

<para>
We need to include the necessary libraries:

<programlisting>
  <![CDATA[
  #include <stdio.h>
  #include <string.h>
  #include <stdlib.h>
  
  #include <libxslt/transform.h>
  #include <libxslt/xsltutils.h>
  ]]>
</programlisting>
</para>

<para>The first group of include directives includes general C
libraries. The libraries we need to make libxslt work are in the
second group. The <filename>transform.h</filename> header file
declares the API that does the bulk of the actual processing. The
<filename>xsltutils.h</filename> header file declares the API for some
generic utility functions of the XSLT engine; among other things,
saving to a file, which is what we need it for.</para>

<para>
If our input files contain entities through external subsets, we need
to tell libxslt to load them. The global variable
<function>xmlLoadExtDtdDefaultValue</function>, defined in
<filename>libxml/globals.h</filename>, is responsible for that. As the
variable is defined outside our program we must specify external
linkage:
  <programlisting>
    extern int xmlLoadExtDtdDefaultValue;
  </programlisting>
</para>

<para>
The program is called from the command line. We anticipate that the
user may not call it the right way, so we define a function for
describing its usage:
<programlisting>
  static void usage(const char *name) {
      printf("Usage: %s [options] stylesheet [stylesheet ...] file [file ...]\n",
          name);
      printf("      --out file: send output to file\n");
      printf("      --param name value: pass a (parameter,value) pair\n");
  }
</programlisting>
</para>
</sect1>

<sect1><title>Program Start</title>

<para>We need to define a few variables that are used throughout the
program:
<programlisting>
    int main(int argc, char **argv) {
        int arg_indx;
	const char *params[16 + 1];
	int params_indx = 0;
	int stylesheet_indx = 0;
	int file_indx = 0;
	int i, j, k;
	FILE *output_file = stdout;
	xsltStylesheetPtr *stylesheets = 
	    (xsltStylesheetPtr *) calloc(argc, sizeof(xsltStylesheetPtr));
	    xmlDocPtr *files = (xmlDocPtr *) calloc(argc, sizeof(xmlDocPtr));
	int return_value = 0;
</programlisting>
</para>

<para>The <varname>arg_indx</varname> integer is an index used to
iterate over the program arguments. The <varname>params</varname>
string array is used to collect the XSLT parameters. In XSLT,
additional information may be passed to the processor via
parameters. The user of the program specifies these in key-value pairs
in the command line following the <userinput>--param</userinput>
command line argument. We accept up to 8 such key-value pairs, which
we track with the <varname>params_indx</varname> integer. libxslt
expects the parameters array to be null-terminated, so we have to
allocate one extra place (16 + 1) for it. The
<varname>file_indx</varname> is an index to iterate over the files to
be processed. The <varname>i</varname>, <varname>j</varname>,
<varname>k</varname> integers are additional indices for iteration
purposes, and <varname>return_value</varname> is the value the program
returns to the operating system. We expect the result of the
transformation to be the standard output in most cases, but the user
may wish otherwise via the <option>--out</option> command line
option, so we need to keep track of the situation with the
<varname>output_file</varname> file pointer.</para>

<para>In libxslt, XSLT stylesheets are internally stored in
<structname>xsltStylesheet</structname> structures; similarly, in
libxml XML documents are stored in <structname>xmlDoc</structname>
structures. <type>xsltStylesheetPtr</type> and <type>xmlDocPtr</type>
are simply typedefs of pointers to them. The user may specify any
number of stylesheets that will be applied to the documents one after
the other. To save time we parse the stylesheets and the documents as
we read them from the command line and keep the parsed representation
of them. The parsed results are kept in arrays. These are dynamically
allocated and sized to the number of arguments; this wastes some
space, but not much (the size of <type>xmlStyleSheetPtr</type> and
<type>xmlDocPtr</type> is the size of a pointer) and simplifies code
later on. The array memory is allocated with
<function>calloc</function> to ensure contents are initialised to
zero.
</para>

</sect1>

<sect1><title>Arguments Collection</title>

<para>If the program gets no arguments at all, we print the usage
description, set the program return value to 1 and exit. Instead of
returning directly we go to (literally) to the end of the program text
where some housekeeping takes place.</para> 

<para>
<programlisting>
  <![CDATA[
    if (argc <= 1) {
        usage(argv[0]);
        return_value = 1;
        goto finish;
    }
        
    /* Collect arguments */
    for (arg_indx = 1; arg_indx < argc; arg_indx++) {
        if (argv[arg_indx][0] != '-')
            break;
        if ((!strcmp(argv[arg_indx], "-param"))
                || (!strcmp(argv[arg_indx], "--param"))) {
            arg_indx++;
            params[params_indx++] = argv[arg_indx++];
            params[params_indx++] = argv[arg_indx];
            if (params_indx >= 16) {
                fprintf(stderr, "too many params\n");
                return_value = 1;
                goto finish;
            }
        }  else if ((!strcmp(argv[arg_indx], "-o"))
                || (!strcmp(argv[arg_indx], "--out"))) {
            arg_indx++;
            output_file = fopen(argv[arg_indx], "w");
        } else {
            fprintf(stderr, "Unknown option %s\n", argv[arg_indx]);
            usage(argv[0]);
            return_value = 1;
            goto finish;
        }
    }
    params[params_indx] = 0;
    ]]>
</programlisting>
</para>

<para>If the user passes arguments we have to collect them. This is a
matter of iterating over the program argument list while we encounter
arguments starting with a dash. The XSLT parameters are put into the
<varname>params</varname> array and the <varname>output_file</varname>
is set to the user request, if any. After processing all the parameter
key-value pairs we set the last element of the <varname>params</varname>
array to null.
</para>
</sect1>

<sect1><title>Parsing</title>

<para>The rest of the argument list is taken to be stylesheets and
files to be transformed. Stylesheets are identified by their suffix,
which is expected to be xsl (case sensitive). All other files are
assumed to be XML documents, regardless of suffix.</para>

<para>
<programlisting>
  <![CDATA[
    /* Collect and parse stylesheets and files to be transformed */
    for (; arg_indx < argc; arg_indx++) {
        char *argument =
            (char *) malloc(sizeof(char) * (strlen(argv[arg_indx]) + 1));
        strcpy(argument, argv[arg_indx]);
        if (strtok(argument, ".")) {
            char *suffix = strtok(0, ".");
            if (suffix && !strcmp(suffix, "xsl")) {
                stylesheets[stylesheet_indx++] =
                    xsltParseStylesheetFile((const xmlChar *)argv[arg_indx]);;
            } else {
                files[file_indx++] = xmlParseFile(argv[arg_indx]);
            }
        } else {
            files[file_indx++] = xmlParseFile(argv[arg_indx]);
        }
        free(argument);
    }
  ]]>
</programlisting>
</para>

<para>Stylesheets are parsed using the
<function>xsltParseStylesheetFile</function>
function. <function>xsltParseStylesheetFile</function> takes as
argument a pointer to an <type>xmlChar</type>, a typedef of an
unsigned char; in effect, the filename of the stylesheet. The
resulting <type>xsltStylesheetPtr</type> is placed in the
<varname>stylesheets</varname> array. In the same vein, XML files are
parsed using the <function>xmlParseFile</function> function that takes
as argument the file's name; the resulting <type>xmlDocPtr</type> is
placed in the <varname>files</varname> array.
</para>

</sect1>

<sect1><title>File Processing</title>

<para>All stylesheets are applied to each file one after the
other. Stylesheets are applied with the
<function>xsltApplyStylesheet</function> function that takes as
argument the stylesheet to be applied, the file to be transformed and
any parameters we have collected. The in-memory representation of an
XML document takes space, which we free using the
<function>xmlFreeDoc</function> function. The file is then saved to the
specified output.</para>

<para>
<programlisting>
  <![CDATA[
    /* Process files */
    for (i = 0; files[i]; i++) {
        doc = files[i];
        res = doc;
        for (j = 0; stylesheets[j]; j++) {
            res = xsltApplyStylesheet(stylesheets[j], doc, params);
            xmlFreeDoc(doc);
            doc = res;
        }

        if (stylesheets[0]) {
            xsltSaveResultToFile(output_file, res, stylesheets[j-1]);
        } else {
            xmlDocDump(output_file, res);
        }
        xmlFreeDoc(res);
    }

    fclose(output_file);

    for (k = 0; stylesheets[k]; k++) {
        xsltFreeStylesheet(stylesheets[k]);
    }

    xsltCleanupGlobals();
    xmlCleanupParser();

 finish:
    free(stylesheets);
    free(files);
    return(return_value);
    ]]>
</programlisting>
</para>

<para>To output an XML document we have in memory we use the
<function>xlstSaveResultToFile</function> function, where we specify
the destination, the document and the stylesheet that has been applied
to it. The stylesheet is required so that output-related information
contained in the stylesheet, such as the encoding to be used, is used
in output. If no transformation has taken place, which will happen
when the user specifies no stylesheets at all in the command line, we
use the <function>xmlDocDump</function> libxml function that saves the
source document to the file without further ado.</para>

<para>As parsed stylesheets take up space in memory, we take care to
free that memory after use with a call to
<function>xmlFreeStyleSheet</function>. When all work is done, we
clean up all global variables used by the XSLT library using
<function>xsltCleanupGlobals</function>. Likewise, all global memory
allocated for the XML parser is reclaimed by a call to
<function>xmlCleanupParser</function>. Before returning we deallocate
the memory allocated for the holding the pointers to the XML documents
and stylesheets.</para>

</sect1>

<sect1><title>*NIX Compiling and Linking</title>

<para>Compiling and linking in a *NIX environment
is easy, as the required libraries are almost certain to be already in
place (remember that libxml and libxslt are used by the GNOME project,
so they are present in most installations). The program can be
dynamically linked so that its footprint is minimized, or statically
linked, so that it stands by itself, carrying all required code.</para>

<para>For dynamic linking the following one liner will do:</para>

<para>
<userinput>gcc -o libxslt_pipes -Wall -I/usr/include/libxml2 -lxslt
-lxml2 -L/usr/lib libxslt_pipes.c</userinput>
</para>

<para>We assume that the necessary header files are in <filename
class="directory">/usr/include/libxml2</filename> and that the
required libraries (<filename>libxslt.so</filename>,
<filename>libxml2.so</filename>) are in <filename
class="directory">/usr/lib</filename>.</para>

<para>In general, a program may need to link to additional libraries,
depending on the processing it actually performs. A good way to start
is to use the <command>xslt-config</command> script. The
<option>--help</option> option displays usage
information. Running</para>

<para>
  <userinput>
    xslt-config --cflags
  </userinput>
</para>

<para>we get compile flags, while running</para>

<para>
  <userinput>
    xslt-config --libs
  </userinput>
</para>

<para>we get the library settings for the linker.</para>

<para>For static linking we must list more libraries than we did for
dynamic linking, as the libraries on which the libxsl and libxslt
libraries depend are also needed. Using <command>xslt-config</command>
on a particular installation we create the following one-liner:</para>

<para>
<userinput>
gcc -o libxslt_pipes -Wall -I/usr/include/libxml2 libxslt_pipes.c
-static -L/usr/lib -lxslt -lxml2 -lz -lpthread -lm
</userinput>
</para>

<para>If we get warnings to the effect that some function in
statically linked applications requires at runtime the shared
libraries used from the glibc version used for linking, that means
that the binary is not completely static. Although we statically
linked against the GNU C runtime library glibc, glibc uses external
libraries to perform some of its functions. Same version libraries
must be present on the system we want the application to run. One way
to avoid this it to use an alternative C runtime, for example <ulink
url="http://www.uclibc.org">uClibc</ulink>, which requires obtaining
and building a uClibc toolchain first (if the reason for trying to get
a statically linked version of the program is to embed it somewhere,
using uClibc might be a good idea anyway).
</para>

</sect1>

<sect1 id="windows-build"><title>MS-Windows Compiling and
Linking</title>

<para>Compiling and linking in MS-Windows requires
some attention. First, the MS-Windows ports must be
downloaded and installed in the programming workstation. The ports are
available in <ulink url="http://www.zlatkovic.com/libxml.en.html">Igor
Zlatkoviæ's site</ulink>. We need the ports for iconv, zlib, libxml,
and libxslt. In contrast to *NIX environments, we
cannot assume that the libraries needed will be present in other
computers where the program will be used. One solution is to
distribute the program along with the necessary dynamic
libraries. Another solution is to statically link the program so that
only a single executable file will have to be distributed.</para>

<para>We assume that we have decompressed the downloaded ports and
have placed the required contents of their <filename
class="directory">include</filename> directories in an <filename
class="directory">include</filename> directory in our file system. The
required contents include everything apart from the <filename
class="directory">libexslt</filename> directory of the libxslt port,
as we are not using EXLST (an initiative to provide extensions to
XSLT) in this project. In order to compile the program we have to make
sure that all necessary header files are included. When using the
Microsoft compiler this translates to adding the required
<option>/I</option> switches in the command line. If using a Visual
Studio product the same effect is attained by specifying additional
include directories in the compilation options. In the end, if the
headers have been copied in <filename
class="directory">C:\include</filename> the command line must contain
<option>/I"C:\include" /I"C:\include\libslt"
/I"C:\include\libxml"</option>.</para>

<para>This being a C program, it needs to be compiled against an
implementation of the C libraries. Microsoft provides various
implementations. The ports, however, have been compiled against the
<filename>msvcrt.dll</filename> implementation, so it is wise to use
the same runtime in our project, lest we wish to come against
unexpected runtime crashes. The <filename>msvcrt.dll</filename> is a
multi-threaded implementation and is specified by giving
<option>/MD</option> as a compiler option. Unfortunately, the
correspondence between the <option>/MD</option> switch and
<filename>msvcrt.dll</filename> breaks after version 6 of the
Microsoft compiler. In version 7 and later (i.e., Visual Studio .NET),
<option>/MD</option> links against a different DLL; in version 7.1
this is <filename>msvcrt71.dll</filename>. The end result of this bit
of esoterica is that if you try to dynamically link your application
with a compiler whose version is greater than 6, your program is
likely to crash unexpectedly. Alternatively, you may wish to compile
all iconv, zlib, libxml and libxslt yourself, using the new runtime
library. This is not a tall order, and some details are given
<link linkend="windows-ports-build">below</link>.</para>

<para>There are three kinds of libraries in MS-Windows. Dynamically
Linked Libraries (DLLs), like <filename>msvcrt.dll</filename> we met
above, are used for dynamic linking; an application links to them at
runtime, so the application does not include the code contained in
them. Static libraries are used for static linking; an application
adds the libraries' code to its own code at link time. Import
libraries are used when building an application that uses DLLs. For
the application to be built, the linker must somehow find the
definitions of the functions that will be provided in runtime by the
DLLs, otherwise it will complain about unresolved references. Import
libraries contain function stubs that, for each DLL function we want
to call, know where to look for it in the DLL. In essence, in order to
use a DLL we must link against its corresponding import library. DLLs
have a <filename>.dll</filename> suffix; static and import libraries
both have a <filename>.lib</filename> suffix. In the MS-Windows ports
of libxml and libxslt static libraries are distinguished by their name
ending in <filename>_a.lib</filename>, while in the zlib port the
import library is <filename>zdll.lib</filename> and the static library
is <filename>zlib.lib</filename>. In what follows we assume we have a
<filename class="directory">lib</filename> directory in our filesystem
where we place the libraries we need for linking.</para>

<para>If we want to link dynamically we must make sure the <filename
class="directory">lib</filename> directory contains
<filename>iconv.lib</filename>, <filename>libxslt.lib</filename>,
<filename>libxml2.lib</filename>, and
<filename>zdll.lib</filename>. When using the Microsoft linker this
translates to adding the required <option>/LIBPATH</option>
switch and the necessary libraries in the command line. In Visual
Studio we must specify an additional library directory for <filename
class="directory">lib</filename> and put the necessary libraries in
the additional dependencies. In the end, the command line must include
<option>/LIBPATH:"C:\lib" "lib\iconv.lib" "lib\libxslt.lib"
"lib\libxml2.lib" "lib\zdll.lib"</option>, provided the libraries'
directory is <filename class="directory">C:\lib</filename>. In order
for the resulting executable to run, the ports DLLs must be present;
one way is to place all DLLs contained in the ports in the home
directory of our application, and make sure they are distributed
together.</para>

<para>If we want to link statically we must make sure the <filename
class="directory">lib</filename> directory contains
<filename>iconv_a.lib</filename>, <filename>libxslt_a.lib</filename>,
<filename>libxml2_a.lib</filename>, and
<filename>zlib.lib</filename>. Adding <filename
class="directory">lib</filename> as a library directory and putting
the necessary libraries in the additional dependencies, we get a
command line that should include <option>/LIBPATH:"C:\lib"
"lib\iconv_a.lib" "lib\libxslt_a.lib" "lib\libxml2_a.lib"
"lib\zlib.lib"</option>. The resulting executable is much bigger
than if we linked dynamically; it is, however, self-contained and can
be distributed more easily, in theory at least. In practice, however,
the executable is not completely static. We saw that the ports are
compiled against <filename>msvcrt.dll</filename>, so the program does
require that DLL at runtime. Moreover, since when using a version of
Microsoft developer tools with a version number greater than 6, we are
no longer using <filename>msvcrt.dll</filename>, but another runtime
like <filename>msvcrt71.dll</filename>, and we then need that DLL.  In
contrast to <filename>msvcrt.dll</filename> it may not be present on
the target computer, so we may have to copy it along.</para>

<sect2 id="windows-ports-build"><title>Building the Ports in
MS-Windows</title>

<para>The source code of the ports is readily available on the web,
one has to check the ports sites. Each port can be built without
problems in an MS-Windows environment using Microsoft development
tools.  The necessary command line tools (compiler, linker,
<command>nmake</command>) must be available. This means running a
batch file called <command>vcvars32.bat</command> that comes with
Visual Studio (its exact location in the directory tree may vary
depending on the version of Visual Studio, but a file search will find
it anyway). Makefiles for the Microsoft tools are found in all
ports. They are distinguished by their suffix, e.g.,
<filename>Makefile.msvc</filename> or
<filename>Makefile.msc</filename>. To build zlib it suffices to run
<command>nmake</command> against <filename>Makefile.msc</filename>
(i.e., with the <option>/F</option> option); similarly, to build
<filename>iconv</filename> it suffices to run <command>nmake</command>
against <filename>Makefile.msvc</filename>. Building libxml and
libxslt requires an extra configuration step; we must run the
<filename>configure.js</filename> configuration script with the
<command>cscript</command> command. <filename>configure.js</filename>
is found in the <filename class="directory">win32</filename> directory
in the distributions. It is written in JScript, Microsoft's
implementation of the ECMA 262 language specification (ECMAScript
Edition 3), a JavaScript offspring. The configuration string takes a
number of parameters detailing our environment and needs;
<userinput>cscript configure.js help</userinput> documents
them.</para>

<para>It is wise to read all documentation files in the source
distributions before starting; moreover, pay attention to the
dependencies between the ports. If we configure libxml and libxslt to
use iconv and zlib we must build these two first and make sure their
headers and libraries can be found by the compiler and the
linker when building libxml and libxslt.</para>

</sect2>

</sect1>

<sect1><title>zlib, iconv and All That</title>

<para>We saw that libxml and libxslt depend on various other
libraries, for instance zlib, iconv, and so forth. Taking a look into
them gives us clues on the capabilities of libxml and libxslt.</para>

<para><ulink url="http://www.zlib.org">zlib</ulink> is a free general
purpose lossless data compression library. It is a venerable
workhorse; more than <ulink
url="http://www.gzip.org/zlib/apps.html">500 applications</ulink>
(both commercial and open source) seem to use the library. libxml uses
zlib so that it can read from or write to compressed files
directly. The <function>xmlParseFile</function> function can
transparently parse a compressed document to produce an
<structname>xmlDoc</structname>. If we want to create a compressed
document with libxml we can use an
<structname>xmlTextWriterPtr</structname> (obtained through
<function>xmlNewTextWriterDoc</function>), or another related
structure from <filename>libxml/xmlwriter.h</filename>, with
compression enabled.</para>

<para>XML allows documents to use a variety of different character
encodings. <ulink
url="http://www.gnu.org/software/libiconv">iconv</ulink> is a free
library for converting between different character encodings.  libxml
provides a set of default converters for some encodings: UTF-8, UTF-16
(little endian and big endian), ISO-8859-1, ASCII, and HTML (a
specific handler for the conversion of UTF-8 to ASCII with HTML
predefined entities like &amp;copy; for the copyright sign). However,
when compiled with iconv support, libxml and libxslt can handle the
full range of encodings provided by iconv; these should cover most
needs.</para>

<para>libxml and libxslt can be used in multi-threaded
applications. In MS-Windows they are linked against
<filename>MSVCRT.DLL</filename> (or one of its descendants, as we saw
<link linkend="windows-build">above</link>). In *NIX the pthreads
(POSIX threads) library is used.</para>

</sect1>

<sect1><title>The Complete Program</title>

<para>
The complete program listing is given below. The program is also
<ulink url="libxslt_pipes.c">available online</ulink>.
</para>

<para>
<programlisting>
<xi:include href="libxslt_pipes.c" parse="text"
	    xmlns:xi="http://www.w3.org/2003/XInclude"/>
</programlisting>
</para>

</sect1>

</article>