summaryrefslogtreecommitdiff
path: root/doc/html/pcre2build.html
diff options
context:
space:
mode:
authorph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2014-11-23 18:38:38 +0000
committerph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2014-11-23 18:38:38 +0000
commit469ce4c0cdbb50172723c20b6ce2590a5e593023 (patch)
tree59e68ed0cc94d6367c7d19e230a778b58a89e46b /doc/html/pcre2build.html
parented4ed4376d5c874b42ca5817e91189b6ca1c7298 (diff)
downloadpcre2-469ce4c0cdbb50172723c20b6ce2590a5e593023.tar.gz
More documentation and test updates.
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@158 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'doc/html/pcre2build.html')
-rw-r--r--doc/html/pcre2build.html115
1 files changed, 60 insertions, 55 deletions
diff --git a/doc/html/pcre2build.html b/doc/html/pcre2build.html
index c6ba6de..b87dad7 100644
--- a/doc/html/pcre2build.html
+++ b/doc/html/pcre2build.html
@@ -17,9 +17,9 @@ please consult the man page, in case the conversion went wrong.
<li><a name="TOC2" href="#SEC2">PCRE2 BUILD-TIME OPTIONS</a>
<li><a name="TOC3" href="#SEC3">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>
<li><a name="TOC4" href="#SEC4">BUILDING SHARED AND STATIC LIBRARIES</a>
-<li><a name="TOC5" href="#SEC5">Unicode and UTF SUPPORT</a>
+<li><a name="TOC5" href="#SEC5">UNICODE AND UTF SUPPORT</a>
<li><a name="TOC6" href="#SEC6">JUST-IN-TIME COMPILER SUPPORT</a>
-<li><a name="TOC7" href="#SEC7">CODE VALUE OF NEWLINE</a>
+<li><a name="TOC7" href="#SEC7">NEWLINE RECOGNITION</a>
<li><a name="TOC8" href="#SEC8">WHAT \R MATCHES</a>
<li><a name="TOC9" href="#SEC9">HANDLING VERY LARGE PATTERNS</a>
<li><a name="TOC10" href="#SEC10">AVOIDING EXCESSIVE STACK USAGE</a>
@@ -91,12 +91,12 @@ respectively. These can be interpreted either as single-unit characters or
UTF-16/UTF-32 strings. To build these additional libraries, add one or both of
the following to the <b>configure</b> command:
<pre>
- --enable-pcre16
- --enable-pcre32
+ --enable-pcre2-16
+ --enable-pcre2-32
</pre>
If you do not want the 8-bit library, add
<pre>
- --disable-pcre8
+ --disable-pcre2-8
</pre>
as well. At least one of the three libraries must be built. Note that the POSIX
wrapper is for the 8-bit library only, and that <b>pcre2grep</b> is an 8-bit
@@ -106,14 +106,15 @@ libraries.
<br><a name="SEC4" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
<P>
The Autotools PCRE2 building process uses <b>libtool</b> to build both shared
-and static libraries by default. You can suppress one of these by adding one of
+and static libraries by default. You can suppress an unwanted library by adding
+one of
<pre>
--disable-shared
--disable-static
</pre>
-to the <b>configure</b> command, as required.
+to the <b>configure</b> command.
</P>
-<br><a name="SEC5" href="#TOC1">Unicode and UTF SUPPORT</a><br>
+<br><a name="SEC5" href="#TOC1">UNICODE AND UTF SUPPORT</a><br>
<P>
By default, PCRE2 is built with support for Unicode and UTF character strings.
To build it without Unicode support, add
@@ -126,20 +127,15 @@ in the same configuration.
</P>
<P>
Of itself, Unicode support does not make PCRE2 treat strings as UTF-8, UTF-16
-or UTF-32. To do that you have have to set the PCRE2_UTF option when you call
-<b>pcre2_compile()</b> to compile a pattern.
+or UTF-32. To do that, applications that use the library have to set the
+PCRE2_UTF option when they call <b>pcre2_compile()</b> to compile a pattern.
</P>
<P>
-It is not possible to support both EBCDIC and UTF-8 codes in the same version
-of the library. Consequently, --enable-unicode and --enable-ebcdic are mutually
-exclusive.
-</P>
-<P>
-UTF support allows the libraries to process character codepoints up to 0x10ffff
-in the strings that they handle. It also provides support for accessing the
-properties of such characters, using pattern escapes such as \P, \p, and \X.
-Only the general category properties such as <i>Lu</i> and <i>Nd</i> are
-supported. Details are given in the
+UTF support allows the libraries to process character code points up to
+0x10ffff in the strings that they handle. It also provides support for
+accessing the Unicode properties of such characters, using pattern escapes such
+as \P, \p, and \X. Only the general category properties such as <i>Lu</i> and
+<i>Nd</i> are supported. Details are given in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
documentation.
</P>
@@ -150,7 +146,7 @@ Just-in-time compiler support is included in the build by specifying
--enable-jit
</pre>
This support is available only for certain hardware architectures. If this
-option is set for an unsupported architecture, a compile time error occurs.
+option is set for an unsupported architecture, a building error occurs.
See the
<a href="pcre2jit.html"><b>pcre2jit</b></a>
documentation for a discussion of JIT usage. When JIT support is enabled,
@@ -160,7 +156,7 @@ pcre2grep automatically makes use of it, unless you add
</pre>
to the "configure" command.
</P>
-<br><a name="SEC7" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
+<br><a name="SEC7" href="#TOC1">NEWLINE RECOGNITION</a><br>
<P>
By default, PCRE2 interprets the linefeed (LF) character as indicating the end
of a line. This is the normal newline character on Unix-like systems. You can
@@ -168,12 +164,13 @@ compile PCRE2 to use carriage return (CR) instead, by adding
<pre>
--enable-newline-is-cr
</pre>
-to the <b>configure</b> command. There is also a --enable-newline-is-lf option,
+to the <b>configure</b> command. There is also an --enable-newline-is-lf option,
which explicitly specifies linefeed as the newline character.
-<br>
-<br>
-Alternatively, you can specify that line endings are to be indicated by the two
-character sequence CRLF. If you want this, add
+</P>
+<P>
+Alternatively, you can specify that line endings are to be indicated by the
+two-character sequence CRLF (CR immediately followed by LF). If you want this,
+add
<pre>
--enable-newline-is-crlf
</pre>
@@ -186,22 +183,26 @@ indicating a line ending. Finally, a fifth option, specified by
<pre>
--enable-newline-is-any
</pre>
-causes PCRE2 to recognize any Unicode newline sequence.
+causes PCRE2 to recognize any Unicode newline sequence. The Unicode newline
+sequences are the three just mentioned, plus the single characters VT (vertical
+tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
+separator, U+2028), and PS (paragraph separator, U+2029).
</P>
<P>
-Whatever line ending convention is selected when PCRE2 is built can be
-overridden when the library functions are called. At build time it is
+Whatever default line ending convention is selected when PCRE2 is built can be
+overridden by applications that use the library. At build time it is
conventional to use the standard for your operating system.
</P>
<br><a name="SEC8" href="#TOC1">WHAT \R MATCHES</a><br>
<P>
By default, the sequence \R in a pattern matches any Unicode newline sequence,
-whatever has been selected as the line ending sequence. If you specify
+independently of what has been selected as the line ending sequence. If you
+specify
<pre>
--enable-bsr-anycrlf
</pre>
the default is changed so that \R matches only CR, LF, or CRLF. Whatever is
-selected when PCRE2 is built can be overridden when the library functions are
+selected when PCRE2 is built can be overridden by applications that use the
called.
</P>
<br><a name="SEC9" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
@@ -210,10 +211,10 @@ Within a compiled pattern, offset values are used to point from one part to
another (for example, from an opening parenthesis to an alternation
metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
are used for these offsets, leading to a maximum size for a compiled pattern of
-around 64K. This is sufficient to handle all but the most gigantic patterns.
-Nevertheless, some people do want to process truly enormous patterns, so it is
-possible to compile PCRE2 to use three-byte or four-byte offsets by adding a
-setting such as
+around 64K code units. This is sufficient to handle all but the most gigantic
+patterns. Nevertheless, some people do want to process truly enormous patterns,
+so it is possible to compile PCRE2 to use three-byte or four-byte offsets by
+adding a setting such as
<pre>
--with-link-size=3
</pre>
@@ -294,16 +295,20 @@ hand".)
<br><a name="SEC13" href="#TOC1">USING EBCDIC CODE</a><br>
<P>
PCRE2 assumes by default that it will run in an environment where the character
-code is ASCII (or Unicode, which is a superset of ASCII). This is the case for
+code is ASCII or Unicode, which is a superset of ASCII. This is the case for
most computer operating systems. PCRE2 can, however, be compiled to run in an
-EBCDIC environment by adding
+8-bit EBCDIC environment by adding
<pre>
--enable-ebcdic --disable-unicode
</pre>
to the <b>configure</b> command. This setting implies
--enable-rebuild-chartables. You should only use it if you know that you are in
-an EBCDIC environment (for example, an IBM mainframe operating system). The
---enable-ebcdic option is incompatible with Unicode support.
+an EBCDIC environment (for example, an IBM mainframe operating system).
+</P>
+<P>
+It is not possible to support both EBCDIC and UTF-8 codes in the same version
+of the library. Consequently, --enable-unicode and --enable-ebcdic are mutually
+exclusive.
</P>
<P>
The EBCDIC character that corresponds to an ASCII LF is assumed to have the
@@ -347,8 +352,8 @@ parameter value by adding, for example,
<pre>
--with-pcre2grep-bufsize=50K
</pre>
-to the <b>configure</b> command. The caller of \fPpcre2grep\fP can, however,
-override this value by specifying a run-time option.
+to the <b>configure</b> command. The caller of \fPpcre2grep\fP can override this
+value by using --buffer-size on the command line..
</P>
<br><a name="SEC16" href="#TOC1">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a><br>
<P>
@@ -362,16 +367,16 @@ to the <b>configure</b> command, <b>pcre2test</b> is linked with the
from a terminal, it reads it using the <b>readline()</b> function. This provides
line-editing and history facilities. Note that <b>libreadline</b> is
GPL-licensed, so if you distribute a binary of <b>pcre2test</b> linked in this
-way, there may be licensing issues. These can be avoided by linking with
-<b>libedit</b> (which has a BSD licence) instead.
+way, there may be licensing issues. These can be avoided by linking instead
+with <b>libedit</b>, which has a BSD licence.
</P>
<P>
-Setting this option causes the <b>-lreadline</b> option to be added to the
-<b>pcre2test</b> build. In many operating environments with a sytem-installed
-readline library this is sufficient. However, in some environments (e.g. if an
-unmodified distribution version of readline is in use), some extra
-configuration may be necessary. The INSTALL file for <b>libreadline</b> says
-this:
+Setting --enable-pcre2test-libreadline causes the <b>-lreadline</b> option to be
+added to the <b>pcre2test</b> build. In many operating environments with a
+sytem-installed readline library this is sufficient. However, in some
+environments (e.g. if an unmodified distribution version of readline is in
+use), some extra configuration may be necessary. The INSTALL file for
+<b>libreadline</b> says this:
<pre>
"Readline uses the termcap functions, but does not link with
the termcap or curses library itself, allowing applications
@@ -386,13 +391,13 @@ immediately before the <b>configure</b> command.
</P>
<br><a name="SEC17" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
<P>
-By adding the
+If you add
<pre>
--enable-valgrind
</pre>
-option to to the <b>configure</b> command, PCRE2 will use valgrind annotations
-to mark certain memory regions as unaddressable. This allows it to detect
-invalid memory accesses, and is mostly useful for debugging PCRE2 itself.
+to the <b>configure</b> command, PCRE2 will use valgrind annotations to mark
+certain memory regions as unaddressable. This allows it to detect invalid
+memory accesses, and is mostly useful for debugging PCRE2 itself.
</P>
<br><a name="SEC18" href="#TOC1">CODE COVERAGE REPORTING</a><br>
<P>
@@ -466,7 +471,7 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 03 November 2014
+Last updated: 23 November 2014
<br>
Copyright &copy; 1997-2014 University of Cambridge.
<br>