summaryrefslogtreecommitdiff
path: root/doc/pcrebuild.3
diff options
context:
space:
mode:
Diffstat (limited to 'doc/pcrebuild.3')
-rw-r--r--doc/pcrebuild.3112
1 files changed, 112 insertions, 0 deletions
diff --git a/doc/pcrebuild.3 b/doc/pcrebuild.3
new file mode 100644
index 0000000..52ebefb
--- /dev/null
+++ b/doc/pcrebuild.3
@@ -0,0 +1,112 @@
+.TH PCRE 3
+.SH NAME
+PCRE - Perl-compatible regular expressions
+.SH PCRE BUILD-TIME OPTIONS
+.rs
+.sp
+This document describes the optional features of PCRE that can be selected when
+the library is compiled. They are all selected, or deselected, by providing
+options to the \fBconfigure\fR script which is run before the \fBmake\fR
+command. The complete list of options for \fBconfigure\fR (which includes the
+standard ones such as the selection of the installation directory) can be
+obtained by running
+
+ ./configure --help
+
+The following sections describe certain options whose names begin with --enable
+or --disable. These settings specify changes to the defaults for the
+\fBconfigure\fR command. Because of the way that \fBconfigure\fR works,
+--enable and --disable always come in pairs, so the complementary option always
+exists as well, but as it specifies the default, it is not described.
+
+.SH UTF-8 SUPPORT
+.rs
+.sp
+To build PCRE with support for UTF-8 character strings, add
+
+ --enable-utf8
+
+to the \fBconfigure\fR command. Of itself, this does not make PCRE treat
+strings as UTF-8. As well as compiling PCRE with this option, you also have
+have to set the PCRE_UTF8 option when you call the \fBpcre_compile()\fR
+function.
+
+.SH CODE VALUE OF NEWLINE
+.rs
+.sp
+By default, PCRE treats character 10 (linefeed) as the newline character. This
+is the normal newline character on Unix-like systems. You can compile PCRE to
+use character 13 (carriage return) instead by adding
+
+ --enable-newline-is-cr
+
+to the \fBconfigure\fR command. For completeness there is also a
+--enable-newline-is-lf option, which explicitly specifies linefeed as the
+newline character.
+
+.SH BUILDING SHARED AND STATIC LIBRARIES
+.rs
+.sp
+The PCRE building process uses \fBlibtool\fR to build both shared and static
+Unix libraries by default. You can suppress one of these by adding one of
+
+ --disable-shared
+ --disable-static
+
+to the \fBconfigure\fR command, as required.
+
+.SH POSIX MALLOC USAGE
+.rs
+.sp
+When PCRE is called through the POSIX interface (see the \fBpcreposix\fR
+documentation), additional working storage is required for holding the pointers
+to capturing substrings because PCRE requires three integers per substring,
+whereas the POSIX interface provides only two. If the number of expected
+substrings is small, the wrapper function uses space on the stack, because this
+is faster than using \fBmalloc()\fR for each call. The default threshold above
+which the stack is no longer used is 10; it can be changed by adding a setting
+such as
+
+ --with-posix-malloc-threshold=20
+
+to the \fBconfigure\fR command.
+
+.SH LIMITING PCRE RESOURCE USAGE
+.rs
+.sp
+Internally, PCRE has a function called \fBmatch()\fR which it calls repeatedly
+(possibly recursively) when performing a matching operation. By limiting the
+number of times this function may be called, a limit can be placed on the
+resources used by a single call to \fBpcre_exec()\fR. The limit can be changed
+at run time, as described in the \fBpcreapi\fR documentation. The default is 10
+million, but this can be changed by adding a setting such as
+
+ --with-match-limit=500000
+
+to the \fBconfigure\fR command.
+
+.SH HANDLING VERY LARGE PATTERNS
+.rs
+.sp
+Within a compiled pattern, offset values are used to point from one part to
+another (for example, from an opening parenthesis to an alternation
+metacharacter). By default two-byte values are used for these offsets, leading
+to a maximum size for a compiled pattern of around 64K. This is sufficient to
+handle all but the most gigantic patterns. Nevertheless, some people do want to
+process enormous patterns, so it is possible to compile PCRE to use three-byte
+or four-byte offsets by adding a setting such as
+
+ --with-link-size=3
+
+to the \fBconfigure\fR command. The value given must be 2, 3, or 4. Using
+longer offsets slows down the operation of PCRE because it has to load
+additional bytes when handling them.
+
+If you build PCRE with an increased link size, test 2 (and test 5 if you are
+using UTF-8) will fail. Part of the output of these tests is a representation
+of the compiled pattern, and this changes with the link size.
+
+.in 0
+Last updated: 21 January 2003
+.br
+Copyright (c) 1997-2003 University of Cambridge.