diff options
author | ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> | 2014-11-23 18:38:38 +0000 |
---|---|---|
committer | ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> | 2014-11-23 18:38:38 +0000 |
commit | 469ce4c0cdbb50172723c20b6ce2590a5e593023 (patch) | |
tree | 59e68ed0cc94d6367c7d19e230a778b58a89e46b /doc/html/pcre2build.html | |
parent | ed4ed4376d5c874b42ca5817e91189b6ca1c7298 (diff) | |
download | pcre2-469ce4c0cdbb50172723c20b6ce2590a5e593023.tar.gz |
More documentation and test updates.
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@158 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'doc/html/pcre2build.html')
-rw-r--r-- | doc/html/pcre2build.html | 115 |
1 files changed, 60 insertions, 55 deletions
diff --git a/doc/html/pcre2build.html b/doc/html/pcre2build.html index c6ba6de..b87dad7 100644 --- a/doc/html/pcre2build.html +++ b/doc/html/pcre2build.html @@ -17,9 +17,9 @@ please consult the man page, in case the conversion went wrong. <li><a name="TOC2" href="#SEC2">PCRE2 BUILD-TIME OPTIONS</a> <li><a name="TOC3" href="#SEC3">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a> <li><a name="TOC4" href="#SEC4">BUILDING SHARED AND STATIC LIBRARIES</a> -<li><a name="TOC5" href="#SEC5">Unicode and UTF SUPPORT</a> +<li><a name="TOC5" href="#SEC5">UNICODE AND UTF SUPPORT</a> <li><a name="TOC6" href="#SEC6">JUST-IN-TIME COMPILER SUPPORT</a> -<li><a name="TOC7" href="#SEC7">CODE VALUE OF NEWLINE</a> +<li><a name="TOC7" href="#SEC7">NEWLINE RECOGNITION</a> <li><a name="TOC8" href="#SEC8">WHAT \R MATCHES</a> <li><a name="TOC9" href="#SEC9">HANDLING VERY LARGE PATTERNS</a> <li><a name="TOC10" href="#SEC10">AVOIDING EXCESSIVE STACK USAGE</a> @@ -91,12 +91,12 @@ respectively. These can be interpreted either as single-unit characters or UTF-16/UTF-32 strings. To build these additional libraries, add one or both of the following to the <b>configure</b> command: <pre> - --enable-pcre16 - --enable-pcre32 + --enable-pcre2-16 + --enable-pcre2-32 </pre> If you do not want the 8-bit library, add <pre> - --disable-pcre8 + --disable-pcre2-8 </pre> as well. At least one of the three libraries must be built. Note that the POSIX wrapper is for the 8-bit library only, and that <b>pcre2grep</b> is an 8-bit @@ -106,14 +106,15 @@ libraries. <br><a name="SEC4" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br> <P> The Autotools PCRE2 building process uses <b>libtool</b> to build both shared -and static libraries by default. You can suppress one of these by adding one of +and static libraries by default. You can suppress an unwanted library by adding +one of <pre> --disable-shared --disable-static </pre> -to the <b>configure</b> command, as required. +to the <b>configure</b> command. </P> -<br><a name="SEC5" href="#TOC1">Unicode and UTF SUPPORT</a><br> +<br><a name="SEC5" href="#TOC1">UNICODE AND UTF SUPPORT</a><br> <P> By default, PCRE2 is built with support for Unicode and UTF character strings. To build it without Unicode support, add @@ -126,20 +127,15 @@ in the same configuration. </P> <P> Of itself, Unicode support does not make PCRE2 treat strings as UTF-8, UTF-16 -or UTF-32. To do that you have have to set the PCRE2_UTF option when you call -<b>pcre2_compile()</b> to compile a pattern. +or UTF-32. To do that, applications that use the library have to set the +PCRE2_UTF option when they call <b>pcre2_compile()</b> to compile a pattern. </P> <P> -It is not possible to support both EBCDIC and UTF-8 codes in the same version -of the library. Consequently, --enable-unicode and --enable-ebcdic are mutually -exclusive. -</P> -<P> -UTF support allows the libraries to process character codepoints up to 0x10ffff -in the strings that they handle. It also provides support for accessing the -properties of such characters, using pattern escapes such as \P, \p, and \X. -Only the general category properties such as <i>Lu</i> and <i>Nd</i> are -supported. Details are given in the +UTF support allows the libraries to process character code points up to +0x10ffff in the strings that they handle. It also provides support for +accessing the Unicode properties of such characters, using pattern escapes such +as \P, \p, and \X. Only the general category properties such as <i>Lu</i> and +<i>Nd</i> are supported. Details are given in the <a href="pcre2pattern.html"><b>pcre2pattern</b></a> documentation. </P> @@ -150,7 +146,7 @@ Just-in-time compiler support is included in the build by specifying --enable-jit </pre> This support is available only for certain hardware architectures. If this -option is set for an unsupported architecture, a compile time error occurs. +option is set for an unsupported architecture, a building error occurs. See the <a href="pcre2jit.html"><b>pcre2jit</b></a> documentation for a discussion of JIT usage. When JIT support is enabled, @@ -160,7 +156,7 @@ pcre2grep automatically makes use of it, unless you add </pre> to the "configure" command. </P> -<br><a name="SEC7" href="#TOC1">CODE VALUE OF NEWLINE</a><br> +<br><a name="SEC7" href="#TOC1">NEWLINE RECOGNITION</a><br> <P> By default, PCRE2 interprets the linefeed (LF) character as indicating the end of a line. This is the normal newline character on Unix-like systems. You can @@ -168,12 +164,13 @@ compile PCRE2 to use carriage return (CR) instead, by adding <pre> --enable-newline-is-cr </pre> -to the <b>configure</b> command. There is also a --enable-newline-is-lf option, +to the <b>configure</b> command. There is also an --enable-newline-is-lf option, which explicitly specifies linefeed as the newline character. -<br> -<br> -Alternatively, you can specify that line endings are to be indicated by the two -character sequence CRLF. If you want this, add +</P> +<P> +Alternatively, you can specify that line endings are to be indicated by the +two-character sequence CRLF (CR immediately followed by LF). If you want this, +add <pre> --enable-newline-is-crlf </pre> @@ -186,22 +183,26 @@ indicating a line ending. Finally, a fifth option, specified by <pre> --enable-newline-is-any </pre> -causes PCRE2 to recognize any Unicode newline sequence. +causes PCRE2 to recognize any Unicode newline sequence. The Unicode newline +sequences are the three just mentioned, plus the single characters VT (vertical +tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line +separator, U+2028), and PS (paragraph separator, U+2029). </P> <P> -Whatever line ending convention is selected when PCRE2 is built can be -overridden when the library functions are called. At build time it is +Whatever default line ending convention is selected when PCRE2 is built can be +overridden by applications that use the library. At build time it is conventional to use the standard for your operating system. </P> <br><a name="SEC8" href="#TOC1">WHAT \R MATCHES</a><br> <P> By default, the sequence \R in a pattern matches any Unicode newline sequence, -whatever has been selected as the line ending sequence. If you specify +independently of what has been selected as the line ending sequence. If you +specify <pre> --enable-bsr-anycrlf </pre> the default is changed so that \R matches only CR, LF, or CRLF. Whatever is -selected when PCRE2 is built can be overridden when the library functions are +selected when PCRE2 is built can be overridden by applications that use the called. </P> <br><a name="SEC9" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br> @@ -210,10 +211,10 @@ Within a compiled pattern, offset values are used to point from one part to another (for example, from an opening parenthesis to an alternation metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values are used for these offsets, leading to a maximum size for a compiled pattern of -around 64K. This is sufficient to handle all but the most gigantic patterns. -Nevertheless, some people do want to process truly enormous patterns, so it is -possible to compile PCRE2 to use three-byte or four-byte offsets by adding a -setting such as +around 64K code units. This is sufficient to handle all but the most gigantic +patterns. Nevertheless, some people do want to process truly enormous patterns, +so it is possible to compile PCRE2 to use three-byte or four-byte offsets by +adding a setting such as <pre> --with-link-size=3 </pre> @@ -294,16 +295,20 @@ hand".) <br><a name="SEC13" href="#TOC1">USING EBCDIC CODE</a><br> <P> PCRE2 assumes by default that it will run in an environment where the character -code is ASCII (or Unicode, which is a superset of ASCII). This is the case for +code is ASCII or Unicode, which is a superset of ASCII. This is the case for most computer operating systems. PCRE2 can, however, be compiled to run in an -EBCDIC environment by adding +8-bit EBCDIC environment by adding <pre> --enable-ebcdic --disable-unicode </pre> to the <b>configure</b> command. This setting implies --enable-rebuild-chartables. You should only use it if you know that you are in -an EBCDIC environment (for example, an IBM mainframe operating system). The ---enable-ebcdic option is incompatible with Unicode support. +an EBCDIC environment (for example, an IBM mainframe operating system). +</P> +<P> +It is not possible to support both EBCDIC and UTF-8 codes in the same version +of the library. Consequently, --enable-unicode and --enable-ebcdic are mutually +exclusive. </P> <P> The EBCDIC character that corresponds to an ASCII LF is assumed to have the @@ -347,8 +352,8 @@ parameter value by adding, for example, <pre> --with-pcre2grep-bufsize=50K </pre> -to the <b>configure</b> command. The caller of \fPpcre2grep\fP can, however, -override this value by specifying a run-time option. +to the <b>configure</b> command. The caller of \fPpcre2grep\fP can override this +value by using --buffer-size on the command line.. </P> <br><a name="SEC16" href="#TOC1">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a><br> <P> @@ -362,16 +367,16 @@ to the <b>configure</b> command, <b>pcre2test</b> is linked with the from a terminal, it reads it using the <b>readline()</b> function. This provides line-editing and history facilities. Note that <b>libreadline</b> is GPL-licensed, so if you distribute a binary of <b>pcre2test</b> linked in this -way, there may be licensing issues. These can be avoided by linking with -<b>libedit</b> (which has a BSD licence) instead. +way, there may be licensing issues. These can be avoided by linking instead +with <b>libedit</b>, which has a BSD licence. </P> <P> -Setting this option causes the <b>-lreadline</b> option to be added to the -<b>pcre2test</b> build. In many operating environments with a sytem-installed -readline library this is sufficient. However, in some environments (e.g. if an -unmodified distribution version of readline is in use), some extra -configuration may be necessary. The INSTALL file for <b>libreadline</b> says -this: +Setting --enable-pcre2test-libreadline causes the <b>-lreadline</b> option to be +added to the <b>pcre2test</b> build. In many operating environments with a +sytem-installed readline library this is sufficient. However, in some +environments (e.g. if an unmodified distribution version of readline is in +use), some extra configuration may be necessary. The INSTALL file for +<b>libreadline</b> says this: <pre> "Readline uses the termcap functions, but does not link with the termcap or curses library itself, allowing applications @@ -386,13 +391,13 @@ immediately before the <b>configure</b> command. </P> <br><a name="SEC17" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br> <P> -By adding the +If you add <pre> --enable-valgrind </pre> -option to to the <b>configure</b> command, PCRE2 will use valgrind annotations -to mark certain memory regions as unaddressable. This allows it to detect -invalid memory accesses, and is mostly useful for debugging PCRE2 itself. +to the <b>configure</b> command, PCRE2 will use valgrind annotations to mark +certain memory regions as unaddressable. This allows it to detect invalid +memory accesses, and is mostly useful for debugging PCRE2 itself. </P> <br><a name="SEC18" href="#TOC1">CODE COVERAGE REPORTING</a><br> <P> @@ -466,7 +471,7 @@ Cambridge, England. </P> <br><a name="SEC21" href="#TOC1">REVISION</a><br> <P> -Last updated: 03 November 2014 +Last updated: 23 November 2014 <br> Copyright © 1997-2014 University of Cambridge. <br> |