diff options
author | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2013-11-19 15:36:57 +0000 |
---|---|---|
committer | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2013-11-19 15:36:57 +0000 |
commit | 3d0a81dbf14c870df3ab19470bee0c29e1999521 (patch) | |
tree | 511475cb8f16f29d723c3fed946eb744895e2fed /doc/html/pcresyntax.html | |
parent | 840d6a79dec6e3b5c1324207043fb93dea810223 (diff) | |
download | pcre-3d0a81dbf14c870df3ab19470bee0c29e1999521.tar.gz |
Source tidies for 8.34-RC1.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@1404 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'doc/html/pcresyntax.html')
-rw-r--r-- | doc/html/pcresyntax.html | 26 |
1 files changed, 18 insertions, 8 deletions
diff --git a/doc/html/pcresyntax.html b/doc/html/pcresyntax.html index b32e8b1..0764a33 100644 --- a/doc/html/pcresyntax.html +++ b/doc/html/pcresyntax.html @@ -65,10 +65,14 @@ documentation. This document contains a quick-reference summary of the syntax. \n newline (hex 0A) \r carriage return (hex 0D) \t tab (hex 09) + \0dd character with octal code 0dd \ddd character with octal code ddd, or backreference + \o{ddd..} character with octal code ddd.. \xhh character with hex code hh \x{hhh..} character with hex code hhh.. -</PRE> +</pre> +Note that \0dd is always an octal code, and that \8 and \9 are the literal +characters "8" and "9". </P> <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br> <P> @@ -92,9 +96,11 @@ documentation. This document contains a quick-reference summary of the syntax. \W a "non-word" character \X a Unicode extended grapheme cluster </pre> -In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII -characters, even in a UTF mode. However, this can be changed by setting the -PCRE_UCP option. +By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode +or in the 16- bit and 32-bit libraries. However, if locale-specific matching is +happening, \s and \w may also match characters with code points in the range +128-255. If the PCRE_UCP option is set, the behaviour of these escape sequences +is changed to use Unicode properties and they match many more characters. </P> <br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br> <P> @@ -150,11 +156,13 @@ PCRE_UCP option. <pre> Xan Alphanumeric: union of properties L and N Xps POSIX space: property Z or tab, NL, VT, FF, CR - Xsp Perl space: property Z or tab, NL, FF, CR + Xsp Perl space: property Z or tab, NL, VT, FF, CR Xuc Univerally-named character: one that can be represented by a Universal Character Name Xwd Perl word: property Xan or underscore -</PRE> +</pre> +Perl and POSIX space are now the same. Perl added VT to its space character set +at release 5.18 and PCRE changed at release 8.34. </P> <br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br> <P> @@ -385,7 +393,9 @@ newline-setting options with similar syntax: (*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32) (*UTF) set appropriate UTF mode for the library in use (*UCP) set PCRE_UCP (use Unicode properties for \d etc) -</PRE> +</pre> +Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the +limits set by the caller of pcre_exec(), not increase them. </P> <br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br> <P> @@ -516,7 +526,7 @@ Cambridge CB2 3QH, England. </P> <br><a name="SEC27" href="#TOC1">REVISION</a><br> <P> -Last updated: 26 April 2013 +Last updated: 12 November 2013 <br> Copyright © 1997-2013 University of Cambridge. <br> |