diff options
author | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2010-05-05 10:44:20 +0000 |
---|---|---|
committer | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2010-05-05 10:44:20 +0000 |
commit | 85b995f30cc9bf0bb04f5b3b3707a216a56b6bdf (patch) | |
tree | 6410c80d7502e73c04d5eb0fa4f8ae885e6e3449 /doc | |
parent | 2bcdcbf324bea8939d73f9b32e3625539a4d209e (diff) | |
download | pcre-85b995f30cc9bf0bb04f5b3b3707a216a56b6bdf.tar.gz |
Add new special properties Xan, Xps, Xsp, Xwd to help with \w etc.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@517 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'doc')
-rw-r--r-- | doc/pcrepattern.3 | 45 | ||||
-rw-r--r-- | doc/pcresyntax.3 | 14 |
2 files changed, 48 insertions, 11 deletions
diff --git a/doc/pcrepattern.3 b/doc/pcrepattern.3 index 083ae56..4552c59 100644 --- a/doc/pcrepattern.3 +++ b/doc/pcrepattern.3 @@ -505,10 +505,16 @@ The extra escape sequences are: \eX an extended Unicode sequence .sp The property names represented by \fIxx\fP above are limited to the Unicode -script names, the general category properties, and "Any", which matches any -character (including newline). Other properties such as "InMusicalSymbols" are -not currently supported by PCRE. Note that \eP{Any} does not match any -characters, so always causes a match failure. +script names, the general category properties, "Any", which matches any +character (including newline), and some special PCRE properties (described +in the +.\" HTML <a href="#extraprops"> +.\" </a> +next section). +.\" +Other Perl properties such as "InMusicalSymbols" are not currently supported by +PCRE. Note that \eP{Any} does not match any characters, so always causes a +match failure. .P Sets of Unicode characters are defined as belonging to certain scripts. A character from one of these sets can be matched using a script name. For @@ -613,10 +619,10 @@ Ugaritic, Vai, Yi. .P -Each character has exactly one general category property, specified by a -two-letter abbreviation. For compatibility with Perl, negation can be specified -by including a circumflex between the opening brace and the property name. For -example, \ep{^Lu} is the same as \eP{Lu}. +Each character has exactly one Unicode general category property, specified by +a two-letter abbreviation. For compatibility with Perl, negation can be +specified by including a circumflex between the opening brace and the property +name. For example, \ep{^Lu} is the same as \eP{Lu}. .P If only one letter is specified with \ep or \eP, it includes all the general category properties that start with that letter. In this case, in the absence @@ -718,6 +724,27 @@ why the traditional escape sequences such as \ed and \ew do not use Unicode properties in PCRE. . . +.\" HTML <a name="extraprops"></a> +.SS PCRE's additional properties +.rs +.sp +As well as the standard Unicode properties described in the previous +section, PCRE supports four more that make it possible to convert traditional +escape sequences such as \ew and \es and POSIX character classes to use Unicode +properties. These are: +.sp + Xan Any alphanumeric character + Xps Any POSIX space character + Xsp Any Perl space character + Xwd Any Perl "word" character +.sp +Xan matches characters that have either the L (letter) or the N (number) +property. Xps matches the characters tab, linefeed, vertical tab, formfeed, or +carriage return, and any other character that has the Z (separator) property. +Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the +same characters as Xan, plus underscore. +. +. .\" HTML <a name="resetmatchstart"></a> .SS "Resetting the match start" .rs @@ -2597,6 +2624,6 @@ Cambridge CB2 3QH, England. .rs .sp .nf -Last updated: 03 May 2010 +Last updated: 05 May 2010 Copyright (c) 1997-2010 University of Cambridge. .fi diff --git a/doc/pcresyntax.3 b/doc/pcresyntax.3 index da32b61..9de639e 100644 --- a/doc/pcresyntax.3 +++ b/doc/pcresyntax.3 @@ -45,6 +45,7 @@ syntax. \eD a character that is not a decimal digit \eh a horizontal whitespace character \eH a character that is not a horizontal whitespace character + \eN a character that is not a newline \ep{\fIxx\fP} a character with the \fIxx\fP property \eP{\fIxx\fP} a character without the \fIxx\fP property \eR a newline sequence @@ -59,7 +60,7 @@ syntax. In PCRE, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII characters. . . -.SH "GENERAL CATEGORY PROPERTY CODES FOR \ep and \eP" +.SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP" .rs .sp C Other @@ -108,6 +109,15 @@ In PCRE, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII characters. Zs Space separator . . +.SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP" +.rs +.sp + Xan Alphanumeric: union of properties L and N + Xps POSIX space: property Z or tab, NL, VT, FF, CR + Xsp Perl space: property Z or tab, NL, FF, CR + Xwd Perl word: property Xan or underscore +. +. .SH "SCRIPT NAMES FOR \ep AND \eP" .rs .sp @@ -459,6 +469,6 @@ Cambridge CB2 3QH, England. .rs .sp .nf -Last updated: 01 March 2010 +Last updated: 05 May 2010 Copyright (c) 1997-2010 University of Cambridge. .fi |