summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2010-05-05 10:44:20 +0000
committerph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2010-05-05 10:44:20 +0000
commit85b995f30cc9bf0bb04f5b3b3707a216a56b6bdf (patch)
tree6410c80d7502e73c04d5eb0fa4f8ae885e6e3449 /doc
parent2bcdcbf324bea8939d73f9b32e3625539a4d209e (diff)
downloadpcre-85b995f30cc9bf0bb04f5b3b3707a216a56b6bdf.tar.gz
Add new special properties Xan, Xps, Xsp, Xwd to help with \w etc.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@517 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'doc')
-rw-r--r--doc/pcrepattern.345
-rw-r--r--doc/pcresyntax.314
2 files changed, 48 insertions, 11 deletions
diff --git a/doc/pcrepattern.3 b/doc/pcrepattern.3
index 083ae56..4552c59 100644
--- a/doc/pcrepattern.3
+++ b/doc/pcrepattern.3
@@ -505,10 +505,16 @@ The extra escape sequences are:
\eX an extended Unicode sequence
.sp
The property names represented by \fIxx\fP above are limited to the Unicode
-script names, the general category properties, and "Any", which matches any
-character (including newline). Other properties such as "InMusicalSymbols" are
-not currently supported by PCRE. Note that \eP{Any} does not match any
-characters, so always causes a match failure.
+script names, the general category properties, "Any", which matches any
+character (including newline), and some special PCRE properties (described
+in the
+.\" HTML <a href="#extraprops">
+.\" </a>
+next section).
+.\"
+Other Perl properties such as "InMusicalSymbols" are not currently supported by
+PCRE. Note that \eP{Any} does not match any characters, so always causes a
+match failure.
.P
Sets of Unicode characters are defined as belonging to certain scripts. A
character from one of these sets can be matched using a script name. For
@@ -613,10 +619,10 @@ Ugaritic,
Vai,
Yi.
.P
-Each character has exactly one general category property, specified by a
-two-letter abbreviation. For compatibility with Perl, negation can be specified
-by including a circumflex between the opening brace and the property name. For
-example, \ep{^Lu} is the same as \eP{Lu}.
+Each character has exactly one Unicode general category property, specified by
+a two-letter abbreviation. For compatibility with Perl, negation can be
+specified by including a circumflex between the opening brace and the property
+name. For example, \ep{^Lu} is the same as \eP{Lu}.
.P
If only one letter is specified with \ep or \eP, it includes all the general
category properties that start with that letter. In this case, in the absence
@@ -718,6 +724,27 @@ why the traditional escape sequences such as \ed and \ew do not use Unicode
properties in PCRE.
.
.
+.\" HTML <a name="extraprops"></a>
+.SS PCRE's additional properties
+.rs
+.sp
+As well as the standard Unicode properties described in the previous
+section, PCRE supports four more that make it possible to convert traditional
+escape sequences such as \ew and \es and POSIX character classes to use Unicode
+properties. These are:
+.sp
+ Xan Any alphanumeric character
+ Xps Any POSIX space character
+ Xsp Any Perl space character
+ Xwd Any Perl "word" character
+.sp
+Xan matches characters that have either the L (letter) or the N (number)
+property. Xps matches the characters tab, linefeed, vertical tab, formfeed, or
+carriage return, and any other character that has the Z (separator) property.
+Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the
+same characters as Xan, plus underscore.
+.
+.
.\" HTML <a name="resetmatchstart"></a>
.SS "Resetting the match start"
.rs
@@ -2597,6 +2624,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
-Last updated: 03 May 2010
+Last updated: 05 May 2010
Copyright (c) 1997-2010 University of Cambridge.
.fi
diff --git a/doc/pcresyntax.3 b/doc/pcresyntax.3
index da32b61..9de639e 100644
--- a/doc/pcresyntax.3
+++ b/doc/pcresyntax.3
@@ -45,6 +45,7 @@ syntax.
\eD a character that is not a decimal digit
\eh a horizontal whitespace character
\eH a character that is not a horizontal whitespace character
+ \eN a character that is not a newline
\ep{\fIxx\fP} a character with the \fIxx\fP property
\eP{\fIxx\fP} a character without the \fIxx\fP property
\eR a newline sequence
@@ -59,7 +60,7 @@ syntax.
In PCRE, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII characters.
.
.
-.SH "GENERAL CATEGORY PROPERTY CODES FOR \ep and \eP"
+.SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
.rs
.sp
C Other
@@ -108,6 +109,15 @@ In PCRE, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII characters.
Zs Space separator
.
.
+.SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
+.rs
+.sp
+ Xan Alphanumeric: union of properties L and N
+ Xps POSIX space: property Z or tab, NL, VT, FF, CR
+ Xsp Perl space: property Z or tab, NL, FF, CR
+ Xwd Perl word: property Xan or underscore
+.
+.
.SH "SCRIPT NAMES FOR \ep AND \eP"
.rs
.sp
@@ -459,6 +469,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
-Last updated: 01 March 2010
+Last updated: 05 May 2010
Copyright (c) 1997-2010 University of Cambridge.
.fi