summaryrefslogtreecommitdiff
path: root/doc/html/pcresyntax.html
diff options
context:
space:
mode:
authorph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2010-05-18 15:47:01 +0000
committerph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2010-05-18 15:47:01 +0000
commit8f8b41c565c70ab99dd21a81f71a512958f867b5 (patch)
treebb68328d748e596f734ebd085e4b3f648452ab61 /doc/html/pcresyntax.html
parent85b995f30cc9bf0bb04f5b3b3707a216a56b6bdf (diff)
downloadpcre-8f8b41c565c70ab99dd21a81f71a512958f867b5.tar.gz
Added PCRE_UCP and related stuff to make \w etc use Unicode properties.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@518 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'doc/html/pcresyntax.html')
-rw-r--r--doc/html/pcresyntax.html101
1 files changed, 56 insertions, 45 deletions
diff --git a/doc/html/pcresyntax.html b/doc/html/pcresyntax.html
index 1a2749f..ad4399d 100644
--- a/doc/html/pcresyntax.html
+++ b/doc/html/pcresyntax.html
@@ -17,28 +17,29 @@ man page, in case the conversion went wrong.
<li><a name="TOC2" href="#SEC2">QUOTING</a>
<li><a name="TOC3" href="#SEC3">CHARACTERS</a>
<li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
-<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a>
-<li><a name="TOC6" href="#SEC6">SCRIPT NAMES FOR \p AND \P</a>
-<li><a name="TOC7" href="#SEC7">CHARACTER CLASSES</a>
-<li><a name="TOC8" href="#SEC8">QUANTIFIERS</a>
-<li><a name="TOC9" href="#SEC9">ANCHORS AND SIMPLE ASSERTIONS</a>
-<li><a name="TOC10" href="#SEC10">MATCH POINT RESET</a>
-<li><a name="TOC11" href="#SEC11">ALTERNATION</a>
-<li><a name="TOC12" href="#SEC12">CAPTURING</a>
-<li><a name="TOC13" href="#SEC13">ATOMIC GROUPS</a>
-<li><a name="TOC14" href="#SEC14">COMMENT</a>
-<li><a name="TOC15" href="#SEC15">OPTION SETTING</a>
-<li><a name="TOC16" href="#SEC16">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
-<li><a name="TOC17" href="#SEC17">BACKREFERENCES</a>
-<li><a name="TOC18" href="#SEC18">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
-<li><a name="TOC19" href="#SEC19">CONDITIONAL PATTERNS</a>
-<li><a name="TOC20" href="#SEC20">BACKTRACKING CONTROL</a>
-<li><a name="TOC21" href="#SEC21">NEWLINE CONVENTIONS</a>
-<li><a name="TOC22" href="#SEC22">WHAT \R MATCHES</a>
-<li><a name="TOC23" href="#SEC23">CALLOUTS</a>
-<li><a name="TOC24" href="#SEC24">SEE ALSO</a>
-<li><a name="TOC25" href="#SEC25">AUTHOR</a>
-<li><a name="TOC26" href="#SEC26">REVISION</a>
+<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
+<li><a name="TOC6" href="#SEC6">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
+<li><a name="TOC7" href="#SEC7">SCRIPT NAMES FOR \p AND \P</a>
+<li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a>
+<li><a name="TOC9" href="#SEC9">QUANTIFIERS</a>
+<li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a>
+<li><a name="TOC11" href="#SEC11">MATCH POINT RESET</a>
+<li><a name="TOC12" href="#SEC12">ALTERNATION</a>
+<li><a name="TOC13" href="#SEC13">CAPTURING</a>
+<li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a>
+<li><a name="TOC15" href="#SEC15">COMMENT</a>
+<li><a name="TOC16" href="#SEC16">OPTION SETTING</a>
+<li><a name="TOC17" href="#SEC17">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
+<li><a name="TOC18" href="#SEC18">BACKREFERENCES</a>
+<li><a name="TOC19" href="#SEC19">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
+<li><a name="TOC20" href="#SEC20">CONDITIONAL PATTERNS</a>
+<li><a name="TOC21" href="#SEC21">BACKTRACKING CONTROL</a>
+<li><a name="TOC22" href="#SEC22">NEWLINE CONVENTIONS</a>
+<li><a name="TOC23" href="#SEC23">WHAT \R MATCHES</a>
+<li><a name="TOC24" href="#SEC24">CALLOUTS</a>
+<li><a name="TOC25" href="#SEC25">SEE ALSO</a>
+<li><a name="TOC26" href="#SEC26">AUTHOR</a>
+<li><a name="TOC27" href="#SEC27">REVISION</a>
</ul>
<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
<P>
@@ -80,6 +81,7 @@ syntax.
\D a character that is not a decimal digit
\h a horizontal whitespace character
\H a character that is not a horizontal whitespace character
+ \N a character that is not a newline
\p{<i>xx</i>} a character with the <i>xx</i> property
\P{<i>xx</i>} a character without the <i>xx</i> property
\R a newline sequence
@@ -93,7 +95,7 @@ syntax.
</pre>
In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.
</P>
-<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a><br>
+<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br>
<P>
<pre>
C Other
@@ -142,7 +144,16 @@ In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.
Zs Space separator
</PRE>
</P>
-<br><a name="SEC6" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
+<br><a name="SEC6" href="#TOC1">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a><br>
+<P>
+<pre>
+ Xan Alphanumeric: union of properties L and N
+ Xps POSIX space: property Z or tab, NL, VT, FF, CR
+ Xsp Perl space: property Z or tab, NL, FF, CR
+ Xwd Perl word: property Xan or underscore
+</PRE>
+</P>
+<br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
<P>
Arabic,
Armenian,
@@ -237,7 +248,7 @@ Ugaritic,
Vai,
Yi.
</P>
-<br><a name="SEC7" href="#TOC1">CHARACTER CLASSES</a><br>
+<br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>
<P>
<pre>
[...] positive character class
@@ -264,7 +275,7 @@ Yi.
In PCRE, POSIX character set names recognize only ASCII characters. You can use
\Q...\E inside a character class.
</P>
-<br><a name="SEC8" href="#TOC1">QUANTIFIERS</a><br>
+<br><a name="SEC9" href="#TOC1">QUANTIFIERS</a><br>
<P>
<pre>
? 0 or 1, greedy
@@ -285,7 +296,7 @@ In PCRE, POSIX character set names recognize only ASCII characters. You can use
{n,}? n or more, lazy
</PRE>
</P>
-<br><a name="SEC9" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
+<br><a name="SEC10" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
<P>
<pre>
\b word boundary (only ASCII letters recognized)
@@ -302,19 +313,19 @@ In PCRE, POSIX character set names recognize only ASCII characters. You can use
\G first matching position in subject
</PRE>
</P>
-<br><a name="SEC10" href="#TOC1">MATCH POINT RESET</a><br>
+<br><a name="SEC11" href="#TOC1">MATCH POINT RESET</a><br>
<P>
<pre>
\K reset start of match
</PRE>
</P>
-<br><a name="SEC11" href="#TOC1">ALTERNATION</a><br>
+<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
<P>
<pre>
expr|expr|expr...
</PRE>
</P>
-<br><a name="SEC12" href="#TOC1">CAPTURING</a><br>
+<br><a name="SEC13" href="#TOC1">CAPTURING</a><br>
<P>
<pre>
(...) capturing group
@@ -326,19 +337,19 @@ In PCRE, POSIX character set names recognize only ASCII characters. You can use
capturing groups in each alternative
</PRE>
</P>
-<br><a name="SEC13" href="#TOC1">ATOMIC GROUPS</a><br>
+<br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br>
<P>
<pre>
(?&#62;...) atomic, non-capturing group
</PRE>
</P>
-<br><a name="SEC14" href="#TOC1">COMMENT</a><br>
+<br><a name="SEC15" href="#TOC1">COMMENT</a><br>
<P>
<pre>
(?#....) comment (not nestable)
</PRE>
</P>
-<br><a name="SEC15" href="#TOC1">OPTION SETTING</a><br>
+<br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br>
<P>
<pre>
(?i) caseless
@@ -355,7 +366,7 @@ newline-setting options with similar syntax:
(*UTF8) set UTF-8 mode
</PRE>
</P>
-<br><a name="SEC16" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
+<br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
<P>
<pre>
(?=...) positive look ahead
@@ -365,7 +376,7 @@ newline-setting options with similar syntax:
</pre>
Each top-level branch of a look behind must be of a fixed length.
</P>
-<br><a name="SEC17" href="#TOC1">BACKREFERENCES</a><br>
+<br><a name="SEC18" href="#TOC1">BACKREFERENCES</a><br>
<P>
<pre>
\n reference by number (can be ambiguous)
@@ -379,7 +390,7 @@ Each top-level branch of a look behind must be of a fixed length.
(?P=name) reference by name (Python)
</PRE>
</P>
-<br><a name="SEC18" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
+<br><a name="SEC19" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
<P>
<pre>
(?R) recurse whole pattern
@@ -398,7 +409,7 @@ Each top-level branch of a look behind must be of a fixed length.
\g'-n' call subpattern by relative number (PCRE extension)
</PRE>
</P>
-<br><a name="SEC19" href="#TOC1">CONDITIONAL PATTERNS</a><br>
+<br><a name="SEC20" href="#TOC1">CONDITIONAL PATTERNS</a><br>
<P>
<pre>
(?(condition)yes-pattern)
@@ -417,7 +428,7 @@ Each top-level branch of a look behind must be of a fixed length.
(?(assert)... assertion condition
</PRE>
</P>
-<br><a name="SEC20" href="#TOC1">BACKTRACKING CONTROL</a><br>
+<br><a name="SEC21" href="#TOC1">BACKTRACKING CONTROL</a><br>
<P>
The following act immediately they are reached:
<pre>
@@ -435,7 +446,7 @@ pattern is not anchored.
(*THEN) local failure, backtrack to next alternation
</PRE>
</P>
-<br><a name="SEC21" href="#TOC1">NEWLINE CONVENTIONS</a><br>
+<br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br>
<P>
These are recognized only at the very start of the pattern or after a
(*BSR_...) or (*UTF8) option.
@@ -447,7 +458,7 @@ These are recognized only at the very start of the pattern or after a
(*ANY) any Unicode newline sequence
</PRE>
</P>
-<br><a name="SEC22" href="#TOC1">WHAT \R MATCHES</a><br>
+<br><a name="SEC23" href="#TOC1">WHAT \R MATCHES</a><br>
<P>
These are recognized only at the very start of the pattern or after a
(*...) option that sets the newline convention or UTF-8 mode.
@@ -456,19 +467,19 @@ These are recognized only at the very start of the pattern or after a
(*BSR_UNICODE) any Unicode newline sequence
</PRE>
</P>
-<br><a name="SEC23" href="#TOC1">CALLOUTS</a><br>
+<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
<P>
<pre>
(?C) callout
(?Cn) callout with data n
</PRE>
</P>
-<br><a name="SEC24" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC25" href="#TOC1">SEE ALSO</a><br>
<P>
<b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),
<b>pcrematching</b>(3), <b>pcre</b>(3).
</P>
-<br><a name="SEC25" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC26" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
@@ -477,9 +488,9 @@ University Computing Service
Cambridge CB2 3QH, England.
<br>
</P>
-<br><a name="SEC26" href="#TOC1">REVISION</a><br>
+<br><a name="SEC27" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 01 March 2010
+Last updated: 05 May 2010
<br>
Copyright &copy; 1997-2010 University of Cambridge.
<br>