summaryrefslogtreecommitdiff
path: root/doc/html/pcrepattern.html
diff options
context:
space:
mode:
authorph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2008-04-28 15:10:02 +0000
committerph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2008-04-28 15:10:02 +0000
commit5866158e01cc19c2a8fff7fffa61de5376a938d0 (patch)
tree7759638de83997a18a99299a741082b8e5b32477 /doc/html/pcrepattern.html
parentccea1b4ed51d39d72efa77127d0ebbc10c1ea7fe (diff)
downloadpcre-5866158e01cc19c2a8fff7fffa61de5376a938d0.tar.gz
Tidies for the 7.7-RC1 distribution.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@345 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'doc/html/pcrepattern.html')
-rw-r--r--doc/html/pcrepattern.html75
1 files changed, 58 insertions, 17 deletions
diff --git a/doc/html/pcrepattern.html b/doc/html/pcrepattern.html
index 237816f..9cc055c 100644
--- a/doc/html/pcrepattern.html
+++ b/doc/html/pcrepattern.html
@@ -35,18 +35,25 @@ man page, in case the conversion went wrong.
<li><a name="TOC20" href="#SEC20">COMMENTS</a>
<li><a name="TOC21" href="#SEC21">RECURSIVE PATTERNS</a>
<li><a name="TOC22" href="#SEC22">SUBPATTERNS AS SUBROUTINES</a>
-<li><a name="TOC23" href="#SEC23">CALLOUTS</a>
-<li><a name="TOC24" href="#SEC24">BACKTRACKING CONTROL</a>
-<li><a name="TOC25" href="#SEC25">SEE ALSO</a>
-<li><a name="TOC26" href="#SEC26">AUTHOR</a>
-<li><a name="TOC27" href="#SEC27">REVISION</a>
+<li><a name="TOC23" href="#SEC23">ONIGURUMA SUBROUTINE SYNTAX</a>
+<li><a name="TOC24" href="#SEC24">CALLOUTS</a>
+<li><a name="TOC25" href="#SEC25">BACKTRACKING CONTROL</a>
+<li><a name="TOC26" href="#SEC26">SEE ALSO</a>
+<li><a name="TOC27" href="#SEC27">AUTHOR</a>
+<li><a name="TOC28" href="#SEC28">REVISION</a>
</ul>
<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br>
<P>
The syntax and semantics of the regular expressions that are supported by PCRE
are described in detail below. There is a quick-reference syntax summary in the
<a href="pcresyntax.html"><b>pcresyntax</b></a>
-page. Perl's regular expressions are described in its own documentation, and
+page. PCRE tries to match Perl syntax and semantics as closely as it can. PCRE
+also supports some alternative regular expression syntax (which does not
+conflict with the Perl syntax) in order to provide some compatibility with
+regular expressions in Python, .NET, and Oniguruma.
+</P>
+<P>
+Perl's regular expressions are described in its own documentation, and
regular expressions in general are covered in a number of books, some of which
have copious examples. Jeffrey Friedl's "Mastering Regular Expressions",
published by O'Reilly, covers regular expressions in great detail. This
@@ -312,6 +319,17 @@ following the discussion of
<a href="#subpattern">parenthesized subpatterns.</a>
</P>
<br><b>
+Absolute and relative subroutine calls
+</b><br>
+<P>
+For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or
+a number enclosed either in angle brackets or single quotes, is an alternative
+syntax for referencing a subpattern as a "subroutine". Details are discussed
+<a href="#onigurumasubroutines">later.</a>
+Note that \g{...} (Perl syntax) and \g&#60;...&#62; (Oniguruma syntax) are <i>not</i>
+synonymous. The former is a back reference; the latter is a subroutine call.
+</P>
+<br><b>
Generic character types
</b><br>
<P>
@@ -1231,7 +1249,11 @@ which may be several bytes long (and they may be of different lengths).
</P>
<P>
The quantifier {0} is permitted, causing the expression to behave as if the
-previous item and the quantifier were not present.
+previous item and the quantifier were not present. This may be useful for
+subpatterns that are referenced as
+<a href="#subpatternsassubroutines">subroutines</a>
+from elsewhere in the pattern. Items other than subpatterns that have a {0}
+quantifier are omitted from the compiled pattern.
</P>
<P>
For convenience, the three most common quantifiers have single-character
@@ -2031,8 +2053,26 @@ changed for different calls. For example, consider this pattern:
</pre>
It matches "abcabc". It does not match "abcABC" because the change of
processing option does not affect the called subpattern.
+<a name="onigurumasubroutines"></a></P>
+<br><a name="SEC23" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br>
+<P>
+For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or
+a number enclosed either in angle brackets or single quotes, is an alternative
+syntax for referencing a subpattern as a subroutine, possibly recursively. Here
+are two of the examples used above, rewritten using this syntax:
+<pre>
+ (?&#60;pn&#62; \( ( (?&#62;[^()]+) | \g&#60;pn&#62; )* \) )
+ (sens|respons)e and \g'1'ibility
+</pre>
+PCRE supports an extension to Oniguruma: if a number is preceded by a
+plus or a minus sign it is taken as a relative reference. For example:
+<pre>
+ (abc)(?i:\g&#60;-1&#62;)
+</pre>
+Note that \g{...} (Perl syntax) and \g&#60;...&#62; (Oniguruma syntax) are <i>not</i>
+synonymous. The former is a back reference; the latter is a subroutine call.
</P>
-<br><a name="SEC23" href="#TOC1">CALLOUTS</a><br>
+<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
<P>
Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl
code to be obeyed in the middle of matching a regular expression. This makes it
@@ -2067,7 +2107,7 @@ description of the interface to the callout function is given in the
<a href="pcrecallout.html"><b>pcrecallout</b></a>
documentation.
</P>
-<br><a name="SEC24" href="#TOC1">BACKTRACKING CONTROL</a><br>
+<br><a name="SEC25" href="#TOC1">BACKTRACKING CONTROL</a><br>
<P>
Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which
are described in the Perl documentation as "experimental and subject to change
@@ -2076,9 +2116,10 @@ production code should be noted to avoid problems during upgrades." The same
remarks apply to the PCRE features described in this section.
</P>
<P>
-Since these verbs are specifically related to backtracking, they can be used
-only when the pattern is to be matched using <b>pcre_exec()</b>, which uses a
-backtracking algorithm. They cause an error if encountered by
+Since these verbs are specifically related to backtracking, most of them can be
+used only when the pattern is to be matched using <b>pcre_exec()</b>, which uses
+a backtracking algorithm. With the exception of (*FAIL), which behaves like a
+failing negative assertion, they cause an error if encountered by
<b>pcre_dfa_exec()</b>.
</P>
<P>
@@ -2182,11 +2223,11 @@ the end of the group if FOO succeeds); on failure the matcher skips to the
second alternative and tries COND2, without backtracking into COND1. If (*THEN)
is used outside of any alternation, it acts exactly like (*PRUNE).
</P>
-<br><a name="SEC25" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC26" href="#TOC1">SEE ALSO</a><br>
<P>
<b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3), <b>pcre</b>(3).
</P>
-<br><a name="SEC26" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC27" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
@@ -2195,11 +2236,11 @@ University Computing Service
Cambridge CB2 3QH, England.
<br>
</P>
-<br><a name="SEC27" href="#TOC1">REVISION</a><br>
+<br><a name="SEC28" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 17 September 2007
+Last updated: 19 April 2008
<br>
-Copyright &copy; 1997-2007 University of Cambridge.
+Copyright &copy; 1997-2008 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE index page</a>.