diff options
author | Jarkko Hietaniemi <jhi@iki.fi> | 2001-02-09 15:40:21 +0000 |
---|---|---|
committer | Jarkko Hietaniemi <jhi@iki.fi> | 2001-02-09 15:40:21 +0000 |
commit | fb55449c820151ec18475c38cc3361fa88eb0a1b (patch) | |
tree | cd1ca4961163dccd84d54b09577b0bc6f8e5fef9 /pod/perlre.pod | |
parent | cbe1151c894397456eb4168363b69bdac01b932b (diff) | |
download | perl-fb55449c820151ec18475c38cc3361fa88eb0a1b.tar.gz |
Integrate change #8733 from maintperl.
Subject: Re: [PATCH: 5.6.1 trial 2 && perl@8671] some coded char set issues in perlre.pod
p4raw-link: @8733 on //depot/maint-5.6/perl: f672c3f4b56d3d7aaa92fcde49f628e864a3320a
p4raw-id: //depot/perl@8734
p4raw-integrated: from //depot/maint-5.6/perl@8731 'merge in'
pod/perlre.pod (@8630..)
Diffstat (limited to 'pod/perlre.pod')
-rw-r--r-- | pod/perlre.pod | 36 |
1 files changed, 20 insertions, 16 deletions
diff --git a/pod/perlre.pod b/pod/perlre.pod index 2e2f59cdfd..02dd2cda5d 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -40,7 +40,7 @@ is, no matter what C<$*> contains, C</s> without C</m> will force "^" to match only at the beginning of the string and "$" to match only at the end (or just before a newline at the end) of the string. Together, as /ms, they let the "." match any character whatsoever, -while yet allowing "^" and "$" to match, respectively, just after +while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string. =item x @@ -334,12 +334,14 @@ I<backreference>. There is no limit to the number of captured substrings that you may use. However Perl also uses \10, \11, etc. as aliases for \010, -\011, etc. (Recall that 0 means octal, so \011 is the 9'th ASCII -character, a tab.) Perl resolves this ambiguity by interpreting -\10 as a backreference only if at least 10 left parentheses have -opened before it. Likewise \11 is a backreference only if at least -11 left parentheses have opened before it. And so on. \1 through -\9 are always interpreted as backreferences." +\011, etc. (Recall that 0 means octal, so \011 is the character at +number 9 in your coded character set; which would be the 10th character, +a horizontal tab under ASCII.) Perl resolves this +ambiguity by interpreting \10 as a backreference only if at least 10 +left parentheses have opened before it. Likewise \11 is a +backreference only if at least 11 left parentheses have opened +before it. And so on. \1 through \9 are always interpreted as +backreferences. Examples: @@ -955,10 +957,10 @@ escape it with a backslash. "-" is also taken literally when it is at the end of the list, just before the closing "]". (The following all specify the same class of three characters: C<[-az]>, C<[az-]>, and C<[a\-z]>. All are different from C<[a-z]>, which -specifies a class containing twenty-six characters.) -Also, if you try to use the character classes C<\w>, C<\W>, C<\s>, -C<\S>, C<\d>, or C<\D> as endpoints of a range, that's not a range, -the "-" is understood literally. +specifies a class containing twenty-six characters, even on EBCDIC +based coded character sets.) Also, if you try to use the character +classes C<\w>, C<\W>, C<\s>, C<\S>, C<\d>, or C<\D> as endpoints of +a range, that's not a range, the "-" is understood literally. Note also that the whole range idea is rather unportable between character sets--and even within character sets they may cause results @@ -970,11 +972,11 @@ spell out the character sets in full. Characters may be specified using a metacharacter syntax much like that used in C: "\n" matches a newline, "\t" a tab, "\r" a carriage return, "\f" a form feed, etc. More generally, \I<nnn>, where I<nnn> is a string -of octal digits, matches the character whose ASCII value is I<nnn>. -Similarly, \xI<nn>, where I<nn> are hexadecimal digits, matches the -character whose ASCII value is I<nn>. The expression \cI<x> matches the -ASCII character control-I<x>. Finally, the "." metacharacter matches any -character except "\n" (unless you use C</s>). +of octal digits, matches the character whose coded character set value +is I<nnn>. Similarly, \xI<nn>, where I<nn> are hexadecimal digits, +matches the character whose numeric value is I<nn>. The expression \cI<x> +matches the character control-I<x>. Finally, the "." metacharacter +matches any character except "\n" (unless you use C</s>). You can specify a series of alternatives for a pattern using "|" to separate them, so that C<fee|fie|foe> will match any of "fee", "fie", @@ -1278,5 +1280,7 @@ L<perlfunc/pos>. L<perllocale>. +L<perlebcdic>. + I<Mastering Regular Expressions> by Jeffrey Friedl, published by O'Reilly and Associates. |