diff options
author | Karl Williamson <khw@khw-desktop.(none)> | 2010-02-19 14:42:16 -0700 |
---|---|---|
committer | Steve Hay <steve.m.hay@googlemail.com> | 2010-02-20 10:59:24 +0000 |
commit | c3c4140635dd08363a20c93a8c8b6d8e7464b891 (patch) | |
tree | 182f6a8aec2ef6ffd6a7bbacd9ae8db2c2052f20 /pod | |
parent | 749123ff5f0f5da3f3eb842fc225137c6821a6fe (diff) | |
download | perl-c3c4140635dd08363a20c93a8c8b6d8e7464b891.tar.gz |
Improve handling of qq(\N{...}); and /x
It is possible to bypass the lexer's parsing of \N. This patch causes
the regex compiler to deal with that better. The compiler no longer
assumes that the lexer parsed the \N. It generates an error message if
the \N isn't in a form it is expecting, and invalid hexadecimal digits
are now fatal errors, with the position of the error more clearly
marked.
The diagnostic pod has been updated to reflect the new error messages,
with some slight clarifications to the previous ones as well.
Diffstat (limited to 'pod')
-rw-r--r-- | pod/perldiag.pod | 54 |
1 files changed, 43 insertions, 11 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod index 4a1288955e..95b45f761d 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -1912,7 +1912,7 @@ about 250 characters for simple names, and somewhat more for compound names (like C<$A::B>). You've exceeded Perl's limits. Future versions of Perl are likely to eliminate these arbitrary limitations. -=item Ignoring zero length \N{} in character class" +=item Ignoring zero length \N{} in character class (W) Named Unicode character escapes (\N{...}) may return a zero length sequence. When such an escape is used in a character class @@ -2474,7 +2474,10 @@ immediately after the switch, without intervening spaces. =item Missing braces on \N{} (F) Wrong syntax of character name literal C<\N{charname}> within -double-quotish context. +double-quotish context. This can also happen when there is a space (or +comment) between the C<\N> and the C<{> in a regex with the C</x> modifier. +This modifier does not change the requirement that the brace immediately follow +the C<\N>. =item Missing comma after first argument to %s function @@ -2524,18 +2527,18 @@ double-quoted strings and regular expression patterns. In patterns, it doesn't have the meaning an unescaped C<*> does. Starting in Perl 5.12.0, C<\N> also can have an additional meaning (only) in -patterns, namely to match a non-newline character. (This is like C<.> but is -not affected by the C</s> modifier.) +patterns, namely to match a non-newline character. (This is short for +C<[^\n]>, and like C<.> but is not affected by the C</s> regex modifier.) This can lead to some ambiguities. When C<\N> is not followed immediately by a -left brace, Perl assumes the "match non-newline character" meaning. Also, if +left brace, Perl assumes the C<[^\n]> meaning. Also, if the braces form a valid quantifier such as C<\N{3}> or C<\N{5,}>, Perl assumes that this means to match the given quantity of non-newlines (in these examples, 3; and 5 or more, respectively). In all other case, where there is a C<\N{> and a matching C<}>, Perl assumes that a character name is desired. However, if there is no matching C<}>, Perl doesn't know if it was mistakenly -omitted, or if "match non-newline" followed by "match a C<{>" was desired, and +omitted, or if C<[^\n]{> was desired, and raises this error. If you meant the former, add the right brace; if you meant the latter, escape the brace with a backslash, like so: C<\N\{> @@ -2626,10 +2629,38 @@ local() if you want to localize a package variable. =item \\N in a character class must be a named character: \\N{...} -The new (5.12) meaning of C<\N> to match non-newlines is not valid in a -bracketed character class, for the same reason that C<.> in a character class -loses its specialness: it matches almost everything, which is probably not what -you want. +(F) The new (5.12) meaning of C<\N> as C<[^\n]> is not valid in a bracketed +character class, for the same reason that C<.> in a character class loses its +specialness: it matches almost everything, which is probably not what you want. + +=item \\N{NAME} must be resolved by the lexer + +(F) When compiling a regex pattern, an unresolved named character or sequence +was encountered. This can happen in any of several ways that bypass the lexer, +such as using single-quotish context: + + $re = '\N{SPACE}'; # Wrong! + /$re/; + +Instead, use double-quotes: + + $re = "\N{SPACE}"; # ok + /$re/; + +The lexer can be bypassed as well by creating the pattern from smaller +components: + + $re = '\N'; + /${re}{SPACE}/; # Wrong! + +It's not a good idea to split a construct in the middle like this, and it +doesn't work here. Instead use the solution above. + +Finally, the message also can happen under the C</x> regex modifier when the +C<\N> is separated by spaces from the C<{>, in which case, remove the spaces. + + /\N {SPACE}/x; # Wrong! + /\N{SPACE}/x; # ok =item Name "%s::%s" used only once: possible typo @@ -2646,7 +2677,8 @@ will not trigger this warning. =item Invalid hexadecimal number in \\N{U+...} (F) The character constant represented by C<...> is not a valid hexadecimal -number. +number. Either it is empty, or you tried to use a character other than 0 - 9 +or A - F, a - f in a hexadecimal number. =item Negative '/' count in unpack |