Subject: [perl #58182] partial: Add uni \s,\w matching

This commit causes regex sequences \b, \s, and \w (and complements) to match in the latin1 range in the scope of feature 'unicode_strings' or with the /u regex modifier. It uses the previously unused flags field in the respective regnodes to indicate the type of matching, and in regexec.c, uses that to decide which of the handy.h macros to use, native or Latin1. I chose this for now rather than create new nodes for each type of match. An earlier version of this patch did that, and in every case the switch case: statements were adjacent, offering no performance advantage. If regexec were modified to use in-line functions or more macros for various short section of it, then it would be faster to have new nodes rather than using the flags field. But, using that field simplified things, as this change flies under the radar in a number of places where it would not if separate nodes were used.
author: Karl Williamson <public@khwilliamson.com> 2010-09-23 23:36:40 -0600
committer: Jesse Vincent <jesse@bestpractical.com> 2010-10-15 23:14:29 +0900
commit: a12cf05f80a65e40fe339b086ab2d10e18d838c1 (patch)
tree: bd1254d24bac6bb121801a2a06d01c7e17703b92 /pod/perlunicode.pod
parent: bdc22dd52e899130c8c4111c985fcbd7eec164a5 (diff)
download: perl-a12cf05f80a65e40fe339b086ab2d10e18d838c1.tar.gz
1 files changed, 8 insertions, 5 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index fc6a8a907c..8ff5bb0653 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -1509,17 +1509,20 @@ ASCII range (except in a locale), along with Perl's desire to add Unicode
 support seamlessly.  The result wasn't seamless: these characters were
 orphaned.
 
-Work is being done to correct this, but only some of it was complete in time
-for the 5.12 release.  What has been finished is the important part of the case
+Work is being done to correct this, but only some of it is complete.
+What has been finished is the matching of C<\b>, C<\s>, C<\w> and their
+complements in regular expressions, and the important part of the case
 changing component.  Due to concerns, and some evidence, that older code might
 have come to rely on the existing behavior, the new behavior must be explicitly
 enabled by the feature C<unicode_strings> in the L<feature> pragma, even though
 no new syntax is involved.
 
 See L<perlfunc/lc> for details on how this pragma works in combination with
-various others for casing.  Even though the pragma only affects casing
-operations in the 5.12 release, it is planned to have it affect all the
-problematic behaviors in later releases: you can't have one without them all.
+various others for casing.
+
+Even though the implementation is incomplete, it is planned to have this
+pragma affect all the problematic behaviors in later releases: you can't
+have one without them all.
 
 In the meantime, a workaround is to always call utf8::upgrade($string), or to
 use the standard module L<Encode>.   Also, a scalar that has any characters
author	Karl Williamson <public@khwilliamson.com>	2010-09-23 23:36:40 -0600
committer	Jesse Vincent <jesse@bestpractical.com>	2010-10-15 23:14:29 +0900
commit	a12cf05f80a65e40fe339b086ab2d10e18d838c1 (patch)
tree	bd1254d24bac6bb121801a2a06d01c7e17703b92 /pod/perlunicode.pod
parent	bdc22dd52e899130c8c4111c985fcbd7eec164a5 (diff)
download	perl-a12cf05f80a65e40fe339b086ab2d10e18d838c1.tar.gz