Any Common digit set can match in any script

This fixes a design flaw in script runs that in 5.30 effectively prevented digits from the Common script except the ASCII [0-9] from being in any meaningful script run.
author: Karl Williamson <khw@cpan.org> 2019-03-14 11:48:11 -0600
committer: Karl Williamson <khw@cpan.org> 2019-03-14 12:18:01 -0600
commit: f4e61fc03836484ea88518e8bf04cc1b32a6a1a0 (patch)
tree: 54a697a00fe9ed00a15d86abb46a359b95f7407e /pod/perlre.pod
parent: bfa9f5ee70ce509f0e66dcff9e9fda131ea8a133 (diff)
download: perl-f4e61fc03836484ea88518e8bf04cc1b32a6a1a0.tar.gz
1 files changed, 8 insertions, 11 deletions
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 209cac7f8d..4898f94d9f 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -2550,15 +2550,12 @@ Katakana and Hiragana are commonly mixed together in practice, along
 with some Chinese characters, and hence are treated as being in a single
 script run by Perl.
 
-The rules used for matching decimal digits are somewhat different.  Many
+The rules used for matching decimal digits are slightly stricter.  Many
 scripts have their own sets of digits equivalent to the Western C<0>
 through C<9> ones.  A few, such as Arabic, have more than one set.  For
 a string to be considered a script run, all digits in it must come from
-the same set, as determined by the first digit encountered. The ASCII
-C<[0-9]> are accepted as being in any script, even those that have their
-own set.  This is because these are often used in commerce even in such
-scripts.  But any mixing of the ASCII and other digits will cause the
-sequence to not be a script run, failing the match.  As an example,
+the same set of ten, as determined by the first digit encountered.
+As an example,
 
  qr/(*script_run: \d+ \b )/x
 
@@ -2579,11 +2576,11 @@ accent of some type.  These are considered to be in the script of the
 master character, and so never cause a script run to not match.
 
 The other one is "Common".  This consists of mostly punctuation, emoji,
-and characters used in mathematics and music, and the ASCII digits C<0>
-through C<9>.  These characters can appear intermixed in text in many of
-the world's scripts.  These also don't cause a script run to not match,
-except any ASCII digits encountered have to obey the decimal digit rules
-described above.
+and characters used in mathematics and music, the ASCII digits C<0>
+through C<9>, and full-width forms of these digits.  These characters
+can appear intermixed in text in many of the world's scripts.  These
+also don't cause a script run to not match.  But like other scripts, all
+digits in a run must come from the same set of 10.
 
 This construct is non-capturing.  You can add parentheses to I<pattern>
 to capture, if desired.  You will have to do this if you plan to use
author	Karl Williamson <khw@cpan.org>	2019-03-14 11:48:11 -0600
committer	Karl Williamson <khw@cpan.org>	2019-03-14 12:18:01 -0600
commit	f4e61fc03836484ea88518e8bf04cc1b32a6a1a0 (patch)
tree	54a697a00fe9ed00a15d86abb46a359b95f7407e /pod/perlre.pod
parent	bfa9f5ee70ce509f0e66dcff9e9fda131ea8a133 (diff)
download	perl-f4e61fc03836484ea88518e8bf04cc1b32a6a1a0.tar.gz