summaryrefslogtreecommitdiff
path: root/pod
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2019-03-14 11:48:11 -0600
committerSteve Hay <steve.m.hay@googlemail.com>2019-04-05 17:51:48 +0100
commita278791cdb58f3c735071800cce0e927b4f4b72a (patch)
treedf6650160c98804bb9ad4ae22f05e15f66fc23ac /pod
parent0a42cc2422c0013fd499b5cc33654466d7bb1286 (diff)
downloadperl-a278791cdb58f3c735071800cce0e927b4f4b72a.tar.gz
Any Common digit set can match in any script
This fixes a design flaw in script runs that in 5.30 effectively prevented digits from the Common script except the ASCII [0-9] from being in any meaningful script run.
Diffstat (limited to 'pod')
-rw-r--r--pod/perldelta.pod21
-rw-r--r--pod/perlre.pod19
2 files changed, 26 insertions, 14 deletions
diff --git a/pod/perldelta.pod b/pod/perldelta.pod
index 3a40c91660..471fe4c2a6 100644
--- a/pod/perldelta.pod
+++ b/pod/perldelta.pod
@@ -14,9 +14,19 @@ L<perl5281delta>, which describes differences between 5.28.0 and 5.28.1.
=head1 Incompatible Changes
-There are no changes intentionally incompatible with 5.28.1. If any exist,
-they are bugs, and we request that you submit a report. See L</Reporting
-Bugs> below.
+=head2 Any set of digits in the Common script are legal in a script run
+of another script
+
+There are several sets of digits in the Common script. C<[0-9]> is the
+most familiar. But there are also C<[\x{FF10}-\x{FF19}]> (FULLWIDTH
+DIGIT ZERO - FULLWIDTH DIGIT NINE), and several sets for use in
+mathematical notation, such as the MATHEMATICAL DOUBLE-STRUCK DIGITs.
+Any of these sets should be able to appear in script runs of, say,
+Greek. But the design of 5.30 overlooked all but the ASCII digits
+C<[0-9]>, so the design was flawed. This has been fixed, so is both a
+bug fix and an incompatibility. [perl #133547]
+
+All digits in a run still have to come from the same set of ten digits.
=head1 Modules and Pragmata
@@ -113,6 +123,11 @@ perl if compilation continued.
L<[perl #132158]|https://rt.perl.org/Ticket/Display.html?id=132158>
+=item *
+
+See L</Any set of digits in the Common script are legal in a script run
+of another script>.
+
=back
=head1 Acknowledgements
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 70c53f1536..c587437c75 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -2529,15 +2529,12 @@ characters from their native scripts and base Chinese. Perl follows
Unicode's UTS 39 (L<http://unicode.org/reports/tr39/>) Unicode Security
Mechanisms in allowing such mixtures.
-The rules used for matching decimal digits are somewhat different. Many
+The rules used for matching decimal digits are slightly stricter. Many
scripts have their own sets of digits equivalent to the Western C<0>
through C<9> ones. A few, such as Arabic, have more than one set. For
a string to be considered a script run, all digits in it must come from
-the same set, as determined by the first digit encountered. The ASCII
-C<[0-9]> are accepted as being in any script, even those that have their
-own set. This is because these are often used in commerce even in such
-scripts. But any mixing of the ASCII and other digits will cause the
-sequence to not be a script run, failing the match. As an example,
+the same set of ten, as determined by the first digit encountered.
+As an example,
qr/(*script_run: \d+ \b )/x
@@ -2558,11 +2555,11 @@ accent of some type. These are considered to be in the script of the
master character, and so never cause a script run to not match.
The other one is "Common". This consists of mostly punctuation, emoji,
-and characters used in mathematics and music, and the ASCII digits C<0>
-through C<9>. These characters can appear intermixed in text in many of
-the world's scripts. These also don't cause a script run to not match,
-except any ASCII digits encountered have to obey the decimal digit rules
-described above.
+and characters used in mathematics and music, the ASCII digits C<0>
+through C<9>, and full-width forms of these digits. These characters
+can appear intermixed in text in many of the world's scripts. These
+also don't cause a script run to not match. But like other scripts, all
+digits in a run must come from the same set of 10.
This construct is non-capturing. You can add parentheses to I<pattern>
to capture, if desired. You will have to do this if you plan to use