diff options
author | Karl Williamson <khw@cpan.org> | 2019-03-14 11:48:11 -0600 |
---|---|---|
committer | Steve Hay <steve.m.hay@googlemail.com> | 2019-04-05 17:51:48 +0100 |
commit | a278791cdb58f3c735071800cce0e927b4f4b72a (patch) | |
tree | df6650160c98804bb9ad4ae22f05e15f66fc23ac /pod | |
parent | 0a42cc2422c0013fd499b5cc33654466d7bb1286 (diff) | |
download | perl-a278791cdb58f3c735071800cce0e927b4f4b72a.tar.gz |
Any Common digit set can match in any script
This fixes a design flaw in script runs that in 5.30 effectively
prevented digits from the Common script except the ASCII [0-9] from
being in any meaningful script run.
Diffstat (limited to 'pod')
-rw-r--r-- | pod/perldelta.pod | 21 | ||||
-rw-r--r-- | pod/perlre.pod | 19 |
2 files changed, 26 insertions, 14 deletions
diff --git a/pod/perldelta.pod b/pod/perldelta.pod index 3a40c91660..471fe4c2a6 100644 --- a/pod/perldelta.pod +++ b/pod/perldelta.pod @@ -14,9 +14,19 @@ L<perl5281delta>, which describes differences between 5.28.0 and 5.28.1. =head1 Incompatible Changes -There are no changes intentionally incompatible with 5.28.1. If any exist, -they are bugs, and we request that you submit a report. See L</Reporting -Bugs> below. +=head2 Any set of digits in the Common script are legal in a script run +of another script + +There are several sets of digits in the Common script. C<[0-9]> is the +most familiar. But there are also C<[\x{FF10}-\x{FF19}]> (FULLWIDTH +DIGIT ZERO - FULLWIDTH DIGIT NINE), and several sets for use in +mathematical notation, such as the MATHEMATICAL DOUBLE-STRUCK DIGITs. +Any of these sets should be able to appear in script runs of, say, +Greek. But the design of 5.30 overlooked all but the ASCII digits +C<[0-9]>, so the design was flawed. This has been fixed, so is both a +bug fix and an incompatibility. [perl #133547] + +All digits in a run still have to come from the same set of ten digits. =head1 Modules and Pragmata @@ -113,6 +123,11 @@ perl if compilation continued. L<[perl #132158]|https://rt.perl.org/Ticket/Display.html?id=132158> +=item * + +See L</Any set of digits in the Common script are legal in a script run +of another script>. + =back =head1 Acknowledgements diff --git a/pod/perlre.pod b/pod/perlre.pod index 70c53f1536..c587437c75 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -2529,15 +2529,12 @@ characters from their native scripts and base Chinese. Perl follows Unicode's UTS 39 (L<http://unicode.org/reports/tr39/>) Unicode Security Mechanisms in allowing such mixtures. -The rules used for matching decimal digits are somewhat different. Many +The rules used for matching decimal digits are slightly stricter. Many scripts have their own sets of digits equivalent to the Western C<0> through C<9> ones. A few, such as Arabic, have more than one set. For a string to be considered a script run, all digits in it must come from -the same set, as determined by the first digit encountered. The ASCII -C<[0-9]> are accepted as being in any script, even those that have their -own set. This is because these are often used in commerce even in such -scripts. But any mixing of the ASCII and other digits will cause the -sequence to not be a script run, failing the match. As an example, +the same set of ten, as determined by the first digit encountered. +As an example, qr/(*script_run: \d+ \b )/x @@ -2558,11 +2555,11 @@ accent of some type. These are considered to be in the script of the master character, and so never cause a script run to not match. The other one is "Common". This consists of mostly punctuation, emoji, -and characters used in mathematics and music, and the ASCII digits C<0> -through C<9>. These characters can appear intermixed in text in many of -the world's scripts. These also don't cause a script run to not match, -except any ASCII digits encountered have to obey the decimal digit rules -described above. +and characters used in mathematics and music, the ASCII digits C<0> +through C<9>, and full-width forms of these digits. These characters +can appear intermixed in text in many of the world's scripts. These +also don't cause a script run to not match. But like other scripts, all +digits in a run must come from the same set of 10. This construct is non-capturing. You can add parentheses to I<pattern> to capture, if desired. You will have to do this if you plan to use |