summaryrefslogtreecommitdiff
path: root/pod/perlunicode.pod
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2015-09-10 22:31:39 -0600
committerKarl Williamson <khw@cpan.org>2015-09-11 09:40:39 -0600
commit6b5cf123f371e012d9812b37b13d50c6e06bf555 (patch)
tree236af9c72384ad78bf168b40fd51baa962b4b07c /pod/perlunicode.pod
parent2efb8b4b644d5f3c28974a8f577081b4142decd2 (diff)
downloadperl-6b5cf123f371e012d9812b37b13d50c6e06bf555.tar.gz
pods: Discourage use of 'In' prefix for Unicode Block property
This changes perluniprops to not list the equivalent 'In' single form method of specifying the Block property, and to discourage its use. The reason is that this is a Perl extension, the use of which is unstable. A future Unicode release could take over the 'In...' name for a new purpose, and perl would follow along, breaking the code that assumed the former meaning. Unicode does not know about this Perl extension, and they wouldn't care if they did know. The reason I'm doing this now is that the latest Unicode version introduced some properties whose names begin with 'In', though no conflicts arose. But it is clear that such conflicts could arise in the future. So the documentation only is changed to warn people of this potential. perlunicode is update accordingly.
Diffstat (limited to 'pod/perlunicode.pod')
-rw-r--r--pod/perlunicode.pod52
1 files changed, 28 insertions, 24 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index 545adf5f20..a407faf306 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -697,30 +697,34 @@ with the nuts and bolts of Unicode.
Block names are matched in the compound form, like C<\p{Block: Arrows}> or
C<\p{Blk=Hebrew}>. Unlike most other properties, only a few block names have a
-Unicode-defined short name. But Perl does provide a (slight, no longer
-recommended) shortcut: You can say, for example C<\p{In_Arrows}> or
-C<\p{In_Hebrew}>.
-
-For backwards compatibility, the C<In> prefix may be
-omitted if there is no naming conflict with a script or any other
-property, and you can even use an C<Is> prefix instead in those cases.
-But don't do this for new code because your code could break in new
-releases, and this has already happened: There was a time in very
-early Unicode releases when C<\p{Hebrew}> would have matched the
-I<block> Hebrew; now it doesn't.
-
-Using the C<In> prefix avoids this ambiguity, so far. But new versions
-of Unicode continue to add new properties whose names begin with C<In>.
-There is a possibility that one of them someday will conflict with your
-usage. Since this is just a Perl extension, Unicode's name will take
-precedence and your code will become broken. Also, Unicode is free to
-add a script whose name begins with C<In>; that would cause problems.
-
-So it's clearer and best to use the compound form when specifying
-blocks. And be sure that is what you really really want to do. In most
-cases scripts are what you want instead.
-
-A complete list of blocks and their shortcuts is in L<perluniprops>.
+Unicode-defined short name.
+
+Perl also defines single form synonyms for the block property in cases
+where these do not conflict with something else. But don't use any of
+these, because they are unstable. Since these are Perl extensions, they
+are subordinate to official Unicode property names; Unicode doesn't know
+nor care about Perl's extensions. It may happen that a name that
+currently means the Perl extension will later be changed without warning
+to mean a different Unicode property in a future version of the perl
+interpreter that uses a later Unicode release, and your code would no
+longer work. The extensions are mentioned here for completeness: Take
+the block name and prefix it with one of: C<In> (for example
+C<\p{Blk=Arrows}> can currently be written as C<\p{In_Arrows}>); or
+sometimes C<Is> (like C<\p{Is_Arrows}>); or sometimes no prefix at all
+(C<\p{Arrows}>). As of this writing (Unicode 8.0) there are no
+conflicts with using the C<In_> prefix, but there are plenty with the
+other two forms. For example, C<\p{Is_Hebrew}> and C<\p{Hebrew}> mean
+C<\p{Script=Hebrew}> which is NOT the same thing as C<\p{Blk=Hebrew}>. Our
+advice used to be to use the C<In_> prefix as a single form way of
+specifying a block. But Unicode 8.0 added properties whose names begin
+with C<In>, and it's now clear that it's only luck that's so far
+prevented a conflict. Using C<In> is only marginally less typing than
+C<Blk:>, and the latter's meaning is clearer anyway, and guaranteed to
+never conflict. So don't take chances. Use C<\p{Blk=foo}> for new
+code. And be sure that block is what you really really want to do. In
+most cases scripts are what you want instead.
+
+A complete list of blocks is in L<perluniprops>.
=head3 B<Other Properties>