summaryrefslogtreecommitdiff
path: root/pod/perldiag.pod
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2011-01-09 13:50:18 -0700
committerKarl Williamson <public@khwilliamson.com>2011-01-09 19:29:02 -0700
commit949cf4983af707fbd15e422845f4f3df20505f97 (patch)
treed317093ddaeba7799370f31a8ee4537edce8d090 /pod/perldiag.pod
parent6ee84de2b1afaa2b442cdbaa59f3cf83e3a562e1 (diff)
downloadperl-949cf4983af707fbd15e422845f4f3df20505f97.tar.gz
utf8.c(): Default to allow problematic code points
Surrogates, non-character code points, and code points that aren't in Unicode are now allowed by default, instead of having to specify a flag to allow them. (Most code did specify those flags anyway.) This affects uvuni_to_utf8_flags(), utf8n_to_uvuni() and various routines that are specialized interfaces to them. Now there is a new set of flags to disallow those code points. Further, all 66 of the non-character code points are known about and handled consistently, instead of just U+FFFF. Code that requires these code points to be forbidden will have to change to use the new flags. I have looked at all the (few) instances in CPAN where these routines are used, and the only one I found that appears to have need to do this, Encode, has already been patched to accommodate this change. Of course, I may have overlooked some subtleties.
Diffstat (limited to 'pod/perldiag.pod')
-rw-r--r--pod/perldiag.pod19
1 files changed, 10 insertions, 9 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index c88df90590..2c5a6377b8 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -5208,15 +5208,16 @@ C<HERE> was retained; anything to the right was discarded.
=item Unicode surrogate U+%X is illegal in UTF-8
-=item UTF-16 surrogate 0x%x
-
-(W utf8) You tried to generate half of a UTF-16 surrogate by
-requesting a Unicode character between the code points 0xD800 and
-0xDFFF (inclusive). That range is reserved exclusively for the use of
-UTF-16 encoding (by having two 16-bit UCS-2 characters); but Perl
-encodes its characters in UTF-8, so what you got is a very illegal
-character. If you really really know what you are doing you can turn off
-this warning by C<no warnings 'utf8';>.
+=item UTF-16 surrogate U+%X
+
+(W utf8) You had a UTF-16 surrogate in a context where they are
+not considered acceptable. These code points, between U+D800 and
+U+DFFF (inclusive), are used by Unicode only for UTF-16. However, Perl
+internally allows all unsigned integer code points (up to the size limit
+available on your platform), including surrogates. But these can cause
+problems when being input or output, which is likely where this message
+came from. If you really really know what you are doing you can turn
+off this warning by C<no warnings 'utf8';>.
=item Value of %s can be "0"; test with defined()