diff options
author | Karl Williamson <public@khwilliamson.com> | 2011-01-09 13:50:18 -0700 |
---|---|---|
committer | Karl Williamson <public@khwilliamson.com> | 2011-01-09 19:29:02 -0700 |
commit | 949cf4983af707fbd15e422845f4f3df20505f97 (patch) | |
tree | d317093ddaeba7799370f31a8ee4537edce8d090 /pod/perldiag.pod | |
parent | 6ee84de2b1afaa2b442cdbaa59f3cf83e3a562e1 (diff) | |
download | perl-949cf4983af707fbd15e422845f4f3df20505f97.tar.gz |
utf8.c(): Default to allow problematic code points
Surrogates, non-character code points, and code points that aren't in Unicode
are now allowed by default, instead of having to specify a flag to allow them.
(Most code did specify those flags anyway.)
This affects uvuni_to_utf8_flags(), utf8n_to_uvuni() and various routines that
are specialized interfaces to them.
Now there is a new set of flags to disallow those code points. Further, all 66
of the non-character code points are known about and handled consistently,
instead of just U+FFFF.
Code that requires these code points to be forbidden will have to change to use
the new flags. I have looked at all the (few) instances in CPAN where these
routines are used, and the only one I found that appears to have need to do
this, Encode, has already been patched to accommodate this change. Of course,
I may have overlooked some subtleties.
Diffstat (limited to 'pod/perldiag.pod')
-rw-r--r-- | pod/perldiag.pod | 19 |
1 files changed, 10 insertions, 9 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod index c88df90590..2c5a6377b8 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -5208,15 +5208,16 @@ C<HERE> was retained; anything to the right was discarded. =item Unicode surrogate U+%X is illegal in UTF-8 -=item UTF-16 surrogate 0x%x - -(W utf8) You tried to generate half of a UTF-16 surrogate by -requesting a Unicode character between the code points 0xD800 and -0xDFFF (inclusive). That range is reserved exclusively for the use of -UTF-16 encoding (by having two 16-bit UCS-2 characters); but Perl -encodes its characters in UTF-8, so what you got is a very illegal -character. If you really really know what you are doing you can turn off -this warning by C<no warnings 'utf8';>. +=item UTF-16 surrogate U+%X + +(W utf8) You had a UTF-16 surrogate in a context where they are +not considered acceptable. These code points, between U+D800 and +U+DFFF (inclusive), are used by Unicode only for UTF-16. However, Perl +internally allows all unsigned integer code points (up to the size limit +available on your platform), including surrogates. But these can cause +problems when being input or output, which is likely where this message +came from. If you really really know what you are doing you can turn +off this warning by C<no warnings 'utf8';>. =item Value of %s can be "0"; test with defined() |