utf8.c(): Default to allow problematic code points

Surrogates, non-character code points, and code points that aren't in Unicode are now allowed by default, instead of having to specify a flag to allow them. (Most code did specify those flags anyway.) This affects uvuni_to_utf8_flags(), utf8n_to_uvuni() and various routines that are specialized interfaces to them. Now there is a new set of flags to disallow those code points. Further, all 66 of the non-character code points are known about and handled consistently, instead of just U+FFFF. Code that requires these code points to be forbidden will have to change to use the new flags. I have looked at all the (few) instances in CPAN where these routines are used, and the only one I found that appears to have need to do this, Encode, has already been patched to accommodate this change. Of course, I may have overlooked some subtleties.
author: Karl Williamson <public@khwilliamson.com> 2011-01-09 13:50:18 -0700
committer: Karl Williamson <public@khwilliamson.com> 2011-01-09 19:29:02 -0700
commit: 949cf4983af707fbd15e422845f4f3df20505f97 (patch)
tree: d317093ddaeba7799370f31a8ee4537edce8d090 /pod/perldiag.pod
parent: 6ee84de2b1afaa2b442cdbaa59f3cf83e3a562e1 (diff)
download: perl-949cf4983af707fbd15e422845f4f3df20505f97.tar.gz
1 files changed, 10 insertions, 9 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index c88df90590..2c5a6377b8 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -5208,15 +5208,16 @@ C<HERE> was retained; anything to the right was discarded.
 
 =item Unicode surrogate U+%X is illegal in UTF-8
 
-=item UTF-16 surrogate 0x%x
-
-(W utf8) You tried to generate half of a UTF-16 surrogate by
-requesting a Unicode character between the code points 0xD800 and
-0xDFFF (inclusive).  That range is reserved exclusively for the use of
-UTF-16 encoding (by having two 16-bit UCS-2 characters); but Perl
-encodes its characters in UTF-8, so what you got is a very illegal
-character.  If you really really know what you are doing you can turn off
-this warning by C<no warnings 'utf8';>.
+=item UTF-16 surrogate U+%X
+
+(W utf8) You had a UTF-16 surrogate in a context where they are
+not considered acceptable.  These code points, between U+D800 and
+U+DFFF (inclusive), are used by Unicode only for UTF-16.  However, Perl
+internally allows all unsigned integer code points (up to the size limit
+available on your platform), including surrogates.  But these can cause
+problems when being input or output, which is likely where this message
+came from.  If you really really know what you are doing you can turn
+off this warning by C<no warnings 'utf8';>.
 
 =item Value of %s can be "0"; test with defined()
author	Karl Williamson <public@khwilliamson.com>	2011-01-09 13:50:18 -0700
committer	Karl Williamson <public@khwilliamson.com>	2011-01-09 19:29:02 -0700
commit	949cf4983af707fbd15e422845f4f3df20505f97 (patch)
tree	d317093ddaeba7799370f31a8ee4537edce8d090 /pod/perldiag.pod
parent	6ee84de2b1afaa2b442cdbaa59f3cf83e3a562e1 (diff)
download	perl-949cf4983af707fbd15e422845f4f3df20505f97.tar.gz