From ba210ebec161cde003bc967e8e460c72f71fb70c Mon Sep 17 00:00:00 2001
From: Jarkko Hietaniemi <jhi@iki.fi>
Date: Tue, 24 Oct 2000 02:55:33 +0000
Subject: Make the UTF-8 decoding stricter and more verbose when malformation
 happens.  This involved adding an argument to utf8_to_uv_chk(), which
 involved changing its prototype, and prefer STRLEN over I32 for the UTF-8
 length, which as a domino effect necessitated changing the prototypes of
 scan_bin(), scan_oct(), scan_hex(), and reg_uni(). The stricter UTF-8
 decoding checking uses Markus Kuhn's UTF-8 Decode Stress Tester from
 http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt

p4raw-id: //depot/perl@7416
---
 pod/perlunicode.pod | 5 +++++
 1 file changed, 5 insertions(+)

(limited to 'pod/perlunicode.pod')
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index c9954d8e96..145c953099 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -71,6 +71,11 @@ on Windows.
 Regardless of the above, the C<bytes> pragma can always be used to force
 byte semantics in a particular lexical scope.  See L<bytes>.
 
+One effect of the C<utf8> pragma is that the internal UTF-8 decoding
+becomes stricter so that the character 0xFFFF (UTF-8 bytes 0xEF 0xBF
+0xBF), and the bytes 0xFE and 0xFF, start to cause warnings if they
+appear in the data.
+
 The C<utf8> pragma is primarily a compatibility device that enables
 recognition of UTF-8 in literals encountered by the parser.  It may also
 be used for enabling some of the more experimental Unicode support features.
-- 
cgit v1.2.1