perlreguts: Bring up-to-date

Various changes have been made to regcomp.c that didn't make it into perlreguts until now.
author: Karl Williamson <public@khwilliamson.com> 2013-09-12 19:42:51 -0600
committer: Karl Williamson <public@khwilliamson.com> 2013-09-24 11:36:11 -0600
commit: c8849eb1c2c14b1c9a128a8f8a696ae1eac43f63 (patch)
tree: 90c3275ec01c423695636c0b5883dde3eeee3567 /pod/perlreguts.pod
parent: bd650281b855f8970df792fc7c58cffd5bfe272e (diff)
download: perl-c8849eb1c2c14b1c9a128a8f8a696ae1eac43f63.tar.gz
1 files changed, 12 insertions, 36 deletions
diff --git a/pod/perlreguts.pod b/pod/perlreguts.pod
index 039b48c82d..d93c799ce6 100644
--- a/pod/perlreguts.pod
+++ b/pod/perlreguts.pod
@@ -168,23 +168,29 @@ multiple of four bytes:
 
 =item C<regnode_charclass>
 
-Bracketed character classes are represented by C<regnode_charclass> structures,
-which have a four-byte argument and then a 32-byte (256-bit) bitmap
-indicating which characters are included in the class.
+Bracketed character classes are represented by C<regnode_charclass>
+structures, which have a four-byte argument and then a 32-byte (256-bit)
+bitmap indicating which characters in the Latin1 range are included in
+the class.
 
     regnode_charclass        U32 arg1;
                              char bitmap[ANYOF_BITMAP_SIZE];
 
+Various flags whose names begin with C<ANYOF_> are used for special
+situations.  Above Latin1 matches and things not known until run-time
+are stored in L</Perl's pprivate structure>.
+
 =item C<regnode_charclass_class>
 
 There is also a larger form of a char class structure used to represent
-POSIX char classes called C<regnode_charclass_class> which has an
-additional 4-byte (32-bit) bitmap indicating which POSIX char classes
+POSIX char classes under C</l> matching,
+called C<regnode_charclass_class> which has an
+additional 32-bit bitmap indicating which POSIX char classes
 have been included.
 
    regnode_charclass_class  U32 arg1;
                             char bitmap[ANYOF_BITMAP_SIZE];
-                            char classflags[ANYOF_CLASSBITMAP_SIZE];
+                            U32 classflags;
 
 =back
 
@@ -761,36 +767,6 @@ Care must be taken when making changes to make sure that you handle
 UTF-8 properly, both at compile time and at execution time, including
 when the string and pattern are mismatched.
 
-The following comment in F<regcomp.h> gives an example of exactly how
-tricky this can be:
-
-    Two problematic code points in Unicode casefolding of EXACT nodes:
-
-    U+0390 - GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
-    U+03B0 - GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
-
-    which casefold to
-
-    Unicode                      UTF-8
-
-    U+03B9 U+0308 U+0301         0xCE 0xB9 0xCC 0x88 0xCC 0x81
-    U+03C5 U+0308 U+0301         0xCF 0x85 0xCC 0x88 0xCC 0x81
-
-    This means that in case-insensitive matching (or "loose matching",
-    as Unicode calls it), an EXACTF of length six (the UTF-8 encoded
-    byte length of the above casefolded versions) can match a target
-    string of length two (the byte length of UTF-8 encoded U+0390 or
-    U+03B0). This would rather mess up the minimum length computation.
-
-    What we'll do is to look for the tail four bytes, and then peek
-    at the preceding two bytes to see whether we need to decrease
-    the minimum length by four (six minus two).
-
-    Thanks to the design of UTF-8, there cannot be false matches:
-    A sequence of valid UTF-8 bytes cannot be a subsequence of
-    another valid sequence of UTF-8 bytes.
-
-
 =head2 Base Structures
 
 The C<regexp> structure described in L<perlreapi> is common to all
author	Karl Williamson <public@khwilliamson.com>	2013-09-12 19:42:51 -0600
committer	Karl Williamson <public@khwilliamson.com>	2013-09-24 11:36:11 -0600
commit	c8849eb1c2c14b1c9a128a8f8a696ae1eac43f63 (patch)
tree	90c3275ec01c423695636c0b5883dde3eeee3567 /pod/perlreguts.pod
parent	bd650281b855f8970df792fc7c58cffd5bfe272e (diff)
download	perl-c8849eb1c2c14b1c9a128a8f8a696ae1eac43f63.tar.gz