1 files changed, 28 insertions, 7 deletions
diff --git a/ext/pcre/pcrelib/doc/Tech.Notes b/ext/pcre/pcrelib/doc/Tech.Notes
index dd01932f8d..73c31c7ca1 100644
--- a/ext/pcre/pcrelib/doc/Tech.Notes
+++ b/ext/pcre/pcrelib/doc/Tech.Notes
@@ -48,7 +48,9 @@ These items are all just one byte long
 
   OP_END                 end of pattern
   OP_ANY                 match any character
+  OP_ANYBYTE             match any single byte, even in UTF-8 mode 
   OP_SOD                 match start of data: \A
+  OP_SOM,                start of match (subject + offset): \G
   OP_CIRC                ^ (start of data, or after \n in multiline)
   OP_NOT_WORD_BOUNDARY   \W
   OP_WORD_BOUNDARY       \w
@@ -61,7 +63,6 @@ These items are all just one byte long
   OP_EODN                match end of data or \n at end: \Z
   OP_EOD                 match end of data: \z
   OP_DOLL                $ (end of data, or before \n in multiline)
-  OP_RECURSE             match the pattern recursively
 
 
 Repeating single characters
@@ -119,8 +120,7 @@ instances of OP_CHARS are used.
 Character classes
 -----------------
 
-When characters less than 256 are involved, OP_CLASS is used for a character
-class. If there is only one character, OP_CHARS is used for a positive class,
+If there is only one character, OP_CHARS is used for a positive class,
 and OP_NOT for a negative one (that is, for something like [^a]). However, in 
 UTF-8 mode, this applies only to characters with values < 128, because OP_NOT 
 is confined to single bytes.
@@ -129,9 +129,15 @@ Another set of repeating opcodes (OP_NOTSTAR etc.) are used for a repeated,
 negated, single-character class. The normal ones (OP_STAR etc.) are used for a
 repeated positive single-character class.
 
-OP_CLASS is followed by a 32-byte bit map containing a 1 bit for every
-character that is acceptable. The bits are counted from the least significant
-end of each byte.
+When there's more than one character in a class and all the characters are less
+than 256, OP_CLASS is used for a positive class, and OP_NCLASS for a negative 
+one. In either case, the opcode is followed by a 32-byte bit map containing a 1
+bit for every character that is acceptable. The bits are counted from the least
+significant end of each byte.
+
+The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8 mode, 
+subject characters with values greater than 256 can be handled correctly. For 
+OP_CLASS they don't match, whereas for OP_NCLASS they do.
 
 For classes containing characters with values > 255, OP_XCLASS is used. It
 optionally uses a bit map (if any characters lie within it), followed by a list
@@ -243,6 +249,21 @@ same scheme is used, with a "reference number" of 0xffff. Otherwise, a
 conditional subpattern always starts with one of the assertions.
 
 
+Recursion
+---------
+
+Recursion either matches the current regex, or some subexpression. The opcode
+OP_RECURSE is followed by an value which is the offset to the starting bracket
+from the start of the whole pattern.
+
+
+Callout
+-------
+
+OP_CALLOUT is followed by one byte of data that holds a callout number in the 
+range 0 to 255.
+
+
 Changing options
 ----------------
 
@@ -257,4 +278,4 @@ at compile time, and so does not cause anything to be put into the compiled
 data.
 
 Philip Hazel
-August 2002
+August 2003