summaryrefslogtreecommitdiff
path: root/regcomp.c
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2011-05-29 09:58:22 -0600
committerKarl Williamson <public@khwilliamson.com>2011-07-03 14:05:46 -0600
commitfa2d2a23f92127e5309991ee009d1b6291408da8 (patch)
treedfb14b1e5a2730720e498e0bc7780fe683f5409c /regcomp.c
parent874816bcbdc3fcae69cb12fbf36a8479cd9e5872 (diff)
downloadperl-fa2d2a23f92127e5309991ee009d1b6291408da8.tar.gz
regcomp.c: Add comments
Diffstat (limited to 'regcomp.c')
-rw-r--r--regcomp.c36
1 files changed, 29 insertions, 7 deletions
diff --git a/regcomp.c b/regcomp.c
index 24f1af3be8..6ed7cc8aa7 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -5824,19 +5824,41 @@ S_reg_scan_name(pTHX_ RExC_state_t *pRExC_state, U32 flags)
/* This section of code defines the inversion list object and its methods. The
* interfaces are highly subject to change, so as much as possible is static to
- * this file. An inversion list is here implemented as a malloc'd C array with
- * some added info that is placed as UVs at the beginning in a header portion.
+ * this file. An inversion list is here implemented as a malloc'd C UV array
+ * with some added info that is placed as UVs at the beginning in a header
+ * portion. An inversion list for Unicode is an array of code points, sorted
+ * by ordinal number. The zeroth element is the first code point in the list.
+ * The 1th element is the first element beyond that not in the list. In other
+ * words, the first range is
+ * invlist[0]..(invlist[1]-1)
+ * The other ranges follow. Thus every element that is divisible by two marks
+ * the beginning of a range that is in the list, and every element not
+ * divisible by two marks the beginning of a range not in the list. A single
+ * element inversion list that contains the single code point N generally
+ * consists of two elements
+ * invlist[0] == N
+ * invlist[1] == N+1
+ * (The exception is when N is the highest representable value on the
+ * machine, in which case the list containing just it would be a single
+ * element, itself. By extension, if the last range in the list extends to
+ * infinity, then the first element of that range will be in the inversion list
+ * at a position that is divisible by two, and is the final element in the
+ * list.)
+ * More about inversion lists can be found in "Unicode Demystified"
+ * Chapter 13 by Richard Gillam, published by Addison-Wesley.
* More will be coming when functionality is added later.
*
- * It is currently implemented as an SV pointing to an array of UVs that the SV
- * thinks are bytes. This allows us to have an array of UV whose memory
- * management is automatically handled by the existing facilities for SV's.
+ * The inversion list data structure is currently implemented as an SV pointing
+ * to an array of UVs that the SV thinks are bytes. This allows us to have an
+ * array of UV whose memory management is automatically handled by the existing
+ * facilities for SV's.
*
* Some of the methods should always be private to the implementation, and some
* should eventually be made public */
-#define INVLIST_LEN_OFFSET 0
-#define INVLIST_ITER_OFFSET 1
+#define INVLIST_LEN_OFFSET 0 /* Number of elements in the inversion list */
+#define INVLIST_ITER_OFFSET 1 /* Current iteration position */
+
#define HEADER_LENGTH (INVLIST_ITER_OFFSET + 1)
/* Internally things are UVs */