summaryrefslogtreecommitdiff
path: root/regcomp.h
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2015-12-21 13:37:20 -0700
committerKarl Williamson <khw@cpan.org>2015-12-22 13:16:51 -0700
commite2f5e63d76a696ba4c441146b555fcbb3ac3c077 (patch)
treeef23063af35f565e595162e9053c36cd91a07ac8 /regcomp.h
parent6f2a89df6e4e9fc861c86650325fb46a163e4cd0 (diff)
downloadperl-e2f5e63d76a696ba4c441146b555fcbb3ac3c077.tar.gz
regcomp.h: reword some comments
Diffstat (limited to 'regcomp.h')
-rw-r--r--regcomp.h65
1 files changed, 32 insertions, 33 deletions
diff --git a/regcomp.h b/regcomp.h
index 5c12a214db..a8955f3b95 100644
--- a/regcomp.h
+++ b/regcomp.h
@@ -403,7 +403,7 @@ struct regnode_ssc {
* 2) A subset of item 1) is if all possible code points outside the bitmap
* match. This is a common occurrence when the class is complemented,
* like /[^ij]/. Therefore a bit is reserved to indicate this,
- * ANYOF_MATCHES_ALL_ABOVE_BITMAP. If it became necessary, this bit could
+ * ANYOF_MATCHES_ALL_ABOVE_BITMAP. If it became necessary, this flag could
* be replaced by using the normal swash mechanism, but with a performance
* penalty.
* 3) Under /d rules, it can happen that code points that are in the upper
@@ -420,7 +420,7 @@ struct regnode_ssc {
* A swash could be created for this case, but this is relatively common,
* and it turns out that it's all or nothing: if any one of these code
* points matches, they all do. Hence a single bit suffices. We use a
- * shared bit that doesn't take up space by itself:
+ * shared flag that doesn't take up space by itself:
* ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER.
* This also implies 1), with one exception: [:^cntrl:].
* 5) A user-defined \p{} property may not have been defined by the time the
@@ -433,11 +433,11 @@ struct regnode_ssc {
* is a better way to accomplish what this feature does. This case also
* implies 1).
* ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP
- * is the shared bit.
+ * is the shared flag.
* 6) /[foo]/il may have folds that are only valid if the runtime locale is a
* UTF-8 one. These are quite rare, so it would be good to avoid the
* expense of looking for them. But /l matching is slow anyway, and we've
- * traditionally not worried to much about its performance. And this
+ * traditionally not worried too much about its performance. And this
* condition requires the ANYOF_LOC_FOLD flag to be set, so testing for
* that flag would be sufficient to rule out most cases of this. So it is
* unclear if this should have a flag or not. But, one is currently
@@ -445,13 +445,13 @@ struct regnode_ssc {
* text below indicates how to share it, should another bit be needed).
*
* At the moment, there are no spare bits, but this could be changed by various
- * tricks. Notice that item 6) is not independent of the ANYOF_LOC_FOLD flag
- * below. Also, the ANYOF_LOC_REQ_UTF8 flag is set only if both these aren't.
- * We can therefore use a 2-bit field to represent these 3 flags, as follows:
- * 00 => ANYOF_LOC_REQ_UTF8
- * 01 => no folding
- * 10 => ANYOF_LOC_FOLD alone
- * 11 => ANYOF_ONLY_UTF8_LOC_FOLD_MATCHES
+ * tricks.
+ *
+ * Note that item ANYOF_ONLY_UTF8_LOC_FOLD_MATCHES is not independent of the
+ * ANYOF_LOC_FOLD flag below. Also, the ANYOF_LOC_REQ_UTF8 flag is set only if
+ * both these aren't. We can therefore share ANYOF_ONLY_UTF8_LOC_FOLD_MATCHES
+ * with ANYOF_LOC_REQ_UTF8, so what the shared flag means depends on the
+ * ANYOF_LOC_FOLD flag.
*
* Beyond that, note that the information may be conveyed by creating new
* regnode types. This is not the best solution, as shown later in this
@@ -459,31 +459,30 @@ struct regnode_ssc {
* for ANYOF_INVERT, for example. A complication of this is that the regexec.c
* REGINCLASS macro assumes that it can just use the bitmap if no flags are
* set. This would have to be changed to add extra tests for the node type, or
- * a special bit reserved that means unspecified special handling, and then the
+ * a special flag reserved that means unspecified special handling, and then the
* node-type would be used internally to sort that out. So we could gain a bit
- * by having an ANYOF_SPECIAL bit, and a node type for INVERT, and another for
+ * by having an ANYOF_SPECIAL flag, and a node type for INVERT, and another for
* POSIXL, and still another for INVERT_POSIXL. This example illustrates one
* problem with this, a combinatorial explosion of node types. The one node
* type khw can think of that doesn't have this explosion issue is
- * ANYOF_LOC_REQ_UTF8, but you'd do this only if you haven't done the 2-bit
- * field trick above. This bit is a natural candidate for being a separate
+ * ANYOF_LOC_REQ_UTF8. This flag is a natural candidate for being a separate
* node type because it is a specialization of the current ANYOFL, and because
- * no other ANYOFL-only bits are set when it is; also most of its uses are
+ * no other ANYOFL-only flags are set when it is; also most of its uses are
* actually outside the reginclass() function, so this could be done with no
- * performance penalty. But again, the 2-bit field trick combines this bit so
- * it doesn't take up space anyway. Another issue when turning a bit into a
- * node type, is that a SSC may use that bit -- not just a regular ANYOF[DL]?.
- * In the case of ANYOF_LOC_REQ_UTF8, the only likely problem is accurately
- * settting the SSC node-type to the new one, which would likely involve
- * S_ssc_or and S_ssc_and, and not how the SSC currently gets set to ANYOFL.
+ * performance penalty. But since it can be shared, as noted above, it doesn't
+ * take up space anyway. Another issue when turning a flag into a node type, is
+ * that a SSC may use that flag -- not just a regular ANYOF[DL]?. In the case
+ * of ANYOF_LOC_REQ_UTF8, the only likely problem is accurately settting the
+ * SSC node-type to the new one, which would likely involve S_ssc_or and
+ * S_ssc_and, and not how the SSC currently gets set to ANYOFL.
*
- * Another possibility is to instead rename the ANYOF_POSIXL bit to be
+ * Another possibility is to instead rename the ANYOF_POSIXL flag to be
* ANYOFL_LARGE, to mean that the ANYOF node has an extra 32 bits beyond what a
* regular one does. That's what it effectively means now, with the extra
- * space all for the POSIX class bits. But those classes actually only occupy
- * 30 bits, so the 2-bit field or 2 of the locale bits could be moved to that
- * extra space. The downside of this is that ANYOFL nodes with whichever of
- * the bits get moved would have to have the extra space always allocated.
+ * space all for the POSIX class flags. But those classes actually only occupy
+ * 30 bits, so 2 of the locale flags could be moved to that extra space. The
+ * downside of this is that ANYOFL nodes with whichever of the flags get moved
+ * would have to have the extra space always allocated.
*
* One could completely remove ANYOFL_LARGE and make all ANYOFL nodes large.
* The 30 bits in the extra word would indicate if a posix class should be
@@ -491,12 +490,12 @@ struct regnode_ssc {
* anyway, and the SSC could be set to this node type. REGINCLASS would have
* to be modified so that if the node type were this, it would call
* reginclass(), as the flag bit that indicates to do this now would be gone.
- * If the 2-bit field is used and moved to the larger structure, this would
- * free up a total of 4 bits. If this were done, we could create an
- * ANYOF_INVERT node-type without a combinatorial explosion, getting us to 5
- * bits. And, keep in mind that ANYOF_MATCHES_ALL_ABOVE_BITMAP is solely for
- * performance, so could be removed. The other performance-related bits are
- * shareable with bits that are required.
+ * If 2 locale flags are moved to the larger structure, this would free up a
+ * total of 4 bits. If this were done, we could create an ANYOF_INVERT
+ * node-type without a combinatorial explosion, getting us to 5 bits. And,
+ * keep in mind that ANYOF_MATCHES_ALL_ABOVE_BITMAP is solely for performance,
+ * so could be removed. The other performance-related flags are shareable with
+ * flags that are required.
*
* Several flags are not used in synthetic start class (SSC) nodes, so could be
* shared should new flags be needed for SSCs, like SSC_MATCHES_EMPTY_STRING