diff options
author | Karl Williamson <khw@cpan.org> | 2015-12-21 13:37:20 -0700 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2015-12-22 13:16:51 -0700 |
commit | e2f5e63d76a696ba4c441146b555fcbb3ac3c077 (patch) | |
tree | ef23063af35f565e595162e9053c36cd91a07ac8 /regcomp.h | |
parent | 6f2a89df6e4e9fc861c86650325fb46a163e4cd0 (diff) | |
download | perl-e2f5e63d76a696ba4c441146b555fcbb3ac3c077.tar.gz |
regcomp.h: reword some comments
Diffstat (limited to 'regcomp.h')
-rw-r--r-- | regcomp.h | 65 |
1 files changed, 32 insertions, 33 deletions
@@ -403,7 +403,7 @@ struct regnode_ssc { * 2) A subset of item 1) is if all possible code points outside the bitmap * match. This is a common occurrence when the class is complemented, * like /[^ij]/. Therefore a bit is reserved to indicate this, - * ANYOF_MATCHES_ALL_ABOVE_BITMAP. If it became necessary, this bit could + * ANYOF_MATCHES_ALL_ABOVE_BITMAP. If it became necessary, this flag could * be replaced by using the normal swash mechanism, but with a performance * penalty. * 3) Under /d rules, it can happen that code points that are in the upper @@ -420,7 +420,7 @@ struct regnode_ssc { * A swash could be created for this case, but this is relatively common, * and it turns out that it's all or nothing: if any one of these code * points matches, they all do. Hence a single bit suffices. We use a - * shared bit that doesn't take up space by itself: + * shared flag that doesn't take up space by itself: * ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER. * This also implies 1), with one exception: [:^cntrl:]. * 5) A user-defined \p{} property may not have been defined by the time the @@ -433,11 +433,11 @@ struct regnode_ssc { * is a better way to accomplish what this feature does. This case also * implies 1). * ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP - * is the shared bit. + * is the shared flag. * 6) /[foo]/il may have folds that are only valid if the runtime locale is a * UTF-8 one. These are quite rare, so it would be good to avoid the * expense of looking for them. But /l matching is slow anyway, and we've - * traditionally not worried to much about its performance. And this + * traditionally not worried too much about its performance. And this * condition requires the ANYOF_LOC_FOLD flag to be set, so testing for * that flag would be sufficient to rule out most cases of this. So it is * unclear if this should have a flag or not. But, one is currently @@ -445,13 +445,13 @@ struct regnode_ssc { * text below indicates how to share it, should another bit be needed). * * At the moment, there are no spare bits, but this could be changed by various - * tricks. Notice that item 6) is not independent of the ANYOF_LOC_FOLD flag - * below. Also, the ANYOF_LOC_REQ_UTF8 flag is set only if both these aren't. - * We can therefore use a 2-bit field to represent these 3 flags, as follows: - * 00 => ANYOF_LOC_REQ_UTF8 - * 01 => no folding - * 10 => ANYOF_LOC_FOLD alone - * 11 => ANYOF_ONLY_UTF8_LOC_FOLD_MATCHES + * tricks. + * + * Note that item ANYOF_ONLY_UTF8_LOC_FOLD_MATCHES is not independent of the + * ANYOF_LOC_FOLD flag below. Also, the ANYOF_LOC_REQ_UTF8 flag is set only if + * both these aren't. We can therefore share ANYOF_ONLY_UTF8_LOC_FOLD_MATCHES + * with ANYOF_LOC_REQ_UTF8, so what the shared flag means depends on the + * ANYOF_LOC_FOLD flag. * * Beyond that, note that the information may be conveyed by creating new * regnode types. This is not the best solution, as shown later in this @@ -459,31 +459,30 @@ struct regnode_ssc { * for ANYOF_INVERT, for example. A complication of this is that the regexec.c * REGINCLASS macro assumes that it can just use the bitmap if no flags are * set. This would have to be changed to add extra tests for the node type, or - * a special bit reserved that means unspecified special handling, and then the + * a special flag reserved that means unspecified special handling, and then the * node-type would be used internally to sort that out. So we could gain a bit - * by having an ANYOF_SPECIAL bit, and a node type for INVERT, and another for + * by having an ANYOF_SPECIAL flag, and a node type for INVERT, and another for * POSIXL, and still another for INVERT_POSIXL. This example illustrates one * problem with this, a combinatorial explosion of node types. The one node * type khw can think of that doesn't have this explosion issue is - * ANYOF_LOC_REQ_UTF8, but you'd do this only if you haven't done the 2-bit - * field trick above. This bit is a natural candidate for being a separate + * ANYOF_LOC_REQ_UTF8. This flag is a natural candidate for being a separate * node type because it is a specialization of the current ANYOFL, and because - * no other ANYOFL-only bits are set when it is; also most of its uses are + * no other ANYOFL-only flags are set when it is; also most of its uses are * actually outside the reginclass() function, so this could be done with no - * performance penalty. But again, the 2-bit field trick combines this bit so - * it doesn't take up space anyway. Another issue when turning a bit into a - * node type, is that a SSC may use that bit -- not just a regular ANYOF[DL]?. - * In the case of ANYOF_LOC_REQ_UTF8, the only likely problem is accurately - * settting the SSC node-type to the new one, which would likely involve - * S_ssc_or and S_ssc_and, and not how the SSC currently gets set to ANYOFL. + * performance penalty. But since it can be shared, as noted above, it doesn't + * take up space anyway. Another issue when turning a flag into a node type, is + * that a SSC may use that flag -- not just a regular ANYOF[DL]?. In the case + * of ANYOF_LOC_REQ_UTF8, the only likely problem is accurately settting the + * SSC node-type to the new one, which would likely involve S_ssc_or and + * S_ssc_and, and not how the SSC currently gets set to ANYOFL. * - * Another possibility is to instead rename the ANYOF_POSIXL bit to be + * Another possibility is to instead rename the ANYOF_POSIXL flag to be * ANYOFL_LARGE, to mean that the ANYOF node has an extra 32 bits beyond what a * regular one does. That's what it effectively means now, with the extra - * space all for the POSIX class bits. But those classes actually only occupy - * 30 bits, so the 2-bit field or 2 of the locale bits could be moved to that - * extra space. The downside of this is that ANYOFL nodes with whichever of - * the bits get moved would have to have the extra space always allocated. + * space all for the POSIX class flags. But those classes actually only occupy + * 30 bits, so 2 of the locale flags could be moved to that extra space. The + * downside of this is that ANYOFL nodes with whichever of the flags get moved + * would have to have the extra space always allocated. * * One could completely remove ANYOFL_LARGE and make all ANYOFL nodes large. * The 30 bits in the extra word would indicate if a posix class should be @@ -491,12 +490,12 @@ struct regnode_ssc { * anyway, and the SSC could be set to this node type. REGINCLASS would have * to be modified so that if the node type were this, it would call * reginclass(), as the flag bit that indicates to do this now would be gone. - * If the 2-bit field is used and moved to the larger structure, this would - * free up a total of 4 bits. If this were done, we could create an - * ANYOF_INVERT node-type without a combinatorial explosion, getting us to 5 - * bits. And, keep in mind that ANYOF_MATCHES_ALL_ABOVE_BITMAP is solely for - * performance, so could be removed. The other performance-related bits are - * shareable with bits that are required. + * If 2 locale flags are moved to the larger structure, this would free up a + * total of 4 bits. If this were done, we could create an ANYOF_INVERT + * node-type without a combinatorial explosion, getting us to 5 bits. And, + * keep in mind that ANYOF_MATCHES_ALL_ABOVE_BITMAP is solely for performance, + * so could be removed. The other performance-related flags are shareable with + * flags that are required. * * Several flags are not used in synthetic start class (SSC) nodes, so could be * shared should new flags be needed for SSCs, like SSC_MATCHES_EMPTY_STRING |