summaryrefslogtreecommitdiff
path: root/lib/feature.pm
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2022-03-16 19:11:08 -0600
committerKarl Williamson <khw@cpan.org>2022-03-19 23:17:51 -0600
commit1df06faef81d6e2c5662cfb5d6cfc0844c38766e (patch)
tree3b56aea1f63f56697d9df59148f2d9f6f5a38b72 /lib/feature.pm
parent835f2666d2ae366f7af912303f061f066b8376c4 (diff)
downloadperl-1df06faef81d6e2c5662cfb5d6cfc0844c38766e.tar.gz
unicode_constants.pl: Refactor to catch more paired delims
Previously, only characters that Unicode included in its bidirectional algorithm have been eligible to be found by this program to be mirrored string delimiters. This commit adds 5 quotation marker character pairs that are omitted from the bidirectional algorithm, as most quotes are, because, as the Standard says, their "directionality and pairing status is less predictable than paired brackets." But we're not particularly interested in those semantics, most string delimiters will be selected only for their visual appearance. Because they aren't in the bidi algorithm, there is no property that maps one member of a pair to its mate. However, Two characters whose names pair only by LEFT vs RIGHT are almost certainly a mirrored pair. This doesn't catch all possibilities; future commits will expand the ones caught. The commit refactors things so as to make future commits easier which look at even more delimiter possibilities.
Diffstat (limited to 'lib/feature.pm')
-rw-r--r--lib/feature.pm5
1 files changed, 5 insertions, 0 deletions
diff --git a/lib/feature.pm b/lib/feature.pm
index f6764f8ec9..a619a32492 100644
--- a/lib/feature.pm
+++ b/lib/feature.pm
@@ -478,6 +478,10 @@ The complete list of accepted paired delimiters as of Unicode 14.0 is:
༼ ༽ U+0F3C, U+0F3D TIBETAN MARK ANG KHANG GYON, TIBETAN MARK ANG
KHANG GYAS
᚛ ᚜ U+169B, U+169C OGHAM FEATHER MARK, OGHAM REVERSED FEATHER MARK
+ ‘ ’ U+2018, U+2019 LEFT/RIGHT SINGLE QUOTATION MARK
+ ’ ‘ U+2019, U+2018 RIGHT/LEFT SINGLE QUOTATION MARK
+ “ ” U+201C, U+201D LEFT/RIGHT DOUBLE QUOTATION MARK
+ ” “ U+201D, U+201C RIGHT/LEFT DOUBLE QUOTATION MARK
‹ › U+2039, U+203A SINGLE LEFT/RIGHT-POINTING ANGLE QUOTATION MARK
› ‹ U+203A, U+2039 SINGLE RIGHT/LEFT-POINTING ANGLE QUOTATION MARK
⁅ ⁆ U+2045, U+2046 LEFT/RIGHT SQUARE BRACKET WITH QUILL
@@ -549,6 +553,7 @@ The complete list of accepted paired delimiters as of Unicode 14.0 is:
〖 〗 U+3016, U+3017 LEFT/RIGHT WHITE LENTICULAR BRACKET
〘 〙 U+3018, U+3019 LEFT/RIGHT WHITE TORTOISE SHELL BRACKET
〚 〛 U+301A, U+301B LEFT/RIGHT WHITE SQUARE BRACKET
+ ﴿ ﴾ U+FD3F, U+FD3E ORNATE RIGHT/LEFT PARENTHESIS
﹙ ﹚ U+FE59, U+FE5A SMALL LEFT/RIGHT PARENTHESIS
﹛ ﹜ U+FE5B, U+FE5C SMALL LEFT/RIGHT CURLY BRACKET
﹝ ﹞ U+FE5D, U+FE5E SMALL LEFT/RIGHT TORTOISE SHELL BRACKET