summaryrefslogtreecommitdiff
path: root/docutils/tools/dev/generate_punctuation_chars.py
diff options
context:
space:
mode:
authoraa-turner <aa-turner@929543f6-e4f2-0310-98a6-ba3bd3dd1d04>2022-11-16 19:55:59 +0000
committeraa-turner <aa-turner@929543f6-e4f2-0310-98a6-ba3bd3dd1d04>2022-11-16 19:55:59 +0000
commit924d4f7f374709f545d42a99ef7c32e26b58ad29 (patch)
tree831e374a6286179728367e1369cdf600821feea1 /docutils/tools/dev/generate_punctuation_chars.py
parent415c15d7d91d116056d8ceb4d9fab3b8eeaaef62 (diff)
downloaddocutils-924d4f7f374709f545d42a99ef7c32e26b58ad29.tar.gz
Fix ``punctuation_chars`` regeneration for tests
Unicode version 7.0 introduced U+2E42, "DOUBLE LOW-REVERSED-9 QUOTATION MARK", in the Ps category. This codepoint doesn't have a pair corresponding to it in the Pe category, breaking the assumptions in the ``openers`` and ``closers`` strings. The only usages I have been able to identify are in Old Hungarian, where the character does not seem to have been used as a quotation mark, so the concept of an opening character and corresponding closing character do not easily fit. As the simplest fix to allow tests to pass following regeneration of the ``punctuation_chars`` file without manual editing, we remove the U+2E42 character from the Ps category. git-svn-id: https://svn.code.sf.net/p/docutils/code/trunk@9253 929543f6-e4f2-0310-98a6-ba3bd3dd1d04
Diffstat (limited to 'docutils/tools/dev/generate_punctuation_chars.py')
-rwxr-xr-xdocutils/tools/dev/generate_punctuation_chars.py7
1 files changed, 7 insertions, 0 deletions
diff --git a/docutils/tools/dev/generate_punctuation_chars.py b/docutils/tools/dev/generate_punctuation_chars.py
index 11c0108d0..5a7bf9842 100755
--- a/docutils/tools/dev/generate_punctuation_chars.py
+++ b/docutils/tools/dev/generate_punctuation_chars.py
@@ -213,6 +213,13 @@ def character_category_patterns():
# 301F LOW DOUBLE PRIME QUOTATION MARK misses the opening pendant:
ucharlists['Ps'].insert(ucharlists['Pe'].index('\u301f'), '\u301d')
+ # 2E42 DOUBLE LOW-REVERSED-9 QUOTATION MARK has no pair, and the only
+ # usages identified thus far are in old hungarian, where it doesn't seem to
+ # be used as a quoting character. Remove from openers (Ps) for now, for
+ # simplicity.
+ # https://www.unicode.org/L2/L2012/12168r-n4268r-oldhungarian.pdf#page=26
+ ucharlists['Ps'].remove('⹂')
+
# print(''.join(ucharlists['Ps']).encode('utf-8')
# print(''.join(ucharlists['Pe']).encode('utf-8')
# print(''.join(ucharlists['Pi']).encode('utf-8')