Fix ``punctuation_chars`` regeneration for tests

Unicode version 7.0 introduced U+2E42, "DOUBLE LOW-REVERSED-9 QUOTATION MARK", in the Ps category. This codepoint doesn't have a pair corresponding to it in the Pe category, breaking the assumptions in the ``openers`` and ``closers`` strings. The only usages I have been able to identify are in Old Hungarian, where the character does not seem to have been used as a quotation mark, so the concept of an opening character and corresponding closing character do not easily fit. As the simplest fix to allow tests to pass following regeneration of the ``punctuation_chars`` file without manual editing, we remove the U+2E42 character from the Ps category. git-svn-id: https://svn.code.sf.net/p/docutils/code/trunk@9253 929543f6-e4f2-0310-98a6-ba3bd3dd1d04
author: aa-turner <aa-turner@929543f6-e4f2-0310-98a6-ba3bd3dd1d04> 2022-11-16 19:55:59 +0000
committer: aa-turner <aa-turner@929543f6-e4f2-0310-98a6-ba3bd3dd1d04> 2022-11-16 19:55:59 +0000
commit: 924d4f7f374709f545d42a99ef7c32e26b58ad29 (patch)
tree: 831e374a6286179728367e1369cdf600821feea1 /docutils/tools/dev/generate_punctuation_chars.py
parent: 415c15d7d91d116056d8ceb4d9fab3b8eeaaef62 (diff)
download: docutils-924d4f7f374709f545d42a99ef7c32e26b58ad29.tar.gz
1 files changed, 7 insertions, 0 deletions
diff --git a/docutils/tools/dev/generate_punctuation_chars.py b/docutils/tools/dev/generate_punctuation_chars.py
index 11c0108d0..5a7bf9842 100755
--- a/docutils/tools/dev/generate_punctuation_chars.py
+++ b/docutils/tools/dev/generate_punctuation_chars.py
@@ -213,6 +213,13 @@ def character_category_patterns():
     # 301F  LOW DOUBLE PRIME QUOTATION MARK misses the opening pendant:
     ucharlists['Ps'].insert(ucharlists['Pe'].index('\u301f'), '\u301d')
 
+    # 2E42  DOUBLE LOW-REVERSED-9 QUOTATION MARK has no pair, and the only
+    # usages identified thus far are in old hungarian, where it doesn't seem to
+    # be used as a quoting character. Remove from openers (Ps) for now, for
+    # simplicity.
+    # https://www.unicode.org/L2/L2012/12168r-n4268r-oldhungarian.pdf#page=26
+    ucharlists['Ps'].remove('⹂')
+
     # print(''.join(ucharlists['Ps']).encode('utf-8')
     # print(''.join(ucharlists['Pe']).encode('utf-8')
     # print(''.join(ucharlists['Pi']).encode('utf-8')
author	aa-turner <aa-turner@929543f6-e4f2-0310-98a6-ba3bd3dd1d04>	2022-11-16 19:55:59 +0000
committer	aa-turner <aa-turner@929543f6-e4f2-0310-98a6-ba3bd3dd1d04>	2022-11-16 19:55:59 +0000
commit	924d4f7f374709f545d42a99ef7c32e26b58ad29 (patch)
tree	831e374a6286179728367e1369cdf600821feea1 /docutils/tools/dev/generate_punctuation_chars.py
parent	415c15d7d91d116056d8ceb4d9fab3b8eeaaef62 (diff)
download	docutils-924d4f7f374709f545d42a99ef7c32e26b58ad29.tar.gz