diff options
author | aa-turner <aa-turner@929543f6-e4f2-0310-98a6-ba3bd3dd1d04> | 2022-11-16 19:55:59 +0000 |
---|---|---|
committer | aa-turner <aa-turner@929543f6-e4f2-0310-98a6-ba3bd3dd1d04> | 2022-11-16 19:55:59 +0000 |
commit | 924d4f7f374709f545d42a99ef7c32e26b58ad29 (patch) | |
tree | 831e374a6286179728367e1369cdf600821feea1 /docutils/tools/dev/generate_punctuation_chars.py | |
parent | 415c15d7d91d116056d8ceb4d9fab3b8eeaaef62 (diff) | |
download | docutils-924d4f7f374709f545d42a99ef7c32e26b58ad29.tar.gz |
Fix ``punctuation_chars`` regeneration for tests
Unicode version 7.0 introduced U+2E42, "DOUBLE LOW-REVERSED-9 QUOTATION
MARK", in the Ps category. This codepoint doesn't have a pair
corresponding to it in the Pe category, breaking the assumptions in
the ``openers`` and ``closers`` strings. The only usages I have been
able to identify are in Old Hungarian, where the character does not
seem to have been used as a quotation mark, so the concept of an
opening character and corresponding closing character do not easily
fit. As the simplest fix to allow tests to pass following
regeneration of the ``punctuation_chars`` file without manual editing,
we remove the U+2E42 character from the Ps category.
git-svn-id: https://svn.code.sf.net/p/docutils/code/trunk@9253 929543f6-e4f2-0310-98a6-ba3bd3dd1d04
Diffstat (limited to 'docutils/tools/dev/generate_punctuation_chars.py')
-rwxr-xr-x | docutils/tools/dev/generate_punctuation_chars.py | 7 |
1 files changed, 7 insertions, 0 deletions
diff --git a/docutils/tools/dev/generate_punctuation_chars.py b/docutils/tools/dev/generate_punctuation_chars.py index 11c0108d0..5a7bf9842 100755 --- a/docutils/tools/dev/generate_punctuation_chars.py +++ b/docutils/tools/dev/generate_punctuation_chars.py @@ -213,6 +213,13 @@ def character_category_patterns(): # 301F LOW DOUBLE PRIME QUOTATION MARK misses the opening pendant: ucharlists['Ps'].insert(ucharlists['Pe'].index('\u301f'), '\u301d') + # 2E42 DOUBLE LOW-REVERSED-9 QUOTATION MARK has no pair, and the only + # usages identified thus far are in old hungarian, where it doesn't seem to + # be used as a quoting character. Remove from openers (Ps) for now, for + # simplicity. + # https://www.unicode.org/L2/L2012/12168r-n4268r-oldhungarian.pdf#page=26 + ucharlists['Ps'].remove('⹂') + # print(''.join(ucharlists['Ps']).encode('utf-8') # print(''.join(ucharlists['Pe']).encode('utf-8') # print(''.join(ucharlists['Pi']).encode('utf-8') |