diff options
author | Christopher Kent Hoadley <chris.hoadley@gmail.com> | 2020-04-29 20:12:01 -0500 |
---|---|---|
committer | GitHub <noreply@github.com> | 2020-04-29 21:12:01 -0400 |
commit | 665d86fa0003584b6a62da66be56309e59333d79 (patch) | |
tree | 0dc52bdda20c94510f30c1a0c8c425927d59a995 | |
parent | 30f5fb5ad771d886f2e40e0da7fd8bf273370c7b (diff) | |
download | asciidoc-py3-665d86fa0003584b6a62da66be56309e59333d79.tar.gz |
Fix a2x option collection from input file with non-ascii encoding (#104)
* Fix a2x Option Collection From Input File With Non-ASCII Encoding
As an alternative to passing all of the options on the command line, a2x allows one to specify options in the body of the text of the source file (e.g. lines starting with "//a2x: <option>" will have "<option>" processed as if it was on the command line). Even if a user does not use this feature, a2x will always perform the scan.
The code that scans the input file for options does not specify an encoding. So, in those cases where the actual encoding of the file does not match the locale (e.g. input file is UTF-8, yet user has set LC_ALL="C" in the environment), there is a UnicodeDecodeError exception thrown.
The fix is to open the file in binary mode when scanning for options. The line will only be attempted to be converted to ASCII if it already matches the regular expression for an a2x option. If the line cannot be converted to ASCII, then a warning message will be issued, and execution will continue.
NOTE: This change requires that all a2x options specified on the command line be normal ASCII text (which is most likely everyone's desire anyway).
Co-authored-by: Matthew Peveler <matt.peveler@gmail.com>
-rw-r--r-- | CHANGELOG.txt | 1 | ||||
-rwxr-xr-x | a2x.py | 14 |
2 files changed, 12 insertions, 3 deletions
diff --git a/CHANGELOG.txt b/CHANGELOG.txt index d3d2f5c..31451fd 100644 --- a/CHANGELOG.txt +++ b/CHANGELOG.txt @@ -26,6 +26,7 @@ Version 9.0.0 (Unreleased) - Fix index terms requiring two characters instead of just one (see https://github.com/asciidoc/asciidoc-py3/pull/2#issuecomment-392605876) - Properly capture and use colophon, dedication, and preface for docbooks in Japanese (see https://github.com/asciidoc/asciidoc-py3/pull/2#issuecomment-392623181) - make install did not include the unwraplatex.py filter +- Fix a2x option collection from input file with non-ascii encoding .Testing - Commit generated test files to the repository for continuous integration @@ -364,11 +364,19 @@ def get_source_options(asciidoc_file): result = [] if os.path.isfile(asciidoc_file): options = '' - with open(asciidoc_file) as f: + with open(asciidoc_file, 'rb') as f: + line_number = 0 for line in f: - mo = re.search(r'^//\s*a2x:', line) + line_number += 1 + mo = re.search(b'^//\s*a2x:', line) if mo: - options += ' ' + line[mo.end():].strip() + try: + options += ' ' + line[mo.end():].strip().decode('ascii') + except UnicodeDecodeError as e: + warning( + "Could not decode option to %s " % e.encoding + + "on line %s in %s" % (line_number, asciidoc_file) + ) parse_options() return result |