Fix a2x option collection from input file with non-ascii encoding (#104)

* Fix a2x Option Collection From Input File With Non-ASCII Encoding As an alternative to passing all of the options on the command line, a2x allows one to specify options in the body of the text of the source file (e.g. lines starting with "//a2x: <option>" will have "<option>" processed as if it was on the command line). Even if a user does not use this feature, a2x will always perform the scan. The code that scans the input file for options does not specify an encoding. So, in those cases where the actual encoding of the file does not match the locale (e.g. input file is UTF-8, yet user has set LC_ALL="C" in the environment), there is a UnicodeDecodeError exception thrown. The fix is to open the file in binary mode when scanning for options. The line will only be attempted to be converted to ASCII if it already matches the regular expression for an a2x option. If the line cannot be converted to ASCII, then a warning message will be issued, and execution will continue. NOTE: This change requires that all a2x options specified on the command line be normal ASCII text (which is most likely everyone's desire anyway). Co-authored-by: Matthew Peveler <matt.peveler@gmail.com>
author: Christopher Kent Hoadley <chris.hoadley@gmail.com> 2020-04-29 20:12:01 -0500
committer: GitHub <noreply@github.com> 2020-04-29 21:12:01 -0400
commit: 665d86fa0003584b6a62da66be56309e59333d79 (patch)
tree: 0dc52bdda20c94510f30c1a0c8c425927d59a995
parent: 30f5fb5ad771d886f2e40e0da7fd8bf273370c7b (diff)
download: asciidoc-py3-665d86fa0003584b6a62da66be56309e59333d79.tar.gz
2 files changed, 12 insertions, 3 deletions
diff --git a/CHANGELOG.txt b/CHANGELOG.txt
index d3d2f5c..31451fd 100644
--- a/CHANGELOG.txt
+++ b/CHANGELOG.txt
@@ -26,6 +26,7 @@ Version 9.0.0 (Unreleased)
 - Fix index terms requiring two characters instead of just one (see https://github.com/asciidoc/asciidoc-py3/pull/2#issuecomment-392605876)
 - Properly capture and use colophon, dedication, and preface for docbooks in Japanese (see https://github.com/asciidoc/asciidoc-py3/pull/2#issuecomment-392623181)
 - make install did not include the unwraplatex.py filter
+- Fix a2x option collection from input file with non-ascii encoding
 
 .Testing
 - Commit generated test files to the repository for continuous integration
diff --git a/a2x.py b/a2x.py
index 1602bf6..881fc4a 100755
--- a/a2x.py
+++ b/a2x.py
@@ -364,11 +364,19 @@ def get_source_options(asciidoc_file):
     result = []
     if os.path.isfile(asciidoc_file):
         options = ''
-        with open(asciidoc_file) as f:
+        with open(asciidoc_file, 'rb') as f:
+            line_number = 0
             for line in f:
-                mo = re.search(r'^//\s*a2x:', line)
+                line_number += 1
+                mo = re.search(b'^//\s*a2x:', line)
                 if mo:
-                    options += ' ' + line[mo.end():].strip()
+                    try:
+                        options += ' ' + line[mo.end():].strip().decode('ascii')
+                    except UnicodeDecodeError as e:
+                        warning(
+                            "Could not decode option to %s " % e.encoding +
+                            "on line %s in %s" % (line_number, asciidoc_file)
+                        )
         parse_options()
     return result
author	Christopher Kent Hoadley <chris.hoadley@gmail.com>	2020-04-29 20:12:01 -0500
committer	GitHub <noreply@github.com>	2020-04-29 21:12:01 -0400
commit	665d86fa0003584b6a62da66be56309e59333d79 (patch)
tree	0dc52bdda20c94510f30c1a0c8c425927d59a995
parent	30f5fb5ad771d886f2e40e0da7fd8bf273370c7b (diff)
download	asciidoc-py3-665d86fa0003584b6a62da66be56309e59333d79.tar.gz