diff options
author | Adam Hupp <adam@hupp.org> | 2020-11-06 10:43:39 -0800 |
---|---|---|
committer | Adam Hupp <adam@hupp.org> | 2020-11-06 10:43:39 -0800 |
commit | a74c994b704d3476e2054cc6332c0a4c49ea1c69 (patch) | |
tree | 2583a12cf9c7a3889958edff6c550a89ec5708f2 | |
parent | 77b8cbea6ceffecb4cbc471d1e8fa22843389439 (diff) | |
download | python-magic-a74c994b704d3476e2054cc6332c0a4c49ea1c69.tar.gz |
Handle undecodable characters in description
We've historically expected that the return values from libmagic are
ascii, since they are constant strings or stuff like dates/numbers.
In some cases, however, it will return information like the title of
the document in the doc's native character set, which is unknown to
us. This produces decode errors.
I have not been able to get a document that triggers this behavior,
but the safest change is to decode with 'backslashreplace', which
escapes undecodable characters with a backslash.
-rw-r--r-- | magic.py | 3 |
1 files changed, 2 insertions, 1 deletions
@@ -239,7 +239,8 @@ def maybe_decode(s): if str == bytes: return s else: - return s.decode('utf-8') + # backslashreplace here because sometimes + return s.decode('utf-8', 'backslashreplace') def coerce_filename(filename): |