summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAdam Hupp <adam@hupp.org>2020-11-06 10:43:39 -0800
committerAdam Hupp <adam@hupp.org>2020-11-06 10:43:39 -0800
commita74c994b704d3476e2054cc6332c0a4c49ea1c69 (patch)
tree2583a12cf9c7a3889958edff6c550a89ec5708f2
parent77b8cbea6ceffecb4cbc471d1e8fa22843389439 (diff)
downloadpython-magic-a74c994b704d3476e2054cc6332c0a4c49ea1c69.tar.gz
Handle undecodable characters in description
We've historically expected that the return values from libmagic are ascii, since they are constant strings or stuff like dates/numbers. In some cases, however, it will return information like the title of the document in the doc's native character set, which is unknown to us. This produces decode errors. I have not been able to get a document that triggers this behavior, but the safest change is to decode with 'backslashreplace', which escapes undecodable characters with a backslash.
-rw-r--r--magic.py3
1 files changed, 2 insertions, 1 deletions
diff --git a/magic.py b/magic.py
index aab7987..92005bd 100644
--- a/magic.py
+++ b/magic.py
@@ -239,7 +239,8 @@ def maybe_decode(s):
if str == bytes:
return s
else:
- return s.decode('utf-8')
+ # backslashreplace here because sometimes
+ return s.decode('utf-8', 'backslashreplace')
def coerce_filename(filename):