diff options
author | Austin Seipp <austin@well-typed.com> | 2015-05-19 04:56:40 -0500 |
---|---|---|
committer | Austin Seipp <austin@well-typed.com> | 2015-05-28 16:19:35 -0500 |
commit | e28462de700240288519a016d0fe44d4360d9ffd (patch) | |
tree | c947a4b0ac30e9b16767dad1bb9f14f1effd439e /libraries/base/GHC/IO/Encoding.hs | |
parent | 640fe14255706ab9c6a1fa101d9b05dfabdc6556 (diff) | |
download | haskell-e28462de700240288519a016d0fe44d4360d9ffd.tar.gz |
base: fix #10298 & #7695
Summary:
This applies a patch from Reid Barton and Sylvain Henry, which fix a
disasterous infinite loop when iconv fails to load locale files, as
specified in #10298.
The fix is a bit of a hack but should be fine - for the actual reasoning
behind it, see `Note [Disaster and iconv]` for more info.
In addition to this fix, we also patch up the IO Encoding utilities to
recognize several variations of the 'ASCII' encoding (including its
aliases) directly so that GHC can do conversions without iconv. This
allows a static binary to sit in an initramfs.
Authored-by: Reid Barton <rwbarton@gmail.com>
Authored-by: Sylvain Henry <hsyl20@gmail.com>
Signed-off-by: Austin Seipp <austin@well-typed.com>
Test Plan: Eyeballed it.
Reviewers: rwbarton, hvr
Subscribers: bgamari, thomie
Differential Revision: https://phabricator.haskell.org/D898
GHC Trac Issues: #10298, #7695
Diffstat (limited to 'libraries/base/GHC/IO/Encoding.hs')
-rw-r--r-- | libraries/base/GHC/IO/Encoding.hs | 14 |
1 files changed, 13 insertions, 1 deletions
diff --git a/libraries/base/GHC/IO/Encoding.hs b/libraries/base/GHC/IO/Encoding.hs index 31683b4e68..014b61b8b0 100644 --- a/libraries/base/GHC/IO/Encoding.hs +++ b/libraries/base/GHC/IO/Encoding.hs @@ -235,7 +235,14 @@ mkTextEncoding e = case mb_coding_failure_mode of _ -> Nothing mkTextEncoding' :: CodingFailureMode -> String -> IO TextEncoding -mkTextEncoding' cfm enc = case [toUpper c | c <- enc, c /= '-'] of +mkTextEncoding' cfm enc + -- First, specifically match on ASCII encodings directly using + -- several possible aliases (specified by RFC 1345 & co), which + -- allows us to handle ASCII conversions without iconv at all (see + -- trac #10298). + | any (== enc) ansiEncNames = return (UTF8.mkUTF8 cfm) + -- Otherwise, handle other encoding needs via iconv. + | otherwise = case [toUpper c | c <- enc, c /= '-'] of "UTF8" -> return $ UTF8.mkUTF8 cfm "UTF16" -> return $ UTF16.mkUTF16 cfm "UTF16LE" -> return $ UTF16.mkUTF16le cfm @@ -249,6 +256,11 @@ mkTextEncoding' cfm enc = case [toUpper c | c <- enc, c /= '-'] of #else _ -> Iconv.mkIconvEncoding cfm enc #endif + where + ansiEncNames = -- ASCII aliases + [ "ANSI_X3.4-1968", "iso-ir-6", "ANSI_X3.4-1986", "ISO_646.irv:1991" + , "US-ASCII", "us", "IBM367", "cp367", "csASCII", "ASCII", "ISO646-US" + ] latin1_encode :: CharBuffer -> Buffer Word8 -> IO (CharBuffer, Buffer Word8) latin1_encode input output = fmap (\(_why,input',output') -> (input',output')) $ Latin1.latin1_encode input output -- unchecked, used for char8 |