diff options
author | Reid Barton <rwbarton@gmail.com> | 2015-07-23 11:43:07 +0200 |
---|---|---|
committer | Ben Gamari <ben@smart-cactus.org> | 2015-07-23 11:43:08 +0200 |
commit | e78841b518ee9c0b92437899c3a4a2307dfd4ac8 (patch) | |
tree | 2e9362b2bb1cb874861bec85e4591209853145e8 | |
parent | 76e2341afdc050549067a18cac41373f64daf4c2 (diff) | |
download | haskell-e78841b518ee9c0b92437899c3a4a2307dfd4ac8.tar.gz |
Update encoding001 to test the full range of non-surrogate code points
GHC has used surrogate code points for roundtripping since 7.4.
See Note [Roundtripping].
Also, improve the wording of that Note slightly.
Test Plan: validate still passes
Reviewers: austin, hvr, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1087
-rw-r--r-- | libraries/base/GHC/IO/Encoding/Failure.hs | 9 | ||||
-rw-r--r-- | libraries/base/tests/IO/encoding001.hs | 9 |
2 files changed, 6 insertions, 12 deletions
diff --git a/libraries/base/GHC/IO/Encoding/Failure.hs b/libraries/base/GHC/IO/Encoding/Failure.hs index df5a99235a..3f9360d731 100644 --- a/libraries/base/GHC/IO/Encoding/Failure.hs +++ b/libraries/base/GHC/IO/Encoding/Failure.hs @@ -74,21 +74,22 @@ data CodingFailureMode -- unicode input that includes lone surrogate codepoints is invalid by -- definition. -- +-- -- When we used private-use characters there was a technical problem when it -- came to encoding back to bytes using iconv. The iconv code will not fail when -- it tries to encode a private-use character (as it would if trying to encode --- a surrogate), which means that we won't get a chance to replace it +-- a surrogate), which means that we wouldn't get a chance to replace it -- with the byte we originally escaped. -- -- To work around this, when filling the buffer to be encoded (in -- writeBlocks/withEncodedCString/newEncodedCString), we replaced the -- private-use characters with lone surrogates again! Likewise, when --- reading from a buffer (unpack/unpack_nl/peekEncodedCString) we have +-- reading from a buffer (unpack/unpack_nl/peekEncodedCString) we had -- to do the inverse process. -- -- The user of String would never see these lone surrogates, but it --- ensures that iconv will throw an error when encountering them. We --- use lone surrogates in the range 0xDC00 to 0xDCFF for this purpose. +-- ensured that iconv will throw an error when encountering them. We +-- used lone surrogates in the range 0xDC00 to 0xDCFF for this purpose. codingFailureModeSuffix :: CodingFailureMode -> String codingFailureModeSuffix ErrorOnCodingFailure = "" diff --git a/libraries/base/tests/IO/encoding001.hs b/libraries/base/tests/IO/encoding001.hs index 9480abb09d..c92f8a3ef5 100644 --- a/libraries/base/tests/IO/encoding001.hs +++ b/libraries/base/tests/IO/encoding001.hs @@ -29,14 +29,7 @@ main = do chr (fromIntegral (x `shiftR` 8) .&. 0xff), chr (fromIntegral x .&. 0xff) ] hPutStr h (concatMap expand32 [ 0, 32 .. 0xD7ff ]) - -- We avoid the private-use characters at 0xEF00..0xEFFF - -- that reserved for GHC's PEP383 roundtripping implementation. - -- - -- The reason is that currently normal text containing those - -- characters will be mangled, even if we aren't using an encoding - -- created using //ROUNDTRIP. - hPutStr h (concatMap expand32 [ 0xE000, 0xE000+32 .. 0xEEFF ]) - hPutStr h (concatMap expand32 [ 0xF000, 0xF000+32 .. 0x10FFFF ]) + hPutStr h (concatMap expand32 [ 0xE000, 0xE000+32 .. 0x10FFFF ]) hClose h -- convert the UTF-32BE file into each other encoding |