summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorReid Barton <rwbarton@gmail.com>2015-07-23 11:43:07 +0200
committerBen Gamari <ben@smart-cactus.org>2015-07-23 11:43:08 +0200
commite78841b518ee9c0b92437899c3a4a2307dfd4ac8 (patch)
tree2e9362b2bb1cb874861bec85e4591209853145e8
parent76e2341afdc050549067a18cac41373f64daf4c2 (diff)
downloadhaskell-e78841b518ee9c0b92437899c3a4a2307dfd4ac8.tar.gz
Update encoding001 to test the full range of non-surrogate code points
GHC has used surrogate code points for roundtripping since 7.4. See Note [Roundtripping]. Also, improve the wording of that Note slightly. Test Plan: validate still passes Reviewers: austin, hvr, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1087
-rw-r--r--libraries/base/GHC/IO/Encoding/Failure.hs9
-rw-r--r--libraries/base/tests/IO/encoding001.hs9
2 files changed, 6 insertions, 12 deletions
diff --git a/libraries/base/GHC/IO/Encoding/Failure.hs b/libraries/base/GHC/IO/Encoding/Failure.hs
index df5a99235a..3f9360d731 100644
--- a/libraries/base/GHC/IO/Encoding/Failure.hs
+++ b/libraries/base/GHC/IO/Encoding/Failure.hs
@@ -74,21 +74,22 @@ data CodingFailureMode
-- unicode input that includes lone surrogate codepoints is invalid by
-- definition.
--
+--
-- When we used private-use characters there was a technical problem when it
-- came to encoding back to bytes using iconv. The iconv code will not fail when
-- it tries to encode a private-use character (as it would if trying to encode
--- a surrogate), which means that we won't get a chance to replace it
+-- a surrogate), which means that we wouldn't get a chance to replace it
-- with the byte we originally escaped.
--
-- To work around this, when filling the buffer to be encoded (in
-- writeBlocks/withEncodedCString/newEncodedCString), we replaced the
-- private-use characters with lone surrogates again! Likewise, when
--- reading from a buffer (unpack/unpack_nl/peekEncodedCString) we have
+-- reading from a buffer (unpack/unpack_nl/peekEncodedCString) we had
-- to do the inverse process.
--
-- The user of String would never see these lone surrogates, but it
--- ensures that iconv will throw an error when encountering them. We
--- use lone surrogates in the range 0xDC00 to 0xDCFF for this purpose.
+-- ensured that iconv will throw an error when encountering them. We
+-- used lone surrogates in the range 0xDC00 to 0xDCFF for this purpose.
codingFailureModeSuffix :: CodingFailureMode -> String
codingFailureModeSuffix ErrorOnCodingFailure = ""
diff --git a/libraries/base/tests/IO/encoding001.hs b/libraries/base/tests/IO/encoding001.hs
index 9480abb09d..c92f8a3ef5 100644
--- a/libraries/base/tests/IO/encoding001.hs
+++ b/libraries/base/tests/IO/encoding001.hs
@@ -29,14 +29,7 @@ main = do
chr (fromIntegral (x `shiftR` 8) .&. 0xff),
chr (fromIntegral x .&. 0xff) ]
hPutStr h (concatMap expand32 [ 0, 32 .. 0xD7ff ])
- -- We avoid the private-use characters at 0xEF00..0xEFFF
- -- that reserved for GHC's PEP383 roundtripping implementation.
- --
- -- The reason is that currently normal text containing those
- -- characters will be mangled, even if we aren't using an encoding
- -- created using //ROUNDTRIP.
- hPutStr h (concatMap expand32 [ 0xE000, 0xE000+32 .. 0xEEFF ])
- hPutStr h (concatMap expand32 [ 0xF000, 0xF000+32 .. 0x10FFFF ])
+ hPutStr h (concatMap expand32 [ 0xE000, 0xE000+32 .. 0x10FFFF ])
hClose h
-- convert the UTF-32BE file into each other encoding