test_wsgirequest_charset: Use UTF-8 instead of iso-8859-1test_wsgirequest_charset_use_UTF-8_instead_of_iso-8859-1

because it seems that the defacto standard for encoding URIs is to use UTF-8. I've been reading about url encoding and it seems like perhaps using an encoding other than UTF-8 is very non-standard and not well-supported (this test is trying to use `iso-8859-1`). From http://en.wikipedia.org/wiki/Percent-encoding > For a non-ASCII character, it is typically converted to its byte sequence in > UTF-8, and then each byte value is represented as above. > The generic URI syntax mandates that new URI schemes that provide for the > representation of character data in a URI must, in effect, represent > characters from the unreserved set without translation, and should convert > all other characters to bytes according to UTF-8, and then percent-encode > those values. This requirement was introduced in January 2005 with the > publication of RFC 3986 From http://tools.ietf.org/html/rfc3986: > Non-ASCII characters must first be encoded according to UTF-8 [STD63], and > then each octet of the corresponding UTF-8 sequence must be percent-encoded > to be represented as URI characters. URI producing applications must not use > percent-encoding in host unless it is used to represent a UTF-8 character > sequence. From http://tools.ietf.org/html/rfc3987: > Conversions from URIs to IRIs MUST NOT use any character encoding other than > UTF-8 in steps 3 and 4, even if it might be possible to guess from the > context that another character encoding than UTF-8 was used in the URI. For > example, the URI "http://www.example.org/r%E9sum%E9.html" might with some > guessing be interpreted to contain two e-acute characters encoded as > iso-8859-1. It must not be converted to an IRI containing these e-acute > characters. Otherwise, in the future the IRI will be mapped to > "http://www.example.org/r%C3%A9sum%C3%A9.html", which is a different URI from > "http://www.example.org/r%E9sum%E9.html". See issue #7, which I think this at least partially fixes.
author: Marc Abramowitz <marc@marc-abramowitz.com> 2015-04-30 17:39:24 -0700
committer: Marc Abramowitz <marc@marc-abramowitz.com> 2015-04-30 17:39:24 -0700
commit: fa100c92c06d3a8a61a0dda1a2e06018437b09c6 (patch)
tree: a1cc50f93fbf257685c3849e03496c5e33949281 /tests/test_util/test_quoting.py
download: paste-git-test_wsgirequest_charset_use_UTF-8_instead_of_iso-8859-1.tar.gz
1 files changed, 28 insertions, 0 deletions
diff --git a/tests/test_util/test_quoting.py b/tests/test_util/test_quoting.py
new file mode 100644
index 0000000..5f5e0a8
--- /dev/null
+++ b/tests/test_util/test_quoting.py
@@ -0,0 +1,28 @@
+from paste.util import quoting
+import six
+import unittest
+
+class TestQuoting(unittest.TestCase):
+    def test_html_unquote(self):
+        self.assertEqual(quoting.html_unquote(b'&lt;hey&nbsp;you&gt;'),
+                         u'<hey\xa0you>')
+        self.assertEqual(quoting.html_unquote(b''),
+                         u'')
+        self.assertEqual(quoting.html_unquote(b'&blahblah;'),
+                         u'&blahblah;')
+        self.assertEqual(quoting.html_unquote(b'\xe1\x80\xa9'),
+                         u'\u1029')
+
+    def test_html_quote(self):
+        self.assertEqual(quoting.html_quote(1),
+                         '1')
+        self.assertEqual(quoting.html_quote(None),
+                         '')
+        self.assertEqual(quoting.html_quote('<hey!>'),
+                         '&lt;hey!&gt;')
+        if six.PY3:
+            self.assertEqual(quoting.html_quote(u'<\u1029>'),
+                             u'&lt;\u1029&gt;')
+        else:
+            self.assertEqual(quoting.html_quote(u'<\u1029>'),
+                             '&lt;\xe1\x80\xa9&gt;')
author	Marc Abramowitz <marc@marc-abramowitz.com>	2015-04-30 17:39:24 -0700
committer	Marc Abramowitz <marc@marc-abramowitz.com>	2015-04-30 17:39:24 -0700
commit	fa100c92c06d3a8a61a0dda1a2e06018437b09c6 (patch)
tree	a1cc50f93fbf257685c3849e03496c5e33949281 /tests/test_util/test_quoting.py
download	paste-git-test_wsgirequest_charset_use_UTF-8_instead_of_iso-8859-1.tar.gz