diff options
author | Marc Abramowitz <marc@marc-abramowitz.com> | 2015-04-30 17:39:24 -0700 |
---|---|---|
committer | Marc Abramowitz <marc@marc-abramowitz.com> | 2015-04-30 17:39:24 -0700 |
commit | fa100c92c06d3a8a61a0dda1a2e06018437b09c6 (patch) | |
tree | a1cc50f93fbf257685c3849e03496c5e33949281 /tests/test_request_form.py | |
download | paste-git-fa100c92c06d3a8a61a0dda1a2e06018437b09c6.tar.gz |
test_wsgirequest_charset: Use UTF-8 instead of iso-8859-1test_wsgirequest_charset_use_UTF-8_instead_of_iso-8859-1
because it seems that the defacto standard for encoding URIs is to use UTF-8.
I've been reading about url encoding and it seems like perhaps using an
encoding other than UTF-8 is very non-standard and not well-supported (this
test is trying to use `iso-8859-1`).
From http://en.wikipedia.org/wiki/Percent-encoding
> For a non-ASCII character, it is typically converted to its byte sequence in
> UTF-8, and then each byte value is represented as above.
> The generic URI syntax mandates that new URI schemes that provide for the
> representation of character data in a URI must, in effect, represent
> characters from the unreserved set without translation, and should convert
> all other characters to bytes according to UTF-8, and then percent-encode
> those values. This requirement was introduced in January 2005 with the
> publication of RFC 3986
From http://tools.ietf.org/html/rfc3986:
> Non-ASCII characters must first be encoded according to UTF-8 [STD63], and
> then each octet of the corresponding UTF-8 sequence must be percent-encoded
> to be represented as URI characters. URI producing applications must not use
> percent-encoding in host unless it is used to represent a UTF-8 character
> sequence.
From http://tools.ietf.org/html/rfc3987:
> Conversions from URIs to IRIs MUST NOT use any character encoding other than
> UTF-8 in steps 3 and 4, even if it might be possible to guess from the
> context that another character encoding than UTF-8 was used in the URI. For
> example, the URI "http://www.example.org/r%E9sum%E9.html" might with some
> guessing be interpreted to contain two e-acute characters encoded as
> iso-8859-1. It must not be converted to an IRI containing these e-acute
> characters. Otherwise, in the future the IRI will be mapped to
> "http://www.example.org/r%C3%A9sum%C3%A9.html", which is a different URI from
> "http://www.example.org/r%E9sum%E9.html".
See issue #7, which I think this at least partially fixes.
Diffstat (limited to 'tests/test_request_form.py')
-rw-r--r-- | tests/test_request_form.py | 36 |
1 files changed, 36 insertions, 0 deletions
diff --git a/tests/test_request_form.py b/tests/test_request_form.py new file mode 100644 index 0000000..cf43721 --- /dev/null +++ b/tests/test_request_form.py @@ -0,0 +1,36 @@ +import six + +from paste.request import * +from paste.util.multidict import MultiDict + +def test_parse_querystring(): + e = {'QUERY_STRING': 'a=1&b=2&c=3&b=4'} + d = parse_querystring(e) + assert d == [('a', '1'), ('b', '2'), ('c', '3'), ('b', '4')] + assert e['paste.parsed_querystring'] == ( + (d, e['QUERY_STRING'])) + e = {'QUERY_STRING': 'a&b&c=&d=1'} + d = parse_querystring(e) + assert d == [('a', ''), ('b', ''), ('c', ''), ('d', '1')] + +def make_post(body): + e = { + 'CONTENT_TYPE': 'application/x-www-form-urlencoded', + 'CONTENT_LENGTH': str(len(body)), + 'REQUEST_METHOD': 'POST', + 'wsgi.input': six.BytesIO(body), + } + return e + +def test_parsevars(): + e = make_post(b'a=1&b=2&c=3&b=4') + #cur_input = e['wsgi.input'] + d = parse_formvars(e) + assert isinstance(d, MultiDict) + assert d == MultiDict([('a', '1'), ('b', '2'), ('c', '3'), ('b', '4')]) + assert e['paste.parsed_formvars'] == ( + (d, e['wsgi.input'])) + # XXX: http://trac.pythonpaste.org/pythonpaste/ticket/125 + #assert e['wsgi.input'] is not cur_input + #cur_input.seek(0) + #assert e['wsgi.input'].read() == cur_input.read() |