diff options
author | Johan Lundberg <lundberg@sunet.se> | 2019-01-21 14:51:34 +0100 |
---|---|---|
committer | Ivan Kanakarakis <ivan.kanak@gmail.com> | 2019-01-25 15:47:00 +0200 |
commit | 56f75da775b01aac7eec18cad3ddd47976ab8312 (patch) | |
tree | 729401c3080af73de0f2d21bbd273ed57de9f545 /src/saml2/sigver.py | |
parent | fbff99e4d3cbd1b53150019f41d88654058bb751 (diff) | |
download | pysaml2-56f75da775b01aac7eec18cad3ddd47976ab8312.tar.gz |
Convert sign_statement result to native string
Using lxml.etree.tostring without encoding in python3 results in a unparsable
xml document. To fix this, we always set the encoding to UTF-8 and omit the xml
declaration. We then convert the result to the native string type before
returning it.
---
Our preferred encoding (in general) is `utf-8`. `lxml` defaults to `ASCII`, or
expects us to provide an encoding. Provided an encoding, `lxml` serializes the
tree-representation of the xml document by encoding it with that encoding. If
it is directed to include an xml declaration, it embeds that encoding in the
xml declaration as the `encoding` property.
(ie, `<?xml version='1.0' encoding='iso-8859-7'?>`)
`lxml` allows for some _special_ values as an encoding.
- In python2 those are: `"unicode"` and `unicode`.
- In python3 those are: `"unicode"` and `str`.
By specifying those values, the result will be _decoded_ from bytes to unicode
("unicode" is not an actual encoding; the actual encoding will be utf-8). The
encoding is already the _type_ of the result. This is why you are not allowed
to have an xml declaration for those cases. The result is not bytes that have
to be read by some encoding rules, but decoded data that their type dictates
how they are managed.
With the latest changes, what we do is:
1. we always encode the result as UTF-8
2. we do not include an xml declaration (because of _(3)_)
3. we convert to the native string type (that is `bytes`/`str` for Python2, and
`str` for Python3 (the equivalent of `unicode` in Python2)
The consumer of the result should expect to treat the result as utf8-encoded
bytes in Python2, and utf8-decoded string in Python3.
Signed-off-by: Ivan Kanakarakis <ivan.kanak@gmail.com>
Diffstat (limited to 'src/saml2/sigver.py')
-rw-r--r-- | src/saml2/sigver.py | 5 |
1 files changed, 4 insertions, 1 deletions
diff --git a/src/saml2/sigver.py b/src/saml2/sigver.py index 6e9ebf9b..0541535a 100644 --- a/src/saml2/sigver.py +++ b/src/saml2/sigver.py @@ -957,7 +957,10 @@ class CryptoBackendXMLSecurity(CryptoBackend): xml = xmlsec.parse_xml(statement) signed = xmlsec.sign(xml, key_file) - return lxml.etree.tostring(signed, xml_declaration=True) + signed_str = lxml.etree.tostring(signed, xml_declaration=False, encoding="UTF-8") + if not isinstance(signed_str, six.string_types): + signed_str = signed_str.decode("utf-8") + return signed_str def validate_signature(self, signedtext, cert_file, cert_type, node_name, node_id, id_attr): """ |