summaryrefslogtreecommitdiff
path: root/README.rst
blob: 25e55bd3790162d8bd2c351e6c2ff18beb0f1f04 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
rfc3986
=======

A Python implementation of `RFC 3986`_ including validation and authority 
parsing. Coming soon: `Reference Resolution <http://tools.ietf.org/html/rfc3986#section-5>`_.

Installation
------------

Simply use pip to install ``rfc3986`` like so::

    pip install rfc3986

License
-------

`Apache License Version 2.0`_

Example Usage
-------------

To parse a URI into a convenient named tuple, you can simply::

    from rfc3986 import uri_reference

    example = uri_reference('http://example.com')
    email = uri_reference('mailto:user@domain.com')
    ssh = uri_reference('ssh://user@git.openstack.org:29418/openstack/keystone.git')

With a parsed URI you can access data about the components::

    print(example.scheme)  # => http
    print(email.path)  # => user@domain.com
    print(ssh.userinfo)  # => user
    print(ssh.host)  # => git.openstack.org
    print(ssh.port)  # => 29418

It can also parse URIs with unicode present::

    uni = uri_reference(b'http://httpbin.org/get?utf8=\xe2\x98\x83')  # ☃
    print(uni.query)  # utf8=%E2%98%83

With a parsed URI you can also validate it::

    if ssh.is_valid():
        subprocess.call(['git', 'clone', ssh.unsplit()])

You can also take a parsed URI and normalize it::

    mangled = uri_reference('hTTp://exAMPLe.COM')
    print(mangled.scheme)  # => hTTp
    print(mangled.authority)  # => exAMPLe.COM

    normal = mangled.normalize()
    print(normal.scheme)  # => http
    print(mangled.authority)  # => example.com

But these two URIs are (functionally) equivalent::

    if normal == mangled:
        webbrowser.open(normal.unsplit())

Your paths, queries, and fragments are safe with us though::

    mangled = uri_reference('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth')
    normal = mangled.normalize()
    assert normal == 'hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth'
    assert normal == 'http://example.com/Some/reallY/biZZare/pAth'
    assert normal != 'http://example.com/some/really/bizzare/path'

If you do not actually need a real reference object and just want to normalize
your URI::

    from rfc3986 import normalize_uri

    assert (normalize_uri('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth') ==
            'http://example.com/Some/reallY/biZZare/pAth')

You can also very simply validate a URI::

    from rfc3986 import is_valid_uri

    assert is_valid_uri('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth')

Requiring Components
~~~~~~~~~~~~~~~~~~~~

You can validate that a particular string is a valid URI and require
independent components::

    from rfc3986 import is_valid_uri

    assert is_valid_uri('http://localhost:8774/v2/resource',
                        require_scheme=True,
                        require_authority=True,
                        require_path=True)

    # Assert that a mailto URI is invalid if you require an authority
    # component
    assert is_valid_uri('mailto:user@example.com', require_authority=True) is False

If you have an instance of a ``URIReference``, you can pass the same arguments
to ``URIReference#is_valid``, e.g.,

.. code::

    from rfc3986 import uri_reference

    http = uri_reference('http://localhost:8774/v2/resource')
    assert uri.is_valid(require_scheme=True,
                        require_authority=True,
                        require_path=True)

    # Assert that a mailto URI is invalid if you require an authority
    # component
    mailto = uri_reference('mailto:user@example.com')
    assert uri.is_valid(require_authority=True) is False

Alternatives
------------

- `rfc3987 <https://pypi.python.org/pypi/rfc3987/1.3.4>`_

  This is a direct competitor to this library, with extra features,
  licensed under the GPL.

- `uritools <https://pypi.python.org/pypi/uritools/0.5.1>`_

  This can parse URIs in the manner of RFC 3986 but provides no validation and
  only recently added Python 3 support.

- Standard library's `urlparse`/`urllib.parse`

  The functions in these libraries can only split a URI (valid or not) and
  provide no validation.

Contributing
------------

This project follows and enforces the Python Software Foundation's `Code of
Conduct <https://www.python.org/psf/codeofconduct/>`_.

If you would like to contribute but do not have a bug or feature in mind, feel
free to email Ian and find out how you can help.

.. _RFC 3986: http://tools.ietf.org/html/rfc3986
.. _Apache License Version 2.0: https://www.apache.org/licenses/LICENSE-2.0