This module contains functions for parsing and handling URIs
(
Parsing and serializing non-UTF-8 form-urlencoded query strings are also supported
(
A URI is an identifier consisting of a sequence of characters matching the syntax
rule named URI in
The generic URI syntax consists of a hierarchical sequence of components referred to as the scheme, authority, path, query, and fragment:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] hier-part = "//" authority path-abempty / path-absolute / path-rootless / path-empty scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) authority = [ userinfo "@" ] host [ ":" port ] userinfo = *( unreserved / pct-encoded / sub-delims / ":" ) reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
The interpretation of a URI depends only on the characters used and not on how those characters are represented in a network protocol.
The functions implemented by this module cover the following use cases:
There are four different encodings present during the handling of URIs:
Functions with
Unless otherwise specified the return value type and encoding are the same as the input type and encoding. That is, binary input returns binary output, list input returns a list output but mixed input returns list output.
In case of lists there is only percent-encoding. In binaries, however, both binary encoding
and percent-encoding shall be considered.
Quoting functions are intended to be used by URI producing application
during component preparation or retrieval phase to avoid conflicts between
data and characters used in URI syntax. Quoting functions use percent
encoding, but with different rules than for example during execution of
Quoting functions can for instance be used for constructing a path
component with a segment containing '/' character which should not collide with
'/' used as general delimiter in path component.
Error tuple indicating the type of error. Possible values of the second component:
The third component is a term providing additional information about the cause of the error.
Map holding the main components of a URI.
List of unicode codepoints, a UTF-8 encoded binary, or a mix of the two,
representing an
This is a utility function meant to be used in the shell for printing
the allowed characters in each
major URI component, and also in the most important characters sets.
Please note that this function does not replace the ABNF rules defined by
the standards, these character sets are derived directly from those
aformentioned rules. For more information see the
Composes a form-urlencoded
See also the opposite operation
Example:
1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]). 2> >,<<"1">>}, 2> {<<"city">>,<<"örebro"/utf8>>}]).]]> >]]>
Same as
Each character in the entry's name and value that cannot be expressed using the selected character encoding, is replaced by a string consisting of a U+0026 AMPERSAND character (), a "#" (U+0023) character, one or more ASCII digits representing the Unicode code point of the character in base ten, and finally a ";" (U+003B) character.
Bytes that are out of the range 0x2A, 0x2D, 0x2E, 0x30 to 0x39, 0x41 to 0x5A, 0x5F, 0x61 to 0x7A, are percent-encoded (U+0025 PERCENT SIGN character (%) followed by uppercase ASCII hex digits representing the hexadecimal value of the byte).
See also the opposite operation
Example:
1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}], 1> [{encoding, latin1}]). uri_string:compose_query([{<<"foo bar">>,<<"1">>}, 2> {<<"city">>,<<"東京"/utf8>>}], [{encoding, latin1}]).]]> >]]>
Dissects an urlencoded
See also the opposite operation
Example:
1> [{"foo bar","1"},{"city","örebro"}] 2> >).]]> >,<<"1">>}, {<<"city">>,<<230,157,177,228,186,172>>}] ]]>
Transforms an
This function implements case normalization, percent-encoding normalization, path segment normalization and scheme based normalization for HTTP(S) with basic support for FTP, SSH, SFTP and TFTP.
Example:
1> uri_string:normalize("/a/b/c/./../../g"). "/a/g" 2> >).]]> >]]> 3> uri_string:normalize("http://localhost:80"). "http://localhost/" 4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g", 4> host => "localhost-örebro"}). "http://localhost-%C3%B6rebro/a/g"
Same as
Example:
1> uri_string:normalize("/a/b/c/./../../g", [return_map]). #{path => "/a/g"} 2> >, [return_map]).]]> <<"mid/6">>}]]> 3> uri_string:normalize("http://localhost:80", [return_map]). #{scheme => "http",path => "/",host => "localhost"} 4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g", 4> host => "localhost-örebro"}, [return_map]). #{scheme => "http",path => "/a/g",host => "localhost-örebro"}
Parses an
See also the opposite operation
Example:
1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose"). #{fragment => "nose",host => "example.com", path => "/over/there",port => 8042,query => "name=ferret", scheme => foo,userinfo => "user"} 2> >).]]> <<"example.com">>,path => <<"/over/there">>, port => 8042,query => <<"name=ferret">>,scheme => <<"foo">>, userinfo => <<"user">>}]]>
Decodes all percent-encoded triplets in the input that can be both a
If the input encoding is not UTF-8, an error tuple is returned.
Example:
1> uri_string:percent_decode(#{host => "localhost-%C3%B6rebro",path => [], 1> scheme => "http"}). #{host => "localhost-örebro",path => [],scheme => "http"} 2> >).]]> >]]>
Using
uri_string:percent_decode(<<"http://local%252Fhost/path">>). <<"http://local%2Fhost/path">> 4> uri_string:percent_decode(<<"http://local%2Fhost/path">>). <<"http://local/host/path">>]]>
Replaces characters out of unreserved set with their percent encoded equivalents.
Unreserved characters defined in
Example:
1> 2> >).]]> >]]>
Function is not aware about any URI component context and should not be used on whole URI. If applied more than once on the same data, might produce unexpected results.
Same as
Example:
1> 2> >, "/").]]> >]]>
Function is not aware about any URI component context and should not be used on whole URI. If applied more than once on the same data, might produce unexpected results.
Creates an
See also the opposite operation
Example:
1> URIMap = #{fragment => "nose", host => "example.com", path => "/over/there", 1> port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}. #{fragment => "nose",host => "example.com", path => "/over/there",port => 8042,query => "name=ferret", scheme => "foo",userinfo => "user"} 2> uri_string:recompose(URIMap). "foo://example.com:8042/over/there?name=ferret#nose"
Convert a
Example:
1> uri_string:resolve("/abs/ol/ute", "http://localhost/a/b/c?q"). "http://localhost/abs/ol/ute" 2> uri_string:resolve("../relative", "http://localhost/a/b/c?q"). "http://localhost/a/relative" 3> uri_string:resolve("http://localhost/full", "http://localhost/a/b/c?q"). "http://localhost/full" 4> uri_string:resolve(#{path => "path", query => "xyz"}, "http://localhost/a/b/c?q"). "http://localhost/a/b/path?xyz"
Same as
Example:
1> uri_string:resolve("/abs/ol/ute", "http://localhost/a/b/c?q", [return_map]). #{host => "localhost",path => "/abs/ol/ute",scheme => "http"} 2> uri_string:resolve(#{path => "/abs/ol/ute"}, #{scheme => "http", 2> host => "localhost", path => "/a/b/c?q"}, [return_map]). #{host => "localhost",path => "/abs/ol/ute",scheme => "http"}
Transcodes an
Example:
1> >,]]> 1> [{in_encoding, utf32},{out_encoding, utf8}]). >]]> 2> uri_string:transcode("foo%F6bar", [{in_encoding, latin1}, 2> {out_encoding, utf8}]). "foo%C3%B6bar"
Percent decode characters.
Example:
1> 2> >).]]> >]]>
Function is not aware about any URI component context and should not be used on whole URI. If applied more than once on the same data, might produce unexpected results.