summaryrefslogtreecommitdiff
path: root/rdflib/util.py
Commit message (Collapse)AuthorAgeFilesLines
* fix: IRI to URI conversion (#2304)Iwan Aucamp2023-03-231-15/+75
| | | | | | | | | | | | | | | | | The URI to IRI conversion was percentage-quoting characters that should not have been quoted, like equals in the query string. It was also not quoting things that should have been quoted, like the username and password components of a URI. This change improves the conversion by only quoting characters that are not allowed in specific parts of the URI and quoting previously unquoted components. The safe characters for each segment are taken from [RFC3986](https://datatracker.ietf.org/doc/html/rfc3986). The new behavior is heavily inspired by [`werkzeug.urls.iri_to_uri`](https://github.com/pallets/werkzeug/blob/92c6380248c7272ee668e1f8bbd80447027ccce2/src/werkzeug/urls.py#L926-L931) though there are some differences. - Closes <https://github.com/RDFLib/rdflib/issues/2120>.
* feat: add typing to `rdflib.util` (#2262)Iwan Aucamp2023-03-121-16/+33
| | | | | | | | Mainly so that users can use RDFLib in a safer way, and that we can make safer changes to RDFLib in future. There are also some accomodating type-hint related changes outside of `rdflib.util`. This change does not have a runtime impact.
* fix: small InputSource related issues (#2255)Iwan Aucamp2023-03-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | I have added a bunch of tests for `InputSource` handling, checking most kinds of input source with most parsers. During this, I detected the following issues that I fixed: - `rdflib.util._iri2uri()` was URL quoting the `netloc` parameter, but this is wrong and the `idna` encoding already takes care of special characters. I removed the URL quoting of `netloc`. - HexTuple parsing was handling the input source in a way that would only work for some input sources, and not raising errors for other input sources. I changed the input source handling to be more generic. - `rdflib.parser.create_input_source()` incorrectly used `file.buffer` instead of `source.buffer` when dealing with IO stream sources. Other changes with no runtime impact include: - Changed the HTTP mocking stuff in test slightly to accommodate serving arbitrary files, as I used this in the `InputSource` tests. - Don't use Google in tests, as we keep getting `urllib.error.HTTPError: HTTP Error 429: Too Many Requests` from it.
* fix: handling of Literal datatype (#2076)Iwan Aucamp2022-08-121-31/+57
| | | | | | | | | | | | | | | | | | Check datatype against `None` instead of checking it's truthiness (i.e. `if datatype is not None:` instead of `if datatype:`). Checking truthiness instead of `is not None` causes a blank string to be treated the same as None. The consequence of this was that `Literal.datatype` could be a `str`, a `URIRef` or `None`, instead of just a `URIRef` or `None` as was seemingly intended. Other changes: - Changed the type of `Literal.datatype` to be `Optional[URIRef]` instead of `Optional[str]` now that `str` will always be converted to `URIRef` even if it is a blank string. - Changed `rdflib.util._coalesce` to make it easier and safer to use with a non-`None` default value. - Changed `rdflib.util` to avoid issues with circular imports.
* More type hints for `rdflib.graph` and related (#1853)Iwan Aucamp2022-05-261-9/+8
| | | | | | | | | | | | | | | | | | This patch primarily adds more type hints for `rdflib.graph`, but also adds type hints to some related modules in order to work with the new type hints for `rdflib.graph`. I'm mainly doing this as a baseline for adding type hints to `rdflib.store`. I have created type aliases to make it easier to type everything consistently and to make type hints easier easier to change in the future. The type aliases are private however (i.e. `_`-prefixed) and should be kept as such for now. This patch only contains typing changes and does not change runtime behavior. Broken off from https://github.com/RDFLib/rdflib/pull/1850
* [pre-commit.ci] auto fixes from pre-commit.com hookspre-commit-ci[bot]2022-05-191-3/+3
| | | | for more information, see https://pre-commit.ci
* Fixes #1429, add `iri2uri` (#1902)Graham Higgins2022-05-191-0/+35
| | | | | | | Add an iri-to-uri conversion utility to encode IRIs to URIs for `Graph.parse()` sources. Added a couple of tests because feeding it with a suite of IRIs to check seems overkill (not that I could find one). Fixes #1429 Co-authored-by: Iwan Aucamp <aucampia@gmail.com>
* feat: add tests and typing for `rdflib.utils.{get_tree,find_roots}` (#1935)Iwan Aucamp2022-05-151-8/+32
| | | | | | | | | | I wanted to see if these are useful for what I wanted to do and I was unsure about what exactly they are doing, wanted to be sure so I added some tests to check the current behavior. Also: - Add rdftest.ttl to `test/data/defined_namespaces/` using fetcher I'm using it to test `get_tree` and `find_roots`.
* Remove testing and debug code from rdflibIwan Aucamp2022-04-191-20/+0
| | | | | | | | | | | | | | This patch removes code from `rdflib/` that does not seem like it belongs in `rdflib/`, most of it is related to doctest, some of it belongs in `test/` and was moved to `test/test_misc/test_collection.py`, and yet more of it seems to just be there for debugging purposes, though it would possibly be better to put that in a separate place if it is needed again or to debug using tests if possible. Other changes: - Removed an invocation of `rdflib.util.test` from `test_util.py`. This seems like an attempt to invoke doctest however pytest takes care of that so this is not needed.
* Fixes, improvements and test for namespace (re)-binding on stores.Iwan Aucamp2022-04-191-1/+25
| | | | | | | | | | | - Added store specific tests for namespace binding and rebinding. - Copied @gjhiggins fix for `rdflib.plugins.stores.memory.Memory.bind` to `rdflib.plugins.stores.memory.SimpleMemory.bind`. - Changed `bind` on `Memory`, `SimpleMemory`, `BerkleyDB` stores to explicitly check for `is not None`/`is None` instead of falsey/truthy. - Added `pytest.util._coalesce` to do null coalescing, for more info see deferred PEP-505 https://peps.python.org/pep-0505/.
* [pre-commit.ci] auto fixes from pre-commit.com hookspre-commit-ci[bot]2022-04-151-13/+5
| | | | for more information, see https://pre-commit.ci
* Remove `(TypeCheck|SubjectType|PredicateType|ObjectType)Error` and relatedIwan Aucamp2022-04-141-68/+0
| | | | | | | | | | | | | Also remove `check_(context|subject|predicate|object|statement|pattern)`. It seems nothing is using these exceptions and functions. Technically this does remove parts of the public API, but I would argue they are "buggy" parts as anything that use them would be sorely disappointed to find that the behaviour is not as expect, and these functions/exceptions don't serve any function related to the aim of RDFLib unless they are integrated with the rest of RDFLib.
* Move tests into test_parser_turtlelike.pyIwan Aucamp2022-04-021-11/+5
| | | | Also refactor as per suggestion from @nicholascar
* blackedGraham Higgins2022-03-231-1/+7
|
* Fix for issue 1768, from_n3 handling numeric shortcutsGraham Higgins2022-03-231-3/+14
|
* add nquads to recognised file extensionsGraham Higgins2022-01-081-0/+1
|
* Make util.guess_format() pass with mypy-strict-reviewed consumersAlex Nelson2021-12-161-1/+2
| | | | Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
* guess_format() cater for JSON-LD files ending .json-ldnicholascar2021-12-011-0/+1
|
* Add JSON-LD to guess_format()Alex Nelson2021-09-071-0/+4
| | | | | | | | | | | | | | | | | | | | | Now that the `rdflib` package provides the functionality formerly in the package `rdflib-jsonld`, it would be helpful for the utility function `guess_format` to recognize file extensions indicative of JSON-LD content. This patch adds recognition of two file extensions, preserving syntax of the `SUFFIX_FORMAT_MAP` variable: * `.json` * `.jsonld` The latter is in recognition of the IANA media type `application/ld+json` reporting the `.jsonld` file extension. References: https://www.iana.org/assignments/media-types/application/ld+json via https://www.iana.org/assignments/media-types/media-types.xhtml via https://w3c.github.io/json-ld-syntax/#iana-considerations Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
* blacked everything6.0.0Nicholas Car2021-07-201-3/+3
|
* Prevent `from_n3` from unescaping `\xhh`Iwan Aucamp2021-06-261-1/+4
| | | | | | | | This is a fairly pragmatic fix to a problem which should be solved by changing `from_n3` to do the same as the actual n3/turtle parser. There are still many issues with this function, some of which I added tests for.
* re-run blackAshley Sommer2020-08-271-2/+2
|
* Merge remote-tracking branch 'origin/master' into t0b3_masterAshley Sommer2020-08-271-1/+1
|\ | | | | | | | | | | | | | | | | # Conflicts: # rdflib/namespace.py # rdflib/parser.py # rdflib/plugins/memory.py # rdflib/plugins/parsers/ntriples.py # test/test_iomemory.py
| * improved Graph().parse()Nicholas Car2020-08-141-1/+1
| |
* | 2to3 whole sourcebaset0b32020-06-221-3/+0
|/ | | | Signed-off-by: t0b3 <thomas.bettler@gmail.com>
* Merge remote-tracking branch 'upstream/master' into autodetect-parse-formatDonny Winston2020-05-261-53/+60
|\
| * changes for flake8Nicholas Car2020-05-171-2/+2
| |
| * blacked all python filesNicholas Car2020-05-161-55/+62
| |
* | Remove 'as' for importDonny Winston2020-05-121-5/+5
| |
* | refactor imports; fix try blockDonny Winston2020-05-121-6/+5
|/
* fixed URIRef including native unicode charactersKempei Igarashi2020-02-211-1/+3
| | | | URIRef should be able to include native unicode like <http://ja.dbpedia.org/resource/日本> equal to <http://ja.dbpedia.org/resource/\\u65e5\\u672c>. Adding the code as same as line 185-187.
* a slightly opinionated autopep8 runGunnar Aastrand Grimnes2018-10-301-5/+1
| | | | | | | | opinions is mainly: no to long lines, but not at any cost. notation3.py crashses autopep :D Also rdflib/__init__.py gets completely broken
* moved all compat code to rdflib.compatsix_2to3Gunnar Aastrand Grimnes2017-01-311-1/+1
|
* removed most of the six import from py3compatGunnar Aastrand Grimnes2017-01-301-1/+1
| | | | now six is used throughout.
* six: util.py: cleanup unused StringIOJoern Hees2017-01-301-1/+0
|
* six: util.py: headers, StringIOJoern Hees2017-01-301-2/+5
|
* fix double reduction of \ escapes in from_n3Joern Hees2015-11-221-1/+1
| | | | the reduction is actually already performed by .decode("unicode-escape").
* util.from_n3() allows to specify a NamespaceManager to parse CURIEsJoern Hees2015-07-271-8/+24
|
* util.from_n3() now correctly parses literals with datatypes, see #502Joern Hees2015-07-271-1/+3
|
* Fix mapping of ttl to turtle for guess_formatNiklas Lindström2015-02-041-2/+2
|
* doc updatesGunnar Aastrand Grimnes2013-05-091-1/+1
|
* made query res pprinter into a serializerGunnar Aastrand Grimnes2013-05-091-56/+0
|
* cleanup - move stuff from rdfextras to sensible packages - entry_points for ↵Gunnar Aastrand Grimnes2013-05-031-1/+206
| | | | console scripts
* minor cleanup - some docstringsGunnar Aastrand Grimnes2013-03-281-0/+4
|
* Conform to PEP8Graham Higgins2013-02-071-18/+7
|
* apply autopep8 standards.Graham Higgins2013-01-111-28/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | $ flake8 rdflib --exclude=pyRdfa,host,extras,transform,rdfs,pyMicrodata rdflib/graph.py:192: W801 redefinition of unused 'BytesIO' from line 189 rdflib/graph.py:194: W402 'BytesIO' imported but unused rdflib/graph.py:680:80: E501 line too long (80 > 79 characters) rdflib/graph.py:682:80: E501 line too long (80 > 79 characters) rdflib/graph.py:686:80: E501 line too long (83 > 79 characters) rdflib/graph.py:690:80: E501 line too long (83 > 79 characters) rdflib/graph.py:692:80: E501 line too long (83 > 79 characters) rdflib/graph.py:695:80: E501 line too long (83 > 79 characters) rdflib/graph.py:698:80: E501 line too long (83 > 79 characters) rdflib/parser.py:21: W801 redefinition of unused 'BytesIO' from line 19 rdflib/compat.py:12: W801 redefinition of unused 'defaultdict' from line 10 rdflib/py3compat.py:10: W801 redefinition of unused 'wraps' from line 7 rdflib/py3compat.py:81: W806 redefinition of function 'b' from line 44 rdflib/py3compat.py:87: W806 redefinition of function 'format_doctest_out' from line 50 rdflib/py3compat.py:97: W806 redefinition of function 'type_cmp' from line 61 rdflib/term.py:54: W801 redefinition of unused 'md5' from line 52 rdflib/store.py:73: W801 redefinition of unused 'BytesIO' from line 71 rdflib/query.py:10: W801 redefinition of unused 'BytesIO' from line 8 rdflib/__init__.py:73: W402 'plugin' imported but unused rdflib/__init__.py:74: W402 'query' imported but unused rdflib/util.py:43: W806 redefinition of function 'sign' from line 50 rdflib/plugins/parsers/hturtle.py:25: W801 redefinition of unused 'html5lib' from line 24 rdflib/plugins/parsers/ntriples.py:141: W402 'BytesIO' imported but unused rdflib/plugins/parsers/structureddata.py:23: W801 redefinition of unused 'html5lib' from line 22
* Removed unused code that broke Windows testsDzinX2012-09-211-5/+0
| | | | | | There are two statements in utils.parse_date_time that don't do anything useful (the result is replaced by statements that follow), but that don't work on Windows for given doctests. I removed them.
* optimisations - generation-comprehensions instead of list-comprehensionsgromgull2012-04-231-0/+4
|
* Applied Jorn's patch, fixes issue 208, thanks\!Graham Higgins2012-01-251-17/+36
|
* Uncomment __all__, tests do all passGraham Higgins2012-01-161-1/+1
|