summaryrefslogtreecommitdiff
path: root/rdflib/parser.py
Commit message (Collapse)AuthorAgeFilesLines
* fix: HTTP 308 Permanent Redirect status code handling (#2389)Iwan Aucamp2023-05-171-17/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the handling of HTTP status code 308 to behave more like `urllib.request.HTTPRedirectHandler`, most critically, the new 308 handling will create a new `urllib.request.Request` object with the new URL, which will prevent state from being carried over from the original request. One case where this is important is when the domain name changes, for example, when the original URL is `http://www.w3.org/ns/adms.ttl` and the redirect URL is `https://uri.semic.eu/w3c/ns/adms.ttl`. With the previous behaviour, the redirect would contain a `Host` header with the value `www.w3.org` instead of `uri.semic.eu` because the `Host` header is placed in `Request.unredirected_hdrs` and takes precedence over the `Host` header in `Request.headers`. Other changes: - Only handle HTTP status code 308 on Python versions before 3.11 as Python 3.11 will handle 308 by default [[ref](https://docs.python.org/3.11/whatsnew/changelog.html#id128)]. - Move code which uses `http://www.w3.org/ns/adms.ttl` and `http://www.w3.org/ns/adms.rdf` out of `test_guess_format_for_parse` into a separate parameterized test, which instead uses the embedded http server. This allows the test to fully control the `Content-Type` header in the response instead of relying on the value that the server is sending. This is needed because the server is sending `Content-Type: text/plain` for the `adms.ttl` file, which is not a valid RDF format, and the test is expecting `Content-Type: text/turtle`. Fixes: - <https://github.com/RDFLib/rdflib/issues/2382>.
* refactor: eliminate inheritance from object (#2339)Iwan Aucamp2023-04-101-1/+1
| | | | | This change removes the redundant inheritance from `object` (i.e. `class Foo(object): pass`) that is no longer needed in Python 3 and is a relic from Python 2.
* build(deps-dev): bump mypy from 1.0.1 to 1.1.1 (#2274)dependabot[bot]2023-03-191-1/+2
| | | | | | | | | | | | | | | | | | build(deps-dev): bump mypy from 1.0.1 to 1.1.1 Bumps [mypy](https://github.com/python/mypy) from 1.0.1 to 1.1.1. - [Release notes](https://github.com/python/mypy/releases) - [Commits](https://github.com/python/mypy/compare/v1.0.1...v1.1.1) updated-dependencies: - dependency-name: mypy dependency-type: direct:development update-type: version-update:semver-minor Also added type ignores for newly detected type errors. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Iwan Aucamp <aucampia@gmail.com>
* fix: small InputSource related issues (#2255)Iwan Aucamp2023-03-111-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | I have added a bunch of tests for `InputSource` handling, checking most kinds of input source with most parsers. During this, I detected the following issues that I fixed: - `rdflib.util._iri2uri()` was URL quoting the `netloc` parameter, but this is wrong and the `idna` encoding already takes care of special characters. I removed the URL quoting of `netloc`. - HexTuple parsing was handling the input source in a way that would only work for some input sources, and not raising errors for other input sources. I changed the input source handling to be more generic. - `rdflib.parser.create_input_source()` incorrectly used `file.buffer` instead of `source.buffer` when dealing with IO stream sources. Other changes with no runtime impact include: - Changed the HTTP mocking stuff in test slightly to accommodate serving arbitrary files, as I used this in the `InputSource` tests. - Don't use Google in tests, as we keep getting `urllib.error.HTTPError: HTTP Error 429: Too Many Requests` from it.
* build(deps-dev): bump mypy from 0.991 to 1.0.1 (#2247)dependabot[bot]2023-03-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | * build(deps-dev): bump mypy from 0.991 to 1.0.1 Bumps [mypy](https://github.com/python/mypy) from 0.991 to 1.0.1. - [Release notes](https://github.com/python/mypy/releases) - [Commits](https://github.com/python/mypy/compare/v0.991...v1.0.1) --- updated-dependencies: - dependency-name: mypy dependency-type: direct:development update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * fix type errors --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Iwan Aucamp <aucampia@gmail.com>
* feat: add parser type hints (#2232)Iwan Aucamp2023-03-051-22/+25
| | | | | | | | | | | | | Add type hints to: - `rdflib/parser.py` - `rdflib/plugins/parser/*.py` - some JSON-LD utils - `rdflib/exceptions.py`. This is mainly because the work I'm doing to fix <https://github.com/RDFLib/rdflib/issues/1844> is touching some of this parser stuff and the type hints are useful to avoid mistakes. No runtime changes are included in this PR.
* Fix type errors resulting from new mypy (#2161)Iwan Aucamp2022-11-191-1/+1
| | | | | | New mypy version is reporting new errors. In the long run we need to switch to poetry so we can better control this.
* Fix/ignore flake8 errors in `rdflib/parser.py` (#2016)Iwan Aucamp2022-07-131-6/+6
| | | | | Fix or ignore flake8 errors in `rdflib/parser.py` so that changes to this file does not cause flake8 errors when they invalidate the flakehell baseline.
* [pre-commit.ci] auto fixes from pre-commit.com hookspre-commit-ci[bot]2022-05-191-4/+4
| | | | for more information, see https://pre-commit.ci
* Fixes #1429, add `iri2uri` (#1902)Graham Higgins2022-05-191-1/+2
| | | | | | | Add an iri-to-uri conversion utility to encode IRIs to URIs for `Graph.parse()` sources. Added a couple of tests because feeding it with a suite of IRIs to check seems overkill (not that I could find one). Fixes #1429 Co-authored-by: Iwan Aucamp <aucampia@gmail.com>
* Replace rdlib.net and rdflib.net with rdflib.github.io (#1901)Graham Higgins2022-05-191-1/+2
| | | | | | | | This is being done because `rdlib.net` is a typo and `rdflib.net` is not owned or associated with this project. Also: - Expand testing for some parts of the code impacted by this. Co-authored-by: Iwan Aucamp <aucampia@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hookspre-commit-ci[bot]2022-04-151-11/+6
| | | | for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hookspre-commit-ci[bot]2022-03-161-0/+1
| | | | for more information, see https://pre-commit.ci
* Fix typing errorsIwan Aucamp2022-03-161-3/+3
| | | | | | | Note re `_urlopen(full_link)` -> `_urlopen(Request(full_link))` `_urlopen` expects a request object and will potentially use `Request.full_url` which won't be available on a string.
* Add doc stringNatanael Arndt2022-03-161-1/+1
|
* add type to URLInputSource.linksjsonld_connegNicholas Car2022-01-041-0/+2
|
* Merge branch 'master' into jsonld_connegNicholas Car2022-01-031-30/+66
|\
| * Merge pull request #1643 from RDFLib/url_source_hdrsNicholas Car2022-01-021-7/+18
| |\ | | | | | | Allow parse of RDF from URL with all RDF Media Types
| | * improved use of plugins()nicholascar2022-01-021-4/+3
| | |
| | * Update parser.pyurl_source_hdrsNicholas Car2022-01-021-1/+1
| | |
| | * all RDF Media Types in Accept headernicholascar2022-01-021-7/+19
| | |
| * | Add typing for parsersIwan Aucamp2021-12-291-23/+50
| |/ | | | | | | This changeset include no runtime changes in rdflib.
* | Import urljoinIwan Aucamp2021-12-281-0/+1
| | | | | | | | This is being used and should be imported.
* | Merge remote-tracking branch 'origin/master' into jsonld_connegIwan Aucamp2021-12-281-4/+3
|\ \ | |/
| * blacked all filesnicholascar2021-12-071-2/+1
| |
| * Fix typing of create_input_sourceIwan Aucamp2021-12-011-2/+2
| | | | | | | | | | Change the type of `data` parameter to also allow values of `Dict` type as these are used with `PythonInputSource`.
* | Merge branch 'master' into jsonld_connegNicholas Car2021-12-011-7/+56
|\ \ | |/
| * Merge pull request #1463 from joshmoore/python-graphNicholas Car2021-12-011-5/+49
| |\ | | | | | | RFC: Add PythonInputSource to create py-based graphs
| | * Add a json example to docsjmoore2021-11-261-2/+12
| | |
| | * Run flake8jmoore2021-11-231-2/+0
| | |
| | * Add PythonInputSource to __all__ and minimal docsjmoore2021-11-231-1/+3
| | |
| | * RFC: Add PythonInputSource to create py-based graphsjmoore2021-11-151-4/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | In order to manipulate a JSON-LD structure before creating a graph (specifically to simplify rdf:Seq handling), it is currently necessary to use `json.loads` followed by `dumps` and then let `Graph().parse()` re-load. By detecting `dict` instances and creating a `PythonInputSource`, a single call to `loads` suffices.
| * | Add type hintsIwan Aucamp2021-10-241-1/+7
| |/ | | | | | | | | | | | | This commit only adds type hints and comments and does not make any changes that should affect runtime. The type hints added here derive from work done for #1418.
* | Adapt for pytest and add back import of os in rdflib/parser.pyIwan Aucamp2021-11-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | test/jsonld/test_onedotone.py got a bit messed up with a merge from master. Looking at the original changes from @ashleysommer, all he did was change a condition. This applies the same change but essentially rebased on master. For comparison see: https://github.com/RDFLib/rdflib/compare/ab31c5ed3772cb63806a4614ca4431e9bd7de8d3...c4b679ff6660f4287e7738c93c8c06072642ce1a Also add back import os in rdflib/parser.py This is now needed after https://github.com/RDFLib/rdflib/pull/1441 was merged.
* | Update rdflib/parser.pyNicholas Car2021-11-211-4/+1
| | | | | | Co-authored-by: Christian Clauss <cclauss@me.com>
* | Update rdflib/parser.pyNicholas Car2021-11-211-5/+1
| | | | | | Co-authored-by: Christian Clauss <cclauss@me.com>
* | Merge branch 'master' into jsonld_connegNicholas Car2021-11-211-4/+7
|\ \ | |/
| * Fix Graph.parse URL handling on windowsIwan Aucamp2021-10-121-4/+7
| | | | | | | | | | | | | | | | | | | | Using `pathlib.Path("http://example.com/").exists()` on windows causes an exception as a URL is not a valid path, while `os.path.exists("http://example.com/")` just returns false. This patch reverts _create_input_source_from_location to using `os.path.exists()` instead of pathlib.Path to make it possible to parse graphs from http URLs on windows.
* | moved http.client imports into a TYPE_CHECKING guardAshley Sommer2021-10-141-4/+6
| | | | | | | | removed unused import os
* | Allow URLInputSource to get content-negotiation links from the Link headers ↵Ashley Sommer2021-10-121-6/+57
|/ | | | | | of HTTP responses. Use Links to resolve schema.org-style json-ld conneg redirections. Fix the ability to run the `remote-url` arm of the JSON-LD test suite (got most of them working!)
* blacked everything6.0.0Nicholas Car2021-07-201-5/+12
|
* Merge pull request #1288 from tgbugs/path2url-removalNicholas Car2021-07-031-6/+7
|\ | | | | pathname2url removal
| * parser.py fix pathlib mismatchesTom Gillespie2021-03-291-5/+6
| |
| * tweaks to hierarchy to improve load timesTom Gillespie2021-03-261-4/+4
| | | | | | | | | | | | I think that most of the difference actually comes from patching uuid to not waste 25 milliseconds every time it is imported but there is a bit of improvement here
* | Add pathlib.PurePath support for Graph.{serialize,parse}Iwan Aucamp2021-06-291-1/+1
| | | | | | | | | | Graph.parse did already support pathlib.Path but there is no good reason to not support PurePath AFAICT.
* | Merge pull request #1342 from iafork/iwana-issue1040Nicholas Car2021-06-291-1/+17
|\ \ | | | | | | Add handling for 308 (Permanent Redirect)
| * | Use HTTPError.headers instead of HTTPError.hdrsIwan Aucamp2021-06-261-1/+1
| | | | | | | | | | | | | | | Even though HTTPError.hdrs works, it is not in the published API and also not in python typeshed so not recognized by mypy.
| * | Add handling for 308 (Permanent Redirect)Iwan Aucamp2021-06-241-1/+17
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | Some/all supported versions of python does not handle 308 (Permanent Redirect) in `urlopen`. See https://bugs.python.org/issue40321 for more info on this. This patch adds handling for 308 in rdflib. I also extracted the HTTP mock from `test_sparqlstore.py` and reused it in test_conneg. Any feedback on it will be appreciated.
* | Add mypy to CI pipelines and fix errors it raisesIwan Aucamp2021-05-301-1/+1
|/
* Support parsing paths specified with pathlibAnton Lodder2020-10-071-0/+3
| | | | | | pathlib was added to the standard librarly as of Python 3.4. This adds support for calling Graph.parse on a file specified using a pathlib object.