summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorJulian Berman <Julian@GrayVines.com>2023-02-22 14:00:02 +0200
committerJulian Berman <Julian@GrayVines.com>2023-02-22 15:26:28 +0200
commit34d19dc706a1d57640e6f84333500e5f734dbf9a (patch)
tree6c24bdef0556cc170bfe17d5209538a4f6e6378f /docs
parentfcea5ad0472b6141b1f1b686200fb9f26c0af326 (diff)
downloadjsonschema-34d19dc706a1d57640e6f84333500e5f734dbf9a.tar.gz
Add some prose documentation on the new referencing API.
Diffstat (limited to 'docs')
-rw-r--r--docs/faq.rst52
-rw-r--r--docs/index.rst1
-rw-r--r--docs/referencing.rst314
-rw-r--r--docs/requirements.txt4
-rw-r--r--docs/spelling-wordlist.txt3
5 files changed, 321 insertions, 53 deletions
diff --git a/docs/faq.rst b/docs/faq.rst
index 2236390..5ae3e62 100644
--- a/docs/faq.rst
+++ b/docs/faq.rst
@@ -88,6 +88,7 @@ The JSON object ``{}`` is simply the Python `dict` ``{}``, and a JSON Schema lik
Specifically, in the case where `jsonschema` is asked to resolve a remote reference, it has no choice but to assume that the remote reference is serialized as JSON, and to deserialize it using the `json` module.
One cannot today therefore reference some remote piece of YAML and have it deserialized into Python objects by this library without doing some additional work.
+ See `Resolving References to Schemas Written in YAML <referencing:Resolving References to Schemas Written in YAML>` for details.
In practice what this means for JSON-like formats like YAML and TOML is that indeed one can generally schematize and then validate them exactly as if they were JSON by simply first deserializing them using libraries like ``PyYAML`` or the like, and passing the resulting Python objects into functions within this library.
@@ -99,57 +100,6 @@ In such cases one is recommended to first pre-process the data such that the res
In the previous example, if the desired behavior is to transparently coerce numeric properties to strings, as Javascript might, then do the conversion explicitly before passing data to this library.
-How do I configure a base URI for $ref resolution using local files?
---------------------------------------------------------------------
-
-`jsonschema` supports loading schemas from the filesystem.
-
-The most common mistake when configuring reference resolution to retrieve schemas from the local filesystem is to specify a base URI which points to a directory, but forget to add a trailing slash.
-
-For example, given a directory ``/tmp/foo/`` with ``bar/schema.json`` within it, you should use something like:
-
-.. code-block:: python
-
- from pathlib import Path
-
- import jsonschema.validators
-
- path = Path("/tmp/foo")
- resolver = jsonschema.validators.RefResolver(
- base_uri=f"{path.as_uri()}/",
- referrer=True,
- )
- jsonschema.validate(
- instance={},
- schema={"$ref": "bar/schema.json"},
- resolver=resolver,
- )
-
-where note:
-
- * the base URI has a trailing slash, even though
- `pathlib.PurePath.as_uri` does not add it!
- * any relative refs are now given relative to the provided directory
-
-If you forget the trailing slash, you'll find references are resolved a
-directory too high.
-
-You're likely familiar with this behavior from your browser. If you
-visit a page at ``https://example.com/foo``, then links on it like
-``<a href="./bar">`` take you to ``https://example.com/bar``, not
-``https://example.com/foo/bar``. For this reason many sites will
-redirect ``https://example.com/foo`` to ``https://example.com/foo/``,
-i.e. add the trailing slash, so that relative links on the page will keep the
-last path component.
-
-There are, in summary, 2 ways to do this properly:
-
-* Remember to include a trailing slash, so your base URI is
- ``file:///foo/bar/`` rather than ``file:///foo/bar``, as shown above
-* Use a file within the directory as your base URI rather than the
- directory itself, i.e. ``file://foo/bar/baz.json``, which will of course
- cause ``baz.json`` to be removed while resolving relative URIs
-
Why doesn't my schema's default property set the default on my instance?
------------------------------------------------------------------------
diff --git a/docs/index.rst b/docs/index.rst
index d66aa8b..949ab44 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -12,6 +12,7 @@ Contents
validate
errors
+ referencing
creating
faq
api/index
diff --git a/docs/referencing.rst b/docs/referencing.rst
new file mode 100644
index 0000000..20a1ca9
--- /dev/null
+++ b/docs/referencing.rst
@@ -0,0 +1,314 @@
+=========================
+JSON (Schema) Referencing
+=========================
+
+The JSON Schema :kw:`$ref` and :kw:`$dynamicRef` keywords allow schema authors to combine multiple schemas (or subschemas) together for reuse or deduplication.
+
+The `referencing <referencing:index>` library was written in order to provide a simple, well-behaved and well-tested implementation of this kind of reference resolution [1]_.
+It has its own documentation, but this page serves as a quick introduction which is tailored more specifically to JSON Schema, and even more specifically to how to configure `referencing <referencing:index>` for use with `Validator` objects in order to customize the behavior of :kw:`$ref` and friends in your schemas.
+
+Configuring `jsonschema` for custom referencing behavior is essentially a two step process:
+
+ * Create a `referencing.Registry` object that behaves the way you wish
+
+ * Pass the `referencing.Registry` to your `Validator` when instantiating it
+
+The examples below essentially follow these two steps.
+
+.. [1] One that in fact is independent of this `jsonschema` library itself, and may some day be used by other tools or implementations.
+
+
+Common Scenarios
+----------------
+
+.. _in-memory-schemas:
+
+Making Additional In-Memory Schemas Available
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The most common scenario one is likely to encounter is the desire to include a small number of additional in-memory schemas, making them available for use during validation.
+
+For instance, imagine the below schema for non-negative integers:
+
+.. code:: json
+
+ {
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "type": "integer",
+ "minimum": 0
+ }
+
+We may wish to have other schemas we write be able to make use of this schema, and refer to it as ``http://example.com/nonneg-int-schema`` and/or as ``urn:nonneg-integer-schema``.
+
+To do so we make use of APIs from the referencing library to create a `referencing.Registry` which maps the URIs above to this schema:
+
+.. code:: python
+
+ from referencing import Registry, Resource
+ schema = Resource.from_contents(
+ {
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "type": "integer",
+ "minimum": 0,
+ },
+ )
+ registry = Registry().with_resources(
+ [
+ ("http://example.com/nonneg-int-schema", schema),
+ ("urn:nonneg-integer-schema", schema),
+ ],
+ )
+
+What's above is likely mostly self-explanatory, other than the presence of the `referencing.Resource.from_contents` function.
+Its purpose is to convert a piece of "opaque" JSON (or really a Python `dict` containing deserialized JSON) into an object which indicates what *version* of JSON Schema the schema is meant to be interpreted under.
+Calling it will inspect a :kw:`$schema` keyword present in the given schema and use that to associate the JSON with an appropriate `specification <referencing.Specification>`.
+If your schemas do not contain ``$schema`` dialect identifiers, and you intend for them to be interpreted always under a specific dialect -- say Draft 2020-12 of JSON Schema -- you may instead use e.g.:
+
+.. code:: python
+
+ from referencing import Registry, Resource
+ from referencing.jsonschema import DRAFT2020212
+ schema = DRAFT202012.create_resource({"type": "integer", "minimum": 0})
+ registry = Registry().with_resources(
+ [
+ ("http://example.com/nonneg-int-schema", schema),
+ ("urn:nonneg-integer-schema", schema),
+ ],
+ )
+
+which has the same functional effect.
+
+You can now pass this registry to your `Validator`, which allows a schema passed to it to make use of the aforementioned URIs to refer to our non-negative integer schema.
+Here for instance is an example which validates that instances are JSON objects with non-negative integral values:
+
+.. code:: python
+
+ from jsonschema import Draft202012Validator
+ validator = Draft202012Validator(
+ {
+ "type": "object",
+ "additionalProperties": {"$ref": "urn:nonneg-integer-schema"},
+ },
+ registry=registry, # the critical argument, our registry from above
+ )
+ validator.validate({"foo": 37})
+ validator.validate({"foo": -37}) # Uh oh!
+
+.. _ref-filesystem:
+
+Resolving References from the File System
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Another common request from schema authors is to be able to map URIs to the file system, perhaps while developing a set of schemas in different local files.
+The referencing library supports doing so dynamically by configuring a callable which can be used to retrieve any schema which is *not* already pre-loaded in the manner described `above <in-memory-schemas>`.
+
+Here we resolve any schema beginning with ``http://localhost`` to a directory ``/tmp/schemas`` on the local filesystem (note of course that this will not work if run directly unless you have populated that directory with some schemas):
+
+.. code:: python
+
+ from pathlib import Path
+ import json
+
+ from referencing import Registry, Resource
+ from referencing.exceptions import NoSuchResource
+
+ SCHEMAS = Path("/tmp/schemas")
+
+ def retrieve_from_filesystem(uri: str):
+ if not uri.startswith("http://localhost/"):
+ raise NoSuchResource(ref=uri)
+ path = SCHEMAS / Path(uri.removeprefix("http://localhost/"))
+ contents = json.loads(path.read_text())
+ return Resource.from_contents(contents)
+
+ registry = Registry(retrieve=retrieve_from_filesystem)
+
+Such a registry can then be used with `Validator` objects in the same way shown above, and any such references to URIs which are not already in-memory will be retrieved from the configured directory.
+
+We can mix the two examples above if we wish for some in-memory schemas to be available in addition to the filesystem schemas, e.g.:
+
+.. code:: python
+
+ from referencing.jsonschema import DRAFT7
+ registry = Registry(retrieve=retrieve_from_filesystem).with_resource(
+ "urn:non-empty-array", DRAFT7.create_resource({"type": "array", "minItems": 1}),
+ )
+
+where we've made use of the similar `referencing.Registry.with_resource` function to add a single additional resource.
+
+Resolving References to Schemas Written in YAML
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Generalizing slightly, the retrieval function provided need not even assume that it is retrieving JSON.
+As long as you deserialize what you have retrieved into Python objects, you may equally be retrieving references to YAML documents or any other format.
+
+Here for instance we retrieve YAML documents in a way similar to the `above <ref-filesystem>` using PyYAML:
+
+.. code:: python
+
+ from pathlib import Path
+ import yaml
+
+ from referencing import Registry, Resource
+ from referencing.exceptions import NoSuchResource
+
+ SCHEMAS = Path("/tmp/yaml-schemas")
+
+ def retrieve_yaml(uri: str):
+ if not uri.startswith("http://localhost/"):
+ raise NoSuchResource(ref=uri)
+ path = SCHEMAS / Path(uri.removeprefix("http://localhost/"))
+ contents = yaml.safe_load(path.read_text())
+ return Resource.from_contents(contents)
+
+ registry = Registry(retrieve=retrieve_yaml)
+
+.. note::
+
+ Not all YAML fits within the JSON data model.
+
+ JSON Schema is defined specifically for JSON, and has well-defined behavior strictly for Python objects which could have possibly existed as JSON.
+
+ If you stick to the subset of YAML for which this is the case then you shouldn't have issue, but if you pass schemas (or instances) around whose structure could never have possibly existed as JSON (e.g. a mapping whose keys are not strings), all bets are off.
+
+One could similarly imagine a retrieval function which switches on whether to call ``yaml.safe_load`` or ``json.loads`` by file extension (or some more reliable mechanism) and thereby support retrieving references of various different file formats.
+
+.. _http:
+
+Automatically Retrieving Resources Over HTTP
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In the general case, the JSON Schema specifications tend to `discourage <https://json-schema.org/draft/2020-12/json-schema-core.html#name-loading-a-referenced-schema>`_ implementations (like this one) from automatically retrieving references over the network, or even assuming such a thing is feasible (as schemas may be identified by URIs which are strictly identifiers, and not necessarily downloadable from the URI even when such a thing is sensical).
+
+However, if you as a schema author are in a situation where you indeed do wish to do so for convenience (and understand the implications of doing so), you may do so by making use of the ``retrieve`` argument to `referencing.Registry`.
+
+Here is how one would configure a registry to automatically retrieve schemas from the `JSON Schema Store <https://www.schemastore.org>`_ on the fly using the `httpx <https://www.python-httpx.org/>`_:
+
+.. code:: python
+
+ from referencing import Registry, Resource
+ import httpx
+
+ def retrieve_via_httpx(uri: str):
+ response = httpx.get(uri)
+ return Resource.from_contents(response.json())
+
+ registry = Registry(retrieve=retrieve_via_httpx)
+
+Given such a registry, we can now, for instance, validate instances against schemas from the schema store by passing the ``registry`` we configured to our `Validator` as in previous examples:
+
+.. code:: python
+
+ from jsonschema import Draft202012Validator
+ Draft202012Validator(
+ {"$ref": "https://json.schemastore.org/pyproject.json"},
+ registry=registry,
+ ).validate({"project": {"name": 12}})
+
+which should in this case indicate the example data is invalid:
+
+.. code:: python
+
+ Traceback (most recent call last):
+ File "example.py", line 14, in <module>
+ ).validate({"project": {"name": 12}})
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+ File "jsonschema/validators.py", line 345, in validate
+ raise error
+ jsonschema.exceptions.ValidationError: 12 is not of type 'string'
+
+ Failed validating 'type' in schema['properties']['project']['properties']['name']:
+ {'pattern': '^([a-zA-Z\\d]|[a-zA-Z\\d][\\w.-]*[a-zA-Z\\d])$',
+ 'title': 'Project name',
+ 'type': 'string'}
+
+ On instance['project']['name']:
+ 12
+
+Retrieving resources from a SQLite database or some other network-accessible resource should be more or less similar, replacing the HTTP client with one for your database of course.
+
+.. warning::
+
+ Be sure you understand the security implications of the reference resolution you configure.
+ And if you accept untrusted schemas, doubly sure!
+
+ You wouldn't want a user causing your machine to go off and retrieve giant files off the network by passing it a ``$ref`` to some huge blob, or exploiting similar vulnerabilities in your setup.
+
+
+Migrating From ``RefResolver``
+------------------------------
+
+Older versions of `jsonschema` used a different object -- `_RefResolver` -- for reference resolution, which you a schema author may already be configuring for your own use.
+
+`_RefResolver` is now fully deprecated and replaced by the use of `referencing.Registry` as shown in examples above.
+
+If you are not already constructing your own `_RefResolver`, this change should be transparent to you (or even recognizably improved, as the point of the migration was to improve the quality of the referencing implementation and enable some new functionality).
+
+If you *were* configuring your own `_RefResolver`, here's how to migrate to the newer APIs:
+
+The ``store`` argument
+~~~~~~~~~~~~~~~~~~~~~~
+
+`_RefResolver`\ 's ``store`` argument was essentially the equivalent of `referencing.Registry`\ 's in-memory schema storage.
+
+If you currently pass a set of schemas via e.g.:
+
+.. code:: python
+
+ from jsonschema import Draft202012Validator, RefResolver
+ resolver = RefResolver.from_schema(
+ schema={"title": "my schema"},
+ store={"http://example.com": {"type": "integer"}},
+ )
+ validator = Draft202012Validator(
+ {"$ref": "http://example.com"},
+ resolver=resolver,
+ )
+ validator.validate("foo")
+
+you should be able to simply move to something like:
+
+.. code:: python
+
+ from referencing import Registry
+ from referencing.jsonschema import DRAFT202012
+
+ from jsonschema import Draft202012Validator
+
+ registry = Registry().with_resource(
+ "http://example.com",
+ DRAFT202012.create_resource({"type": "integer"}),
+ )
+ validator = Draft202012Validator(
+ {"$ref": "http://example.com"},
+ registry=registry,
+ )
+ validator.validate("foo")
+
+Handlers
+~~~~~~~~
+
+The ``handlers`` functionality from `_RefResolver` was a way to support additional HTTP schemes for schema retrieval.
+
+Here you should move to a custom ``retrieve`` function which does whatever you'd like.
+E.g. in pseudocode:
+
+.. code:: python
+
+ from urllib.parse import urlsplit
+
+ def retrieve(uri: str):
+ parsed = urlsplit(uri)
+ if parsed.scheme == "file":
+ ...
+ elif parsed.scheme == "custom":
+ ...
+
+ registry = Registry(retrieve=retrieve)
+
+
+Other Key Functional Differences
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Whilst `_RefResolver` *did* automatically retrieve remote references (against the recommendation of the spec, and in a way which therefore could lead to questionable security concerns when combined with untrusted schemas), `referencing.Registry` does *not* do so.
+If you rely on this behavior, you should follow the `above example of retrieving resources over HTTP <http>`.
diff --git a/docs/requirements.txt b/docs/requirements.txt
index 4dc78ed..1df64d8 100644
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -82,7 +82,7 @@ pytz==2022.7.1
# via babel
pyyaml==6.0
# via sphinx-autoapi
-referencing==0.18.6
+referencing==0.20.0
# via
# jsonschema
# jsonschema-specifications
@@ -113,7 +113,7 @@ sphinx-basic-ng==1.0.0b1
# via furo
sphinx-copybutton==0.5.1
# via -r docs/requirements.in
-sphinx-json-schema-spec==2.3.3
+sphinx-json-schema-spec==2023.2.2
# via -r docs/requirements.in
sphinxcontrib-applehelp==1.0.4
# via sphinx
diff --git a/docs/spelling-wordlist.txt b/docs/spelling-wordlist.txt
index a2c7d56..38eb537 100644
--- a/docs/spelling-wordlist.txt
+++ b/docs/spelling-wordlist.txt
@@ -10,6 +10,7 @@ callables
# non-codeblocked cls from autoapi
cls
deque
+deduplication
dereferences
deserialize
deserialized
@@ -30,6 +31,7 @@ online
outputter
pre
programmatically
+pseudocode
recurses
regex
repr
@@ -41,6 +43,7 @@ submodules
subschema
subschemas
subscopes
+untrusted
uri
validator
validators