Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Designating a dialect for custom metaschemas in 4.18 #1061

Open
eslavich opened this issue Mar 19, 2023 · 13 comments
Open

Designating a dialect for custom metaschemas in 4.18 #1061

eslavich opened this issue Mar 19, 2023 · 13 comments
Labels
Bug Something doesn't work the way it should.

Comments

@eslavich
Copy link

eslavich commented Mar 19, 2023

I'm working with a custom metaschema that is a superset of draft 4, is there/will there be a way to select the draft 4 dialect when creating a Validator class? Currently the create method is choosing the opaque dialect when it fails to recognize our metaschema's id:

https://github.com/python-jsonschema/jsonschema/blob/v4.18.0a1/jsonschema/validators.py#L187

and the referencing package doesn't appear to offer a sanctioned method for registering a new dialect id:

https://github.com/python-jsonschema/referencing/blob/v0.24.4/referencing/jsonschema.py#L544-L556

Is there another way to accomplish this that I'm missing? And if not, are you open to adding something like a default_specification argument to the create method?

By the way, thanks a lot for providing an alpha release, it's super helpful to be able to work through our issues ahead of time.

@Julian
Copy link
Member

Julian commented Mar 19, 2023

This will happen before the real release, probably this week! Thanks for indicating someone is paying attention :D

Expect an update in the next few days but it'll look basically like what you expect I hope!

@Julian
Copy link
Member

Julian commented Mar 19, 2023

(to be even more specific no changes should be required on your part though you certainly can choose to make some to get improved behavior, and also there will be a be another beta!)

@eslavich
Copy link
Author

Huzzah! I'll keep an eye out for it.

@Julian
Copy link
Member

Julian commented Apr 25, 2023

I want to clarify something that I didn't notice until now --

@eslavich are you specifically calling jsonschema.validators.create and not extend? I assumed (or misread) that you meant the latter (and were surprised that extending a validator didn't preserve its resolving behavior, which is why I labelled this a bug).

Are you instead talking about a totally unrelated validator/dialect you created with .create which you happened to define a keyword called $ref for?

@braingram
Copy link

I've been testing asdf with the new 4.18 and I think I might have a minimal example that illustrates why we need to define a specification (and the opaque specification results in resolution errors).

For asdf we define a meta-schema based off draft4 (the details probably aren't important but I'm happy to supply as much as you'd like). Since this metaschema is not registered with referencing.jsonschema._SPECIFICATIONS creating a validator with the metaschema results in an opaque specification and failures due to inability to resolve references.

Here's a minimal example that was compatible with 4.17:

import jsonschema

meta_schema = {
    "id": "https://example.com/yaml-schema/draft-01",
    "$schema": "http://json-schema.org/draft-04/schema#",
    "allOf": [{"$ref": "http://json-schema.org/draft-04/schema"}],
}

s0 = {
    "id": "http://example.com/foo",
    "$schema": "http://example.com/yaml-schema/draft-01#",
}

s1 = {
    "id": "http://example.com/bar",
    "$schema": "http://example.com/yaml-schema/draft-01#",
    "allOf": [{"$ref": "foo"}]
}

by_id = {s['id']: s for s in (meta_schema, s0, s1)}


def retrieve(uri):
    return by_id[uri]


handlers = {'http': retrieve}
resolver = jsonschema.validators.RefResolver(
    "", {}, cache_remote=False, handlers=handlers)
Validator = jsonschema.validators.create(
    meta_schema=meta_schema,
    type_checker=jsonschema.validators.Draft4Validator.TYPE_CHECKER,
    validators=jsonschema.validators.Draft4Validator.VALIDATORS,
    id_of=jsonschema.validators.Draft4Validator.ID_OF,
    format_checker=jsonschema.validators.Draft4Validator.FORMAT_CHECKER,
)
validator = Validator(s1, resolver=resolver)
validator.validate({})

When run with 4.17.3 this executes with no error. When run with 4.18.0 this shows the expected DeprecationWarning for RefResolver and errors out as follows:

/Users/bgraham/projects/230314_jsonschema_ref_resolver/tests/ref_resolution/03_ref.py:28: DeprecationWarning: jsonschema.RefResolver is deprecated as of v4.18.0, in favor of the https://github.com/python-jsonschema/referencing library, which provides more compliant referencing behavior as well as more flexible APIs for customization. A future release will remove RefResolver. Please file a feature request (on referencing) if you are missing an API for the kind of customization you need.
  resolver = jsonschema.validators.RefResolver(
Traceback (most recent call last):
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 1082, in resolve_from_url
    document = self.store[url]
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/_utils.py", line 20, in __getitem__
    return self.store[self.normalize(uri)]
KeyError: 'foo'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 1085, in resolve_from_url
    document = self.resolve_remote(url)
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 1189, in resolve_remote
    with urlopen(uri) as url:
  File "/Users/bgraham/.pyenv/versions/3.10.6/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/Users/bgraham/.pyenv/versions/3.10.6/lib/python3.10/urllib/request.py", line 503, in open
    req = Request(fullurl, data)
  File "/Users/bgraham/.pyenv/versions/3.10.6/lib/python3.10/urllib/request.py", line 322, in __init__
    self.full_url = url
  File "/Users/bgraham/.pyenv/versions/3.10.6/lib/python3.10/urllib/request.py", line 348, in full_url
    self._parse()
  File "/Users/bgraham/.pyenv/versions/3.10.6/lib/python3.10/urllib/request.py", line 377, in _parse
    raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: 'foo'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/bgraham/projects/230314_jsonschema_ref_resolver/tests/ref_resolution/03_ref.py", line 38, in <module>
    validator.validate({})
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 420, in validate
    for error in self.iter_errors(*args, **kwargs):
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 354, in iter_errors
    for error in errors:
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/_validators.py", line 335, in allOf
    yield from validator.descend(instance, subschema, schema_path=index)
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 402, in descend
    for error in errors:
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/_validators.py", line 284, in ref
    yield from validator._validate_reference(ref=ref, instance=instance)
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 447, in _validate_reference
    scope, resolved = resolve(ref)
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 1071, in resolve
    return url, self._remote_cache(url)
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 1087, in resolve_from_url
    raise exceptions._RefResolutionError(exc)
jsonschema.exceptions._RefResolutionError: unknown url type: 'foo'

If I modify the example to use referencing (am I doing this right?):

import jsonschema
import referencing

meta_schema = {
    "id": "https://example.com/yaml-schema/draft-01",
    "$schema": "http://json-schema.org/draft-04/schema#",
    "allOf": [{"$ref": "http://json-schema.org/draft-04/schema"}],
}

s0 = {
    "id": "http://example.com/foo",
    "$schema": "http://example.com/yaml-schema/draft-01#",
}

s1 = {
    "id": "http://example.com/bar",
    "$schema": "http://example.com/yaml-schema/draft-01#",
    "allOf": [{"$ref": "foo"}]
}

by_id = {s['id']: s for s in (meta_schema, s0, s1)}


def retrieve(uri):
    if uri in by_id:
        return referencing.Resource(by_id[uri], referencing.jsonschema.DRAFT4)
    raise referencing.exceptions.NoSuchResource(uri)


registry = referencing.Registry(retrieve=retrieve)
Validator = jsonschema.validators.create(
    meta_schema=meta_schema,
    type_checker=jsonschema.validators.Draft4Validator.TYPE_CHECKER,
    validators=jsonschema.validators.Draft4Validator.VALIDATORS,
    id_of=jsonschema.validators.Draft4Validator.ID_OF,
    format_checker=jsonschema.validators.Draft4Validator.FORMAT_CHECKER,
)
validator = Validator(s1, registry=registry)
validator.validate({})

The example fails with the following traceback:

Traceback (most recent call last):
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 432, in _validate_reference
    resolved = self._resolver.lookup(ref)
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/referencing/_core.py", line 588, in lookup
    raise exceptions.Unresolvable(ref=ref) from None
referencing.exceptions.Unresolvable: foo

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/bgraham/projects/230314_jsonschema_ref_resolver/tests/ref_resolution/02_ref.py", line 39, in <module>
    validator.validate({})
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 420, in validate
    for error in self.iter_errors(*args, **kwargs):
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 354, in iter_errors
    for error in errors:
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/_validators.py", line 335, in allOf
    yield from validator.descend(instance, subschema, schema_path=index)
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 402, in descend
    for error in errors:
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/_validators.py", line 284, in ref
    yield from validator._validate_reference(ref=ref, instance=instance)
  File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 434, in _validate_reference
    raise exceptions._WrappedReferencingError(err)
jsonschema.exceptions._WrappedReferencingError: Unresolvable: foo

@Julian
Copy link
Member

Julian commented Jul 6, 2023

Thanks, that's definitely helpful, I'll have a look more carefully in the morning, but just to be sure, why are you calling create and not extend there if you're simply trying to add some stuff to draft4?

@braingram
Copy link

Thanks for the quick response and for the good question. I have not tried swapping 'create' for 'extend'. A simple swap (and removing the id_of and meta_schema arguments) in asdf does not appear to work but I'm not quite sure why yet.

Is there a way to define a meta schema with extend?

@Julian
Copy link
Member

Julian commented Jul 7, 2023

Is there a way to define a meta schema with extend?

When I first added the API I mistakenly didn't add one, assuming that generally one wasn't going to change the metaschema -- if this is what's preventing you from using it I'm happy to add an argument for it, though otherwise I wasn't planning on it because eventually the entire API may need deprecating unfortunately due to the new "Vocabulary System" in newer drafts of JSON Schema (which mean that now there's some concept of groups of validators). But yeah if it's useful I can add it if it turns out there's some other reason your quick experiment didn't work.

Initially what you shared looks like a bug (at least inasmuch as the behavior should not change for RefResolver certainly) but will need to do some more diagnosis. Thanks again for the feedback, I definitely do want to make this work in a way that requires no hacks for you guys and is indisputably better than before.

@jpmckinney
Copy link

I'm currently using the suggestion in #994 (comment)

@Julian
Copy link
Member

Julian commented Jul 10, 2023

I don't think that's the same question, though I'm not 100% sure.

The discussion there is about users (perhaps like yourself) who are trying to change what it means to be "Draft 4".

This question here is about an explicit new draft which extends another -- i.e. a user who is properly specifying a different $schema URI -- I could be wrong though of course.

Julian added a commit that referenced this issue Jul 12, 2023
…rafts

We need a bit more state management to serve `RefResolver` until it's
fully removed :/

(The handler here is simply to avoid needing to hit some remote
reference.)

Refs: #1061 (comment)
@Julian
Copy link
Member

Julian commented Jul 12, 2023

@braingram this should be at least partially addressed with a bugfix in v4.18.1 (out in a few minutes).

Can you let me know how much progress that gets you?

Appreciated!

EDIT: To be clear, "this" is your example comment more so than the title of the issue (being able to create referencing.Specification objects and pass them in).

@braingram
Copy link

@Julian Thanks for the update!

I pulled down 4.18.1 and tested the two examples.

The one using the deprecated RefResolver now works on 4.18.1.

However the second example using referencing.Registry fails with the same error. Is this expected because the Resource is created with the Draft4 specification?

@Julian
Copy link
Member

Julian commented Jul 12, 2023

However the second example using referencing.Registry fails with the same error. Is this expected because the Resource is created with the Draft4 specification?

It has more to do with the original title of this issue, which I'm not sure yet how the best way to solve is (or well, I know a good way, but it involves touching this API even though it's likely not to be a good long term solution for other reasons, so I'm not sure yet whether there's some other one).

Specifically to explain the issue --

You have the schema

{
    "id": "http://example.com/bar",
    "$schema": "http://example.com/yaml-schema/draft-01#",
    "allOf": [{"$ref": "foo"}]
}

That $schema is saying "I am a schema that belongs to some version of JSON Schema identified by that URI http://example.com/yaml-schema/draft-01#", which of course you have invented.

jsonschema (the library) does not know what the referencing semantics are meant to be in your invented specification -- you of course want them to be "draft 4 semantics, probably with some additional keywords or something (otherwise you'd just use draft 4" -- but there's no way for the library to know that, it needs to be told that's the behavior you mean to have. So the example is failing quite simply because your schema's id keyword is saying "my ID is http://example.com/bar" but the library doesn't know that your specification uses the id keyword to represent schema identifiers.

One way of doing that would be, as you said, to have jsonschema.validators.create(...) take a specification argument and you'd need to provide a referencing.Specification object which implements the behavior you want (possibly by simply using referencing.jsonschema.DRAFT4 if you literally wanted exactly the draft 4 behavior with no changes).

But as I say adding that argument is not completely trivial -- if only because in the current API, you can provide an id_of function for specifying how schemas identify themselves, which it turned out is not enough information to know how a version of JSON Schema defines its referencing semantics -- in particular, though this gets complicated to explain, but part of what makes referencing so tedious is that each version defines which keywords contain subresources. So it's not enough to know id_of(yourversion) is "http://example.com/bar" -- the library also needs to know which keywords in your version may contain schemas, because those schemas may themselves contain subresources.

And now basically that means that the id_of argument needs deprecating, because referencing.Specification is really what's needed to define all of this.

So, tl;dr... yes I'm aware the second example doesn't work yet. If that's what's blocking you, I will consider adding the specification argument to jsonschema.validators.create, but I want to both think a bit harder on whether there's another possible solution, as well as have to carefully add that in a way that doesn't make it really easy to define a conflicting referencing.Specification and id_of argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something doesn't work the way it should.
Projects
None yet
Development

No branches or pull requests

4 participants