Breaking reference cycles during validation #1062

eslavich · 2023-03-19T21:30:28Z

As you may know, over in asdf we have a history of taking outrageous liberties with jsonschema's implementation details. Thanks to some new ideas from @braingram and the changes coming in 4.18, we're going to be able to eliminate most of that and use public interfaces instead.

One remaining problem we need to solve is how to break reference cycles during validation. For example, validating this document leads to RecursionError:

from referencing import Registry, Resource
import jsonschema

schema = {
    "$schema": "http://json-schema.org/draft-04/schema#",
    "id": "http://example.com/reference-cycle",
    "type": "object",
    "properties": {
        "foo": {"$ref": "#"}
    }
}

instance = {}
instance["foo"] = instance

resource = Resource.from_contents(schema)
registry = resource @ Registry()
validator = jsonschema.Draft4Validator(schema, registry=registry)

validator.validate(instance)

My best idea so far is to replace the validator methods with our own doctored up versions. We might replace the "properties" method with something like this:

seen = {}

def properties(validator, properties, instance, schema):
    if not validator.is_type(instance, "object"):
        return

    for property, subschema in properties.items():
	if property in instance:
            key = (id(instance[property]), id(subschema))
            if key in seen:
                for error in list(seen[key]):
                    yield ValidationError(error.message)
            else:
                errors = []
                seen[key] = errors
	        for error in validator.descend(
                    instance[property],
	            subschema,
                    path=property,
                    schema_path=property,
                ):
                    errors.append(error)
                    yield error

(and store that seen variable somewhere we can reliably clear it after each validation)

Is that a reasonable strategy? In doing so are we using anything that should not be considered a public API?

The text was updated successfully, but these errors were encountered:

Julian · 2023-03-20T16:04:49Z

Just acknowledging that I see this :) -- it'll probably take me a few days at very least to look at it more carefully. I will say straight off the bat that I know that one thing downstream folks, probably like ASDF very specifically, want/need is an API for doing stateful things during validation. This is very "unlike" what JSON Schema "official" specifications do certainly historically (though that's changed more recently, which I personally find quite unfortunate :/ but alas...), but which is why such a thing is not as easy as it could be -- providing some official public APIs for doing such a thing is something I definitely want to do at some point but it's careful work to not break other existing code I think.

Anyways, yeah, I need to get a few more things in progress for the upcoming release (like #1061) but I'll try to give you some advice here sometime soon.

Julian · 2023-04-12T00:44:02Z

So I've had a quick look at this (been more than a few days, but alas).

That's quite an odd instance :). Your instance is infinite, so I'd definitely expect you to have to go out of your way to support it. Such a thing is certainly beyond any behavior of JSON Schema the specification, JSON has no such thing as infinite data, so the spec doesn't really say anything about what the behavior should be there, and therefore of course as you notice, the properties keyword which implements the spec doesn't deal with such a thing.

I can certainly understand if you extend the spec's behavior to such data, that you'd want the result you're aiming for.

It's sort of more an induction-y kind of thing than breaking reference cycles, as even though I haven't thought too hard, my suspicion is you could have some set of mutually referential data with some more complex relationship in your schema and it will not be completely obvious to prove that just because you've validated the first time through each one that that means recursively the schema is valid. But I'm choosing not to think too hard :), I don't know precisely what you're trying to make work in general --

Anyways if it's just infinitely nested exact copies of data, the basic strategy of caching which instances are valid under a schema probably is good enough, yeah -- I'm also not really thinking hard about which validators you need to override / extend, but that API is of course public, so find which ones you need to override and define them similar to how you have that properties validator and you should indeed be OK.

(n.b. if in X JSON Schema versions the spec adds a crazy leafCrazyValidator keyword that says "valid if your instance has a 37 in its deepest key and invalid otherwise, your library is going to have a bad time with these instances, but hopefully that's unlikely anyhow)

(And n.b. 2, as a Python programmer I would never really call id in "real code", so stylistically I'd more do what you did by using immutable objects for your schema and instance, and then you can directly cache the validation result even in lru_cache or whatever, but of course it's your party in the end :) -- if you're going to do naughty things like that you can use one of the bad stdlib types -- WeakKeyDictionary probably).

But yeah that seems the safest today I think, if you really mean to support such a thing.

Julian added the Enhancement Some new desired functionality label Mar 21, 2023

braingram mentioned this issue Mar 21, 2023

Attempt to use only public jsonschema interfaces asdf-format/asdf#1490

Closed

Julian removed the Enhancement Some new desired functionality label Apr 12, 2023

python-jsonschema locked and limited conversation to collaborators Apr 12, 2023

Julian converted this issue into discussion #1084 Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Breaking reference cycles during validation #1062

Breaking reference cycles during validation #1062

eslavich commented Mar 19, 2023 •

edited

Julian commented Mar 20, 2023

Julian commented Apr 12, 2023

This issue was moved to a discussion.

This issue was moved to a discussion.

Breaking reference cycles during validation #1062

Breaking reference cycles during validation #1062

Comments

eslavich commented Mar 19, 2023 • edited

Julian commented Mar 20, 2023

Julian commented Apr 12, 2023

This issue was moved to a discussion.

eslavich commented Mar 19, 2023 •

edited