Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking reference cycles during validation #1062

Closed
eslavich opened this issue Mar 19, 2023 · 2 comments
Closed

Breaking reference cycles during validation #1062

eslavich opened this issue Mar 19, 2023 · 2 comments

Comments

@eslavich
Copy link

eslavich commented Mar 19, 2023

As you may know, over in asdf we have a history of taking outrageous liberties with jsonschema's implementation details. Thanks to some new ideas from @braingram and the changes coming in 4.18, we're going to be able to eliminate most of that and use public interfaces instead.

One remaining problem we need to solve is how to break reference cycles during validation. For example, validating this document leads to RecursionError:

from referencing import Registry, Resource
import jsonschema

schema = {
    "$schema": "http://json-schema.org/draft-04/schema#",
    "id": "http://example.com/reference-cycle",
    "type": "object",
    "properties": {
        "foo": {"$ref": "#"}
    }
}

instance = {}
instance["foo"] = instance

resource = Resource.from_contents(schema)
registry = resource @ Registry()
validator = jsonschema.Draft4Validator(schema, registry=registry)

validator.validate(instance)

My best idea so far is to replace the validator methods with our own doctored up versions. We might replace the "properties" method with something like this:

seen = {}

def properties(validator, properties, instance, schema):
    if not validator.is_type(instance, "object"):
        return

    for property, subschema in properties.items():
	if property in instance:
            key = (id(instance[property]), id(subschema))
            if key in seen:
                for error in list(seen[key]):
                    yield ValidationError(error.message)
            else:
                errors = []
                seen[key] = errors
	        for error in validator.descend(
                    instance[property],
	            subschema,
                    path=property,
                    schema_path=property,
                ):
                    errors.append(error)
                    yield error

(and store that seen variable somewhere we can reliably clear it after each validation)

Is that a reasonable strategy? In doing so are we using anything that should not be considered a public API?

@Julian
Copy link
Member

Julian commented Mar 20, 2023

Just acknowledging that I see this :) -- it'll probably take me a few days at very least to look at it more carefully. I will say straight off the bat that I know that one thing downstream folks, probably like ASDF very specifically, want/need is an API for doing stateful things during validation. This is very "unlike" what JSON Schema "official" specifications do certainly historically (though that's changed more recently, which I personally find quite unfortunate :/ but alas...), but which is why such a thing is not as easy as it could be -- providing some official public APIs for doing such a thing is something I definitely want to do at some point but it's careful work to not break other existing code I think.

Anyways, yeah, I need to get a few more things in progress for the upcoming release (like #1061) but I'll try to give you some advice here sometime soon.

@Julian
Copy link
Member

Julian commented Apr 12, 2023

So I've had a quick look at this (been more than a few days, but alas).

That's quite an odd instance :). Your instance is infinite, so I'd definitely expect you to have to go out of your way to support it. Such a thing is certainly beyond any behavior of JSON Schema the specification, JSON has no such thing as infinite data, so the spec doesn't really say anything about what the behavior should be there, and therefore of course as you notice, the properties keyword which implements the spec doesn't deal with such a thing.

I can certainly understand if you extend the spec's behavior to such data, that you'd want the result you're aiming for.

It's sort of more an induction-y kind of thing than breaking reference cycles, as even though I haven't thought too hard, my suspicion is you could have some set of mutually referential data with some more complex relationship in your schema and it will not be completely obvious to prove that just because you've validated the first time through each one that that means recursively the schema is valid. But I'm choosing not to think too hard :), I don't know precisely what you're trying to make work in general --

Anyways if it's just infinitely nested exact copies of data, the basic strategy of caching which instances are valid under a schema probably is good enough, yeah -- I'm also not really thinking hard about which validators you need to override / extend, but that API is of course public, so find which ones you need to override and define them similar to how you have that properties validator and you should indeed be OK.

(n.b. if in X JSON Schema versions the spec adds a crazy leafCrazyValidator keyword that says "valid if your instance has a 37 in its deepest key and invalid otherwise, your library is going to have a bad time with these instances, but hopefully that's unlikely anyhow)

(And n.b. 2, as a Python programmer I would never really call id in "real code", so stylistically I'd more do what you did by using immutable objects for your schema and instance, and then you can directly cache the validation result even in lru_cache or whatever, but of course it's your party in the end :) -- if you're going to do naughty things like that you can use one of the bad stdlib types -- WeakKeyDictionary probably).

But yeah that seems the safest today I think, if you really mean to support such a thing.

@Julian Julian removed the Enhancement Some new desired functionality label Apr 12, 2023
@python-jsonschema python-jsonschema locked and limited conversation to collaborators Apr 12, 2023
@Julian Julian converted this issue into discussion #1084 Apr 12, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants