Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsonschema should not change its behavior depending on which packages are installed #1035

Closed
dechamps opened this issue Jan 1, 2023 · 4 comments

Comments

@dechamps
Copy link

dechamps commented Jan 1, 2023

According to the docs, depending on which packages are installed, jsonschema will validate certain formats but not others.

I would like to suggest that changing the behavior of a piece of code depending on what packages appear to be present (for whatever reason) in the current environment might not be a good idea, because it opens up potential for "spooky action at a distance" where a seemingly unrelated change to the environment can suddenly cause failures that can be pretty difficult to understand because they seem to come out of nowhere.

For example, consider the case of two packages: altair and jupyter, which are frequently used together but are otherwise unrelated. jupyter_events, a dependency of jupyter, started depending on jsonschema[format-nongpl] a few months ago in jupyter/jupyter_events@decd0ec. This caused rfc3986-validator to be installed, which in turn changed the behavior of jsonschema to start validating uri-references. This had the surprising consequence of turning a benign schema error in Altair (which also pulls jsonschema, but without [format-nongpl]) into a serious problem that suddenly broke my code (see vega/altair#2794). It should not be possible for a change to the dependency spec of package A to cause a breaking change in behavior in completely unrelated package B.

I would suggest that jsonschema users should be required to explicitly list in caller code which format validators they want to use, instead of having validators magically "pop up" as soon as some package is installed.

@Julian
Copy link
Member

Julian commented Jan 1, 2023

It sounds like the schema upstream was invalid -- if anything that's a reason format validation should be more strict is it not? jsonschema has no way to make judgements on how benign or not the mistake in the schema is.

It should not be possible for a change to the dependency spec of package A to cause a breaking change in behavior in completely unrelated package B.

I suspect you'll find many other cases of this unfortunately -- optionally enabling functionality when another package is installed is not uncommon in Python, and whilst I agree it can lead to this kind of annoying downstream issue, you're pinning your dependencies right? So you should at least not be encountering it until you've manually made a change that should tip you off.

Regardless though, this behavior is essentially constrained partly by the specification, which says that format validation is off by default, but even when on, unknown formats must be ignored. -- and partly by backwards compatiblity, where this behavior has worked like this forever. I agree that stricter behavior would be less error prone, but it's only now that $vocabulary exists in the most recent JSON Schema versions that there's a spec-compliant way to get that behavior -- but $vocabulary is likely to change in the next version, so even targeting it with a Python API is taking a guess on how it'll turn out (one I regardless will likely do).

In short: yes the behavior around format isn't ideal, but there isn't really a good alternative that's backwards compatible, spec compliant, and future-proof.

@dechamps
Copy link
Author

dechamps commented Jan 1, 2023

Pinging @binste as he's the one working on a fix for Altair in vega/altair#2771. Several people got bit by this before I was, apparently.

format validation should be more strict

I don't disagree. The problem is not how strict the validation is. The problem is that the level of strictness suddenly changed for no apparent reason just because some package happened to be installed on the system.

The level of strictness should be under the control of the caller code, not a side effect of the package manager.

optionally enabling functionality when another package is installed is not uncommon in Python

I'll admit I'm not that familiar with what's idiomatic in the Python ecosystem, but my understanding is that, usually, something starts working when a package is installed, which seems fine. Something breaking because a package is installed is way more objectionable. Because of the way jsonschema is used, increasing strictness because a package happens to be present is likely to do more harm than good in many situations.

Regardless though, this behavior is essentially constrained partly by the specification, which says that format validation is off by default, but even when on, unknown formats must be ignored.

I don't see how that relates to the issue at hand? What I'm suggesting is a change in the jsonschema user API where validators have to be explicitly specified instead of popping up by surprise through the set of installed packages. It's a change in the details of the mechanics of how jsonschema format validators are enabled - what does that has to do with the spec, and what does that has to do with ignoring unknown formats?

@Julian
Copy link
Member

Julian commented Jan 1, 2023

I'll admit I'm not that familiar with what's idiomatic in the Python ecosystem, but my understanding is that, usually, something starts working when a package is installed, which seems fine.

This library is a validation library, it's meant to help developers catch problems -- "starts working" in this context is precisely "raises errors", the definitions are backwards from the usual ones --a buggy schema which was silently passing suddenly started raising errors it sounds like.

What I'm suggesting is a change in the jsonschema user API where validators have to be explicitly specified instead of popping up by surprise through the set of installed packages.

I don't think this is an accurate characterization. Right now, other than for schema validation (your case), format validation is opt-in already. For schema validation, indeed it's on by default if packages are installed, and otherwise off. In general, I don't think users want to be manually configuring format validation for schema validation, certainly not in this specific case, they likely don't care or even know what formats are used in the metaschema, they just want to know whether their schema is correct or not, where here it seems it wasn't.

What I was saying about the spec was that if the spec didn't mandate that format be off by default for the general case (i.e. for data, not schema validation), then it'd be even more critical to essentially always install the format dependencies, and you likely would have noticed the invalid schema even earlier.

The level of strictness should be under the control of the caller code, not a side effect of the package manager.

The level of strictness by the way is indeed under your control, you're free to turn it on or off by not enabling format validation, if you indeed want to keep the buggy schema as-is.

@Julian
Copy link
Member

Julian commented Jan 4, 2023

This is being handled upstream it seems. The general behavior as I mentioned I don't think can or will change -- format validation in general will stay opt-in, but in the case of invalid schemas, it's currently "best effort", which indeed may change with the set of installed packages, unfortunate as that might be. If format dependencies become mandatory, which also seems unlikely given some of them are not trivial deps, that may change, but given I'm not sure there's anything else to do here, going to close. Additional thoughts are of course still welcome regardless, if I'm indeed still missing the point.

@Julian Julian closed this as completed Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants