Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vega-lite JSON schema validation fails with uri-reference errors #2794

Closed
dechamps opened this issue Jan 1, 2023 · 5 comments
Closed

Vega-lite JSON schema validation fails with uri-reference errors #2794

dechamps opened this issue Jan 1, 2023 · 5 comments
Labels

Comments

@dechamps
Copy link

dechamps commented Jan 1, 2023

Steps to reproduce:

virtualenv venv &&
cd venv &&
bin/pip install altair rfc3986-validator &&
bin/python <<EOF
import altair as alt
print(alt.Chart().mark_line().properties(usermeta={}).to_json())
EOF

With altair-4.2.0, the above fails with:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "…/venv/lib/python3.10/site-packages/altair/vegalite/v4/api.py", line 588, in properties
    self.validate_property(key, val)
  File "…/venv/lib/python3.10/site-packages/altair/utils/schemapi.py", line 464, in validate_property
    return jsonschema.validate(value, props.get(name, {}), resolver=resolver)
  File "…/venv/lib/python3.10/site-packages/jsonschema/validators.py", line 1117, in validate
    cls.check_schema(schema)
  File "…/venv/lib/python3.10/site-packages/jsonschema/validators.py", line 231, in check_schema
    raise exceptions.SchemaError.create_from(error)
jsonschema.exceptions.SchemaError: '#/definitions/Dict<unknown>' is not a 'uri-reference'

Failed validating 'format' in metaschema['allOf'][0]['properties']['$ref']:
    {'format': 'uri-reference', 'type': 'string'}

On schema['$ref']:
    '#/definitions/Dict<unknown>'

But here's the twist: if you remove rfc3986-validator from the pip install command in the above repro, it works!

As you can imagine this is a bit of an head-scratcher. Here's what's going on:

jsonschema fails to validate the Vega-lite schema. To be clear, the problem is not the vega-lite output - it's the schema itself that's invalid, because it contains $ref values that are not valid uri-references. In the example above, the $ref is #/definitions/Dict<unknown> which is invalid because the < and > characters are not allowed in a RFC 3986 URI reference.

Even though the schema is invalid, that goes unnoticed most of the time because jsonschema only validates uri-references if the rfc3986-validator or rfc3987 package is installed! This explains why Altair seems to work fine if these packages are missing.

You may wonder how I ended up in this situation. Well this is where things gets a bit worrying, because I ended up triggering this latent bug simply by installing jupyter! This is because, a few months ago, jupyter_events (which jupyter depends on) added a dependency on jsonschema[format-nongpl] (see jupyter/jupyter_events@decd0ec), which in turns pulls rfc3986-validator (figuring out that dependency chain was surprisingly hard - see pypa/pip#11683). Hilarity ensues, with the somewhat mind-blowing outcome that Altair is broken when a recent jupyter is also installed.

One workaround is to uninstall rfc3986-validator right after installing the package that pulled it (e.g. jupyter):

pip uninstall rfc3986-validator
@mattijn
Copy link
Contributor

mattijn commented Jan 1, 2023

Ouch! What a trip.. Not nice that jupyter cause a break of Altair now. Luckily you found, the hard way, a method to make it work again🫥.
There is a PR waiting that circumvent this in Altair: #2771.
But as you said, since the culprit seems to come from the vega-lite schema itself it would be nice to get some feedback of them as well, to double check if this is really the case that the vega-lite schema is the source of this. To make sure we are fixing the right things. Cc: @domoritz.

@dechamps
Copy link
Author

dechamps commented Jan 1, 2023

Thanks. I naively thought I was the first one to uncover this because my Google searches came back empty, but as you pointed out it looks like similar issues are already being discussed in #2705, #2767 and #2771. In particular @binste already figured out the problem, including the tricky interaction with jsonschema dependencies. One thing that I didn't see being brought up is the fact that this breaks latest Altair when installed simultaneously with latest Jupyter, due to the recent addition of the jsonschema[format-nongpl] dependency to jupyter_events.

Sorry about the duplicate report. Hopefully, at some point there will be enough of these that they will turn up in Google searches :/

dechamps added a commit to dechamps/LoudspeakerExplorer that referenced this issue Jan 1, 2023
dechamps added a commit to dechamps/LoudspeakerExplorer that referenced this issue Jan 1, 2023
@mattijn
Copy link
Contributor

mattijn commented Jan 1, 2023

Yes that is new information, also your suggestion how you can circumvent it with the latest jsonschema installed.
Since there are multiple issues raised in a short period on the same problem by different persons and if it is accurate that Altair is (auto)installed around 300k on a daily basis than I’m not surprised if ~2k users are facing this issue currently*.

  • scientific references on this are welcome

@joelostblom
Copy link
Contributor

Thanks for reporting this in such detail. It is indeed quite worrying that an update to Jupyter breaks Altair and the large amount of people that would be affected by that. Happy to see that there is already a PR in progress for this.

And @mattijn it seems like you are correct that Altair is installed around 300-400k times daily, and most of it seems to be automatic installs judging from that Linux is the dominating platform.

0237h added a commit to 0237h/code4rena-scraper that referenced this issue Jan 3, 2023
@mattijn
Copy link
Contributor

mattijn commented Jan 4, 2023

Closed #2794 as completed via #2771.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants