Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No format validation for uri-reference #220

Closed
binste opened this issue Jan 7, 2023 · 5 comments
Closed

No format validation for uri-reference #220

binste opened this issue Jan 7, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@binste
Copy link

binste commented Jan 7, 2023

Hi @sirosen, I'm trying out the suggestion you made here vega/altair#2771 (comment) for the Altair repository (related Altair issue vega/altair#2801). The vega-lite schema I'm validating has an invalid uri-reference format and so I'd expect the validation to fail. This is the case if I use jsonschema

import json
import jsonschema
with open("altair/vegalite/v5/schema/vega-lite-schema.json", "r") as f:
    schema = json.load(f)

jsonschema.Draft7Validator.check_schema(schema)
ValueError: '#/definitions/FieldOrDatumDefWithCondition<MarkPropFieldDef<TypeForShape>,(string|null)>' is not a valid 'URI_reference'.

But not if I use check-jsonschema:

check-jsonschema --check-metaschema altair/vegalite/v5/schema/vega-lite-schema.json
# ok -- validation done

I only found this reference in the docs on format validation which mentions that it is enabled by default. Any pointers towards the cause of this would be greatly appreciated :) In a next step, I'd then actually need to disable that specific uri-reference check which I might do by uninstalling the relevant format package in the CI but I first wanted to make sure that check-jsonschema actually runs the format checks.

Versions:

  • check-jsonschema: 0.20.0
  • jsonschema: 4.17.3 (installed as pip install 'jsonschema[format])
@sirosen sirosen added the bug Something isn't working label Jan 7, 2023
@sirosen
Copy link
Member

sirosen commented Jan 7, 2023

I just did a little bit of work to validate this, but this is a bug! Thanks for surfacing it.

The main issue is that the validator is being built without a format checker.
check-jsonschema has some custom code for building format checkers (to support --disable-format and --format-regex), so it's not a totally trivial change, but it's pretty simple.

I have a patch which I tried and which gets errors from the schema. It also fixes a slight (I think not significant) mistake in how the validator class gets selected:

--- a/src/check_jsonschema/schema_loader/main.py
+++ b/src/check_jsonschema/schema_loader/main.py
@@ -168,5 +168,12 @@ class MetaSchemaLoader(SchemaLoaderBase):
         format_opts: FormatOptions,
         fill_defaults: bool,
     ) -> jsonschema.Validator:
-        validator = jsonschema.validators.validator_for(instance_doc)
-        return t.cast(jsonschema.Validator, validator(validator.META_SCHEMA))
+        schema_validator = jsonschema.validators.validator_for(instance_doc)
+        meta_validator_class = jsonschema.validators.validator_for(schema_validator.META_SCHEMA, default=schema_validator)
+
+        # format checker (which may be None)
+        meta_schema_dialect = schema_validator.META_SCHEMA.get("$schema")
+        format_checker = make_format_checker(format_opts, meta_schema_dialect)
+
+        meta_validator = meta_validator_class(schema_validator.META_SCHEMA, format_checker=format_checker)
+        return meta_validator

I was also able to add an option (I'll need to think more about the naming and interface) which disables specific format checks. If you want, I could pursue it, with usage something like this:

$ check-jsonschema --check-metaschema vega-schema.json --disable-formats uri-reference  
ok -- validation done

I think that might appeal to you, as it is more direct than installing jsonschema[format] but then carefully removing the URI library (rfc3987).

@binste
Copy link
Author

binste commented Jan 7, 2023

Thanks for looking into this so quickly! Indeed, I'd definitely prefer a syntax like --disable-formats over having to remove the URI library. If this is possible, it would be great :)

sirosen added a commit that referenced this issue Jan 20, 2023
Metaschema checking was not building the validator correctly.
Primarily two fixes are applied here:
- the metaschema's schema dialect is no longer copied from the schema
  being checked, but is taken from the metaschema document
- metaschema checking now automatically includes format checking and
  applies the CLI parameters for format checking (including the
  ability to disable format checking)

Add test for an invalid format under metaschema. This test requires
one of the format checking libraries, and therefore drives the need
for new features in the testsuite, including the addition of config
for the example file tests. Example file config is schema-validated,
which serves as another dogfooding self-check.

The test file imitates the driving use-case in #220

The acceptance test definition is refactored to make managing the test
data a little easier.
@sirosen
Copy link
Member

sirosen commented Jan 20, 2023

It's been difficult to make time to work on this more, but I have a fix in flight for this, which I'm putting through its paces in CI right now. To separate the issues and not lose track, I'll write down a separate ticket for --disable-formats.

sirosen added a commit that referenced this issue Jan 20, 2023
Metaschema checking was not building the validator correctly.
Primarily two fixes are applied here:
- the metaschema's schema dialect is no longer copied from the schema
  being checked, but is taken from the metaschema document
- metaschema checking now automatically includes format checking and
  applies the CLI parameters for format checking (including the
  ability to disable format checking)

Add test for an invalid format under metaschema. This test requires
one of the format checking libraries, and therefore drives the need
for new features in the testsuite, including the addition of config
for the example file tests. Example file config is schema-validated,
which serves as another dogfooding self-check.

The test file imitates the driving use-case in #220

The acceptance test definition is refactored to make managing the test
data a little easier.
sirosen added a commit that referenced this issue Jan 20, 2023
* Simplify: remove `make_validator`

This was an intermediate method which doesn't appear to be necessary,
and leverages caching behavior which is never used. Remove it and move
the body into `get_validator` (its only caller).

* Minor cleanup to acceptance test naming

* Fix how '--check-metaschema' builds a validator

Metaschema checking was not building the validator correctly.
Primarily two fixes are applied here:
- the metaschema's schema dialect is no longer copied from the schema
  being checked, but is taken from the metaschema document
- metaschema checking now automatically includes format checking and
  applies the CLI parameters for format checking (including the
  ability to disable format checking)

Add test for an invalid format under metaschema. This test requires
one of the format checking libraries, and therefore drives the need
for new features in the testsuite, including the addition of config
for the example file tests. Example file config is schema-validated,
which serves as another dogfooding self-check.

The test file imitates the driving use-case in #220

The acceptance test definition is refactored to make managing the test
data a little easier.
@sirosen
Copy link
Member

sirosen commented Jan 24, 2023

I've just published a version with the fix for this (v0.21.0). So please let me know if it doesn't work for you!

I haven't gotten to --disable-formats support yet, but per #231 I think it will play well into cleaning up the interface a bit, so I want to do it when I get a chance.

@sirosen sirosen closed this as completed Jan 24, 2023
@binste
Copy link
Author

binste commented Jan 25, 2023

Thanks @sirosen! I can confirm that the validation of the schema now fails as expected based on the wrong URI formats. I'll wait with implementing this in Altair until you got a chance to look into #231 but no stress! :) Appreciate your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants