Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for draft 2020-12 #817

Merged
merged 27 commits into from Aug 4, 2021
Merged

Add support for draft 2020-12 #817

merged 27 commits into from Aug 4, 2021

Conversation

nezhar
Copy link
Contributor

@nezhar nezhar commented Jun 14, 2021

No description provided.

@Julian
Copy link
Member

Julian commented Jun 15, 2021

Hello there!

Thanks very much for this, at first glance it looks like it's great, definitely in the right ballpark.

The first thing I think we'll need to address are the changes to the json/tests folder -- those come from the upstream JSON-Schema-Test-Suite (which I also help co-maintain with others from cross-language communities).

If there are gaps you noticed there, we need to basically take them upstream (and then we'll be able to pull those changes into that folder, which is a git subtree).

Could you perhaps consider pulling those out as PRs to that repository? Ideally in self-contained chunks for each piece.

I'll have to look more carefully at the changes to give more specific comments, but from my first skim it seemed exactly like what I'd expect.

Let's see whether CI passes as well, which I just manually enabled.

And thanks again!

@nezhar
Copy link
Contributor Author

nezhar commented Jun 15, 2021

Thanks 🙂, good to know it's going the right way.

Sure, I can remove that commit and create a dedicated PR if this fits better.
I usually use a git submodule to include another repository, this can be also done with https://github.com/json-schema-org/JSON-Schema-Test-Suite. What do you think?

The implementation is still in progress, the curent state is of the draft2020-12 tests is 1334 total, 168 failed, 111 ignored, 1055 passed. The ignored tests should lower once the missing format validations are implemented.

I added isodate new dependency for the duration validation, but it seems that the lib does not cover some cases: https://github.com/Julian/jsonschema/pull/817/files#diff-0953f1b9ffe16c7d4aa18ae8bf21287c552dcd9b241f66fb51e099724af8b722R616. Any sugestion here? There seems no beter implementation of the standard. My idea was to cover the missing cases innside the validation function as the library had no release since 2017.

@Julian
Copy link
Member

Julian commented Jun 15, 2021

Sure, I can remove that commit and create a dedicated PR if this fits better.
I usually use a git submodule to include another repository, this can be also done with https://github.com/json-schema-org/JSON-Schema-Test-Suite. What do you think?

That's already the case here functionally, that directory is a git-subtree (https://www.atlassian.com/git/tutorials/git-subtree), which is essentially a better version of git submodules.

The implementation is still in progress, the curent state is of the draft2020-12 tests is 1334 total, 168 failed, 111 ignored, 1055 passed. The ignored tests should lower once the missing format validations are implemented.

Cool! That sounds promising, and yeah matches my experience when implementing earlier drafts.

Any sugestion here? There seems no beter implementation of the standard. My idea was to cover the missing cases innside the validation function as the library had no release since 2017.

I'll have to do some research myself. If we can't find something, the alternative is to possibly just not support the format. I don't really want to maintain a ton of format-specific code internally in the library. But let me see what I can find myself.

@tschmidtb51
Copy link

I added isodate new dependency for the duration validation

Not quite sure, but could isoduration do the job?

@nezhar
Copy link
Contributor Author

nezhar commented Jun 18, 2021

I added isodate new dependency for the duration validation

Not quite sure, but could isoduration do the job?

Already tried isoduration, but the implementation seems to be worst than the one in isodate, so I decied to go back as there are only some cases that are considered invalid in the tests, but are parsed by isodate. They are collected in duration_format_validation.

@nezhar
Copy link
Contributor Author

nezhar commented Jun 18, 2021

Inside ecmascript_regex_validation there are several skips related to ecmascript-regex tests that where introduced with draft2020-12. I saw there was an issue once related to this topic: #612 and js_regex tried to solve such issues due to the imcompatiblity of implementation of regular expression across languages.

The package was latter removed due to Zac-HD/js-regex#4 and now the repository is archived. Are there any plans to cover this?

@nezhar
Copy link
Contributor Author

nezhar commented Jun 18, 2021

Some tests are skiped in format_validation_annotation as the format validation is a bit confusing. Is there any dirference beetween https://github.com/Julian/jsonschema/blob/main/json/tests/draft2020-12/format.json#L775 and https://github.com/Julian/jsonschema/blob/main/json/tests/draft2020-12/optional/format/duration.json#L12? In terms of testing they both define the same schema and use the same data instance, but expect diferent results.

@tschmidtb51
Copy link

tschmidtb51 commented Jun 18, 2021

Some tests are skiped in format_validation_annotation as the format validation is a bit confusing. Is there any dirference beetween https://github.com/Julian/jsonschema/blob/main/json/tests/draft2020-12/format.json#L775 and https://github.com/Julian/jsonschema/blob/main/json/tests/draft2020-12/optional/format/duration.json#L12? In terms of testing they both define the same schema and use the same data instance, but expect diferent results.

If I remember correctly that is due to the new rules/bahavior for Format Vocabulary (see the description in 2019-09).

@nezhar
Copy link
Contributor Author

nezhar commented Jun 18, 2021

I added isodate new dependency for the duration validation

Not quite sure, but could isoduration do the job?

Already tried isoduration, but the implementation seems to be worst than the one in isodate, so I decied to go back as there are only some cases that are considered invalid in the tests, but are parsed by isodate. They are collected in duration_format_validation.

Tried isoduration and it seems to work well, maybe I defined the wrong exepction in the raises param at the first run. 7a63ea3 is now switching form isodate to isoduation.

@Julian
Copy link
Member

Julian commented Jun 18, 2021

The package was latter removed due to Zac-HD/js-regex#4 and now the repository is archived. Are there any plans to cover this?

It's not a priority for me personally at the minute -- if a library comes along in the Python ecosystem that provides JS regexes, it'll be considered, but yeah there are way more important things to address (including this PR!)

Copy link
Contributor

@robherring robherring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave your branch a spin (using https://github.com/devicetree-org/dt-schema) and found a few issues.

Ultimately, 202012 is too incompatible for me to test this further easily. I'm primarily interested in 'unevaluatedProperties'. The big issue is 'items' lists have been replaced with prefixItems and that's used everywhere. So I need 201909 support. Is that something you plan?

jsonschema/validators.py Show resolved Hide resolved
jsonschema/validators.py Outdated Show resolved Hide resolved
jsonschema/validators.py Outdated Show resolved Hide resolved
Copy link
Contributor

@robherring robherring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to hack in 2019-09 support and test out 'unevaluatedProperties'. Works well for me, but a few comments.

jsonschema/_validators.py Outdated Show resolved Hide resolved
jsonschema/_validators.py Show resolved Hide resolved
jsonschema/_validators.py Outdated Show resolved Hide resolved
@Julian
Copy link
Member

Julian commented Jun 24, 2021

@nezhar I pulled in the subtree changes from the test suite.

If you rebase or merge your branch, it should now remove those changes from this PR, so any outstanding changes to the json/ folder (i.e. changes you made to tests) we should take to the test suite repo.

@bolsote
Copy link

bolsote commented Jun 24, 2021

@nezhar Hello, I'm the author of isoduration. First of all, thank you for picking the library.

Since there's not been much activity recently, I'd like to mention I'm currently working on these two issues:

  • Solve all isodate issues bolsote/isoduration#9: In this one we are making sure all known isodate issues do not affect isoduration. The test suite in isoduration should be a strict superset of that of isodate. If you find it is not, please report it, as that would be a bug.
  • Properly support decimal points bolsote/isoduration#5: As of right now we are not really ISO/DIS 8601-1 compliant, as you can have decimal points at any place, while they are only accepted at the very last element of each segment. This attempts to fix it.

The first of those issues already has a PR attached, adding a few tests. No functional changes. The second one will have a PR soon (most likely tomorrow). I've reserved a healthy chunk of time in the next few weeks to solve all other issues, particularly the one related to supporting repeating intervals.

I also understand there were some teething troubles while integrating the library, maybe due to unexpected interfaces? If so, please do let me know, and we can see if the situation can be improved.

jsonschema/validators.py Outdated Show resolved Hide resolved
jsonschema/validators.py Outdated Show resolved Hide resolved
jsonschema/_legacy_validators.py Outdated Show resolved Hide resolved
jsonschema/_format.py Outdated Show resolved Hide resolved
@nezhar
Copy link
Contributor Author

nezhar commented Jun 25, 2021

@nezhar Hello, I'm the author of isoduration. First of all, thank you for picking the library.

Since there's not been much activity recently, I'd like to mention I'm currently working on these two issues:

  • Solve all isodate issues bolsote/isoduration#9: In this one we are making sure all known isodate issues do not affect isoduration. The test suite in isoduration should be a strict superset of that of isodate. If you find it is not, please report it, as that would be a bug.
  • Properly support decimal points bolsote/isoduration#5: As of right now we are not really ISO/DIS 8601-1 compliant, as you can have decimal points at any place, while they are only accepted at the very last element of each segment. This attempts to fix it.

The first of those issues already has a PR attached, adding a few tests. No functional changes. The second one will have a PR soon (most likely tomorrow). I've reserved a healthy chunk of time in the next few weeks to solve all other issues, particularly the one related to supporting repeating intervals.

I also understand there were some teething troubles while integrating the library, maybe due to unexpected interfaces? If so, please do let me know, and we can see if the situation can be improved.

Hello @bolsote, thanks for creating and maintaining the library 🥇

As mentioned in #817 (comment) it works well and covers all test cases specified in https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/master/tests/draft2020-12/optional/format/duration.json and it is already part of this PR.

jsonschema/_utils.py Outdated Show resolved Hide resolved
@nezhar
Copy link
Contributor Author

nezhar commented Jun 25, 2021

Some tests are skiped in format_validation_annotation as the format validation is a bit confusing. Is there any dirference beetween https://github.com/Julian/jsonschema/blob/main/json/tests/draft2020-12/format.json#L775 and https://github.com/Julian/jsonschema/blob/main/json/tests/draft2020-12/optional/format/duration.json#L12? In terms of testing they both define the same schema and use the same data instance, but expect diferent results.

If I remember correctly that is due to the new rules/bahavior for Format Vocabulary (see the description in 2019-09).

There is something maybe that I miss here. Having this schema:

{ "format": "duration" }

This test should pass (format.json)

{
    "description": "invalid duration string is only an annotation by default",
    "data": "PT1D",
    "valid": true
}

But this test should fail (optional/format/duration.json)

{
    "description": "an invalid duration string",
    "data": "PT1D",
    "valid": false
}

Same schema and same data, and this applies for all format tests. Is this suposed to be controled with an extra param or something?

@codecov
Copy link

codecov bot commented Jun 28, 2021

Codecov Report

Merging #817 (4547b2a) into main (72a0c60) will increase coverage by 0.60%.
The diff coverage is 99.68%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #817      +/-   ##
==========================================
+ Coverage   96.41%   97.01%   +0.60%     
==========================================
  Files          18       18              
  Lines        2730     3011     +281     
  Branches      308      412     +104     
==========================================
+ Hits         2632     2921     +289     
+ Misses         78       73       -5     
+ Partials       20       17       -3     
Impacted Files Coverage Δ
jsonschema/__init__.py 77.77% <ø> (ø)
jsonschema/validators.py 95.31% <97.61%> (+0.73%) ⬆️
jsonschema/_format.py 87.28% <100.00%> (+3.70%) ⬆️
jsonschema/_legacy_validators.py 100.00% <100.00%> (ø)
jsonschema/_types.py 100.00% <100.00%> (ø)
jsonschema/_utils.py 94.30% <100.00%> (+4.77%) ⬆️
jsonschema/_validators.py 99.65% <100.00%> (+0.11%) ⬆️
jsonschema/tests/test_cli.py 99.19% <100.00%> (ø)
jsonschema/tests/test_format.py 100.00% <100.00%> (+5.76%) ⬆️
jsonschema/tests/test_jsonschema_test_suite.py 86.04% <100.00%> (+1.04%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 72a0c60...4547b2a. Read the comment docs.

@Julian
Copy link
Member

Julian commented Jun 28, 2021

Is this suposed to be controled with an extra param or something?

It's a bit clunky, but the format.json file is intended to be ran with format-as-annotation (in this library that means with no configured FormatChecker()) and the latter with one enabled. It's indeed the same test, but ran with two different configurations, which changes the expected result.

It may be we need a small tweak to the way we load the test suite to accommodate that.

@nezhar
Copy link
Contributor Author

nezhar commented Jun 28, 2021

Is this suposed to be controled with an extra param or something?

It's a bit clunky, but the format.json file is intended to be ran with format-as-annotation (in this library that means with no configured FormatChecker()) and the latter with one enabled. It's indeed the same test, but ran with two different configurations, which changes the expected result.

It may be we need a small tweak to the way we load the test suite to accommodate that.

Splitting the tests seems to be working fine: 2bb7f52#diff-0953f1b9ffe16c7d4aa18ae8bf21287c552dcd9b241f66fb51e099724af8b722R493

@nezhar nezhar marked this pull request as ready for review June 30, 2021 12:13
jsonschema/_validators.py Outdated Show resolved Hide resolved
jsonschema/_validators.py Outdated Show resolved Hide resolved
(id, validator.META_SCHEMA)
for id, validator in meta_schemas.items()
)
self.store = get_store()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we turning what was previously local state to mutable global state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since draft2020-12 relies on references in takes some time to initialize the validator as it fetches all this URLs form remote. The tests for draft2020-12 require over 30 min without this approach and takes several hours with the current pipeline configuration as there is a limit on paralel github actions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like there are some URLs missing from Draft202012Validator's own schema store.

IIRC from the last time I did this myself, basically we need to be caching the vocabulary schemas as well (globally), not just the metaschema.

But yeah we shouldn't move to global state here for all URLs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the meta schema is loaded form the local file located in draft2020-12.json. All references are loaded afterwards:

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "https://json-schema.org/draft/2020-12/schema",
    ...
    "allOf": [
        {"$ref": "meta/core"},
        {"$ref": "meta/applicator"},
        {"$ref": "meta/unevaluated"},
        {"$ref": "meta/validation"},
        {"$ref": "meta/meta-data"},
        {"$ref": "meta/format-annotation"},
        {"$ref": "meta/content"}
    ],
    ...
}

In this case I will try to load only the meta schemas in a global store, everything else will be loaded in the validator store.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the vocabulary schemas are also stored in the schemas directory and loaded additionally to the store by adding vocabulary_schemas when creating a validator.

@nezhar nezhar requested a review from Julian July 9, 2021 07:24
@@ -14,14 +14,14 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: pypy3
python-version: 3.9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Sorry for the delay on this one. I'll get you a full review soon, but this change (moving coverage to run on 3.9) is another good one to extract into its own tiny PR please! Appreciated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to new PR #823

Also I cleaned the commits for the fix that has been applied with #820

@nezhar
Copy link
Contributor Author

nezhar commented Jul 14, 2021

I made another contribution on the tests repostiory to cover some cases on unevaluated properties json-schema-org/JSON-Schema-Test-Suite#500

The other contribution I made is kind of reverted, as the checks where moved to format.json - json-schema-org/JSON-Schema-Test-Suite@ba1f1a7. This means that the coverage will drop as there are no more non string tests for the functions inside _format.py. Any sugestion on how to handle this better? I was thinking to add some new unit tests specific to the functions in this repository.

jsonschema/validators.py Outdated Show resolved Hide resolved
jsonschema/validators.py Outdated Show resolved Hide resolved
@Julian
Copy link
Member

Julian commented Jul 21, 2021

Thanks @nezhar! This is likely getting close to merging, I'll have another look and perhaps leave one more round of comments but then likely merge and move forward from there.

There are definitely some changes I want to make (specifically around vocabulary support), but we can do so after things are merged, and perhaps after a beta release.

narrow_unicode_build(test)
or ecmascript_regex_validation(test)
or skip(
message="ToDo: Extend validation",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok if we are missing support for something here, but can you change the message or add a ticket to indicate what we are missing and what'd be required to get it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two additional cases that fail that where introduced with the latest update of the tests. I'm kind of having a hard time understanding how the dynamic scopes are suposed to work, maybe they need to be split when loading anschors and dynamicAnchors. I'm still lokking into this, for now I fixed the message.

@Julian
Copy link
Member

Julian commented Aug 4, 2021

@nezhar I've merged this just now -- apologies for this taking so long, and quite grateful for the work!

There definitely still are things I want to fix, you'll see I've just sent a PR upstream because the UUID formatting validation here isn't correct, and I think the relative pointer validation may not be either.

But I want to get this merged and fix things on the main branch going forward, so we at least aren't sitting here with the PR waiting.

I'll follow up perhaps here if there are any major updates/changes that need doing, just in case you have comments on them, but thanks again!

yield error


def defs(validator, defs, instance, schema):
Copy link
Member

@Julian Julian Aug 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In playing around myself here the past few days, I'm pretty sure something is wrong here, but I don't know what yet.

$defs doesn't really have any behavior, it's just a rename of definitions (and thereby just a standard place to put definitions used elsewhere in the schema), so there's no validation that should happen when encountering it.

But I see tests fail if I remove this -- I think that's an issue with the dynamicRef support (which is what should be generating the error in that test), not with $defs itself, the test that fails is one that validates an invalid schema against the metaschema.

If anyone sees where the issue is, let me know, otherwise will keep playing.

Julian added a commit that referenced this pull request Aug 26, 2021
These indeed can be improved, as mentioned in
#817 (comment)
but it's a bit less clear exactly how yet -- rather than putting $ref
in the schema path, instead using relative_schema_path to only refer
to the schema post-$ref lookup is a bit more consistent with the current
norms, wherein what's in schema_path should be lookup-able via indexing.

But for now, they're distinguishable via .schema, which shows only the
$ref'ed schema for the second error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants