Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less Specific Validation Errors for "oneOf" Schemas - Take 2 #991

Closed
dtantsur opened this issue Aug 30, 2022 · 11 comments
Closed

Less Specific Validation Errors for "oneOf" Schemas - Take 2 #991

dtantsur opened this issue Aug 30, 2022 · 11 comments

Comments

@dtantsur
Copy link

Moving this out of #977 since that bug was a user error apparently.

We're seeing a significant regression in user messages, presumably because of bd5ea73. Granted, it's hard to provide good error messages for something like

{'anyOf': [{'additionalProperties': False,
                'properties': {'hostname': {'type': 'string'},
                               'network_type': {'enum': ['managed',
                                                         'unmanaged'],
                                                'type': 'string'},
                               'port_id': {'type': 'string'},
                               'switch_id': {'type': 'string'},
                               'switch_info': {'type': 'string'}},
                'required': ['port_id', 'switch_id'],
                'type': 'object'},
               {'additionalProperties': False,
                'properties': {'hostname': {'type': 'string'},
                               'network_type': {'enum': ['managed',
                                                         'unmanaged'],
                                                'type': 'string'},
                               'port_id': {'type': 'string'},
                               'switch_id': {'type': 'string'},
                               'switch_info': {'type': 'string'}},
                'required': ['port_id', 'hostname'],
                'type': 'object'},
               {'additionalProperties': False,
                'properties': {'hostname': {'type': 'string'},
                               'network_type': {'enum': ['unmanaged'],
                                                'type': 'string'},
                               'port_id': {'type': 'string'},
                               'switch_id': {'type': 'string'},
                               'switch_info': {'type': 'string'}},
                'required': ['network_type'],
                'type': 'object'},
               {'additionalProperties': False, 'type': 'object'}]}

in all case, but e.g. all of them have 'additionalProperties': False. Previously (<= 4.7 I think) we used to have Additional properties are not allowed, now we only have ... is not valid under any of the given schemas.

Another pretty unfortunately example from our API:

STANDARD_TRAITS = # ... some list
CUSTOM_TRAIT_PATTERN = "^CUSTOM_[A-Z0-9_]+$"
TRAITS_SCHEMA = {'anyOf': [
    {'type': 'string', 'minLength': 1, 'maxLength': 255,
     'pattern': CUSTOM_TRAIT_PATTERN},
    {'type': 'string', 'enum': STANDARD_TRAITS},
]}

{
    'type': 'object',
    'properties': {
        'description': {'type': ['string', 'null'], 'maxLength': 255},
        'extra': {'type': ['object', 'null']},
        'name': TRAITS_SCHEMA,
        'steps': {'type': 'array', 'items': api_utils.DEPLOY_STEP_SCHEMA,
                  'minItems': 1},
        'uuid': {'type': ['string', 'null']},
    },
    'required': ['steps', 'name'],
    'additionalProperties': False,
}

With CUSTOM_AAAAA (a really long string) we used to have 'CUSTOM_AAAAA....' is too long, now again the generic error message. It basically means that we need to accept unspecific API errors or migrate away from jsonscheme.

@Julian
Copy link
Member

Julian commented Aug 30, 2022

In and of itself producing less specific error messages isn't a bug, it's precisely the intended behavior of the change there (and produces better error messages in many other cases), so there'd need to be some specific heuristic one could use to again figure out a better error for the cases you're presenting. Do you have a suggested one?

@dtantsur
Copy link
Author

I'd be fine with a way to opt-out of the change in #977 (i.e. going back to the 4.7 behavior).

@dtantsur
Copy link
Author

Another data point: I tried rewriting the first schema to use deeper anyOf:

    {'additionalProperties': False,
     'anyOf': [{'anyOf': [{'required': ['port_id', 'switch_id']},
                          {'required': ['port_id', 'hostname']}],
                'properties': {'hostname': {'type': 'string'},
                               'network_type': {'enum': ['managed',
                                                         'unmanaged'],
                                                'type': 'string'},
                               'port_id': {'type': 'string'},
                               'switch_id': {'type': 'string'},
                               'switch_info': {'type': 'string'}}},
               {'properties': {'hostname': {'type': 'string'},
                               'network_type': {'enum': ['unmanaged'],
                                                'type': 'string'},
                               'port_id': {'type': 'string'},
                               'switch_id': {'type': 'string'},
                               'switch_info': {'type': 'string'}},
                'required': ['network_type']}],
     'type': 'object'}

but now it rejects any properties, including the ones defined here... I guess you cannot test properties inside anyOf this way?

@dtantsur
Copy link
Author

I did get some progress with rewriting the second schema to

TRAITS_SCHEMA = {
    'type': 'string', 'minLength': 1, 'maxLength': 255,
    'anyOf': [
        {'pattern': CUSTOM_TRAIT_PATTERN},
        {'enum': STANDARD_TRAITS},
    ]
}

Now it at least handles the lengths correctly, but not the failing patterns. My reasoning would be to put pattern above enum with the logic: if it is not one of the predefined values, it must match the pattern, so the final error message is about the pattern. YMMV.

@Julian
Copy link
Member

Julian commented Aug 30, 2022

I'd be fine with a way to opt-out of the change in #977 (i.e. going back to the 4.7 behavior).

If you want to indiscretely recurse into applicators you certainly still can by doing something like:

error = best_match(...)
while error.context:
    error = best_match(error.context)

but now it rejects any properties, including the ones defined here... I guess you cannot test properties inside anyOf this way?

Yes, that's the defined behavior of additionalProperties, but may mean you want to read about unevaluatedProperties.

@dtantsur
Copy link
Author

dtantsur commented Aug 30, 2022

Yes, that's the defined behavior of additionalProperties

Sorry, I must be missing the place where it says that anyOf cannot be used with it.

may mean you want to read about unevaluatedProperties.

Same result with 'unevaluatedProperties': False.

I'll see what the manual recursion gives me, thanks!

@Julian
Copy link
Member

Julian commented Aug 30, 2022

It's:

Validation with "additionalProperties" applies only to the child values of instance names that do not appear in the annotation results of either "properties" or "patternProperties".

which basically means additionalProperties applies to properties defined as a "sibling" keyword, not somewhere nested inside a subschema that's inside anyOf.

@dtantsur
Copy link
Author

Gotcha. And with unevaluatedProperties it will descend into subschemas, but if none of them match, it will report a wrong error ("unexpected fields" even for fields defined in all subschemas).

@Julian
Copy link
Member

Julian commented Aug 30, 2022

Yep exactly.

@dtantsur
Copy link
Author

Thank you again! With your incredible help I think I managed to produce error messages I want: https://review.opendev.org/c/openstack/ironic/+/855166. Do you think it's worth an FAQ item? Otherwise feel free to close this issue.

@Julian
Copy link
Member

Julian commented Aug 30, 2022

My pleasure, glad to hear.

Do you think it's worth an FAQ item?

I suspect a FAQ entry would end up highly specific to the use case there -- but there are 2 open tickets to enhance best_match yet again with better heuristics (#698 and #812). It may be worth trying your schema again after one or both of those are done, perhaps they cover your case (and if not as I say as long as we can define some well-defined heuristic best_match is always open to improving).

@Julian Julian closed this as completed Aug 30, 2022
openstack-mirroring pushed a commit to openstack/openstack that referenced this issue Aug 30, 2022
* Update ironic from branch 'master'
  to 62f9c61ae6657227fd1f5dbc86c59a22b6bac28f
  - Improve error message heuristics with jsonschema>=4.8
    
    Before version 4.8, jsonschema did some wild guessing when producing
    error messages for schemas with several equivalent subschemas. In
    version 4.8 it is no longer done, causing error messages that are more
    correct but also more generic.
    
    This change restores guessing the potential root cause without claiming
    that it's the only possible root cause. Also the traits schema is
    simplified to make it less ambiguous.
    
    See python-jsonschema/jsonschema#991 for details.
    
    Change-Id: Ia75cecd2bfbc602b8b2b85bdda20fdc04c5eadf4
openstack-mirroring pushed a commit to openstack/ironic that referenced this issue Aug 30, 2022
Before version 4.8, jsonschema did some wild guessing when producing
error messages for schemas with several equivalent subschemas. In
version 4.8 it is no longer done, causing error messages that are more
correct but also more generic.

This change restores guessing the potential root cause without claiming
that it's the only possible root cause. Also the traits schema is
simplified to make it less ambiguous.

See python-jsonschema/jsonschema#991 for details.

Change-Id: Ia75cecd2bfbc602b8b2b85bdda20fdc04c5eadf4
openstack-mirroring pushed a commit to openstack/ironic that referenced this issue Aug 31, 2022
Before version 4.8, jsonschema did some wild guessing when producing
error messages for schemas with several equivalent subschemas. In
version 4.8 it is no longer done, causing error messages that are more
correct but also more generic.

This change restores guessing the potential root cause without claiming
that it's the only possible root cause. Also the traits schema is
simplified to make it less ambiguous.

See python-jsonschema/jsonschema#991 for details.

Change-Id: Ia75cecd2bfbc602b8b2b85bdda20fdc04c5eadf4
(cherry picked from commit 62f9c61)
openstack-mirroring pushed a commit to openstack/ironic that referenced this issue Aug 31, 2022
Before version 4.8, jsonschema did some wild guessing when producing
error messages for schemas with several equivalent subschemas. In
version 4.8 it is no longer done, causing error messages that are more
correct but also more generic.

This change restores guessing the potential root cause without claiming
that it's the only possible root cause. Also the traits schema is
simplified to make it less ambiguous.

See python-jsonschema/jsonschema#991 for details.

Change-Id: Ia75cecd2bfbc602b8b2b85bdda20fdc04c5eadf4
(cherry picked from commit 62f9c61)
openstack-mirroring pushed a commit to openstack/ironic that referenced this issue Aug 31, 2022
Before version 4.8, jsonschema did some wild guessing when producing
error messages for schemas with several equivalent subschemas. In
version 4.8 it is no longer done, causing error messages that are more
correct but also more generic.

This change restores guessing the potential root cause without claiming
that it's the only possible root cause. Also the traits schema is
simplified to make it less ambiguous.

See python-jsonschema/jsonschema#991 for details.

Change-Id: Ia75cecd2bfbc602b8b2b85bdda20fdc04c5eadf4
(cherry picked from commit 62f9c61)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants