Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greedy (?) object typing leads to semantically wrong error message #812

Open
sdruskat opened this issue Jun 2, 2021 · 1 comment
Open
Labels
Enhancement Some new desired functionality Error Reporting Issues related to clearer or more robust validation error reporting

Comments

@sdruskat
Copy link

sdruskat commented Jun 2, 2021

Hi, thanks for a great project!

We've stumbled across an issue with misleading error messages, although I'm not sure if there is a useful fix.

  • jsonschema version: 3.2.0
  • Python: 3.8.5

When you define two objects (e.g., person and entity), which share a number of properties but define others which are mutually exclusive across the two, and one object (say, entity) also requires fields where the other doesn't, properties are reported in a ValidationError as unexpected when from the user perspective, they aren't.

Expected behaviour

Given the definitions

  • person (properties: person_property (optional), alias (optional))
  • entity (properties: entity_property (required), alias (optional))

and data that introduces an additional property BREAK_ENTITY (which is disallowed via "additionalProperties": false) to an otherwise valid entity data object, I expect the error message to raise the additional key, but not any other (semantically valid) keys.

Actual behaviour

The error messages raises two unexpected keys: the additional one (correct), and entity_property (incorrect).

Without having had a look at the source code, I assume that the additional property trips the error, and then trying to type the object and collecting unexpected properties is greedy, and in this case assumes person whereas we're looking at entity really, which would be valid if only the additional property was removed. I guess changing the heuristic to find the best match may come at a performance penalty though, as you'd have to look at all potential best matches?

This also happens independently from which of the two objects is defined first, but depends on the order of objects in anyOf.

Reproducing the issue

(Minimalized (well, more or less) schema

{
    "$schema": "http://json-schema.org/draft-07/schema",
    "additionalProperties": false,
    "type": "object",
    "properties": {
        "parties": {
            "items": {
                "anyOf": [
                    {
                        "$ref": "#/definitions/person"
                    },
                    {
                        "$ref": "#/definitions/entity"
                    }
                ]
            },
            "minItems": 1,
            "type": "array",
            "uniqueItems": true
        }
    },
    "required": [
        "parties"
    ],
    "definitions": {
        "person": {
            "type": "object",
            "additionalProperties": false,
            "properties": {
                "person_property": {
                    "type": "string",
                    "minLength": 1
                },
                "alias": {
                    "$ref": "#/definitions/alias"
                }
            }
        },
        "entity": {
            "additionalProperties": false,
            "properties": {
                "entity_property": {
                    "type": "string",
                    "minLength": 1
                },
                "alias": {
                    "$ref": "#/definitions/alias"
                }
            },
            "required": [
                "entity_property"
            ],
            "type": "object"
        },
        "alias": {
            "type": "string",
            "minLength": 1
        }
    }
}

Valid data

{
    "parties": [
        {
            "entity_property": "Entity",
            "alias": "Entity alias"
        },
        {
            "person_property": "Person",
            "alias": "Person alias"
        }
    ]
}

Invalid person data (yields expected message)

The following invalid person data breaks and yields the expected error message: jsonschema.exceptions.ValidationError: Additional properties are not allowed ('BREAK_PERSON' was unexpected).

{
    "parties": [
        {
            "entity_property": "Entity",
            "alias": "Entity alias"
        },
        {
            "person_property": "Person",
            "alias": "Person alias",
            "BREAK_PERSON": "BREAK"
        }
    ]
}

ISSUE: Invalid entity data (yields unexpected message)

The following invalid entity data yields the expected error, albeit with an unexpected error message jsonschema.exceptions.ValidationError: Additional properties are not allowed ('BREAK_ENTITY', 'entity_property' were unexpected) (, 'entity_property' being the unexpected part, full trace further below)

{
    "parties": [
        {
            "entity_property": "Entity",
            "alias": "Entity alias",
            "BREAK_ENTITY": "BREAK"
        },
        {
            "person_property": "Person",
            "alias": "Person alias"
        }
    ]
}

Trace

Linebreaks included for legibility.

Traceback (most recent call last):
  File "repr_bug.py", line 27, in <module>
    validate(args.data, args.schema)
  File "repr_bug.py", line 14, in validate
    jsonschema.validate(instance=data, schema=schema_data)
  File "~/venv/lib/python3.8/site-packages/jsonschema/validators.py", line 934, in validate
    raise error
jsonschema.exceptions.ValidationError: 
  Additional properties are not allowed ('BREAK_ENTITY', 'entity_property' were unexpected)

Failed validating 'additionalProperties' in schema[0]:
    {'additionalProperties': False,
     'properties': {'alias': {'$ref': '#/definitions/alias'},
                    'person_property': {'minLength': 1, 'type': 'string'}},
     'type': 'object'}

On instance:
    {'BREAK_ENTITY': 'BREAK',
     'alias': 'Entity alias',
     'entity_property': 'Entity'}
@Julian
Copy link
Member

Julian commented Jun 2, 2021

Thanks for the detailed report (and for the kind words).

I'll have to try to understand the details a bit more as you say, to see whether there's a sufficient improvement that could be made to best_match that'd produce the right result here without damaging other cases. There are I think 2 other known improvements that haven't yet been implemented (#728 and #698 I think) -- not sure if either of those cover this case but might be worth checking.

If it's neither of those but still well defined, obviously improving error messages is always good!

EDIT: It seems improving this needs some more in depth heuristic that probably counts overlapping properties between the schema and instance -- but doing so correctly (including for patternProperties et al) probably requires full annotation support.

@Julian Julian added the Enhancement Some new desired functionality label Aug 6, 2021
@Julian Julian added the Error Reporting Issues related to clearer or more robust validation error reporting label Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Some new desired functionality Error Reporting Issues related to clearer or more robust validation error reporting
Projects
None yet
Development

No branches or pull requests

2 participants