Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor errors when anyOf is used #1002

Open
ssbarnea opened this issue Sep 13, 2022 · 5 comments
Open

Poor errors when anyOf is used #1002

ssbarnea opened this issue Sep 13, 2022 · 5 comments
Labels
Enhancement Some new desired functionality Error Reporting Issues related to clearer or more robust validation error reporting

Comments

@ssbarnea
Copy link
Contributor

While refactoring as schema I had to make use of anyOf to distinguish between a v1 of the format and v2 of it (based on presence of a version const property). This worked well with the json validator used by vscode but I observed that on python-jsonschema the error became too generic to be useful to the user.

As my schema is bit more complex, I will try to provide a minimal reproducer for the issue, hoping that we can tune it to improve the messaging.

I did create a repository with a minimal scheme that reproduce the current behavior at https://github.com/ssbarnea/rep-a

Instead of reporting the fact that "description" is not of type string, the library reports the generic message:

{'version': 1, 'description': 0}: {'version': 1, 'description': 0} is not valid under any of the given schemas

Is there any way we can write such a schema to avoid getting such a generic error?

Please note that in reality the schemas are considerably more complex, with lots of properties for each version, but if the user only gets one such top-lever error, they will never be able to spot the root cause of the issue.

@Julian
Copy link
Member

Julian commented Sep 14, 2022

This is possibly similar to #698 or #812.

It's possible it's fixable, I haven't had a chance to revisit best_match the past few weeks but it simply needs some reliable heuristic for when to descend.

@Julian Julian added Enhancement Some new desired functionality Error Reporting Issues related to clearer or more robust validation error reporting labels Sep 14, 2022
@ssbarnea
Copy link
Contributor Author

I found a that using if/then/else as a valid workaround for improving the messaging but these are considerably uglier and harder to scale above 2-options. Feel free to close if you think it counts as a duplicate.

I will link my workaround, in case someone else finds it useful.

ssbarnea added a commit to ssbarnea/schemas that referenced this issue Sep 14, 2022
This greatly improves the quality of the validating error messages.

Related: python-jsonschema/jsonschema#1002
ssbarnea added a commit to ssbarnea/schemas that referenced this issue Sep 14, 2022
This greatly improves the quality of the validating error messages.

Improvement of error messages comes with a minor downside, as from
now on schema validation for meta would not pass on empty files. Users
will have to either put real content in meta files to pass schema
validation, such as version at least to pass. That was needed for
technical reasons as without it the error messages would be
considerably less easy to understand.

Related: python-jsonschema/jsonschema#1002
ssbarnea added a commit to ansible/schemas that referenced this issue Sep 14, 2022
This greatly improves the quality of the validating error messages.

Improvement of error messages comes with a minor downside, as from
now on schema validation for meta would not pass on empty files. Users
will have to either put real content in meta files to pass schema
validation, such as version at least to pass. That was needed for
technical reasons as without it the error messages would be
considerably less easy to understand.

Related: python-jsonschema/jsonschema#1002
@cscetbon
Copy link

Unfortunately I'm having the same issue with a schema that uses any to validate different types of messages using one global schema and the validation error just dumps the whole schema each time which is kinda useless. Here is an extract showing only what is important relating to the issue


ValidationError: {
  ...
  'metadata': {..., 'eventType': 'SIS_ADDRESS', ...},
  'address': {..., 'addressType': 'HOME', ...},
  ...
} is not valid under any of the given schemas

Failed validating 'anyOf' in schema:
    {'$schema': 'http://json-schema.org/draft-04/schema#',
     'anyOf': [{'$ref': '#/definitions/eem/events/SISAddress'},
               {'$ref': '#/definitions/eem/events/SISCourse'},
               {'$ref': '#/definitions/eem/events/SISEnrollment'},
               {'$ref': '#/definitions/eem/events/SISPerson'},
               {'$ref': '#/definitions/eem/events/SISRestriction'},
               {'$ref': '#/definitions/eem/events/SISSection'}],
     'definitions': {'edm': {'$schema': 'http://json-schema.org/draft-04/schema#',
...
'id': '#Address',
                                     'properties': {'addressType': {'enumLabels': {'BILLING_PRIMARY': 'Billing '
                                                                                                      'Primary',
                                                                                   'BILLING_SECONDARY': 'Billing '
                                                                                                      'Secondary',
                                                                                   'CAMPUS_PRIMARY': 'Campus '
                                                                                                      'Primary',
                                                                                   'HOME_PRIMARY': 'Home '
                                                                                                      'Primary',
                                                                                   'HOME_SECONDARY': 'Home '
                                                                                                      'Secondary',
                                                                                              .....
                                                                                   'WORK_SECONDARY': 'Work '
                                                                                                      'Secondary'},
....
           'SISAddress': {'$id': 'https://eem-schema.prod.xx.com/EEM.json#SISAddress'
...
                                            'required': ['addressType',
                                                         'city',
                                                         'country']},
....

Above the error is just that HOME is not a supported value for address.address_type and is specified in #/definitions/eem/events/SISAddress

@Julian
Copy link
Member

Julian commented Apr 24, 2024

(Just having re-looked at this after fixing #1250) -- this one indeed still gives the same result.

A minimal reproducer is:

from jsonschema import validate
schema = {
    "anyOf": [
        {
            "properties": {
                "version": {"const": 1},
                "description": {"type": "string"}
            },
        },
        {
            "properties": {
                "version": {"const": 2},
                "description": {"type": "string"}
            },
        },
    ]
}
validate(instance={"version": 1, "description": 0}, schema=schema)

which gives:

jsonschema.exceptions.ValidationError: {'version': 1, 'description': 0} is not valid under any of the given schemas

Failed validating 'anyOf' in schema:
    {'anyOf': [{'properties': {'description': {'type': 'string'},
                               'version': {'const': 1}}},
               {'properties': {'description': {'type': 'string'},
                               'version': {'const': 2}}}]}

On instance:
    {'description': 0, 'version': 1}

instead of descending into the branch that at least matched one property.

I'm not sure it's easy to differentiate there, but will come back and give this another look at some point.

Similarly/interestingly, there's a related bug it seems when we replace a subschema with False:

schema = {
    "anyOf": [
        {
            "properties": {
                "version": {"const": 1},
                "description": False
            },
        },
        {
            "properties": {
                "version": {"const": 2},
                "description": False
            },
        },
    ]
}
validate(instance={"version": 1, "description": 0}, schema=schema)

namely there we seem to indeed descend, and get:

jsonschema.exceptions.ValidationError: 2 was expected

Failed validating 'const' in schema[1]['properties']['version']:
    {'const': 2}

On instance['version']:
    1

which seems doubly wrong.

@macintacos
Copy link

I believe that I'm hitting this as well trying to create a JSON schema for some of Slack Block Kit's JSON objects. For example, take the following schema:

{
  "$defs": {
    "SlackBlockElementImage": {
      "description": "Validation for Slack Block element - Image.\n\nReference: https://api.slack.com/reference/block-kit/block-elements#image",
      "properties": {
        "type": {
          "const": "image",
          "description": "The type of element. In this case type is always image.",
          "enum": [
            "image"
          ],
          "title": "Type",
          "type": "string"
        },
        "image_url": {
          "description": "The URL of the image to be displayed.",
          "format": "uri",
          "maxLength": 2083,
          "minLength": 1,
          "title": "Image Url",
          "type": "string"
        },
        "alt_text": {
          "description": "A plain-text summary of the image. This should not contain any markup.",
          "title": "Alt Text",
          "type": "string"
        }
      },
      "required": [
        "type",
        "image_url",
        "alt_text"
      ],
      "title": "SlackBlockElementImage",
      "type": "object"
    },
    "SlackBlockText": {
      "allOf": [
        {
          "if": {
            "$comment": "Is 'type' set to 'mrkdwn'?",
            "properties": {
              "type": {
                "const": "mrkdwn"
              }
            }
          },
          "then": {
            "$comment": "Then 'emoji' must be null.",
            "properties": {
              "emoji": {
                "type": "null"
              }
            }
          }
        },
        {
          "if": {
            "$comment": "Is 'type' set to 'plain_text'?",
            "properties": {
              "type": {
                "const": "plain_text"
              }
            }
          },
          "then": {
            "$comment": "Then 'verbatim' must be null.",
            "properties": {
              "verbatim": {
                "type": "null"
              }
            }
          }
        }
      ],
      "description": "Validation for Slack Block object - Text.\n\nThis model assumed that all 'None' values mean that the field will be omitted when serialized to JSON with the `.json()` method. 'None' values are automatically excluded when exporting to json with `.json()`.\n\nReference: https://api.slack.com/reference/block-kit/composition-objects#text",
      "properties": {
        "type": {
          "description": "The formatting to use for this text object. Can be one of 'plain_text' or 'mrkdwn'.",
          "enum": [
            "plain_text",
            "mrkdwn"
          ],
          "title": "Type",
          "type": "string"
        },
        "text": {
          "description": "The text for the block. This field accepts any of the standard text formatting markup when type is mrkdwn. The maximum length is 3000 characters when used alone.",
          "maxLength": 3000,
          "title": "Text",
          "type": "string"
        },
        "emoji": {
          "anyOf": [
            {
              "type": "boolean"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Use only when `type` is `mrkdwn`.\n\n            Indicates whether emojis in a text field should be escaped into the colon emoji format. This field is only usable when type is 'plain_text'.",
          "title": "Emoji"
        },
        "verbatim": {
          "anyOf": [
            {
              "type": "boolean"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "OPTIONAL. Use only when `type` is `mrkdwn`.\n\n            When set to false (as is default) URLs will be auto-converted into links, conversation names will be link-ified, and certain mentions will be automatically parsed.\n\n            Using a value of true will skip any preprocessing of this nature, although you can still include manual parsing strings. This field is only usable when type is mrkdwn.",
          "title": "Verbatim"
        }
      },
      "required": [
        "type",
        "text"
      ],
      "title": "SlackBlockText",
      "type": "object"
    }
  },
  "description": "Validation for Slack Block - Context.\n\nReference: https://api.slack.com/reference/block-kit/blocks#context",
  "properties": {
    "type": {
      "const": "context",
      "description": "The type of block. For a context block, the type is always 'context'.",
      "enum": [
        "context"
      ],
      "title": "Type",
      "type": "string"
    },
    "elements": {
      "description": "An array of image elements and text objects. Minimum number of items is 1, maximum is 10.",
      "items": {
        "anyOf": [
          {
            "$ref": "#/$defs/SlackBlockElementImage"
          },
          {
            "$ref": "#/$defs/SlackBlockText"
          }
        ]
      },
      "maxItems": 10,
      "minItems": 1,
      "title": "Elements",
      "type": "array"
    },
    "block_id": {
      "anyOf": [
        {
          "maxLength": 255,
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "A string acting as a unique identifier for a block. If not specified, one will be generated. Maximum length for this field is 255 characters. block_id should be unique for each message and each iteration of a message. If a message is updated, use a new block_id.",
      "title": "Block Id"
    }
  },
  "required": [
    "type",
    "elements"
  ],
  "title": "SlackBlockContext",
  "type": "object"
}

The above schema validates this object just fine:

{
    "type": "context",
    "elements": [
        {
            "type": "image",
            "image_url": "https://google.com/blah",
            "alt_text": "This is some description for the image URL"
        }
    ]
}

Using this code to demonstrate (note: I am using the format checker to make sure the URLs are being validated):

if __name__ == "__main__":
    import jsonschema

    json_obj = json.loads(
        """
        {
            "type": "context",
            "elements": [
                {
                    "type": "image",
                    "image_url": "https://google.com/blah",
                    "alt_text": "This is some description for the image URL"
                }
            ]
        }
        """
    )
    schema = # Load the schema in as a dictionary here

    jsonschema.validate(
        instance=json_obj,
        schema=schema,
        format_checker=jsonschema.Draft202012Validator.FORMAT_CHECKER,
    )

If you run the above, it completes just fine. but if you change image_url to "abc" instead of a valid URL, you get an unhelpful error:

Failed validating 'anyOf' in schema['properties']['elements']['items']:
    {'anyOf': [{'$ref': '#/$defs/SlackBlockElementImage'},
               {'$ref': '#/$defs/SlackBlockText'}]}

On instance['elements'][0]:
    {'alt_text': 'This is some description for the image URL',
     'image_url': 'abc',
     'type': 'image'}

Instead of telling me that the image_url isn't valid, it tells me it couldn't figure out which $ref object validates the object, and just gives up.

I get that this might be hard to figure out at runtime, but I wonder if perhaps this library could make its best guess based on the other fields that are valid? For example, the type field in the above example is "context", which is only valid in the SlackBlockElementImage definition, so it would be interesting to see if there was a way to get the validate command to use that to sniff out which object to validate against and show a better error for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Some new desired functionality Error Reporting Issues related to clearer or more robust validation error reporting
Projects
None yet
Development

No branches or pull requests

4 participants