Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rediscuss about validate on assignment #170

Open
digitaldogsbody opened this issue Mar 10, 2023 · 4 comments
Open

Rediscuss about validate on assignment #170

digitaldogsbody opened this issue Mar 10, 2023 · 4 comments
Labels
discuss For discussion in next catchup meeting help wanted Extra attention is needed

Comments

@digitaldogsbody
Copy link
Member

Currently, we have Pydantic set to validate property values on assignment. This has generally worked fine, although it imposes extra requirements on users (e.g you must supply a label value in the arguments of a new instance of a Manifest class, rather than being able to set it later).

With the work for #144, the new validation on Canvas to ensure the existence of height & width and/or duration broke a lot of tests which were relying on the previous behaviour of those fields being Optional (itself an issue with how datamodel-code-generator interprets the constraints in the JSON Schema) and thus not needing to be set at instantiation.

Example of broken tests: https://github.com/iiif-prezi/iiif-prezi3/actions/runs/4380645190/jobs/7667931447#step:6:42

It also broke the behavior of some recipes/helpers:

Example of broken recipe: https://github.com/iiif-prezi/iiif-prezi3/blob/main/docs/recipes/scripts/0003-mvm-video-method1.py

This recipe relies on setting the HWD after the Canvas creation, originally because I wanted to use the same dictionary values for the resource and the canvas (slightly stymied, it turns out by IIIF/cookbook-recipes#374, but the idea stands 😅)

A broken helper would be create_canvas_from_iiif, as this starts by instantiating a canvas object with kwargs, which wouldn't include HWD (because you're expecting to get them from the image service) and then uses set_hwd to set them afterwards.

We have several approaches available to us:

  1. Keep the behaviour as in the validator POC - this means
    a) The addition of HWD validators becomes a breaking change because it could cause existing code to stop working (the fact that the output JSON could be invalid is a slightly different matter - of which more later) - we would (probably) need to move to version 2.0, although it's not a "change in the public interface" so maybe not. Wahey semver discussions!
    b) We need to update the existing tests and helpers, potentially involving a slightly gnarly hack for create_canvas_from_iiif of setting fake values first (although we do this already for id)
  2. Turn off validate_assignment in the Pydantic config - this would allow HWD to be set after instantiation
    a) We would have to shift the validation to being called elsewhere in order to ensure we did not output invalid JSON, the json function itself seems a sensible place
    b) We would need to rewrite the tests that currently expect a ValidationError to be raised on assignment (e.g https://github.com/iiif-prezi/iiif-prezi3/blob/main/tests/test_basic.py#L70-L81)
    c) This could "break" existing implementations in as much as if they weren't setting HWD on a Canvas we would currently let them output JSON, and having post-hoc validation would stop that. I don't see this as breaking in quite the same way, as it's really a fix for a bug. Maybe this is semantics?
    d) This would temporarily allow users to create objects with garbage data - although they wouldn't be able to output them, so maybe that is just their lookout?
  3. Scrap the validators and leave the current behaviour on the library, allowing the users to output invalid JSON if they don't set HWD on a canvas
@digitaldogsbody
Copy link
Member Author

My personal position is we should adopt 1 or 2, but I can't decide which I prefer.

In favour of 1 - We already require other non-optional fields to be set at instantiation (even if we let the AutoHelpers deal with setting some of them and the user doesn't necessarily explicitly have to), so why should HWD be any different? Yes it's slightly inconvenient to change existing behaviour, but it's to fix a mistake (and we have also technically done this before because the latest skeleton introduced the regexp constraints of label languages, which was technically breaking)

In favour of 2 - Outputting the JSON is the only time the data actually needs to be correct, and saving validation until then might make the library slightly friendlier to work with / more flexible in allowing the user to use the workflow best for them - maybe they don't have the HWD (or label, or etc etc) available at object instantiation, but as long as they supply it before outputting JSON, why do we care?

@digitaldogsbody digitaldogsbody added help wanted Extra attention is needed discuss For discussion in next catchup meeting labels Mar 10, 2023
@helrond
Copy link
Contributor

helrond commented Mar 14, 2023

To me I would favor consistency first and foremost in a library, so would lean towards #1. If this is how the library behaves in other circumstances, it feels like it would be confusing for this one instance to behave differently. I get that it's a "larger" change, but it also feels like we're fixing something that was broken (even though we didn't realize it until now) so it's also a necessary fix.

@digitaldogsbody
Copy link
Member Author

@iiif-prezi/prezi3-maintainers Any other thoughts on this? I'd like to do a bit more work on the library over the next week (in expectation that a few more people might look at it after the conference), so it would be good to get this resolved.

@giacomomarchioro
Copy link
Contributor

I agree with Hillel but in my opinion doing a post validation would be much more flexible and could also overcome some limitations of the JSON schema. I wanted to add a validation step before generating the JSON also to my library in particular for testing disjoint behaviours where you need to know where is your object to understand if the behaviour is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss For discussion in next catchup meeting help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants