feat: Add unique items validation to constrained lists #2618

nuno-andre · 2021-04-01T17:08:04Z

Change Summary

Add unique_items property and validation to ConstrainedList (and conlist).

Related issue number

fix #2011

Checklist

Unit tests for the changes exist
Tests pass on CI and coverage remains at 100%
Documentation reflects the changes where applicable
changes/<pull request or issue id>-<github username>.md file added describing change
(see changes/README.md for details)
My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Rationale

This proposal solves two related issues:

{"type": "array", "uniqueItems": true} schemas being interpreted as a Python set.
JSON Schema specs define array as an ordered sequence of zero or more values and its instance equality as both are arrays, and have an equal value item-for-item. So both casting JSON arrays to Python sets and defining Python sets as JSON arrays in the schema can lead to validation errors and/or data corruption.
conset used instead of uniqueItems
In addition to the ordering issue, ConstrainedSet casts arrays to Python sets with no previous validation of items uniqueness; which is fully consistent with the Python set behavior, but opposite to that of the schema with which it is equated.

A unique_items validation in ConstrainedList would solve both issues while keeping the current conset behavior untouched.

This PR is a POC and lacks the proper tests and docs, and also it does not replicate the property into other data models (FieldInfo?). If you consider this proposal relevant, I will be glad to complete it for evaluation.

To be evaluated

Implementation follows the specs, with one exception: default value is None instead of False so as "uniqueItems": false will not be added to the schema unless explicitly stated.
In the feature request the property name is unique. This PR named it unique_items for consistency with max_items and min_items.

- Current error message is a placeholder. As specs do not state which mathematical object corresponds to this schema, there's no clearly defined concept of multiplicity. E.g. in [1, 1, 2, 2, 2, 3] one could understand: two items (1, 2), three items (1, 2, 2), or five items (1, 1, 2, 2, 2). I tend to see it as a set, so the first statement makes more sense to me.

samuelcolvin

Seems like a good start, but needs documentation, fixes and some tweaks.

We need to decide (what do others think?) whether we should support uniqueness checks for things that can't be hashed but can be equalsed - e.g. list, dict, set etc.

pydantic/types.py

samuelcolvin · 2021-05-09T13:23:10Z

pydantic/types.py

+    @classmethod
+    def unique_items_validator(cls, v: 'Optional[List[T]]') -> 'Optional[List[T]]':
+        if cls.unique_items and v and len(set(v)) != len(v):
+            raise errors.ListUniqueItemsError(not_unique=len(v) - len(set(v)))


you can avoid duplicating the len(v) and len(set(v)) calls.

What are we going to do for lists of unhashable types?

Maybe we just decide that is a limitation of this feature, but at the least we should document it.

They are incidentally duplicated because the exception message is just a placeholder.

Current error message is a placeholder. As specs do not state which mathematical object corresponds to this schema, there's no clearly defined concept of multiplicity. E.g. in [1, 1, 2, 2, 2, 3] one could understand: two items (1, 2), three items (1, 2, 2), or five items (1, 1, 2, 2, 2). I tend to see it as a set, so the first statement makes more sense to me.

I still don't know how many elements are supposed to be leftover if validation fails. Do you have an opinion on this? If not I can try to open an issue at the JSON Schema repo.

But in this case it would also be a micro-optimization, since the second calls are only made on exceptions, and this would save two STORE_FAST and two LOAD_FAST in a successful validation.

We can add a failover for unhashable types. E.g. something such as (with the appropriate exceptions)

try: if v and len(set(v)) != len(v): raise ValueError except TypeError: unique = list() if len([unique.append(i) for i in v if i not in unique]) != len(v): raise ValueError from None

I can benchmark some options. IMHO, as in the previous case, I think it is worth optimizing the check at the expense of a more expensive validation error. What do you think?

json-schema-org/json-schema-spec#1099

The failover for unhashable types seems too tricky to mypy 😅 Would you accept a type: ignore if it turns out to be the most performant?

There's currently no instance-based output structure for validation errors (only schema-based). So, if it's okay with you, I'll continue with a generic error message to move on to the unhashable types validation.

samuelcolvin · 2021-05-09T13:25:45Z

please update.

WilliamDEdwards · 2021-09-09T11:17:18Z

Will this be merged without support for unhashable types (for now)? I'd love to use this with lists.

nuno-andre · 2021-09-10T16:12:07Z

please review!

@WilliamDEdwards it's already mergeable for hashable and unhashable types 😉

WilliamDEdwards · 2021-09-10T17:03:50Z

please review!

@WilliamDEdwards it's already mergeable for hashable and unhashable types 😉

Ah, I should've read the full discussion. Can't wait :)

add failover for unhashable types check keyword value to call the validator add some tests

WilliamDEdwards · 2021-11-21T13:37:10Z

Does this PR need any work? I'd contribute, but it looks pretty finished.

samuelcolvin

a few comments, let me know what you think.

The "hash or not to hash" questions is a difficult one. I'm not sure what I think about it.

pydantic/types.py

samuelcolvin · 2021-12-05T14:31:20Z

pydantic/types.py


-def conlist(item_type: Type[T], *, min_items: int = None, max_items: int = None) -> Type[List[T]]:
+            if v and len([unique.append(i) for i in v if i not in unique]) != len(v):  # type: ignore


modifying unique inside a comprehension is somewhat unorthodox, also (if we don't care about how many items are not unique, which I will mostly be the case) we can do this faster by raising the error as soon as we find a duplicate.

Something like

for i, value in enumerate(v, start=1): if value in v[i:]: raise errors.ListUniqueItemsError()

I haven't done any profiling but I feel something like this should be quicker.

I've just profiled it and you're totally right about performance. Also, the difference between the set and the loop checks is almost negligible (even when the duplicate is at the end of the iterable).

I also agree that the most of the time we don't care about how many items aren't unique, and furthermore, this calculation reduces the performance. So I'm going to refactor this PR to push it again.

samuelcolvin · 2021-12-06T15:22:25Z

please update.

samuelcolvin · 2021-12-08T23:54:48Z

coverage is failing.

nuno-andre · 2021-12-09T02:07:44Z

Yes, I'm sorry. I mixed up different checks.

But this issue in mypy.py is not related to my PR, isn't it?

samuelcolvin · 2021-12-10T10:04:11Z

thanks so much.

WilliamDEdwards · 2021-12-10T10:29:35Z

@samuelcolvin When are you planning to release a new version which includes this patch?

WilliamDEdwards · 2022-01-12T13:23:15Z

@samuelcolvin When are you planning to release a new version which includes this patch?

For future readers: this has been included in v1.9.0.

WilliamDEdwards · 2022-01-18T17:13:20Z

I think this may not be working well with optional lists:

'NoneType' object is not iterable

Haven't done any further investigation yet, so not 100% sure this is because of unique_items.

nuno-andre · 2022-01-19T01:25:21Z

@WilliamDEdwards Please, could you provide a minimal reproducible example? This is currently working in 1.9.0:

from typing import Any, Optional
from pydantic import BaseModel, conlist

class Model(BaseModel):
    prop: Optional[conlist(Any, unique_items=True)]

m = Model()
print(m.prop)
#> None

WilliamDEdwards · 2022-01-19T08:37:29Z

@nuno-andre Sorry for not being more specific yesterday. I hadn't been able to make an MRE, but wanted to report the issue (if it is an issue, I don't exclude PEBKAC).

Here's the MRE. It's worth noting that I use List with Field rather than conlist, because mypy doesn't seem to work well with con*.

Code:

from pydantic import BaseModel, Field
from typing import Optional, List

class TestSchema(BaseModel):
  test: Optional[List[str]] = Field(default=..., unique_items=True)

Example:

>>> TestSchema(test=None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pydantic/main.py", line 331, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for TestSchema
test
  'NoneType' object is not iterable (type=type_error)

>>> TestSchema(test=['test'])
TestSchema(test=['test'])

ChibuezeOnejeme · 2022-05-18T07:44:23Z

guys i want, to make my schema columns, unique and no duplicates

`Class whiskey_dantic(BaseModel):
name: str = Field(..., unique_items=True)
price: str = Field(..., unique_items=True)
class Config:
orm_mode =True

I am getting this error:
ValueError: On-field "name" the following field constraints are set but not enforced: unique_items.
For more details see https://pydantic-docs.helpmanual.io/usage/schema/#unenforced-field-constraints

WilliamDEdwards · 2022-10-16T17:37:22Z

Hi @nuno-andre,

@WilliamDEdwards Please, could you provide a minimal reproducible example? This is currently working in 1.9.0:
from typing import Any, Optional
from pydantic import BaseModel, conlist

class Model(BaseModel):
    prop: Optional[conlist(Any, unique_items=True)]

m = Model()
print(m.prop)
#> None

The problem is in specifying None explicitly. Model() works, Model(prop=None) doesn't, with this error.

nuno-andre · 2022-10-16T18:07:21Z

@WilliamDEdwards Thank you for raising this issue!

I've created the issue (#4626) and a PR (#4627).

Add unique items validation to constrained list

a889b7e

samuelcolvin reviewed May 9, 2021

View reviewed changes

github-actions bot added the awaiting author revision label May 9, 2021

github-actions bot assigned nuno-andre May 9, 2021

nuno-andre requested a review from samuelcolvin June 4, 2021 16:50

github-actions bot added ready for review and removed awaiting author revision labels Sep 10, 2021

github-actions bot assigned PrettyWood and samuelcolvin and unassigned nuno-andre Sep 10, 2021

nuno-andre changed the title ~~Add unique items validation to constrained lists - POC~~ Add unique items validation to constrained lists Sep 10, 2021

nuno-andre changed the title ~~Add unique items validation to constrained lists~~ feat: Add unique items validation to constrained lists Sep 10, 2021

add unique_items to field and schema

1ed2798

add failover for unhashable types check keyword value to call the validator add some tests

samuelcolvin reviewed Dec 5, 2021

View reviewed changes

github-actions bot added awaiting author revision and removed ready for review labels Dec 6, 2021

github-actions bot assigned nuno-andre and unassigned samuelcolvin and PrettyWood Dec 6, 2021

github-actions bot added ready for review and removed awaiting author revision labels Dec 8, 2021

github-actions bot assigned PrettyWood and samuelcolvin and unassigned nuno-andre Dec 8, 2021

nuno-andre and others added 2 commits December 9, 2021 02:36

update unique_items validation

4b01de1

Merge branch 'master' into conlist

ffa012f

Merge branch 'master' into nuno-andre-conlist

069cc06

samuelcolvin merged commit 91ecfd6 into pydantic:master Dec 10, 2021

nuno-andre deleted the conlist branch December 11, 2021 20:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add unique items validation to constrained lists #2618

feat: Add unique items validation to constrained lists #2618

nuno-andre commented Apr 1, 2021 •

edited

samuelcolvin left a comment

samuelcolvin May 9, 2021

samuelcolvin May 9, 2021

nuno-andre May 9, 2021

nuno-andre May 10, 2021

nuno-andre May 10, 2021

nuno-andre May 20, 2021

samuelcolvin commented May 9, 2021

WilliamDEdwards commented Sep 9, 2021 •

edited

nuno-andre commented Sep 10, 2021

WilliamDEdwards commented Sep 10, 2021

WilliamDEdwards commented Nov 21, 2021

samuelcolvin left a comment

samuelcolvin Dec 5, 2021

nuno-andre Dec 8, 2021

samuelcolvin commented Dec 6, 2021

samuelcolvin commented Dec 8, 2021

nuno-andre commented Dec 9, 2021

samuelcolvin commented Dec 10, 2021

WilliamDEdwards commented Dec 10, 2021

WilliamDEdwards commented Jan 12, 2022

WilliamDEdwards commented Jan 18, 2022

nuno-andre commented Jan 19, 2022

WilliamDEdwards commented Jan 19, 2022

ChibuezeOnejeme commented May 18, 2022 •

edited

WilliamDEdwards commented Oct 16, 2022

nuno-andre commented Oct 16, 2022 •

edited


		def conlist(item_type: Type[T], *, min_items: int = None, max_items: int = None) -> Type[List[T]]:
		if v and len([unique.append(i) for i in v if i not in unique]) != len(v): # type: ignore

feat: Add unique items validation to constrained lists #2618

feat: Add unique items validation to constrained lists #2618

Conversation

nuno-andre commented Apr 1, 2021 • edited

Change Summary

Related issue number

Checklist

Rationale

To be evaluated

samuelcolvin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samuelcolvin commented May 9, 2021

WilliamDEdwards commented Sep 9, 2021 • edited

nuno-andre commented Sep 10, 2021

WilliamDEdwards commented Sep 10, 2021

WilliamDEdwards commented Nov 21, 2021

samuelcolvin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samuelcolvin commented Dec 6, 2021

samuelcolvin commented Dec 8, 2021

nuno-andre commented Dec 9, 2021

samuelcolvin commented Dec 10, 2021

WilliamDEdwards commented Dec 10, 2021

WilliamDEdwards commented Jan 12, 2022

WilliamDEdwards commented Jan 18, 2022

nuno-andre commented Jan 19, 2022

WilliamDEdwards commented Jan 19, 2022

ChibuezeOnejeme commented May 18, 2022 • edited

WilliamDEdwards commented Oct 16, 2022

nuno-andre commented Oct 16, 2022 • edited

nuno-andre commented Apr 1, 2021 •

edited

WilliamDEdwards commented Sep 9, 2021 •

edited

ChibuezeOnejeme commented May 18, 2022 •

edited

nuno-andre commented Oct 16, 2022 •

edited