Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Url Type #317

Merged
merged 6 commits into from Nov 2, 2022
Merged

Url Type #317

merged 6 commits into from Nov 2, 2022

Conversation

samuelcolvin
Copy link
Member

@samuelcolvin samuelcolvin commented Nov 1, 2022

Ref pydantic/pydantic#4694

This will cause some changes in functionality:

  • as per V2: replace URL types with a 3rd party library pydantic#4694 user_required has been dropped (it could be kept, but I don't see the point)
  • tld_required and tld have gone - they were unreliable already, we could keep them being unreliable but I'm not a fan
  • other url types will need to be implemented as

Other decedent types will need to be implemented via Annotated like:

HttpUrl = Annotated[Url, UrlConstraints(allowed_schemas=['https', 'https'], host_required=True, max_length=2083)]

Feedback welcome.


This creates the size of binaries by a non-trivial amount:

-rw-rw-r-- 1 samuel samuel 1315737 Nov  2 12:15 pydantic_core-0.6.0-cp310-cp310-linux_x86_64.whl
-rw-rw-r-- 1 samuel samuel  154722 Nov  2 12:15 pydantic_core-0.6.0.tar.gz
-rw-rw-r-- 1 samuel samuel 1478375 Nov  2 12:20 pydantic_core-0.6.1-cp310-cp310-linux_x86_64.whl
-rw-rw-r-- 1 samuel samuel  158389 Nov  2 12:20 pydantic_core-0.6.1.tar.gz

(0.6.1 is this branch)

@tiangolo @PrettyWood what do you think? Is getting this right without another python dependency (much more performant, and likely more correct) worth the increase? For me it is, but I know some people care a lot about binary size.

@codecov-commenter
Copy link

codecov-commenter commented Nov 1, 2022

Codecov Report

Merging #317 (b74eb11) into main (cf4a5cd) will decrease coverage by 0.15%.
The diff coverage is 93.52%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #317      +/-   ##
==========================================
- Coverage   97.36%   97.20%   -0.16%     
==========================================
  Files          55       57       +2     
  Lines        6334     6450     +116     
  Branches       44       45       +1     
==========================================
+ Hits         6167     6270     +103     
- Misses        167      179      +12     
- Partials        0        1       +1     
Impacted Files Coverage Δ
src/input/input_abstract.rs 93.40% <0.00%> (-3.19%) ⬇️
src/errors/types.rs 77.77% <66.66%> (-0.47%) ⬇️
src/validators/url.rs 96.05% <96.05%> (ø)
src/url.rs 96.55% <96.55%> (ø)
pydantic_core/core_schema.py 99.52% <100.00%> (-0.48%) ⬇️
src/errors/validation_exception.rs 98.19% <100.00%> (+0.01%) ⬆️
src/input/input_python.rs 98.01% <100.00%> (-0.18%) ⬇️
src/lib.rs 100.00% <100.00%> (ø)
src/validators/literal.rs 97.83% <100.00%> (ø)
src/validators/mod.rs 98.95% <100.00%> (+<0.01%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf4a5cd...b74eb11. Read the comment docs.

@samuelcolvin
Copy link
Member Author

If we wanted to support top level domains, we should use https://github.com/datasets/top-level-domain-names/blob/master/top-level-domain-names.csv.

@samuelcolvin samuelcolvin added the Full Build cause CI to do a full build label Nov 2, 2022
@samuelcolvin samuelcolvin changed the title Url type Url Type Nov 2, 2022
@tiangolo
Copy link
Member

tiangolo commented Nov 2, 2022

From 1.31 MB (for the wheel) to 1.47 MB for a better, faster, safer implementation seems perfectly reasonable to me.

@samuelcolvin
Copy link
Member Author

ye, I think I agree. If people really care about size, we could publish a pydantic-core-minimal package with this removed and mimalloc removed.

Though since (AFAIK) there's no way for optional dependencies in python to say "also don't install XXX", there's no easy way to let users call pip install pydantic[minimal] to remove pydantic-core with pydantic-core-minimal??

@PrettyWood
Copy link
Member

From 1.31 MB (for the wheel) to 1.47 MB for a better, faster, safer implementation seems perfectly reasonable to me.

Completely agree with @tiangolo! I think people care about size when it becomes "too big" (which is currently the case for compiled pydantic). But here the package remains small!

And well done @samuelcolvin for adding this on the rust side. Looks great

@samuelcolvin
Copy link
Member Author

thanks both, I learnt more about URLs doing this - I never new that there are 6 "special" schemes ("http" | "https" | "ws" | "wss" | "ftp" | "file") which causing punycode encoding of host, while other schemes cause percent encoding of the host part.

If we wanted to support top level domains, we should use https://github.com/datasets/top-level-domain-names/blob/master/top-level-domain-names.csv.

While this would be cool, I wonder how many people would really need it, and it would cause a bigger increase in size since we would need to include that full list in the codebase, let's leave it until people complain.

@samuelcolvin samuelcolvin merged commit f23c6a7 into main Nov 2, 2022
@samuelcolvin samuelcolvin deleted the url branch November 2, 2022 14:52
Copy link
Member

@PrettyWood PrettyWood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just a question

@@ -81,6 +81,7 @@ impl ValidationError {
}

fn errors(&self, py: Python, include_context: Option<bool>) -> PyResult<PyObject> {
// TODO remove `collect` when we have https://github.com/PyO3/pyo3/pull/2676
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyo3 0.17.3 has been released! Can probably be done in a PR afterwards

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, will do.

} else {
result.push_str(chunk);
}
result.push('.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't there some cases where it might reallocate the string because the capacity is not enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, but unlikely since the punycode representation should always be longer than the "unicode" (is there a better name?) representation.

Even if it does, it shouldn't cause an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(@PrettyWood just mentioning it here since you not have seen, I just sent you a message on twitter, answer on pydantic/pydantic#4665 would be great when you have a chance, sorry to hassle 🙇)

This was referenced Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Full Build cause CI to do a full build
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants