-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deserialization of email with a dot in the name in NameEmail #2955
Comments
I've also come across this issue recently, having needed apostrophes in the name. It's because the current regex only looks for word characters; https://github.com/samuelcolvin/pydantic/blob/5ccbdcb5904f35834300b01432a665c75dc02296/pydantic/networks.py#L465 I've had a look through rfc5322 and it seems that the name should support the characters below, otherwise should be wrapped in double quotes.
I've come up with this regex, using the examples from this article. (Note that I haven't bothered to match the first case, since the current implementation has a fallback if the regex does not match, and attempts to validate the entire string as an address.) I understand that the RFC is often not taken as gospel when it comes to email validation, and there may be a simpler regex that can still support the majority of cases. In fact, the Microsoft article also allows (but does not recommend) spaces in the name without double quotes, so I have included that in the regex. I'm happy to implement this and submit a PR, but would be good to get feedback first, as I may have missed something, and regex really is not my strong suit! |
I have also come across this problem, when using dashes @henrybetts Did you start on a PR/implementation for this? |
I've been using this custom implementation of NameEmail for the past year which has worked well. I'll try to find a spare moment this week to review the regex and create a PR! """
Hopefully temporary email validation, until pydantic supports special characters in the name part.
"""
from pydantic.networks import validate_email as pydantic_validate_email, NameEmail as PydanticNameEmail
from pydantic.validators import str_validator
import re
from typing import Tuple, Optional
pretty_email_regex = re.compile(
r"\s*(?:(?:([\w!#$%&'*+\-/=?^_`{|}~ ]+?)|(?:\"((?:[^\"]|\\\")*)\"))\s+)?<\s*(\S+)\s*>\s*")
def validate_email(value: str) -> Tuple[str, str]:
m = pretty_email_regex.fullmatch(value)
name1: Optional[str] = None
name2: Optional[str] = None
if m:
name1, name2, value = m.groups()
local_part, email = pydantic_validate_email(value)
return name1 or name2 or local_part, email
class NameEmail(PydanticNameEmail):
@classmethod
def validate(cls, value) -> 'NameEmail':
if value.__class__ == cls:
return value
value = str_validator(value)
return cls(*validate_email(value)) |
I also have this issue, having a lot of corporate display names following this generic pattern: "Unit/Service Team" which fail the validation and that should pass according to the RFC (or am I wrong ?). |
Happy to accept a PR to either fix the regex, or (even better) move all email parsing into rust Robin pydantic-core and use an external library. |
Is anyone working on this @Kludex ? If not, I would be happy to try |
Hi, @r3dm1ke ! I'm not aware of anyone on the team working on that. Go for it. We would be happy to accept a PR fixing this. The best thing would be bringing the parsing logic into pydantic-core as Samuel has mentioned earlier. |
@lig I got a question: by "bringing the parsing logic into pydantic-core" you mean rewriting python-email-validator in rust and adding it to pydantic-core? Or write just the parsing part which separates name from email, and still call the |
I've taken @henrybetts's approach (thank you!) and cleaned it up a tad in #6125. Regarding doing things in Rust — I think we looked into replacing email-validator with a rust implementation a while back (but still after Samuel's comment above), and ultimately decided it wasn't worth the hassle given it's complicated and we couldn't find a great crate implementing email validation that didn't have various incompatibilities (that we weren't happy with) when compared with the email-validator package. I think we could perhaps still do the splitting of the name from the email in Rust (in pydantic-core) though. I'm not sure if the performance benefits would be worth the squeeze though given it's already implemented as a regex. |
@dmontagu I think that it could be worth trying available crates again. As far as I can see there are several of them and some are being actively developed. Maybe, the situation has changed since than. I would agree that rewriting email-validator is an overkill. Maybe, it could be easier to fix incompatibilities in one of the available crates. But this is still a quest someone should be willing to pursue. |
Checks
Bug
If you try to deserialize a string like
firstname.lastname <firstname.lastname@example.com>
into aNameEmail
object (as a pydantic model argument), you get an exceptionpydantic.error_wrappers.ValidationError
.Output of
python -c "import pydantic.utils; print(pydantic.utils.version_info())"
:Traceback
If the name is specified without a dot (for example, with a space), everything works as it should.
String like
firstname.lastname <firstname.lastname@example.com>
can in real life be received if an email likefirstname.lastname@example.com
has been serialized fromNameEmail
to a string (as json value, for example).Debug shows that validation fails at this point https://github.com/JoshData/python-email-validator/blob/primary/email_validator/__init__.py#L334, but this not valid value created pydantic (the same way of deserialization as in PR #2479) from valid email.
This seems to be more of a problem with python-email-validator
So far I have not found an appropriate way to control the validation or deserialization of emails. Maybe you have an idea?
The text was updated successfully, but these errors were encountered: