Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization of email with a dot in the name in NameEmail #2955

Closed
3 tasks done
ozeranskii opened this issue Jul 2, 2021 · 10 comments · Fixed by #6125
Closed
3 tasks done

Deserialization of email with a dot in the name in NameEmail #2955

ozeranskii opened this issue Jul 2, 2021 · 10 comments · Fixed by #6125
Assignees
Labels
bug V1 Bug related to Pydantic V1.X bug V2 Bug related to Pydantic V2 help wanted Pull Request welcome

Comments

@ozeranskii
Copy link

ozeranskii commented Jul 2, 2021

Checks

  • I added a descriptive title to this issue
  • I have searched (google, github) for similar issues and couldn't find anything
  • I have read and followed the docs and still think this is a bug

Bug

If you try to deserialize a string like firstname.lastname <firstname.lastname@example.com> into a NameEmail object (as a pydantic model argument), you get an exception pydantic.error_wrappers.ValidationError.

Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

             pydantic version: 1.8.2
            pydantic compiled: False
                 install path: /<some_path>/pypoetry/virtualenvs/<some_project_name>/lib/python3.7/site-packages/pydantic
               python version: 3.7.10 (default, Apr 27 2021, 08:49:44)  [Clang 12.0.0 (clang-1200.0.32.29)]
                     platform: Darwin-20.5.0-x86_64-i386-64bit
     optional deps. installed: ['dotenv', 'email-validator', 'typing-extensions']
from pydantic import BaseModel, NameEmail


class Message(BaseModel):
    to: NameEmail


Message(to='firstname.lastname <firstname.lastname@example.com>')

Traceback

Traceback (most recent call last):
  File "/<some_path>/scratch_1.py", line 8, in <module>
    Message(to='firstname.lastname <firstname.lastname@example.com>')
  File "/<some_path>/pypoetry/virtualenvs/<some_project_name>/lib/python3.7/site-packages/pydantic/main.py", line 406, in __init__
    raise validation_error
pydantic.error_wrappers.ValidationError: 1 validation error for Message
to
  value is not a valid email address (type=value_error.email)

If the name is specified without a dot (for example, with a space), everything works as it should.

from pydantic import BaseModel, NameEmail


class Message(BaseModel):
    to: NameEmail


msg = Message(to='firstname lastname <firstname.lastname@example.com>')

print(msg)
# output 
# to=NameEmail(name='firstname lastname', email='firstname.lastname@example.com')

String like firstname.lastname <firstname.lastname@example.com> can in real life be received if an email like firstname.lastname@example.com has been serialized from NameEmail to a string (as json value, for example).

class Message(BaseModel):
    to: NameEmail

    class Config:
        # workaround until the fix is included in the new version of pydantic
        # https://github.com/samuelcolvin/pydantic/issues/2341
        json_encoders = {
            NameEmail: str,
        }


msg1 = Message(to='firstname.lastname@example.com')
msg1_as_json = msg1.json()

print(f'msg1_as_json: {msg1_as_json}')
# output
# msg1_as_json: {"to": "firstname.lastname <firstname.lastname@example.com>"}

# serialize msg1 from json in dict
msg1_as_dict = json.loads(msg1.json())
print(f'msg1_as_dict: {msg1_as_dict}')
# output
# msg1_as_dict: {'to': 'firstname.lastname <firstname.lastname@example.com>'}

msg2 = Message(**msg1_as_dict)
# pydantic.error_wrappers.ValidationError: 1 validation error for Message
# to
#   value is not a valid email address (type=value_error.email)

Debug shows that validation fails at this point https://github.com/JoshData/python-email-validator/blob/primary/email_validator/__init__.py#L334, but this not valid value created pydantic (the same way of deserialization as in PR #2479) from valid email.

This seems to be more of a problem with python-email-validator
So far I have not found an appropriate way to control the validation or deserialization of emails. Maybe you have an idea?

@ozeranskii ozeranskii added the bug V1 Bug related to Pydantic V1.X label Jul 2, 2021
@henrybetts
Copy link

I've also come across this issue recently, having needed apostrophes in the name. It's because the current regex only looks for word characters; https://github.com/samuelcolvin/pydantic/blob/5ccbdcb5904f35834300b01432a665c75dc02296/pydantic/networks.py#L465

I've had a look through rfc5322 and it seems that the name should support the characters below, otherwise should be wrapped in double quotes.

atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                       "!" / "#" /        ;  characters not including
                       "$" / "%" /        ;  specials.  Used for atoms.
                       "&" / "'" /
                       "*" / "+" /
                       "-" / "/" /
                       "=" / "?" /
                       "^" / "_" /
                       "`" / "{" /
                       "|" / "}" /
                       "~"

I've come up with this regex, using the examples from this article.

(Note that I haven't bothered to match the first case, since the current implementation has a fallback if the regex does not match, and attempts to validate the entire string as an address.)

I understand that the RFC is often not taken as gospel when it comes to email validation, and there may be a simpler regex that can still support the majority of cases. In fact, the Microsoft article also allows (but does not recommend) spaces in the name without double quotes, so I have included that in the regex.

I'm happy to implement this and submit a PR, but would be good to get feedback first, as I may have missed something, and regex really is not my strong suit!

@vikahl
Copy link

vikahl commented May 3, 2022

I have also come across this problem, when using dashes - or parenthesis ( in the name part.

@henrybetts Did you start on a PR/implementation for this?

@henrybetts
Copy link

I've been using this custom implementation of NameEmail for the past year which has worked well. I'll try to find a spare moment this week to review the regex and create a PR!

"""
Hopefully temporary email validation, until pydantic supports special characters in the name part.
"""

from pydantic.networks import validate_email as pydantic_validate_email, NameEmail as PydanticNameEmail
from pydantic.validators import str_validator
import re
from typing import Tuple, Optional


pretty_email_regex = re.compile(
    r"\s*(?:(?:([\w!#$%&'*+\-/=?^_`{|}~ ]+?)|(?:\"((?:[^\"]|\\\")*)\"))\s+)?<\s*(\S+)\s*>\s*")


def validate_email(value: str) -> Tuple[str, str]:
    m = pretty_email_regex.fullmatch(value)
    name1: Optional[str] = None
    name2: Optional[str] = None
    if m:
        name1, name2, value = m.groups()

    local_part, email = pydantic_validate_email(value)

    return name1 or name2 or local_part, email


class NameEmail(PydanticNameEmail):
    @classmethod
    def validate(cls, value) -> 'NameEmail':
        if value.__class__ == cls:
            return value
        value = str_validator(value)
        return cls(*validate_email(value))

@wookiesh
Copy link

wookiesh commented Feb 2, 2023

I also have this issue, having a lot of corporate display names following this generic pattern: "Unit/Service Team" which fail the validation and that should pass according to the RFC (or am I wrong ?).
We also have mails from background services using names like "[Service] Something" but it seems these brackets are indeed not intended to be accepted there..

@samuelcolvin
Copy link
Member

Happy to accept a PR to either fix the regex, or (even better) move all email parsing into rust Robin pydantic-core and use an external library.

@Kludex Kludex added bug V2 Bug related to Pydantic V2 v2-reviewed help wanted Pull Request welcome labels Apr 29, 2023
@r3dm1ke
Copy link

r3dm1ke commented Jun 9, 2023

Is anyone working on this @Kludex ? If not, I would be happy to try

@lig
Copy link
Contributor

lig commented Jun 9, 2023

Hi, @r3dm1ke ! I'm not aware of anyone on the team working on that. Go for it. We would be happy to accept a PR fixing this. The best thing would be bringing the parsing logic into pydantic-core as Samuel has mentioned earlier.

@r3dm1ke
Copy link

r3dm1ke commented Jun 13, 2023

@lig I got a question: by "bringing the parsing logic into pydantic-core" you mean rewriting python-email-validator in rust and adding it to pydantic-core? Or write just the parsing part which separates name from email, and still call the python-email-validator library?

@dmontagu
Copy link
Contributor

dmontagu commented Jun 13, 2023

I've taken @henrybetts's approach (thank you!) and cleaned it up a tad in #6125.

Regarding doing things in Rust — I think we looked into replacing email-validator with a rust implementation a while back (but still after Samuel's comment above), and ultimately decided it wasn't worth the hassle given it's complicated and we couldn't find a great crate implementing email validation that didn't have various incompatibilities (that we weren't happy with) when compared with the email-validator package.

I think we could perhaps still do the splitting of the name from the email in Rust (in pydantic-core) though. I'm not sure if the performance benefits would be worth the squeeze though given it's already implemented as a regex.

@lig
Copy link
Contributor

lig commented Jun 14, 2023

@dmontagu I think that it could be worth trying available crates again. As far as I can see there are several of them and some are being actively developed. Maybe, the situation has changed since than. I would agree that rewriting email-validator is an overkill. Maybe, it could be easier to fix incompatibilities in one of the available crates. But this is still a quest someone should be willing to pursue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V1 Bug related to Pydantic V1.X bug V2 Bug related to Pydantic V2 help wanted Pull Request welcome
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants