-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate local part unicode correctness #192
Comments
Per RFC 6531, Section 3.3, we'll want to ensure that the local part can be converted into valid UTF-8. This includes validating that the local part does not include unpaired or misordered surrogates. We can easily determine this by rejecting runes in the range U+D800 to U+DFFF, as valid surrogate pairs will be returned as non-surrogate code points. An inefficient solution for identifying the incorrect use of surrogates would be Testing for the actual validity of UTF-8 encoded data is outside the scope of this module. We expect email addresses to be provided in their UTF-16 form, in keeping with the bulk of the ecmascript language specification. Note that IDNs must be NFC-normalized, whereas the local part need merely be valid UTF8 (though a normalized form is encouraged, we must be permissive in what we accept). Also note that our normalization routine may want to prefer the A-label form unless the local-part contains unicode characters. |
Additionally, it's not clear to me whether labels may contain noncharacters. The general expectation is likely to preserve such characters, and there's nothing in SMTPUTF8 that mentions special treatment of these characters. I propose we accept such characters unless IEFT publish errata clarifying the validity of noncharacters. I'm also not seeing anything that suggests that the unicode characters between U+007B and U+00C0 are not valid in labels, so we should likely drop that restriction (cc @WesTyler). |
Fixed in #193. |
In addition to checking whether the email could contain unicode code points, we should also ensure that the string representation is valid for UTF-8 conversion.
The challenge here is that we don't want to pull in additional dependencies, so we'll need to accomplish this with just the Node.js built-ins.
The text was updated successfully, but these errors were encountered: