Validate local part unicode correctness #192

skeggse · 2018-10-12T05:15:55Z

In addition to checking whether the email could contain unicode code points, we should also ensure that the string representation is valid for UTF-8 conversion.

The challenge here is that we don't want to pull in additional dependencies, so we'll need to accomplish this with just the Node.js built-ins.

skeggse · 2018-10-21T06:20:51Z

Per RFC 6531, Section 3.3, we'll want to ensure that the local part can be converted into valid UTF-8.

This includes validating that the local part does not include unpaired or misordered surrogates. We can easily determine this by rejecting runes in the range U+D800 to U+DFFF, as valid surrogate pairs will be returned as non-surrogate code points. An inefficient solution for identifying the incorrect use of surrogates would be /^[\ud800-\udfff]$/.test(rune).

Testing for the actual validity of UTF-8 encoded data is outside the scope of this module. We expect email addresses to be provided in their UTF-16 form, in keeping with the bulk of the ecmascript language specification.

Note that IDNs must be NFC-normalized, whereas the local part need merely be valid UTF8 (though a normalized form is encouraged, we must be permissive in what we accept).

Also note that our normalization routine may want to prefer the A-label form unless the local-part contains unicode characters.

skeggse · 2018-10-21T06:26:36Z

Additionally, it's not clear to me whether labels may contain noncharacters. The general expectation is likely to preserve such characters, and there's nothing in SMTPUTF8 that mentions special treatment of these characters. I propose we accept such characters unless IEFT publish errata clarifying the validity of noncharacters.

I'm also not seeing anything that suggests that the unicode characters between U+007B and U+00C0 are not valid in labels, so we should likely drop that restriction (cc @WesTyler).

For #192.

skeggse · 2018-10-21T17:34:28Z

Fixed in #193.

skeggse added the enhancement label Oct 12, 2018

skeggse self-assigned this Oct 12, 2018

skeggse added the help wanted label Oct 12, 2018

skeggse added a commit that referenced this issue Oct 21, 2018

Check local part for UTF-8 representability

cff5a24

For #192.

skeggse added a commit that referenced this issue Oct 21, 2018

Check local part for UTF-8 representability

2e50282

For #192.

skeggse added a commit that referenced this issue Oct 21, 2018

Check local part for UTF-8 representability

aa90e0d

For #192.

skeggse added a commit that referenced this issue Oct 21, 2018

Check local part for UTF-8 representability

1110afa

For #192.

skeggse added a commit that referenced this issue Oct 21, 2018

Check local part for UTF-8 representability

bacd3ea

For #192.

skeggse added a commit that referenced this issue Oct 21, 2018

Check local part for UTF-8 representability

376864a

For #192.

skeggse closed this as completed Oct 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate local part unicode correctness #192

Validate local part unicode correctness #192

skeggse commented Oct 12, 2018 •

edited

skeggse commented Oct 21, 2018 •

edited

skeggse commented Oct 21, 2018 •

edited

skeggse commented Oct 21, 2018

Validate local part unicode correctness #192

Validate local part unicode correctness #192

Comments

skeggse commented Oct 12, 2018 • edited

skeggse commented Oct 21, 2018 • edited

skeggse commented Oct 21, 2018 • edited

skeggse commented Oct 21, 2018

skeggse commented Oct 12, 2018 •

edited

skeggse commented Oct 21, 2018 •

edited

skeggse commented Oct 21, 2018 •

edited