Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the Name API more principled #1919

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Make the Name API more principled #1919

wants to merge 4 commits into from

Conversation

djc
Copy link
Collaborator

@djc djc commented Apr 5, 2023

So I started to work towards a better API to accomodate the discussion in #1904. Ideally I think we'd want to allow underscore by default in host names, and potentially provide a separate API that does not allow them. So here are some things that I ran into that seem to have surprising error potential:

  • Name implements Default, and Default yields an empty Name that is not root (is_fqdn is false). This seems like a weird default value that's probably not valid in most cases? My second commit here removes the Default impl for Name and LowerName instead.

  • Name has a FromStr impl and several implementations for the local IntoName trait, but FromStr relies on Name::from_str_relaxed() while the IntoName impls rely on from_utf8(). I think a reasonably expectation would be that FromStr exactly calls from_utf8(), so I implemented that and deprecated Name::from_utf8() in favor of the FromStr impl for good measure, which seems like a cleaner API. I think maybe the IntoName trait predates TryFrom and could be removed in favor of TryFrom impls these days?

  • I noticed that from_ascii() and from_utf8() have quite different behavior. Label::from_utf8() switches to calling Label::from_ascii() when the first character of the label is _. However, Name::from_utf8() and Name::from_ascii() have the documented behavioral difference that from_utf8() always normalizes to lowercase whereas from_ascii() preserves case. This seems like very surprising behavior -- IMO from_ascii() should also normalize to lowercase. What do you think?

cc @nrempel @cpu

djc added 4 commits April 5, 2023 14:05
It seems surprising for Name::default() to result in an empty,
but not-FQDN domain name. Callers should probably use Name::root()
instead.
It seems like `from_utf8()` and `from_str()` should do exactly the
same thing. Align the behavior (which was strangely different)
and deprecate `from_utf8()` in favor of the widely used FromStr trait.

(Note that the `IntoName` impls also used `from_utf8()` instead of
`from_str_relaxed()`, which made things even more confusing.)
@djc djc requested a review from bluejekyll April 5, 2023 13:16
Copy link
Contributor

@cpu cpu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to wrap my head around the contexts of where the various from_ funcs are used, but I think these changes are looking sensible. I left some low-hanging comments from a first pass through.

fn from_encoded_str<E: LabelEnc>(
local: &str,
origin: Option<&Self>,
//allow_underscore: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover commented code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually scaffolding for my next commit, where I was going to introduce from_host_utf8() and from_host_ascii() (naming to be bikeshedded) that retain the stricter semantics.

@@ -224,14 +224,28 @@ impl Borrow<[u8]> for Label {
}
}

fn is_safe_ascii(c: char, is_first: bool, for_encoding: bool) -> bool {
fn is_valid_ascii(c: char, is_first: bool, for_encoding: bool) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to keep the is_first special casing if we've reached the conclusion the _ character is allowed in non-hostnames without any requirement on position in the label?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point, I can't think of a reason it should be retained...

Comment on lines +232 to +238
if is_first && c == '-' {
false
} else if for_encoding && c == '.' {
false
} else {
true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noting clippy doesn't like the false duplicate conditional bodies and this will need a tweak for CI to pass.

@@ -88,8 +88,8 @@ impl Label {

if !s.is_empty()
&& s.is_ascii()
&& s.chars().take(1).all(|c| is_safe_ascii(c, true, false))
&& s.chars().skip(1).all(|c| is_safe_ascii(c, false, false))
&& s.chars().take(1).all(|c| is_valid_ascii(c, true, false))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't leave a comment on the line in question because it hasn't changed in this diff but I think the doc string on from_ascii should try to enumerate more of the reasons that it may return an error above and beyond invalid ASCII characters.

@djc djc mentioned this pull request Apr 26, 2023
4 tasks
Copy link
Member

@bluejekyll bluejekyll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the simplification effort here. It seems like a good idea to reduce complexity of all the interfaces. There might be an issue with the wildcard checks, can you verify that was intentional?

'.' if !for_encoding => true, // needed to allow dots, for things like email addresses
'-' => true, // dash is allowed
'_' => true, // allowed in names that don't represent hostnames, https://github.com/bluejekyll/trust-dns/issues/1904
'*' => true, // wild card
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think wildcard is only allowed as the first and only character.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants