New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the Name API more principled #1919
base: main
Are you sure you want to change the base?
Conversation
It seems surprising for Name::default() to result in an empty, but not-FQDN domain name. Callers should probably use Name::root() instead.
It seems like `from_utf8()` and `from_str()` should do exactly the same thing. Align the behavior (which was strangely different) and deprecate `from_utf8()` in favor of the widely used FromStr trait. (Note that the `IntoName` impls also used `from_utf8()` instead of `from_str_relaxed()`, which made things even more confusing.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still trying to wrap my head around the contexts of where the various from_
funcs are used, but I think these changes are looking sensible. I left some low-hanging comments from a first pass through.
fn from_encoded_str<E: LabelEnc>( | ||
local: &str, | ||
origin: Option<&Self>, | ||
//allow_underscore: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover commented code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually scaffolding for my next commit, where I was going to introduce from_host_utf8()
and from_host_ascii()
(naming to be bikeshedded) that retain the stricter semantics.
@@ -224,14 +224,28 @@ impl Borrow<[u8]> for Label { | |||
} | |||
} | |||
|
|||
fn is_safe_ascii(c: char, is_first: bool, for_encoding: bool) -> bool { | |||
fn is_valid_ascii(c: char, is_first: bool, for_encoding: bool) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to keep the is_first
special casing if we've reached the conclusion the _
character is allowed in non-hostnames without any requirement on position in the label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good point, I can't think of a reason it should be retained...
if is_first && c == '-' { | ||
false | ||
} else if for_encoding && c == '.' { | ||
false | ||
} else { | ||
true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just noting clippy
doesn't like the false
duplicate conditional bodies and this will need a tweak for CI to pass.
@@ -88,8 +88,8 @@ impl Label { | |||
|
|||
if !s.is_empty() | |||
&& s.is_ascii() | |||
&& s.chars().take(1).all(|c| is_safe_ascii(c, true, false)) | |||
&& s.chars().skip(1).all(|c| is_safe_ascii(c, false, false)) | |||
&& s.chars().take(1).all(|c| is_valid_ascii(c, true, false)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't leave a comment on the line in question because it hasn't changed in this diff but I think the doc string on from_ascii
should try to enumerate more of the reasons that it may return an error above and beyond invalid ASCII characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like the simplification effort here. It seems like a good idea to reduce complexity of all the interfaces. There might be an issue with the wildcard checks, can you verify that was intentional?
'.' if !for_encoding => true, // needed to allow dots, for things like email addresses | ||
'-' => true, // dash is allowed | ||
'_' => true, // allowed in names that don't represent hostnames, https://github.com/bluejekyll/trust-dns/issues/1904 | ||
'*' => true, // wild card |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think wildcard is only allowed as the first and only character.
So I started to work towards a better API to accomodate the discussion in #1904. Ideally I think we'd want to allow underscore by default in host names, and potentially provide a separate API that does not allow them. So here are some things that I ran into that seem to have surprising error potential:
Name
implementsDefault
, andDefault
yields an emptyName
that is notroot
(is_fqdn
isfalse
). This seems like a weird default value that's probably not valid in most cases? My second commit here removes theDefault
impl forName
andLowerName
instead.Name
has aFromStr
impl and several implementations for the localIntoName
trait, butFromStr
relies onName::from_str_relaxed()
while theIntoName
impls rely onfrom_utf8()
. I think a reasonably expectation would be thatFromStr
exactly callsfrom_utf8()
, so I implemented that and deprecatedName::from_utf8()
in favor of theFromStr
impl for good measure, which seems like a cleaner API. I think maybe theIntoName
trait predatesTryFrom
and could be removed in favor ofTryFrom
impls these days?I noticed that
from_ascii()
andfrom_utf8()
have quite different behavior.Label::from_utf8()
switches to callingLabel::from_ascii()
when the first character of the label is_
. However,Name::from_utf8()
andName::from_ascii()
have the documented behavioral difference thatfrom_utf8()
always normalizes to lowercase whereasfrom_ascii()
preserves case. This seems like very surprising behavior -- IMOfrom_ascii()
should also normalize to lowercase. What do you think?cc @nrempel @cpu