Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the Name API more principled #1919

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
34 changes: 24 additions & 10 deletions crates/proto/src/rr/domain/label.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,8 @@ impl Label {
return Ok(Self::wildcard());
}

// special case for SRV type records
if s.starts_with('_') {
// Skip IDNA processing for ASCII strings
if s.chars().all(is_safe_ascii) {
return Self::from_ascii(s);
}

Expand Down Expand Up @@ -88,8 +88,8 @@ impl Label {

if !s.is_empty()
&& s.is_ascii()
&& s.chars().take(1).all(|c| is_safe_ascii(c, true, false))
&& s.chars().skip(1).all(|c| is_safe_ascii(c, false, false))
&& s.chars().take(1).all(|c| is_valid_ascii(c, true, false))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't leave a comment on the line in question because it hasn't changed in this diff but I think the doc string on from_ascii should try to enumerate more of the reasons that it may return an error above and beyond invalid ASCII characters.

&& s.chars().skip(1).all(|c| is_valid_ascii(c, false, false))
{
Self::from_raw_bytes(s.as_bytes())
} else {
Expand Down Expand Up @@ -189,7 +189,7 @@ impl Label {
let to_single_escape = |ch: char| format!("\\{ch}");

match char::from(byte) {
c if is_safe_ascii(c, is_first, true) => f.write_char(c)?,
c if is_valid_ascii(c, is_first, true) => f.write_char(c)?,
// it's not a control and is printable as well as inside the standard ascii range
c if byte > b'\x20' && byte < b'\x7f' => f.write_str(&to_single_escape(c))?,
_ => f.write_str(&to_triple_escape(byte))?,
Expand Down Expand Up @@ -224,14 +224,28 @@ impl Borrow<[u8]> for Label {
}
}

fn is_safe_ascii(c: char, is_first: bool, for_encoding: bool) -> bool {
fn is_valid_ascii(c: char, is_first: bool, for_encoding: bool) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to keep the is_first special casing if we've reached the conclusion the _ character is allowed in non-hostnames without any requirement on position in the label?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point, I can't think of a reason it should be retained...

if !is_safe_ascii(c) {
return false;
}

if is_first && c == '-' {
false
} else if for_encoding && c == '.' {
false
} else {
true
}
Comment on lines +232 to +238
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noting clippy doesn't like the false duplicate conditional bodies and this will need a tweak for CI to pass.

}

fn is_safe_ascii(c: char) -> bool {
match c {
c if !c.is_ascii() => false,
c if c.is_alphanumeric() => true,
'-' if !is_first => true, // dash is allowed
'_' => true, // SRV like labels
'*' if is_first => true, // wildcard
'.' if !for_encoding => true, // needed to allow dots, for things like email addresses
'-' => true, // dash is allowed
'_' => true, // allowed in names that don't represent hostnames, https://github.com/bluejekyll/trust-dns/issues/1904
'*' => true, // wild card
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think wildcard is only allowed as the first and only character.

'.' => true, // needed to allow dots, for things like email addresses
_ => false,
}
}
Expand Down
20 changes: 19 additions & 1 deletion crates/proto/src/rr/domain/name.rs
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,11 @@ impl Name {
Self::from_str(name).or_else(|_| Self::from_ascii(name))
}

fn from_encoded_str<E: LabelEnc>(local: &str, origin: Option<&Self>) -> ProtoResult<Self> {
fn from_encoded_str<E: LabelEnc>(
local: &str,
origin: Option<&Self>,
//allow_underscore: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover commented code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually scaffolding for my next commit, where I was going to introduce from_host_utf8() and from_host_ascii() (naming to be bikeshedded) that retain the stricter semantics.

) -> ProtoResult<Self> {
let mut name = Self::new();
let mut label = String::new();

Expand Down Expand Up @@ -1798,8 +1802,22 @@ mod tests {
#[test]
fn test_underscore() {
Name::from_str("_begin.example.com").expect("failed at beginning");
assert_eq!(
Name::from_str("_begin.example.com").unwrap(),
Name::from_ascii("_begin.example.com").unwrap(),
);

Name::from_str_relaxed("mid_dle.example.com").expect("failed in the middle");
assert_eq!(
Name::from_str("mid_dle.example.com").is_ok(),
Name::from_ascii("mid_dle.example.com").is_ok(),
);

Name::from_str_relaxed("end_.example.com").expect("failed at the end");
assert_eq!(
Name::from_str("end_.example.com").is_ok(),
Name::from_ascii("end_.example.com").is_ok(),
);
}

#[test]
Expand Down