Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate the normalized hostname #158

Merged
merged 1 commit into from Oct 7, 2022

Conversation

dentarg
Copy link
Collaborator

@dentarg dentarg commented Oct 5, 2022

According to the "preferred format" used by DNS.

See https://en.wikipedia.org/wiki/Domain_Name_System#Domain_name_syntax,_internationalization

Moves one valid URL to the set of invalid URLs (if you enter http://www..twingly..com/ in the address bar in Chrome, it does a search, doesn't try to visit any site).

Close #62

lib/twingly/url.rb Outdated Show resolved Hide resolved
lib/twingly/url.rb Outdated Show resolved Hide resolved
@@ -564,12 +565,6 @@ def leading_and_trailing_whitespace
it { is_expected.to eq(expected) }
end

context "oddly enough, does not alter URLs with consecutive dots" do
Copy link
Contributor

@walro walro Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe, the quirk with twingly-url is that it was intentionally meant to mimic the behavior of the good ol' .NET implementation, the change here seems to make the two implementation diverge as I gather, not sure it matters anymore though 🤷

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but I can't see the benefit with that really :) Shouldn't really be possible to collect anything from sites with URLs like http://www..twingly..com/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just mentioning that normalization of links is/used to be a thing, in those cases it didn't necessarily mean that they'd have to be "visitable".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True... but all things that are normalised should start out as a visitable URL? I suspect we constructed this URL by hand during development, it didn't come from the real world

lib/twingly/url.rb Outdated Show resolved Hide resolved
lib/twingly/url.rb Outdated Show resolved Hide resolved
lib/twingly/url.rb Outdated Show resolved Hide resolved
@dentarg dentarg force-pushed the update-for-addressable-2.8.1 branch from e093ccc to 1522f47 Compare October 6, 2022 11:04
@@ -103,11 +108,27 @@ def try_addressable_normalize(addressable_uri)
raise
end

def valid_hostname?(hostname)
# No need to check the TLD, the public suffix list does that
labels = hostname.split(DOT)[0...-1].map(&:to_s)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dentarg
Copy link
Collaborator Author

dentarg commented Oct 6, 2022

This is ready now

Copy link
Contributor

@Pontus4 Pontus4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed one tiny thing, then this LGTM!

EDIT: Didn't include the comment in this approve and now I can't delete it :P

Copy link
Contributor

@Pontus4 Pontus4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed one tiny thing, then this LGTM!

lib/twingly/url.rb Outdated Show resolved Hide resolved
Copy link
Contributor

@walro walro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@roback roback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

According to the "preferred format" used by DNS.

See https://en.wikipedia.org/wiki/Domain_Name_System#Domain_name_syntax,_internationalization

Moves one invalid URL to the set of invalid URLs (if you enter
http://www..twingly..com/ in the address bar in Chrome, it does a
search, doesn't try to visit any site).
@dentarg dentarg force-pushed the update-for-addressable-2.8.1 branch from 1522f47 to ae2d264 Compare October 6, 2022 14:06
Copy link
Contributor

@Chrizpy Chrizpy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dentarg
Copy link
Collaborator Author

dentarg commented Oct 6, 2022

:shipit:

@Pontus4 Pontus4 merged commit f95151c into twingly:master Oct 7, 2022
@dentarg dentarg deleted the update-for-addressable-2.8.1 branch October 7, 2022 11:43
dentarg added a commit to dentarg/twingly-url that referenced this pull request Oct 8, 2022
These are now invalid after twingly#158. I think we can close twingly#74.

Close twingly#74
dentarg added a commit to dentarg/twingly-url that referenced this pull request Oct 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in normalized_host in Addressable (ArgumentError: invalid byte sequence in UTF-8)
5 participants