Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: broken onebox images due to escape_uri bugs #17840

Merged
merged 2 commits into from Aug 9, 2022

Conversation

SamSaffron
Copy link
Member

normalized_encode in addressable is buggy due to:
sporkmonger/addressable#472

New implementation avoids any escaping (and only performs basic normalization)
if URL is already valid.

Vast majority to the calls to escape_uri start with valid urls.

This also leaves an edge case around "part" escaped urls, where some chars
are escaped and others are not. In those cases addressable may corrupt stuff.

Also added support for unicode domain names and emoji domain names
with escape uri

This removes an uneeded hack checking for pre-signed urls, which are now
handled by the general case due to starting off valid and only being minimally
normalized. Previous test case continues to pass.

UrlHelper.s3_presigned_url? which was somewhat wide was removed.

url = uri.to_s

# edge case where we expect mailto:test%40test.com to normalize to mailto:test@test.com
if url.match?(/\A#{URI::regexp}\z/) && !url.match(/\Amailto/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/\Amailto/ would catch "mailto.example.com". We should check for the colon as well

Suggested change
if url.match?(/\A#{URI::regexp}\z/) && !url.match(/\Amailto/)
if url.match?(/\A#{URI::regexp}\z/) && !url.match(/\Amailto:/)

will push a patch 👀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, matching URI::regexp does not guarantee that the URL can be parsed.

pry(main)> URI::regexp.match?("https://éxample.com")
=> true
pry(main)> URI.parse("https://éxample.com")
URI::InvalidURIError: URI must be ascii only "https://\u00E9xample.com"

I think we should try URI.parse, and then fallback to addressable if an exception is raised. Will push a patch 👀

SamSaffron and others added 2 commits August 9, 2022 11:27
normalized_encode in addressable has a number of issues, including sporkmonger/addressable#472

To temporaily work around those issues for the majority of cases, we try parsing with `::URI`. If that fails (e.g. due to non-ascii characters) then we will fall back to addressable.

Hopefully we can simplify this back to `Addressable::URI.normalized_encode` in the future.

This commit also adds support for unicode domain names and emoji domain names with escape_uri.

This removes an unneeded hack checking for pre-signed urls, which are now handled by the general case due to starting off valid and only being minimally normalized. Previous test case continues to pass.

UrlHelper.s3_presigned_url? which was somewhat wide was removed.
This is a much better description of its function. It performs idempotent normalization of a URL. If consumers truly need to `encode` a URL (including double-encoding of existing encoded entities), they can use the existing `.encode` method.
@davidtaylorhq davidtaylorhq merged commit 3c81683 into main Aug 9, 2022
@davidtaylorhq davidtaylorhq deleted the normalize-implementation branch August 9, 2022 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants