Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with normalization/unencode and leave_encoded #196

Open
DreadPirateShawn opened this issue May 9, 2015 · 3 comments
Open

Bug with normalization/unencode and leave_encoded #196

DreadPirateShawn opened this issue May 9, 2015 · 3 comments
Labels

Comments

@DreadPirateShawn
Copy link

Normalization breaks superscripts in a URL path.

Consider http://en.wiktionary.org/wiki/³ which is distinctly different from http://en.wiktionary.org/wiki/3 -- normalize will convert the former into the latter.

> require 'addressable/template'
 => true

> Addressable::URI.parse("http://en.wiktionary.org/wiki/³")
 => #<Addressable::URI:0x500b93c URI:http://en.wiktionary.org/wiki/³>

> Addressable::URI.parse("http://en.wiktionary.org/wiki/³").normalize
 => #<Addressable::URI:0x500f014 URI:http://en.wiktionary.org/wiki/3>

> Addressable::URI.unencode("http://en.wiktionary.org/wiki/%C2%B3")
 => "http://en.wiktionary.org/wiki/³"

> Addressable::URI.parse("http://en.wiktionary.org/wiki/%C2%B3").normalize
 => #<Addressable::URI:0x50290c2 URI:http://en.wiktionary.org/wiki/3>

I also tried to normalize the path directly (so that I could pass the leave_encoded parameter), but that did not work either -- as you can see in the latter examples, the leave_encoded parameter was respected (the ampersand remains encoded) but the superscript was not (still changes to a regular 3).

> require 'addressable/template'
 => true

> Addressable::URI.normalize_component("/wiki/³", leave_encoded=/[³]/)
 => "/wiki/3"

> Addressable::URI.normalize_component("/wiki/%C2%B3", leave_encoded=/[³]/)
 => "/wiki/3"

> Addressable::URI.normalize_component("/wiki/³%26³")
 => "/wiki/3&3"

> Addressable::URI.normalize_component("/wiki/³%26³", leave_encoded=/[&³]/)
 => "/wiki/3%263"

> Addressable::URI.normalize_component("/wiki/%C2%B3%26%C2%B3", leave_encoded=/[&³]/)
 => "/wiki/3%263"

This may be related to issue #100, or at least is likely related to the same section of code.

@sporkmonger
Copy link
Owner

The bug here is with leave_encoded. See http://intertwingly.net/blog/2004/07/31/URI-Equivalence and referenced discussion for why this behavior is correct in the absence of leave_encoded.

@AnthonyClark
Copy link

AnthonyClark commented Apr 26, 2019

Ran into this issue and seems like it's still around.
Actual:
Addressable::URI.unencode_component("%E2%84%A2", String, "%E2%84%A2") => "™"

Expected:
Addressable::URI.unencode_component("%E2%84%A2", String, "%E2%84%A2") => "%E2%84%A2"

@sporkmonger I know this issue is super old, but do you know if there was any attempt to fix it?

@dentarg
Copy link
Collaborator

dentarg commented Mar 14, 2021

@AnthonyClark I don't think there's been any attempt to address this (links to the blame views: unencode_component, normalize_component)

@dentarg dentarg changed the title Normalization breaks superscripts in path Bug with normalization/unencode and leave_encoded Mar 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants