New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overly-eager decoding of reserved chars when performing HTTP requests #24932
Comments
Can you please provide a reproduction, for example an URL we can embed into a post and see the issue? |
Thank you @renchap for looking into this. Here is a post: |
Ok, I can reproduce.
Do you have a pointer for this? |
@renchap I was linked to this on stackoverflow: https://en.wikipedia.org/wiki/Query_string#Web_forms
I'm not saying (at least, this is as much I understand. Not a programmer.) |
When you look at the EventBrite page you linked above (
So the This seems incorrect to me and an issue from EventBrite, they should not HTML-encode the attribute value, as it is contained in a |
Ok this is very weird. In Chrome, when using "View source", the I am not sure if this is in the HTML spec, or a common behaviour. I think it all comes from a bug in React, where it HTML-encodes the content of those attributes: facebook/react#13838 NextJS issue, with a proposed workaround: vercel/next.js#2006 I am not sure if we need to handle this specific case in Mastodon, as this is not conforming to any spec I know. But on the other hand many sites are most probably showing this issue and dont have a real way to fix it. @ClearlyClaire do you have an opinion here? |
@renchap I’m not sure if this behavior is a bug. For example; when looking at a Hex-scrambled email address in the inspector, it appears like a proper email. And yes, 20% of OG-image validators don’t show the picture. |
I was confused because initially reading the report it sounded like Eventbrite was using plain My understanding of HTML is that HTML attributes can contain character references that should be interpreted as such. |
Upon investigation, I find that Mastodon does interpret character references properly, but that the image still can't be fetched from a different reason. From my testing, it appears that Mastodon is actually trying to fetch the following URL: Note that the URL percent-encoding is not exactly the same as in the correct link. I will be investigating why. @eWalthert how did you observe the “Actual behavior” from your original post? |
This is because we “normalize” the given URL: Line 31 in bf3ebeb
But… stopping doing that would not prevent the issue as HTTP.rb is doing the same thing (httprb/http#654): Disabling normalization completely is possible but pretty awkward: diff --git a/app/lib/request.rb b/app/lib/request.rb
index 4bde6fc911..98e8427f3d 100644
--- a/app/lib/request.rb
+++ b/app/lib/request.rb
@@ -28,7 +28,7 @@ class Request
raise ArgumentError if url.blank?
@verb = verb
- @url = Addressable::URI.parse(url).normalize
+ @url = Addressable::URI.parse(url)
@http_client = options.delete(:http_client)
@allow_local = options.delete(:allow_local)
@options = options.merge(socket_class: use_proxy? || @allow_local ? ProxySocket : Socket)
@@ -92,7 +92,7 @@ class Request
end
def http_client
- HTTP.use(:auto_inflate).timeout(TIMEOUT.dup).follow(max_hops: 3)
+ HTTP.use(:auto_inflate).use(:normalize_uri => { normalizer: ->(x) { x } }).timeout(TIMEOUT.dup).follow(max_hops: 3)
end
end
I honestly don't know if “normalizing” the URL is advisable or if we are doing it right. EDIT: The proper RFC to cover that would probably be https://datatracker.ietf.org/doc/html/rfc3986#section-6.2.2.2 according to it, the normalization process should not touch |
Dear @ClearlyClaire this was actually obseved by @charakterziffer@typo.social but as you can see, aslo the "other-way" https://typo.social/@charakterziffer/109644820380669393 Maybe one of you can report to support@eventbrite.com, to strengthen the awreness of the issue. |
@eWalthert the bug is in (a library used by) Mastodon, but it's different than what was originally reported and is more subtle; I'm gonna edit the issue's title and post accordingly |
This has been resolved with the last update 4.1.4 💙 |
Steps to reproduce the problem
Link to an Eventbrite event in a post:
e.g. https://typo.social/@lttrspc/110344236591977314 which links to https://letterspace39.eventbrite.com/
Expected behaviour
The preview card includes an image fetched from:
https://img.evbuc.com/https%3A%2F%2Fcdn.evbuc.com%2Fimages%2F503170239%2F16956914287%2F1%2Foriginal.20230428-131449?w=1000&auto=format%2Ccompress&q=75&sharp=10&rect=0%2C0%2C2160%2C1080&s=e263a07d9865502aeaa826c331043fa7
Actual behaviour
Mastodon fails to fetch the image, getting a 403 from Eventbrite because of an incorrect URL (notice that
%3A
and%2C
got decoded even though they are reserved characters):https://img.evbuc.com/https:%2F%2Fcdn.evbuc.com%2Fimages%2F503170239%2F16956914287%2F1%2Foriginal.20230428-131449?w=1000&auto=format,compress&q=75&sharp=10&rect=0,0,2160,1080&s=e263a07d9865502aeaa826c331043fa7
Detailed description
This is more general than preview cards and can be tracked down to
Addressable::URI
decoding percent-encoded reserved characters (sporkmonger/addressable#472).Specifications
Mastodon 4.1.2, Ruby 2.7.0p0, PostfreSQL 15.2, Redis 5.0.7, all browsers and apps
The text was updated successfully, but these errors were encountered: