Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that binary characters are escaped correctly in the location header #246

Closed
wants to merge 2 commits into from

Conversation

netroy
Copy link

@netroy netroy commented Feb 21, 2024

I'm seeing the issue #125 on the following URLs, because the server returns binary characters in the location header.

These URLs redirect correctly on curl, Firefox, Chrome, and a few other cli tools and scripts that I tried. I was also able to use https.get by manually converting the binary string in the location header to utf-8, and then doing an encodeURI on it like this:

import https from 'node:https';
import { parse } from 'node:url';

const url = 'https://chromewebstore.google.com/detail/aeblfdkhhhdcdjpifhhbdiojplfjncoa';
const headers = {
	'User-Agent':
		'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
};

https.get({ ...parse(url), headers }, (response1) => {
	const location = encodeURI(Buffer.from(response1.headers.location, 'binary').toString('utf8'));
	https.get({ ...parse(location), headers }, ({ statusCode }) => {
		console.log(statusCode);
	});
});

So, I've ported over the fix, and added a test for it.

@netroy netroy mentioned this pull request Feb 21, 2024
@RubenVerborgh
Copy link
Collaborator

RubenVerborgh commented Feb 21, 2024

Thanks @netroy! The issue is that this behavior is not defined in the specification. So while others might do it, that's not necessarily the one thing to do:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

https://www.rfc-editor.org/rfc/rfc7230.html

Note: Some recipients attempt to recover from Location header fields that are not valid URI references. This specification does not mandate or define such processing, but does allow it for the sake of robustness.

https://www.rfc-editor.org/rfc/rfc9110.html#field.location

So as far as I can see, the server is at fault here; and there is not one correct recovery strategy.

@netroy
Copy link
Author

netroy commented Feb 21, 2024

server is at fault here

Unfortunately, I gathered that from all the issues I've read through today across a dozen or so repos 😞.

However since the URLs I mentioned in the description work in browsers, and on curl, I have a customer who believes that the nodejs based product I work on is the one with a bug.
I've let them know about this, but I just wanted to check with you folks.
Maybe I can fix this in the application using a pnpm-patch instead 🤔

@RubenVerborgh
Copy link
Collaborator

RubenVerborgh commented Feb 22, 2024

A local fix is probably best indeed; it's hard to predict how non-spec behavior would affect millions of other users.

As you can see from the failing tests (which are of course fixable, but still), there is a potential for unpredictable behaviors, and there might even be security issues as well with guessing/assuming the used encoding is UTF-8. We don't know Chrome's or Firefox' exact implementation; they might only attempt UTF-8 decoding after very specific (security and other) constraints are satisfied for a specific request.

@netroy netroy deleted the fix-125 branch February 22, 2024 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants