Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assume ISO-8859-1 (instead of UTF-8) encoding for ASGI HTTP headers #1911

Closed
n0w4k opened this issue Apr 23, 2021 · 6 comments · Fixed by #1914
Closed

Assume ISO-8859-1 (instead of UTF-8) encoding for ASGI HTTP headers #1911

n0w4k opened this issue Apr 23, 2021 · 6 comments · Fixed by #1914
Assignees
Labels
Milestone

Comments

@n0w4k
Copy link

n0w4k commented Apr 23, 2021

Currently, headers in Falcon's ASGI package will be decoded using the Python's default UTF-8 decoding.
For instance:

name.decode(): value.decode()

and
return self._asgi_headers[asgi_name].decode()

When headers have an encoding different from UTF-8, this leads to UnicodeDecodeError. An option to use different charsets here would be much appreciated.

Even though headers are supposed to be plain ASCII where it doesn't matter, in reality, there are still headers sent which contain characters especially from the ISO-8859-1 charset. Compare also RFC 7230, section 3.2.4:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].

@open-collective-bot
Copy link

Hi 👋,

Thanks for using Falcon. The large amount of time and effort needed to
maintain the project and develop new features is not sustainable without
the generous financial support of community members like you.

Please consider helping us secure the future of the Falcon framework with a
one-time or recurring donation.

Thank you for your support!

@vytas7
Copy link
Member

vytas7 commented Apr 23, 2021

Hi @n0w4k !
And thanks for filing this.

I'll schedule this for 3.1 to ensure we make a decision on this.

I've tried digging into this quickly. It seems that, for instance, WSGI (PEP 3333) implies that headers should be treated as latin1 (ISO-8859-1), OTOH it's clearly articulated only for response headers, not request header parsing.
The ASGI spec conveniently evades this issue by simply providing byte headers, as discussed in your description.

The popular Gunicorn server also opts to treat headers as latin1 (ISO-8859-1) encoded, see its bytes_to_str() function.

In that light, I'm inclined to think that we probably don't want to support custom header encodings, but OTOH, it is a bug in Falcon's asgi.App implementation that we assume UTF-8, and not ISO-8859-1.

@kgriffs thoughts?

@vytas7
Copy link
Member

vytas7 commented Apr 23, 2021

@n0w4k Just to confirm, would defaulting to ISO-8859-1 solve your problem at hand, or are you dealing with other incompatible encodings as well?

@n0w4k
Copy link
Author

n0w4k commented Apr 23, 2021

Thanks for the reply, @vytas7. I can confirm that all the bytes mentioned in the UnicodeDecodeError exceptions that we checked (mostly umlauts) could be decoded using ISO-8859-1, so yes, I am pretty optimistic that defaulting to it would solve our problem. Furthermore, we never had this issue using WSGI (using Falcon 2).

@vytas7 vytas7 modified the milestones: Version 3.1, Version 3.0.1 Apr 23, 2021
@vytas7
Copy link
Member

vytas7 commented Apr 23, 2021

Right, thanks!

It looks like almost all RFCs, WSGI servers and ASGI frameworks assume latin1 by default, this was simply an oversight on our end. I've rescheduled this to the first bugfix release (we have another edge case incompatibility already slated for the release in question).

@vytas7 vytas7 changed the title Allow charsets other then UTF-8 when parsing headers in falcon.asgi Assume ISO-8859-1 (instead of UTF-8) encoding for ASGI HTTP headers Apr 23, 2021
@vytas7 vytas7 self-assigned this Apr 25, 2021
@kgriffs
Copy link
Member

kgriffs commented May 10, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants