Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiny maybe-bug in obsolete line folding handling #59

Open
njsmith opened this issue Mar 25, 2018 · 5 comments
Open

Tiny maybe-bug in obsolete line folding handling #59

njsmith opened this issue Mar 25, 2018 · 5 comments

Comments

@njsmith
Copy link
Member

njsmith commented Mar 25, 2018

I was looking at urllib3/urllib3#1318, and realized I wasn't quite sure what the right rules were for handling whitespace in headers using the "obsolete line-folding rule". Specifically, if we have a header like Name: value1${WHITESPACE1}\r\n${WHITESPACE2}value2 then what is the header's logical value?

Currently h11 treats it as: value1${WHITESPACE1} value2

That PR makes urllib3 treat it as: value1 value2

RFC 7230 says:

     header-field   = field-name ":" OWS field-value OWS

     field-name     = token
     field-value    = *( field-content / obs-fold )
     field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]
     field-vchar    = VCHAR / obs-text

     obs-fold       = CRLF 1*( SP / HTAB )

(And then in the text it specifies that obs-fold tokens should be replaced by one or more SP.)

As written, this actually says that ${WHITESPACE1} being non-empty is an error, and only Name: value1\r\n${WHITESPACE2}value2 is legal (and it becomes Name: value1 value2). So h11 doesn't quite follow that.

The reason h11 doesn't quite follow it though is that the spec's ABNF is buggy – its field-content rule is just wrong: https://www.rfc-editor.org/errata/eid4189

The suggestion in that errata is:

     field-content  = field-vchar [ 1*( SP / HTAB / field-vchar ) field-vchar ]
     obs-fold       = OWS CRLF 1*( SP / HTAB )

This matches the urllib3 interpretation, but not the current h11 interpretation.

Of course, the comment on that errata is "this is not right", but OTOH "The [fix] is the best approach for now, and a better fix will be developed", someday, maybe.

So I guess ideally h11 should switch to strip trailing whitespace from header lines that are followed by obs-fold lines.

This is the opposite of a high-priority bug.

@pquentin
Copy link

@njsmith
Copy link
Member Author

njsmith commented May 15, 2018

Fortunately, that wget bug is very different in practice: if I'm reading this right they were interpreting the header as value1${WHITESPACE1}\r\n${WHITESPACE2}value2 which is, uh.... very very definitely not right, given that it's supposed to be impossible for \r and \n to appear in a header value. h11 doesn't do that.

@pquentin
Copy link

Right! I just thought it was an interesting client-side header-parsing security bug, but maybe this issue was not the right place to say that.

@njsmith
Copy link
Member Author

njsmith commented May 16, 2018

Oh, sure, it's a good reminder of how big problems can arise from small parsing in obscure things like the handling of whitespace in obsolete line folding :-). I think h11 is pretty careful about these and that the specific issue discussed in this bug is not security relevant, but some day I'm sure I'll be wrong so it's good to check :-)

@njsmith
Copy link
Member Author

njsmith commented May 28, 2019

The obsolete RFC 2616 says:

All linear white space, including folding, has the same semantics as SP. A recipient MAY replace any linear white space with a single SP before interpreting the field value or forwarding the message downstream.

Which seems consistent with urllib3's interpretation. Though it also seems to bless h11's interpretation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants