-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change fields.py to use HTML5 non-ASCII File Names by Default instead of RFC2231 for encoding #1492
Conversation
…e HTML5 working draft, and made this the default. Support for RFC 2231-style encoding remains. Which style encoding to use can be selected by setting `filename_encoding_style` when creating a `RequestField` (by `__init__` or by `from_tuples`).
Made sure the `format_header_param_*` methods returned unicode strings on all Python versions on all code paths. Added a few more comments, including "Must be unicode." to `RequestField.__init__`. This was to match the requirement of `from_tuples` which requires field names and file names to be unicode.
I know @sigmavirus24 had some thoughts on the previous PR so just CC-ing in case you'd like to weigh in on this change. |
I think that the tests need to be updated to address how this works but I think it would make sense for someone else to review this so I don't just rewrite the test to match the current behaviour. For the first failure I do think that it is checking to see if the field is in the old RFC2231 format that this removes. The second one I'm less clear on. |
Hi folks! this PR has been open for a while. Is this still something we should do? @sigmavirus24, you were identified as the subject-matter expert, do you have time to review or should I? |
I'd be happy to refine it further with review. Personally I found that RFC2231 only was resulting in errors in my project and a number of other projects that rely upon requests but I'm also no subject expert. I just had a brief deep dive while trying to figure out an issue with a module I was developing for my project. I can take a look at modifying the tests because I think they just aren't written to test the new expected values but I'd like someone else to chime in for sure. |
Thanks. I believe that is the case for the tests.
…On Tue, Jan 22, 2019, 5:08 AM Robb ***@***.***> wrote:
I'd be happy to refine it further with review. Personally I found that
RFC2231 only was resulting in errors in my project and a number of other
projects that rely upon requests but I'm also no subject expert. I just had
a brief deep dive while trying to figure out an issue with a module I was
developing for my project. I can take a look at modifying the tests because
I think they just aren't written to test the new expected values but I'd
like someone else to chime in for sure.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1492 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAPUc7yjOcc_nx6Oa0WreTmBTj7DBbG2ks5vFw1ngaJpZM4Y_S3g>
.
|
Yeah, I know a bit too much about 2231 but I'm far more grumpy that predominantly US-developed frameworks have such terrible support for accepting 2231 encoded filenames given the spec is well over a decade old at this point. People just assume everything is ascii and the HTML5 spec has about as much adoption server-side as 2231 did last I checked, so we're trading one poorly supported thing for another. 🤷♂️ |
@sigmavirus24 I might not be seeing this correctly but does this only impact requests made with What are your thoughts on making it configurable in a similar way to our multipart boundary can be configured as a part of |
I think that making it an option that can be configured is probably the best of both worlds. I also understand that we might not want to switch the default behaviour to HTML5 right off the bat and could just add the option to use HTML5. Unfortunately I didn't do the work on this PR in the first place I just revived it from a previously abandoned PR because I needed this functionality to get requests to work for me and I'd like to do what I can to make this an option for other people who I saw were running into this issue, but I'm not an expert on the requests framework. |
You are correct. Specifically,
To be clear, my concern isn't backwards compatibility, just the lack of certainty that this makes things any better for people. Exposing this as an option would require extra work in requests to utilize that or to pass the configuration options onto users. [1] As a result, if this is optional and not default, Requests users won't get it for free [1] Unfortunately, our multipart API is already atrocious so adding more to it for this would be a non-starter for me. |
I haven't studied the RFC's in depth but the discussion in #303 seems valid and it looks like this was first attempted in #304 - when RFC 7578 was still a draft but as of 2015 it has now rendered RFC2388 obsolete. If I'm not mistaken the usage of RFC2231 encoding was done in accordance with RFC2388. If we aren't worried about backwards compatibility then it would seem like going with the new standard method of multi-part encoding of filenames would make the most sense. From my reasoning it is unlikely that anyone will be adding support for a 22+ year old standard that has been rendered obsolete whereas I know that a number of people from various bug reports have had issues with the way UTF8 filenames are encoded. I could understand if there were counter-examples where a server would only work with RFC2231 and not support the newer standard but moving forward I think it would make sense to support the newest standard. Like I said before I can't totally vouch that this code does everything it needs to do at this point but it did solve the problem I ran into and so I'm willing to help try to hash it out so others can benefit from this change. I have contributed far less and have far less skin in the game than anyone else here at this point so my interest is just trying to figure out the best technical approach. If this isn't going to be merged I think that I'll have to revisit the curl script that someone wrote to accomplish the task I was trying to do but I'd really rather accomplish the whole task in python3 as the code is already written but I don't want to distribute a module that requires someone to use my non-standard repository for basic functionality. |
Okay, I'm dragging this back up as these unicode issues seem to still be popping up for users. Broadly speaking, I'm in favor of us matching browser behavior, especially Firefox. If there is a server out there that can't read files uploaded using @sigmavirus24 can you help me understand your reservations beyond just not being sure that the HTML5 encoding scheme won't fix all problems? Do we have counter-examples where using it would be problematic? (e.g., if you know already that Flask or Django outright chokes on the HTML5 format, that would be a great datapoint for us) Otherwise, I'd like to move forward with a slightly modified version of this:
Thoughts? @sethmlarson? |
I'm in favor of doing what browsers find to be best. Wonder what curl does, probably propagates whatever you give it? I'm +1 to what you've mentioned. |
Great, let's wait to hear from @sigmavirus24. If he's on board, I can take on bringing this PR into the proposed state. |
Okay, here's a comparison of httpieHere's the command:
As expected, it uses RFC2231:
Relevant line:
curlCurl apparently uses HTTP encoding. Here's the command:
And the output:
Relevant section:
Where |
I don't know that already. I'm just familiar with the ASCII-centric view of most WSGI/Rack/etc based frameworks and the fact that they flat out don't keep up with decades old web standards, let alone those released a handful of years ago. I'm not certain this will break things that aren't already broken either so :Shrug: As for using environment variables to control behaviour, that way lies dragons from my experiences in Requests. It's can be a great control rod for "power" users but I'm not sure this is a control rod we need right now. Besides, it's easier to say no today and yes tomorrow. |
Totally agree with "Besides, it's easier to say no today and yes tomorrow."
but we've been saying no to this for so long that I think we're actively
harming users. I don't want to do something that'll break Requests, but I
also don't want to continue sitting on our hands here.
…On Mon, Mar 11, 2019, 6:53 AM Ian Stapleton Cordasco < ***@***.***> wrote:
(e.g., if you know already that Flask or Django outright chokes on the
HTML5 format, that would be a great datapoint for us)
I don't know that already. I'm just familiar with the ASCII-centric view
of most WSGI/Rack/etc based frameworks and the fact that they flat out
don't keep up with decades old web standards, let alone those released a
handful of years ago.
I'm not certain this will break things that aren't already broken either
so :Shrug:
As for using environment variables to control behaviour, that way lies
dragons from my experiences in Requests. It's can be a great control rod
for "power" users but I'm not sure this is a control rod we need right now.
Besides, it's easier to say no today and yes tomorrow.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#1492 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAPUc1vwWKpvo6u5lchH4_YbSpzOXc-Kks5vVl-9gaJpZM4Y_S3g>
.
|
@theacodes I might be wrong but sigma may have wanted that comment to apply to the environment variable controlling behavior instead of the general idea of using HTML5 encoding. Correct me if I'm wrong, that's how I understood your comment @sigmavirus24 |
Oh in that case I'm happy to leave that out until someone needs it
…On Mon, Mar 11, 2019, 9:29 AM Seth Michael Larson ***@***.***> wrote:
@theacodes <https://github.com/theacodes> I might be wrong but sigma
might wanted that comment to apply to the environment variable controlling
behavior instead of the general idea of using HTML5 encoding.
Correct me if I'm wrong, that's how I understood your comment
@sigmavirus24 <https://github.com/sigmavirus24>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1492 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAPUcxLXyhmNeziAXHud5Qad-tr5tJBHks5vVoRSgaJpZM4Y_S3g>
.
|
+1 to making the encoding pluggable (maybe we can provide an encoder/decoder to override the default; we can even provide the two encoder options out of the box and consumers can choose whichever downstream), but -0.5 to making it controlled by an environment variable. I feel env-based internal knobs in libraries produce unexpected precedence bugs, and it's hard to draw a line of what should be in an env var vs not, and also makes it hard to deprecate noticeably. IMO env knobs make more sense in end-user applications, rather than libraries. FWIW, "breaking things that are already broken" at least puts us on the right side of history in the long run vs perpetuating broken behaviour. :) I generally vote for some acute short term pain in favour of progress. It's also usually the less burdensome path for maintainers. |
Confirmed that this works as expected with Flask. Urllib3 request: import urllib3
http = urllib3.PoolManager()
http.request("POST", "http://localhost:5000", fields={"αλήθεια": ("αλήθεια.txt", "Meep", "text/plain")}) the request headers read: ImmutableMultiDict([('αλήθεια', <FileStorage: 'αλήθεια.txt' ('text/plain')>)]) |
Giving up for today: I can't reproduce the CI's failure locally, which makes debugging this hard. |
Codecov Report
@@ Coverage Diff @@
## master #1492 +/- ##
=========================================
- Coverage 99.94% 99.9% -0.05%
=========================================
Files 22 22
Lines 1826 2075 +249
=========================================
+ Hits 1825 2073 +248
- Misses 1 2 +1
Continue to review full report at Codecov.
|
Alright, @sethmlarson this is ready for review. Most importantly, I want to make sure you're comfortable with the new parameter name |
Personally +1 on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found some issues and I have some questions as well.
Also thoughts on adding a section to the docs about how to switch to previous or custom behavior? Basically documenting the pluggability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sethmlarson thanks for the thorough review (apparently I'm quite rusty!) should be much better now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added one more test case for control characters and fixed the docs failure on Travis, you can take a look at my commits @theacodes.
Assuming I didn't break anything with my latest commits this looks good to me! Thanks for picking this up and running with it. :)
Thanks, @sethmlarson! You can merge when you're ready. :) |
So happy to finally put this one to bed. Thanks, everyone! 💜
…On Fri, Mar 22, 2019, 8:52 PM Seth Michael Larson ***@***.***> wrote:
Merged #1492 <#1492> into master.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1492 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAPUc25U9-tX4OOfl_77SycYAXHne8J2ks5vZaTugaJpZM4Y_S3g>
.
|
Woo! 🎉 Thank you @Spryttan for opening the original PR and @Robbt for rebasing. This is a great change. :) |
This is a re-open of #856 and basically just a merge of the work that Spryttan did with the current HEAD so that it can be tested and hopefully merged after review.
I haven't spent the time fully reading the different RFC's and protocol docs but just arrived at this because I realized a module I was writing was failing on filenames that had UTF8 and so I decided to test this and this resolved the problem we were experiencing. I noticed it is failing one test but I think the test needs updating as it is trying to confirm that requests.py is converting a field to RFC2231 vs. just UTF8 encoding. And so I think the tests will pass when that is updated.
I understand that previously there was some concern about merging this without notice because it would mean a fundamental change in the way fields are encoded but it seems like that never happened. I'm happy to try to steward this because I think it will fix a number of problems that people using the Requests library that builds upon urllib3 have ran into. Let me know what I need to do.
Closes #303