Change fields.py to use HTML5 non-ASCII File Names by Default instead of RFC2231 for encoding #1492

Robbt · 2018-12-03T19:52:29Z

This is a re-open of #856 and basically just a merge of the work that Spryttan did with the current HEAD so that it can be tested and hopefully merged after review.

I haven't spent the time fully reading the different RFC's and protocol docs but just arrived at this because I realized a module I was writing was failing on filenames that had UTF8 and so I decided to test this and this resolved the problem we were experiencing. I noticed it is failing one test but I think the test needs updating as it is trying to confirm that requests.py is converting a field to RFC2231 vs. just UTF8 encoding. And so I think the tests will pass when that is updated.

I understand that previously there was some concern about merging this without notice because it would mean a fundamental change in the way fields are encoded but it seems like that never happened. I'm happy to try to steward this because I think it will fix a number of problems that people using the Requests library that builds upon urllib3 have ran into. Let me know what I need to do.

Closes #303

…e HTML5 working draft, and made this the default. Support for RFC 2231-style encoding remains. Which style encoding to use can be selected by setting `filename_encoding_style` when creating a `RequestField` (by `__init__` or by `from_tuples`).

Made sure the `format_header_param_*` methods returned unicode strings on all Python versions on all code paths. Added a few more comments, including "Must be unicode." to `RequestField.__init__`. This was to match the requirement of `from_tuples` which requires field names and file names to be unicode.

…able Directly

sethmlarson · 2018-12-03T20:05:45Z

I know @sigmavirus24 had some thoughts on the previous PR so just CC-ing in case you'd like to weigh in on this change.

Robbt · 2018-12-03T22:17:47Z

I think that the tests need to be updated to address how this works but I think it would make sense for someone else to review this so I don't just rewrite the test to match the current behaviour. For the first failure I do think that it is checking to see if the field is in the old RFC2231 format that this removes.

The second one I'm less clear on.

theacodes · 2019-01-22T04:00:46Z

Hi folks! this PR has been open for a while. Is this still something we should do? @sigmavirus24, you were identified as the subject-matter expert, do you have time to review or should I?

Robbt · 2019-01-22T13:08:54Z

I'd be happy to refine it further with review. Personally I found that RFC2231 only was resulting in errors in my project and a number of other projects that rely upon requests but I'm also no subject expert. I just had a brief deep dive while trying to figure out an issue with a module I was developing for my project. I can take a look at modifying the tests because I think they just aren't written to test the new expected values but I'd like someone else to chime in for sure.

theacodes · 2019-01-22T15:53:54Z

Thanks. I believe that is the case for the tests.

…

On Tue, Jan 22, 2019, 5:08 AM Robb ***@***.***> wrote: I'd be happy to refine it further with review. Personally I found that RFC2231 only was resulting in errors in my project and a number of other projects that rely upon requests but I'm also no subject expert. I just had a brief deep dive while trying to figure out an issue with a module I was developing for my project. I can take a look at modifying the tests because I think they just aren't written to test the new expected values but I'd like someone else to chime in for sure. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1492 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAPUc7yjOcc_nx6Oa0WreTmBTj7DBbG2ks5vFw1ngaJpZM4Y_S3g> .

sigmavirus24 · 2019-01-22T18:54:44Z

Yeah, I know a bit too much about 2231 but I'm far more grumpy that predominantly US-developed frameworks have such terrible support for accepting 2231 encoded filenames given the spec is well over a decade old at this point. People just assume everything is ascii and the HTML5 spec has about as much adoption server-side as 2231 did last I checked, so we're trading one poorly supported thing for another. 🤷‍♂️

sethmlarson · 2019-01-22T19:39:17Z

@sigmavirus24 I might not be seeing this correctly but does this only impact requests made with encode_multipart=True? If this is not true disregard below, I haven't given this more than 2 minutes of investigation.

What are your thoughts on making it configurable in a similar way to our multipart boundary can be configured as a part of RequestMethods.request_encode_body and default to RFC2231? Something like multipart_field_format='rfc2231' or 'html5'?

Robbt · 2019-01-22T19:42:57Z

I think that making it an option that can be configured is probably the best of both worlds. I also understand that we might not want to switch the default behaviour to HTML5 right off the bat and could just add the option to use HTML5. Unfortunately I didn't do the work on this PR in the first place I just revived it from a previously abandoned PR because I needed this functionality to get requests to work for me and I'd like to do what I can to make this an option for other people who I saw were running into this issue, but I'm not an expert on the requests framework.

sigmavirus24 · 2019-01-22T20:11:03Z

I might not be seeing this correctly but does this only impact requests made with encode_multipart=True?

You are correct. Specifically, multipart/form-data uses the same header encoding strategy as email for each part. And the proper way to do this for email clients is to use RFC 2231.

What are your thoughts on making it configurable in a similar way to our multipart boundary can be configured as a part of RequestMethods.request_encode_body and default to RFC2231? Something like multipart_field_format='rfc2231' or 'html5'?

To be clear, my concern isn't backwards compatibility, just the lack of certainty that this makes things any better for people. Exposing this as an option would require extra work in requests to utilize that or to pass the configuration options onto users. [1] As a result, if this is optional and not default, Requests users won't get it for free

[1] Unfortunately, our multipart API is already atrocious so adding more to it for this would be a non-starter for me.

Robbt · 2019-01-22T21:08:21Z

I haven't studied the RFC's in depth but the discussion in #303 seems valid and it looks like this was first attempted in #304 - when RFC 7578 was still a draft but as of 2015 it has now rendered RFC2388 obsolete. If I'm not mistaken the usage of RFC2231 encoding was done in accordance with RFC2388.

If we aren't worried about backwards compatibility then it would seem like going with the new standard method of multi-part encoding of filenames would make the most sense. From my reasoning it is unlikely that anyone will be adding support for a 22+ year old standard that has been rendered obsolete whereas I know that a number of people from various bug reports have had issues with the way UTF8 filenames are encoded.

I could understand if there were counter-examples where a server would only work with RFC2231 and not support the newer standard but moving forward I think it would make sense to support the newest standard. Like I said before I can't totally vouch that this code does everything it needs to do at this point but it did solve the problem I ran into and so I'm willing to help try to hash it out so others can benefit from this change. I have contributed far less and have far less skin in the game than anyone else here at this point so my interest is just trying to figure out the best technical approach.

If this isn't going to be merged I think that I'll have to revisit the curl script that someone wrote to accomplish the task I was trying to do but I'd really rather accomplish the whole task in python3 as the code is already written but I don't want to distribute a module that requires someone to use my non-standard repository for basic functionality.

theacodes · 2019-03-11T01:57:51Z

Okay, I'm dragging this back up as these unicode issues seem to still be popping up for users.

Broadly speaking, I'm in favor of us matching browser behavior, especially Firefox. If there is a server out there that can't read files uploaded using multipart/form-data from a web browser it is broken beyond our help, as multipart/form-data literally exists for the benefit of browsers.

@sigmavirus24 can you help me understand your reservations beyond just not being sure that the HTML5 encoding scheme won't fix all problems? Do we have counter-examples where using it would be problematic? (e.g., if you know already that Flask or Django outright chokes on the HTML5 format, that would be a great datapoint for us)

Otherwise, I'd like to move forward with a slightly modified version of this:

We default to html5 parsing.
We allow changing the parsing scheme in code.
We allow changing the default back to RFC2231 using an environment variable, such as URLLIB3_USE_RFC2231_BY_DEFAULT for the benefit of Requests users that might run into issues with the new default.

Thoughts? @sethmlarson?

sethmlarson · 2019-03-11T02:16:50Z

I'm in favor of doing what browsers find to be best. Wonder what curl does, probably propagates whatever you give it? I'm +1 to what you've mentioned.

theacodes · 2019-03-11T02:22:34Z

Great, let's wait to hear from @sigmavirus24. If he's on board, I can take on bringing this PR into the proposed state.

theacodes · 2019-03-11T02:48:08Z

Okay, here's a comparison of httpie (which uses Requests) and curl.

httpie

Here's the command:

http -v -f POST https://httpbin.org/post αλήθεια@αλήθεια.txt

As expected, it uses RFC2231:

http -v -f POST https://httpbin.org/post αλήθεια@αλήθεια.txt
POST /post HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 263
Content-Type: multipart/form-data; boundary=095b687d6199b2436196354e281e2e8e
Host: httpbin.org
User-Agent: HTTPie/1.0.2

--095b687d6199b2436196354e281e2e8e
Content-Disposition: form-data; name*=utf-8''%CE%B1%CE%BB%CE%AE%CE%B8%CE%B5%CE%B9%CE%B1; filename*=utf-8''%CE%B1%CE%BB%CE%AE%CE%B8%CE%B5%CE%B9%CE%B1.txt
Content-Type: text/plain

Meep

--095b687d6199b2436196354e281e2e8e--

HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Encoding: gzip
Content-Length: 304
Content-Type: application/json
Date: Mon, 11 Mar 2019 02:39:57 GMT
Server: nginx

{
    "args": {},
    "data": "",
    "files": {
        "αλήθεια": "Meep\n"
    },
    "form": {},
    "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate",
        "Content-Length": "263",
        "Content-Type": "multipart/form-data; boundary=095b687d6199b2436196354e281e2e8e",
        "Host": "httpbin.org",
        "User-Agent": "HTTPie/1.0.2"
    },
    "json": null,
    "origin": "104.232.115.83, 104.232.115.83",
    "url": "https://httpbin.org/post"
}

Relevant line:

Content-Disposition: form-data; name*=utf-8''%CE%B1%CE%BB%CE%AE%CE%B8%CE%B5%CE%B9%CE%B1; filename*=utf-8''%CE%B1%CE%BB%CE%AE%CE%B8%CE%B5%CE%B9%CE%B1.txt

curl

Curl apparently uses HTTP encoding.

Here's the command:

curl --trace - -F "αλήθεια=@αλήθεια.txt" https://httpbin.org/post

And the output:

curl --trace - -F "αλήθεια=@αλήθεια.txt" https://httpbin.org/post
== Info:   Trying 3.85.154.144...
== Info: Connected to httpbin.org (3.85.154.144) port 443 (#0)
== Info: found 173 certificates in /etc/ssl/certs/ca-certificates.crt
== Info: found 694 certificates in /etc/ssl/certs
== Info: ALPN, offering http/1.1
== Info: SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
== Info:         server certificate verification OK
== Info:         server certificate status verification SKIPPED
== Info:         common name: httpbin.org (matched)
== Info:         server certificate expiration date OK
== Info:         server certificate activation date OK
== Info:         certificate public key: RSA
== Info:         certificate version: #3
== Info:         subject: CN=httpbin.org
== Info:         start date: Sun, 17 Feb 2019 00:00:00 GMT
== Info:         expire date: Tue, 17 Mar 2020 12:00:00 GMT
== Info:         issuer: C=US,O=Amazon,OU=Server CA 1B,CN=Amazon
== Info:         compression: NULL
== Info: ALPN, server did not agree to a protocol
=> Send header, 209 bytes (0xd1)
0000: 50 4f 53 54 20 2f 70 6f 73 74 20 48 54 54 50 2f POST /post HTTP/
0010: 31 2e 31 0d 0a 48 6f 73 74 3a 20 68 74 74 70 62 1.1..Host: httpb
0020: 69 6e 2e 6f 72 67 0d 0a 55 73 65 72 2d 41 67 65 in.org..User-Age
0030: 6e 74 3a 20 63 75 72 6c 2f 37 2e 34 37 2e 30 0d nt: curl/7.47.0.
0040: 0a 41 63 63 65 70 74 3a 20 2a 2f 2a 0d 0a 43 6f .Accept: */*..Co
0050: 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 3a 20 32 31 ntent-Length: 21
0060: 31 0d 0a 45 78 70 65 63 74 3a 20 31 30 30 2d 63 1..Expect: 100-c
0070: 6f 6e 74 69 6e 75 65 0d 0a 43 6f 6e 74 65 6e 74 ontinue..Content
0080: 2d 54 79 70 65 3a 20 6d 75 6c 74 69 70 61 72 74 -Type: multipart
0090: 2f 66 6f 72 6d 2d 64 61 74 61 3b 20 62 6f 75 6e /form-data; boun
00a0: 64 61 72 79 3d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d dary=-----------
00b0: 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 34 62 32 -------------4b2
00c0: 38 37 64 37 30 38 62 66 63 62 34 36 33 0d 0a 0d 87d708bfcb463...
00d0: 0a                                              .
<= Recv header, 23 bytes (0x17)
0000: 48 54 54 50 2f 31 2e 31 20 31 30 30 20 43 6f 6e HTTP/1.1 100 Con
0010: 74 69 6e 75 65 0d 0a                            tinue..
=> Send data, 158 bytes (0x9e)
0000: 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d ----------------
0010: 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 34 62 32 38 37 64 ----------4b287d
0020: 37 30 38 62 66 63 62 34 36 33 0d 0a 43 6f 6e 74 708bfcb463..Cont
0030: 65 6e 74 2d 44 69 73 70 6f 73 69 74 69 6f 6e 3a ent-Disposition:
0040: 20 66 6f 72 6d 2d 64 61 74 61 3b 20 6e 61 6d 65  form-data; name
0050: 3d 22 ce b1 ce bb ce ae ce b8 ce b5 ce b9 ce b1 ="..............
0060: 22 3b 20 66 69 6c 65 6e 61 6d 65 3d 22 ce b1 ce "; filename="...
0070: bb ce ae ce b8 ce b5 ce b9 ce b1 2e 74 78 74 22 ............txt"
0080: 0d 0a 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 ..Content-Type:
0090: 74 65 78 74 2f 70 6c 61 69 6e 0d 0a 0d 0a       text/plain....
=> Send data, 5 bytes (0x5)
0000: 4d 65 65 70 0a                                  Meep.
=> Send data, 48 bytes (0x30)
0000: 0d 0a 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d ..--------------
0010: 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 34 62 32 38 ------------4b28
0020: 37 64 37 30 38 62 66 63 62 34 36 33 2d 2d 0d 0a 7d708bfcb463--..
<= Recv header, 17 bytes (0x11)
0000: 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d HTTP/1.1 200 OK.
0010: 0a                                              .
<= Recv header, 40 bytes (0x28)
0000: 41 63 63 65 73 73 2d 43 6f 6e 74 72 6f 6c 2d 41 Access-Control-A
0010: 6c 6c 6f 77 2d 43 72 65 64 65 6e 74 69 61 6c 73 llow-Credentials
0020: 3a 20 74 72 75 65 0d 0a                         : true..
<= Recv header, 32 bytes (0x20)
0000: 41 63 63 65 73 73 2d 43 6f 6e 74 72 6f 6c 2d 41 Access-Control-A
0010: 6c 6c 6f 77 2d 4f 72 69 67 69 6e 3a 20 2a 0d 0a llow-Origin: *..
<= Recv header, 32 bytes (0x20)
0000: 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 61 70 Content-Type: ap
0010: 70 6c 69 63 61 74 69 6f 6e 2f 6a 73 6f 6e 0d 0a plication/json..
<= Recv header, 37 bytes (0x25)
0000: 44 61 74 65 3a 20 4d 6f 6e 2c 20 31 31 20 4d 61 Date: Mon, 11 Ma
0010: 72 20 32 30 31 39 20 30 32 3a 34 31 3a 35 32 20 r 2019 02:41:52
0020: 47 4d 54 0d 0a                                  GMT..
<= Recv header, 15 bytes (0xf)
0000: 53 65 72 76 65 72 3a 20 6e 67 69 6e 78 0d 0a    Server: nginx..
<= Recv header, 21 bytes (0x15)
0000: 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 3a 20 Content-Length:
0010: 34 35 35 0d 0a                                  455..
<= Recv header, 24 bytes (0x18)
0000: 43 6f 6e 6e 65 63 74 69 6f 6e 3a 20 6b 65 65 70 Connection: keep
0010: 2d 61 6c 69 76 65 0d 0a                         -alive..
<= Recv header, 2 bytes (0x2)
0000: 0d 0a                                           ..
<= Recv data, 455 bytes (0x1c7)
0000: 7b 0a 20 20 22 61 72 67 73 22 3a 20 7b 7d 2c 20 {.  "args": {},
0010: 0a 20 20 22 64 61 74 61 22 3a 20 22 22 2c 20 0a .  "data": "", .
0020: 20 20 22 66 69 6c 65 73 22 3a 20 7b 0a 20 20 20   "files": {.
0030: 20 22 5c 75 30 33 62 31 5c 75 30 33 62 62 5c 75  "\u03b1\u03bb\u
0040: 30 33 61 65 5c 75 30 33 62 38 5c 75 30 33 62 35 03ae\u03b8\u03b5
0050: 5c 75 30 33 62 39 5c 75 30 33 62 31 22 3a 20 22 \u03b9\u03b1": "
0060: 4d 65 65 70 5c 6e 22 0a 20 20 7d 2c 20 0a 20 20 Meep\n".  }, .
0070: 22 66 6f 72 6d 22 3a 20 7b 7d 2c 20 0a 20 20 22 "form": {}, .  "
0080: 68 65 61 64 65 72 73 22 3a 20 7b 0a 20 20 20 20 headers": {.
0090: 22 41 63 63 65 70 74 22 3a 20 22 2a 2f 2a 22 2c "Accept": "*/*",
00a0: 20 0a 20 20 20 20 22 43 6f 6e 74 65 6e 74 2d 4c  .    "Content-L
00b0: 65 6e 67 74 68 22 3a 20 22 32 31 31 22 2c 20 0a ength": "211", .
00c0: 20 20 20 20 22 43 6f 6e 74 65 6e 74 2d 54 79 70     "Content-Typ
00d0: 65 22 3a 20 22 6d 75 6c 74 69 70 61 72 74 2f 66 e": "multipart/f
00e0: 6f 72 6d 2d 64 61 74 61 3b 20 62 6f 75 6e 64 61 orm-data; bounda
00f0: 72 79 3d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d ry=-------------
0100: 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 34 62 32 38 37 -----------4b287
0110: 64 37 30 38 62 66 63 62 34 36 33 22 2c 20 0a 20 d708bfcb463", .
0120: 20 20 20 22 48 6f 73 74 22 3a 20 22 68 74 74 70    "Host": "http
0130: 62 69 6e 2e 6f 72 67 22 2c 20 0a 20 20 20 20 22 bin.org", .    "
0140: 55 73 65 72 2d 41 67 65 6e 74 22 3a 20 22 63 75 User-Agent": "cu
0150: 72 6c 2f 37 2e 34 37 2e 30 22 0a 20 20 7d 2c 20 rl/7.47.0".  },
0160: 0a 20 20 22 6a 73 6f 6e 22 3a 20 6e 75 6c 6c 2c .  "json": null,
0170: 20 0a 20 20 22 6f 72 69 67 69 6e 22 3a 20 22 31  .  "origin": "1
0180: 30 34 2e 32 33 32 2e 31 31 35 2e 38 33 2c 20 31 04.232.115.83, 1
0190: 30 34 2e 32 33 32 2e 31 31 35 2e 38 33 22 2c 20 04.232.115.83",
01a0: 0a 20 20 22 75 72 6c 22 3a 20 22 68 74 74 70 73 .  "url": "https
01b0: 3a 2f 2f 68 74 74 70 62 69 6e 2e 6f 72 67 2f 70 ://httpbin.org/p
01c0: 6f 73 74 22 0a 7d 0a                            ost".}.
{
  "args": {},
  "data": "",
  "files": {
    "\u03b1\u03bb\u03ae\u03b8\u03b5\u03b9\u03b1": "Meep\n"
  },
  "form": {},
  "headers": {
    "Accept": "*/*",
    "Content-Length": "211",
    "Content-Type": "multipart/form-data; boundary=------------------------4b287d708bfcb463",
    "Host": "httpbin.org",
    "User-Agent": "curl/7.47.0"
  },
  "json": null,
  "origin": "104.232.115.83, 104.232.115.83",
  "url": "https://httpbin.org/post"
}
== Info: Connection #0 to host httpbin.org left intact

Relevant section:

=> Send data, 158 bytes (0x9e)
0000: 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d ----------------
0010: 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 34 62 32 38 37 64 ----------4b287d
0020: 37 30 38 62 66 63 62 34 36 33 0d 0a 43 6f 6e 74 708bfcb463..Cont
0030: 65 6e 74 2d 44 69 73 70 6f 73 69 74 69 6f 6e 3a ent-Disposition:
0040: 20 66 6f 72 6d 2d 64 61 74 61 3b 20 6e 61 6d 65  form-data; name
0050: 3d 22 ce b1 ce bb ce ae ce b8 ce b5 ce b9 ce b1 ="..............
0060: 22 3b 20 66 69 6c 65 6e 61 6d 65 3d 22 ce b1 ce "; filename="...
0070: bb ce ae ce b8 ce b5 ce b9 ce b1 2e 74 78 74 22 ............txt"
0080: 0d 0a 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 ..Content-Type:
0090: 74 65 78 74 2f 70 6c 61 69 6e 0d 0a 0d 0a       text/plain....

Where 0x6e marks the start of the string αλήθεια

sigmavirus24 · 2019-03-11T13:53:00Z

(e.g., if you know already that Flask or Django outright chokes on the HTML5 format, that would be a great datapoint for us)

I don't know that already. I'm just familiar with the ASCII-centric view of most WSGI/Rack/etc based frameworks and the fact that they flat out don't keep up with decades old web standards, let alone those released a handful of years ago.

I'm not certain this will break things that aren't already broken either so :Shrug:

As for using environment variables to control behaviour, that way lies dragons from my experiences in Requests. It's can be a great control rod for "power" users but I'm not sure this is a control rod we need right now. Besides, it's easier to say no today and yes tomorrow.

theacodes · 2019-03-11T16:05:32Z

Totally agree with "Besides, it's easier to say no today and yes tomorrow." but we've been saying no to this for so long that I think we're actively harming users. I don't want to do something that'll break Requests, but I also don't want to continue sitting on our hands here.

…

On Mon, Mar 11, 2019, 6:53 AM Ian Stapleton Cordasco < ***@***.***> wrote: (e.g., if you know already that Flask or Django outright chokes on the HTML5 format, that would be a great datapoint for us) I don't know that already. I'm just familiar with the ASCII-centric view of most WSGI/Rack/etc based frameworks and the fact that they flat out don't keep up with decades old web standards, let alone those released a handful of years ago. I'm not certain this will break things that aren't already broken either so :Shrug: As for using environment variables to control behaviour, that way lies dragons from my experiences in Requests. It's can be a great control rod for "power" users but I'm not sure this is a control rod we need right now. Besides, it's easier to say no today and yes tomorrow. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#1492 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAPUc1vwWKpvo6u5lchH4_YbSpzOXc-Kks5vVl-9gaJpZM4Y_S3g> .

sethmlarson · 2019-03-11T16:29:05Z

@theacodes I might be wrong but sigma may have wanted that comment to apply to the environment variable controlling behavior instead of the general idea of using HTML5 encoding.

Correct me if I'm wrong, that's how I understood your comment @sigmavirus24

theacodes · 2019-03-11T16:33:09Z

Oh in that case I'm happy to leave that out until someone needs it

…

On Mon, Mar 11, 2019, 9:29 AM Seth Michael Larson ***@***.***> wrote: @theacodes <https://github.com/theacodes> I might be wrong but sigma might wanted that comment to apply to the environment variable controlling behavior instead of the general idea of using HTML5 encoding. Correct me if I'm wrong, that's how I understood your comment @sigmavirus24 <https://github.com/sigmavirus24> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1492 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAPUcxLXyhmNeziAXHud5Qad-tr5tJBHks5vVoRSgaJpZM4Y_S3g> .

shazow · 2019-03-11T16:52:55Z

+1 to making the encoding pluggable (maybe we can provide an encoder/decoder to override the default; we can even provide the two encoder options out of the box and consumers can choose whichever downstream), but -0.5 to making it controlled by an environment variable.

I feel env-based internal knobs in libraries produce unexpected precedence bugs, and it's hard to draw a line of what should be in an env var vs not, and also makes it hard to deprecate noticeably. IMO env knobs make more sense in end-user applications, rather than libraries.

FWIW, "breaking things that are already broken" at least puts us on the right side of history in the long run vs perpetuating broken behaviour. :) I generally vote for some acute short term pain in favour of progress. It's also usually the less burdensome path for maintainers.

tox.ini

theacodes · 2019-03-11T21:44:50Z

Confirmed that this works as expected with Flask.

Urllib3 request:

import urllib3
http = urllib3.PoolManager()
http.request("POST", "http://localhost:5000", fields={"αλήθεια": ("αλήθεια.txt", "Meep", "text/plain")})

the request headers read:

ImmutableMultiDict([('αλήθεια', <FileStorage: 'αλήθεια.txt' ('text/plain')>)])

theacodes · 2019-03-11T22:25:24Z

Giving up for today: I can't reproduce the CI's failure locally, which makes debugging this hard.

codecov-io · 2019-03-12T02:00:20Z

Codecov Report

Merging #1492 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master   #1492      +/-   ##
=========================================
- Coverage   99.94%   99.9%   -0.05%     
=========================================
  Files          22      22              
  Lines        1826    2075     +249     
=========================================
+ Hits         1825    2073     +248     
- Misses          1       2       +1

Impacted Files	Coverage Δ
src/urllib3/fields.py	`100% <100%> (ø)`	⬆️
src/urllib3/connection.py	`99.01% <0%> (-0.99%)`	⬇️
src/urllib3/util/timeout.py	`100% <0%> (ø)`	⬆️
src/urllib3/response.py	`100% <0%> (ø)`	⬆️
src/urllib3/connectionpool.py	`100% <0%> (ø)`	⬆️
src/urllib3/util/url.py	`100% <0%> (+1.13%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2a0957e...ea9003e. Read the comment docs.

theacodes · 2019-03-12T02:17:45Z

Alright, @sethmlarson this is ready for review. Most importantly, I want to make sure you're comfortable with the new parameter name header_encoder.

sigmavirus24 · 2019-03-12T11:50:32Z

Personally +1 on header_encoder. Also @sethmlarson understood my objection to be the environment variable appropriately. This all sounds good to me.

sethmlarson

Found some issues and I have some questions as well.

dummyserver/handlers.py

src/urllib3/fields.py

sethmlarson · 2019-03-12T12:27:35Z

Also thoughts on adding a section to the docs about how to switch to previous or custom behavior? Basically documenting the pluggability

theacodes

@sethmlarson thanks for the thorough review (apparently I'm quite rusty!) should be much better now.

src/urllib3/fields.py

sethmlarson

I added one more test case for control characters and fixed the docs failure on Travis, you can take a look at my commits @theacodes.

Assuming I didn't break anything with my latest commits this looks good to me! Thanks for picking this up and running with it. :)

theacodes · 2019-03-23T03:35:40Z

Thanks, @sethmlarson! You can merge when you're ready. :)

theacodes · 2019-03-23T03:53:00Z

So happy to finally put this one to bed. Thanks, everyone! 💜

…

On Fri, Mar 22, 2019, 8:52 PM Seth Michael Larson ***@***.***> wrote: Merged #1492 <#1492> into master. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1492 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAPUc25U9-tX4OOfl_77SycYAXHne8J2ks5vZaTugaJpZM4Y_S3g> .

sethmlarson · 2019-03-23T03:53:20Z

Woo! 🎉 Thank you @Spryttan for opening the original PR and @Robbt for rebasing. This is a great change. :)

…3#1492)

James Elmendorf and others added 5 commits May 3, 2016 16:40

Merge remote-tracking branch 'refs/remotes/shazow/master'

45b8f40

Changed Mapping from String to Formatting Function to Specifying Call…

d91467d

…able Directly

attempt to merge files

5cd19a5

sethmlarson self-requested a review December 3, 2018 20:05

theacodes added the Solution Proposed label Mar 11, 2019

theacodes self-assigned this Mar 11, 2019

Fixing up some stuff

ac01642

sethmlarson reviewed Mar 11, 2019

View reviewed changes

tox.ini Show resolved Hide resolved

theacodes added 2 commits March 11, 2019 15:34

smh unicode

f28b7e8

Fix isinstance check

3eafc75

urllib3 deleted a comment from codecov-io Mar 12, 2019

Fix typo... :(

db63197

theacodes added Ready and removed Solution Proposed labels Mar 12, 2019

sethmlarson requested changes Mar 12, 2019

View reviewed changes

sethmlarson reviewed Mar 12, 2019

View reviewed changes

src/urllib3/fields.py Show resolved Hide resolved

sethmlarson reviewed Mar 12, 2019

View reviewed changes

src/urllib3/fields.py Outdated Show resolved Hide resolved

Address review comments

131d0d5

theacodes reviewed Mar 22, 2019

View reviewed changes

src/urllib3/fields.py Show resolved Hide resolved

src/urllib3/fields.py Outdated Show resolved Hide resolved

src/urllib3/fields.py Outdated Show resolved Hide resolved

sethmlarson added 2 commits March 22, 2019 22:17

Fix the HTML5 Working Draft link

6d00228

Add unit test for HTML5 control character handling

ea9003e

sethmlarson approved these changes Mar 23, 2019

View reviewed changes

sethmlarson merged commit 46331f9 into urllib3:master Mar 23, 2019

joesecurity mentioned this pull request Apr 11, 2019

_post function mangles Unicode filenames joesecurity/jbxapi#10

Closed

This was referenced Aug 21, 2019

Mailgun: disable workaround for Unicode attachment filenames if fixed urllib3 in use anymail/django-anymail#157

Closed

Non-ASCII filename uploads don't comply with RFC 7578 psf/requests#4652

Closed

blueset mentioned this pull request Aug 25, 2019

is telegram.vendor.ptb_urllib3.urllib3 necessary for ETM? ehForwarderBot/efb-telegram-master#68

Closed

davidism mentioned this pull request Jun 9, 2021

Bug: Content-Disposition Header Lacking "filename" When "filename*" Is Present #1062

Closed

Dobatymo pushed a commit to Dobatymo/urllib3 that referenced this pull request Mar 16, 2022

Encode field names using HTML5 by default instead of RFC 2231 (urllib…

b2aabbc

…3#1492)

mkurz mentioned this pull request Jan 12, 2023

Escape Content-Disposition params according to WHATWG HTML living standard playframework/playframework#11571

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change fields.py to use HTML5 non-ASCII File Names by Default instead of RFC2231 for encoding #1492

Change fields.py to use HTML5 non-ASCII File Names by Default instead of RFC2231 for encoding #1492

Robbt commented Dec 3, 2018 •

edited by theacodes

sethmlarson commented Dec 3, 2018

Robbt commented Dec 3, 2018

theacodes commented Jan 22, 2019

Robbt commented Jan 22, 2019

theacodes commented Jan 22, 2019 via email

sigmavirus24 commented Jan 22, 2019

sethmlarson commented Jan 22, 2019

Robbt commented Jan 22, 2019

sigmavirus24 commented Jan 22, 2019

Robbt commented Jan 22, 2019

theacodes commented Mar 11, 2019 •

edited

sethmlarson commented Mar 11, 2019

theacodes commented Mar 11, 2019

theacodes commented Mar 11, 2019

sigmavirus24 commented Mar 11, 2019

theacodes commented Mar 11, 2019 via email

sethmlarson commented Mar 11, 2019 •

edited

theacodes commented Mar 11, 2019 via email

shazow commented Mar 11, 2019

theacodes commented Mar 11, 2019

theacodes commented Mar 11, 2019

codecov-io commented Mar 12, 2019 •

edited

theacodes commented Mar 12, 2019

sigmavirus24 commented Mar 12, 2019

sethmlarson left a comment

sethmlarson commented Mar 12, 2019

theacodes left a comment

sethmlarson left a comment

theacodes commented Mar 23, 2019

theacodes commented Mar 23, 2019 via email

sethmlarson commented Mar 23, 2019

Change fields.py to use HTML5 non-ASCII File Names by Default instead of RFC2231 for encoding #1492

Change fields.py to use HTML5 non-ASCII File Names by Default instead of RFC2231 for encoding #1492

Conversation

Robbt commented Dec 3, 2018 • edited by theacodes

sethmlarson commented Dec 3, 2018

Robbt commented Dec 3, 2018

theacodes commented Jan 22, 2019

Robbt commented Jan 22, 2019

theacodes commented Jan 22, 2019 via email

sigmavirus24 commented Jan 22, 2019

sethmlarson commented Jan 22, 2019

Robbt commented Jan 22, 2019

sigmavirus24 commented Jan 22, 2019

Robbt commented Jan 22, 2019

theacodes commented Mar 11, 2019 • edited

sethmlarson commented Mar 11, 2019

theacodes commented Mar 11, 2019

theacodes commented Mar 11, 2019

httpie

curl

sigmavirus24 commented Mar 11, 2019

theacodes commented Mar 11, 2019 via email

sethmlarson commented Mar 11, 2019 • edited

theacodes commented Mar 11, 2019 via email

shazow commented Mar 11, 2019

theacodes commented Mar 11, 2019

theacodes commented Mar 11, 2019

codecov-io commented Mar 12, 2019 • edited

Codecov Report

theacodes commented Mar 12, 2019

sigmavirus24 commented Mar 12, 2019

sethmlarson left a comment

Choose a reason for hiding this comment

sethmlarson commented Mar 12, 2019

theacodes left a comment

Choose a reason for hiding this comment

sethmlarson left a comment

Choose a reason for hiding this comment

theacodes commented Mar 23, 2019

theacodes commented Mar 23, 2019 via email

sethmlarson commented Mar 23, 2019

Robbt commented Dec 3, 2018 •

edited by theacodes

theacodes commented Mar 11, 2019 •

edited

sethmlarson commented Mar 11, 2019 •

edited

codecov-io commented Mar 12, 2019 •

edited