New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FormHttpMessageConverter writes ASCII encoded so that a multipart form data can not contain filenames with German Umlaute [SPR-12108] #16724
Comments
Leif Hanack commented Is this issue slipped through the filters because of the Spring One conference:) or how long is a ticket normally waiting for triage? |
Rossen Stoyanchev commented At the end of the day ASCII is expected so I'm not sure there is anything more we can do. Is preserving the exact file name in the part name crucial? When supplying the map with parts you could re-write part names replacing non-ASCII characters with their Unicode escape sequences but then code on the other side would have to detect those Unicode escape sequences and turn them back into Unicode characters. Or you might consider to dumb down the "ö" into "o" (instead of the "?" you get from getBytes) if that is acceptable. |
Leif Hanack commented What feels strange to me is that Spring will use the clients charset (or default UTF-8) when it reads from a client (browser). For me that seems like we are already away from the original spec. Otherwise it would be consequent to accept ASCII only:) So far it is no problem if a client/browser sending non-ASCII encoded data to a Spring backend. But Spring backend to Spring backend communication is not able to do that. Sounds weird. I hate flags but it might be an option to make it optional? What do you think? Maybe instead of getAsciiBytes(String name) something like:
Thanks, Leif |
Rossen Stoyanchev commented
We're still talking about writing and then reading headers? In other words using FormHttpMessageConverter to create a multipart request which then cannot be processed on the server side? As far as I can see on the server side we are simply getting the headers through the HttpServletRequest. Can you be more specific? |
Leif Hanack commented When I send a multipart form from the browser to a Spring Boot App's RestController (backend A) it will handle filenames with German umlaute properly. In this case "FormHttpMessageConverter" is not used. When I want to "forward" that multipart request to another backend (backend B), "FormHttpMessageConverter" is used. In that case the former filename will be ASCII encoded so that the German umlaute will get lost. | Browser | -> | backend A | -> | backend B | This seems to be not consequent. Either broswer -> backend A should be W3C-conform as well and only use ASCII or backend A -> backend B should allow UTF-8 encoding as well. I hope I could make my point clear:) Regards, Leif |
Rossen Stoyanchev commented Thanks for elaborating. After some further research it appears that Apache Commons FileUpload has a setHeaderEncoding property that's in turn supported in Spring's CommonsMultipartResolver as the the setDefaultEncoding property. The encoding is used internally to parse bytes representing part headers including a fall-back on the encoding of the request (and platform encoding as last resort) if necessary. From discussions it appears this is done to support what browsers actually do -- see this discussion and also FILEUPLOAD-199 which shows that as of 1.3 FileUpload also supports the encoded-word syntax from RFC 2047. I haven't found anything in the Servlet 3 spec nor in the API that addresses header encoding. Hence I'm not sure what the experience is across containers. I'd be curious to know which you're using (Servlet 3 or Apache commons) and what server? Based on these findings I'm going to set this for 4.1.1. In the very least we can provide a property for the encoding as you suggested to use for headers in multipart content much like what Apache commons already does on the server (reading) side. |
Rossen Stoyanchev commented On further thought while I think it's okay to adapt to non-standard behavior from browsers, it's not okay to perpetuate such behavior on the client side, which is what we are in this case. I committed a fix that lets you set a multipartCharset property on the converter. It then applies the encoded-word syntax from RFC 2047 via MimeUtility.encodeWord (from javax.mail). There is a test with Apache Commons FileUpload that successfully parses the written data but if you could please give it a try with 4.1.1.BUILD-SNAPSHOT. Thanks. |
Leif Hanack commented Looks great and works! To answer your previous question:) We use spring-boot with tomcat on backend A and spring with tomcat on backend b. Thanks a lot for your quickness. |
Rossen Stoyanchev commented Note that since the |
Leif Hanack opened SPR-12108 and commented
Hi there,
we encountered a problem using FormHttpMessageConverter when writing a multi part. Our use case is to create a multi part on the server side and send it to another service. The multi part contains a FileSystemResource and has a filename with German Umlaute (e.g. "Hallöle.png").
Unfortunately FormHttpMessageConverter.write uses getAsciiBytes. This seems to be W3C compliant.
We tried a couple of things even javax.mail.internet.MimeUtility.decode, but nothing worked for us.
We were only able to get it fixed when changing
name.getBytes("US-ASCII")
to
name.getBytes("UTF-8")
What would you recommend? How to deal with NON-ASCII characters?
Thanks a lot, Leif
Affects: 4.0.6
Issue Links:
Referenced from: commits 7f43f02, 9be0cf2
The text was updated successfully, but these errors were encountered: