Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FormHttpMessageConverter writes ASCII encoded so that a multipart form data can not contain filenames with German Umlaute [SPR-12108] #16724

Closed
spring-projects-issues opened this issue Aug 21, 2014 · 9 comments
Assignees
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) type: enhancement A general enhancement
Milestone

Comments

@spring-projects-issues
Copy link
Collaborator

spring-projects-issues commented Aug 21, 2014

Leif Hanack opened SPR-12108 and commented

Hi there,

we encountered a problem using FormHttpMessageConverter when writing a multi part. Our use case is to create a multi part on the server side and send it to another service. The multi part contains a FileSystemResource and has a filename with German Umlaute (e.g. "Hallöle.png").

Unfortunately FormHttpMessageConverter.write uses getAsciiBytes. This seems to be W3C compliant.

Control names originally encoded in non-ASCII character sets may be encoded using the method outlined in [RFC2045]. (http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1)

We tried a couple of things even javax.mail.internet.MimeUtility.decode, but nothing worked for us.

We were only able to get it fixed when changing
name.getBytes("US-ASCII")
to
name.getBytes("UTF-8")

What would you recommend? How to deal with NON-ASCII characters?

Thanks a lot, Leif


Affects: 4.0.6

Issue Links:

Referenced from: commits 7f43f02, 9be0cf2

@spring-projects-issues
Copy link
Collaborator Author

Leif Hanack commented

Is this issue slipped through the filters because of the Spring One conference:) or how long is a ticket normally waiting for triage?
Thank you, Leif

@spring-projects-issues
Copy link
Collaborator Author

Rossen Stoyanchev commented

At the end of the day ASCII is expected so I'm not sure there is anything more we can do.

Is preserving the exact file name in the part name crucial? When supplying the map with parts you could re-write part names replacing non-ASCII characters with their Unicode escape sequences but then code on the other side would have to detect those Unicode escape sequences and turn them back into Unicode characters. Or you might consider to dumb down the "ö" into "o" (instead of the "?" you get from getBytes) if that is acceptable.

@spring-projects-issues
Copy link
Collaborator Author

Leif Hanack commented

What feels strange to me is that Spring will use the clients charset (or default UTF-8) when it reads from a client (browser). For me that seems like we are already away from the original spec. Otherwise it would be consequent to accept ASCII only:) So far it is no problem if a client/browser sending non-ASCII encoded data to a Spring backend. But Spring backend to Spring backend communication is not able to do that. Sounds weird.

I hate flags but it might be an option to make it optional? What do you think?

Maybe instead of getAsciiBytes(String name) something like:

getBytes(String name) {
  try {
    // useAscii can be set and is true by default
    def charset = useAscii ? "US-ASCII" : charset.name()
    return name.getBytes(charset);
  }
  catch (UnsupportedEncodingException ex) {
    throw new IllegalStateException(ex);
  }  
}

Thanks, Leif

@spring-projects-issues
Copy link
Collaborator Author

Rossen Stoyanchev commented

What feels strange to me is that Spring will use the clients charset (or default UTF-8) when it reads from a client (browser).

We're still talking about writing and then reading headers? In other words using FormHttpMessageConverter to create a multipart request which then cannot be processed on the server side? As far as I can see on the server side we are simply getting the headers through the HttpServletRequest. Can you be more specific?

@spring-projects-issues
Copy link
Collaborator Author

Leif Hanack commented

When I send a multipart form from the browser to a Spring Boot App's RestController (backend A) it will handle filenames with German umlaute properly. In this case "FormHttpMessageConverter" is not used. When I want to "forward" that multipart request to another backend (backend B), "FormHttpMessageConverter" is used. In that case the former filename will be ASCII encoded so that the German umlaute will get lost.

| Browser | -> | backend A | -> | backend B |
| | uses UTF-8, works | | uses ASCII, don't work as expected | |

This seems to be not consequent. Either broswer -> backend A should be W3C-conform as well and only use ASCII or backend A -> backend B should allow UTF-8 encoding as well.

I hope I could make my point clear:)

Regards, Leif

@spring-projects-issues
Copy link
Collaborator Author

Rossen Stoyanchev commented

Thanks for elaborating.

After some further research it appears that Apache Commons FileUpload has a setHeaderEncoding property that's in turn supported in Spring's CommonsMultipartResolver as the the setDefaultEncoding property. The encoding is used internally to parse bytes representing part headers including a fall-back on the encoding of the request (and platform encoding as last resort) if necessary. From discussions it appears this is done to support what browsers actually do -- see this discussion and also FILEUPLOAD-199 which shows that as of 1.3 FileUpload also supports the encoded-word syntax from RFC 2047.

I haven't found anything in the Servlet 3 spec nor in the API that addresses header encoding. Hence I'm not sure what the experience is across containers. I'd be curious to know which you're using (Servlet 3 or Apache commons) and what server?

Based on these findings I'm going to set this for 4.1.1. In the very least we can provide a property for the encoding as you suggested to use for headers in multipart content much like what Apache commons already does on the server (reading) side.

@spring-projects-issues
Copy link
Collaborator Author

Rossen Stoyanchev commented

On further thought while I think it's okay to adapt to non-standard behavior from browsers, it's not okay to perpetuate such behavior on the client side, which is what we are in this case. I committed a fix that lets you set a multipartCharset property on the converter. It then applies the encoded-word syntax from RFC 2047 via MimeUtility.encodeWord (from javax.mail). There is a test with Apache Commons FileUpload that successfully parses the written data but if you could please give it a try with 4.1.1.BUILD-SNAPSHOT. Thanks.

@spring-projects-issues
Copy link
Collaborator Author

Leif Hanack commented

Looks great and works!

To answer your previous question:) We use spring-boot with tomcat on backend A and spring with tomcat on backend b.

Thanks a lot for your quickness.
Regards, Leif

@spring-projects-issues
Copy link
Collaborator Author

spring-projects-issues commented Dec 22, 2016

Rossen Stoyanchev commented

Note that since the setMultipartCharset property was added, a more recent version of the multipart form-data spec RFC 7578 has changed recommendations. See this comment under #19072 for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

2 participants