Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying encodings other than UTF-8 in XML declaration written #315

Open
cassiomolin opened this issue Oct 12, 2018 · 14 comments
Open
Labels
adding-declarations Issues related to adding non-content declarations to XML output most-wanted Tag to indicate that there is heavy user +1'ing action

Comments

@cassiomolin
Copy link

cassiomolin commented Oct 12, 2018

The UTF-8 encoding is hard coded in the ToXmlGenerator source code:

if (Feature.WRITE_XML_1_1.enabledIn(_formatFeatures)) {
    _xmlWriter.writeStartDocument("UTF-8", "1.1");
} else if (Feature.WRITE_XML_DECLARATION.enabledIn(_formatFeatures)) {
    _xmlWriter.writeStartDocument("UTF-8", "1.0");
} else {
    return;
}

Once ToXmlGenerator is final, there might not be an easy way to have other encodings such as ISO-8859-1:

<?xml version="1.0" encoding="ISO-8859-1"?>

See this question in Stack Overflow for reference.

@cassiomolin cassiomolin changed the title Allow encoding other than UTF-8 Allow encodings other than UTF-8 Oct 12, 2018
@cowtowncoder
Copy link
Member

Ah. Yes, I see. So although underlying writer may actually use different encoding, xml declaration claims it is UTF-8. That's not good.

@cowtowncoder cowtowncoder added 2.11 and removed 2.10 labels Oct 3, 2019
@saimonsez
Copy link

Additionaly, XmlFactory outputs utf-8 only:

protected XMLStreamWriter _createXmlWriter(IOContext ctxt, OutputStream out) throws IOException
    {
     XMLStreamWriter sw;
     try {
         sw = _xmlOutputFactory.createXMLStreamWriter(_decorate(ctxt, out), "UTF-8");
     } catch (Exception e) {
         throw new JsonGenerationException(e.getMessage(), e, null);
     }
     return _initializeXmlWriter(sw);
}

@cowtowncoder
Copy link
Member

@saimonsez That is intentional however (partly since there is no mechanism to pass non-Unicode encodings); caller is expected to create Writer for alternate encodings.

However, I hope to introduce a mechanism to allow users to create document "header" (xml declaration and/or DOCTYPE declaration) via XMLStreamWriter, which would allow adding encoding in xml declaration.

@saimonsez
Copy link

saimonsez commented May 14, 2020

I see, thank you for clarification. In my case, the caller is spring-web (AbstractJackson2HttpMessageConverter) without a chance to configure an enocding other than unicode, so I am stuck again. Are you by chance involved with springs integration of jackson?

@saimonsez
Copy link

I just created spring-projects/spring-framework#25076 which is related to this issue (if jackson is used with spring).

@cowtowncoder
Copy link
Member

I am only involved whenever Spring folks file bugs, but do not know their code base (and they don't use, I think, Jackson JAX-RS provider).
Their involvement would be needed even if new functionality / endpoints were added, for what that is worth.

@kromit
Copy link

kromit commented Aug 18, 2020

This is still moved for 2 years from 2.10 to 2.11 to 2.12. So next is2.13?

Any stupid workaround would be great.

@cowtowncoder
Copy link
Member

@kromit You are absolutely welcome to provide a fix as you seem to need it.

@kromit
Copy link

kromit commented Aug 18, 2020

@cowtowncoder I've looked into this and I would break significantly more things on the way, than I would fix. 🙈
Not sure if my workaround is legit but this is what I am using.

private final String DOCTYPE ="<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n";
  
Writer writer = new OutputStreamWriter(out, StandardCharsets.ISO_8859_1);
writer.write(DOCTYPE);

XmlMapper xmlMapper = new XmlMapper();
xmlMapper.writeValue(writer, value);

@cowtowncoder
Copy link
Member

Couple of possibly helpful pointers:

  • Add possibility to add DOCTYPE element #150 is related; there should be a way to customize writing of DOCTYPE as well as xml declaration
  • You could also manually create XMLStreamWriter, use writer.writeStartDocument(...) to initialize, pass to XML-specific mapper.writeValue() method. But if you do so, need to disable ToXmlGenerator.Feature.WRITE_XML_DECLARATION (in fact that may already be necessary in your case?)

The idea with #150 (which I really would like to get in 2.12 if I have time) would be to allow registering a writer callback that would write all pre-amble events (xml declaration and/or DOCTYPE) the way caller wants. Conceptually simple just need to think of a way to do that in a way that fits with format-specific handling of Jackson's databind (most API is format-agnostic).

@cowtowncoder cowtowncoder added 2.13 and removed 2.12 labels Nov 13, 2020
@cowtowncoder cowtowncoder changed the title Allow encodings other than UTF-8 Allow specifying encodings other than UTF-8 in XML declaration written Nov 13, 2020
@cowtowncoder cowtowncoder added the most-wanted Tag to indicate that there is heavy user +1'ing action label Nov 13, 2020
@cowtowncoder cowtowncoder added the adding-declarations Issues related to adding non-content declarations to XML output label Jul 7, 2021
@pjfanning
Copy link
Member

pjfanning commented Apr 29, 2023

@cowtowncoder is there any appetite to take this on for v2.16?

If I was to look at this, I'd prefer not to use custom declaration handling and to concentrate on allowing this module to create XMLStreamWriter instances that could be created with other encodings.

In pseudocode, a user might be able to do this

XmlMapper mapper = new XmlMapper();
byte[] bytes = mapper.writeValueAsBytes(myObjectInstance, "Big5");

The bytes in the example above would pop out with Big5 as the encoding in the XML declaration and the chars would be encoded as Big5. We can add equivalent extra methods to XmlMapper for writeValue(OutputStream, ...).

We can decide afterwards if it makes sense to support this on the String/Reader write methods.

If you look at XmlFactory, there is some support for JsonEncoding but this enum has very limited values - all Unicode ones. This is why in the pseudocode, the encoding is a String - or it could be a java.nio.charset.Charset. I know that the Java encoding names may not be ideal - as they may not match the encoding names used in other applications but it should be a decent enough place to start.

I have a couple of changes here but the use of JsonEncoding is not a good place to start. And in practice, users don't typically create their own generators and use higher level methods like writeValueAsString/writeValueAsBytes.

@cowtowncoder
Copy link
Member

@pjfanning Agreed, JsonEncoding is too limited to help. And yes, it'd be great to work on solving this in 2.16. I think your points are valid wrt it being difficult to add simple handler as one also needs to be able to create underlying XMLStreamWriter that uses encoding -- although this is not necessarily as problematic for other things one might want to "inject" (like DOCTYPE declaration).

So I am open to alternate takes here; I do not have current specific plans (or time) to tackle this myself, so should not try to dictate solution. But will definitely give feedback :)

@pjfanning
Copy link
Member

@cowtowncoder
Copy link
Member

@pjfanning I think using XMLStreamWriter and writing should work with earlier versions, but is not tested. If you wanted to backport tests to verify in 2.16 that'd be fine?

2.17 adds convenience factory methods but fundamentally those are not necessary I think.

This isn't super convenient mechanism of course, but at least makes certain things possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adding-declarations Issues related to adding non-content declarations to XML output most-wanted Tag to indicate that there is heavy user +1'ing action
Projects
None yet
Development

No branches or pull requests

5 participants