Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how to output/convert null values with TOML backend #253

Closed
cowtowncoder opened this issue Apr 9, 2021 · 10 comments
Closed

Figure out how to output/convert null values with TOML backend #253

cowtowncoder opened this issue Apr 9, 2021 · 10 comments
Assignees
Labels
TOML Issue related to TOML format backend

Comments

@cowtowncoder
Copy link
Member

Looks like TOMGenerator.writeNull() method throws an exception, since TOML does not support actual null values.

While this sort of makes sense, I immediately noticed that this is problematic for conversions: for example when doing conversion in jackson-benchmarks for test data.

One obvious solution would be, I think, to make this method just write empty String value, although other options exist, for example:

  1. Omit output -- difficult to do, however, since streaming generator does not have power to "veto" output in all cases. This is something that can be done at databind level, however
  2. Have a settting (TomlWriteFeature?) for something like FAIL_ON_NULL_WRITE (enabled or disabled by default?) that would let user select between fail, or output of default value (empty String).

My initial suggestion would be to add a feature to allow failing, but have default be "write Empty String value instead of null".

@cowtowncoder cowtowncoder added 2.13 TOML Issue related to TOML format backend labels Apr 9, 2021
@cowtowncoder
Copy link
Member Author

cc @yawkat

@yawkat
Copy link
Member

yawkat commented Apr 9, 2021

I can see the issue with conversions. A write feature for this sounds sensible.

With a default empty string, how does this interact with deserialization? It would yield an empty string JsonToken after all. That is incompatible with most fields that could contain null, but there isn't really a better default value for this either.

For this "symmetry" reason, it seems more reasonable to make it fail by default. Better fail in serialization and notice early than fail in deserialization.

@cowtowncoder
Copy link
Member Author

I don't think presented an empty String is bad form partly because this is how XML already works (although with newer optional support for xsi:nil attribute) -- Jackson does have coercion from empty String to null for many/most things.
Whether this is enabled by default or not has been inconsistent in the past (with DeserializationFeature.ACCEPT_EMPTY_STRING_AS_NULL_OBJECT being heavily overloaded).

But with 2.12 CoercionConfig was added and XmlMapper is now initialized with:

        enable(DeserializationFeature.ACCEPT_EMPTY_STRING_AS_NULL_OBJECT);
        // 13-May-2020, tatu: [dataformat-xml#377] Need to ensure we will keep XML-specific
        //    Base64 default as "MIME" (not MIME-NO-LINEFEEDS), to preserve pre-2.12 behavior
        setBase64Variant(Base64Variants.MIME);

        // 04-Jun-2020, tatu: Use new (2.12) "CoercionConfigs" to support coercion
        //   from empty and blank Strings to "empty" POJOs etc
        coercionConfigDefaults()
            // To allow indentation without problems, need to accept blank String as empty:
            .setAcceptBlankAsEmpty(Boolean.TRUE)
            // and then coercion from empty String to empty value, in general
            .setCoercion(CoercionInputShape.EmptyString, CoercionAction.AsEmpty);

and thus empty String (and blank too to account for common pretty-printing) is accepted as null for non-textual values (like numbers, date/time), mostly scalars, but also POJOs, Collections and so on.

Similarly, Properties backend defines writeNull() as:

    @Override
    public void writeNull() throws IOException
    {
        _verifyValueWrite("write null value");
        _writeUnescapedEntry("");
    }

although I am not sure if it defines similar default coercions as XML (it probably should).

So this is not really a new problem per se: lack of null presentation in formats (and some languages like... Perl I think?) has been encountered before.

I just happened to bump into this with conversions that jackson-benchmarks does.

Does this make sense?

@yawkat
Copy link
Member

yawkat commented Apr 9, 2021

Yea, if the other modules do it this way, it makes sense to adopt the behavior by default as well. I will work on implementing this then.

@yawkat
Copy link
Member

yawkat commented Apr 9, 2021

oh btw @cowtowncoder , can you give me permission to set issue labels and such here?

@cowtowncoder
Copy link
Member Author

@yawkat did you accept membership in team? If that does not have enough access I can add more; will try to figure this out.

@yawkat yawkat self-assigned this Apr 9, 2021
@yawkat
Copy link
Member

yawkat commented Apr 9, 2021

Yep works now :)

@cowtowncoder
Copy link
Member Author

Ok good. Yes, "Write" access should be able to manage labels too.

@yawkat
Copy link
Member

yawkat commented Apr 9, 2021

Looking at #256:

  • setAcceptBlankAsEmpty(Boolean.TRUE) seems unnecessary for toml, since toml always quotes strings and there's no case where whitespace would just appear.
  • what does .setCoercion(CoercionInputShape.EmptyString, CoercionAction.AsEmpty) actually do when ACCEPT_EMPTY_STRING_AS_NULL_OBJECT is also set? Does it coerce empty strings to 0 instead of null for boxed number types for example? If that's it, don't think it's a good idea for toml either.

@cowtowncoder
Copy link
Member Author

Yes, agreed that blank is less relevant for TOML specifically (one could argue against that for Properties too since I think parser trims leading space, and without any non-space there is no trailing). So that's fine.

ACCEPT_EMPTY_STRING_AS_NULL_OBJECT is a legacy setting and define mostly for code that may be relying on that, not yet upgraded to consider CoercionConfig. It seems like it should have lower precedence than CoercionConfig.
Unfortunately semantics for that feature are not well-defined.

And yes, "empty" would be Integer.valueOf(0) instead of null.

For TOML, which has bit more typing than XML or Properties, AsNull seems like a bit better option then?
Users may of course change default or per-logical/per-physical type rules.

(I was contemplating on empty-vs-null wrt Properties as well... it's bit tricky).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TOML Issue related to TOML format backend
Projects
None yet
Development

No branches or pull requests

2 participants