Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to serialise Decimal as string #600

Open
ameyarao98 opened this issue Jul 15, 2023 · 15 comments
Open

Add ability to serialise Decimal as string #600

ameyarao98 opened this issue Jul 15, 2023 · 15 comments

Comments

@ameyarao98
Copy link

Currently Decimal serialises to floats by default, which is undesirable when working with money for instance. The default option cannot be used for this since it does not raise an error, so a clean way to be able to specify that Decimal should be serialised as a string in dumps would be very helpful

@bwoodsend
Copy link
Collaborator

To be clear, the expected behaviour is:

>>> ujson.dumps(decimal.Decimal("1.123123123123123123123123123"))
'"1.123123123123123123123123123"'

i.e. the output is a JSON string and there's no rounding?

@bwoodsend
Copy link
Collaborator

And setting the default parameter to something that calls str() if given a Decimal() doesn't solve that problem because the default is only called if ujson can't handle the value on its own and ujson thinks that converting Decimal() instances to floats is intended handling.

@ameyarao98
Copy link
Author

Current behaviour:

>>> ujson.dumps({"amount": Decimal("15.35")})
'{"amount":15.35}'

Expected behaviour:

>>> ujson.dumps({"amount": Decimal("15.35")}, decimal_as_string=True)
'{"amount":"15.35"}'

@JustAnotherArchivist
Copy link
Collaborator

I would argue that the dumping behaviour is correct. Numbers in JSON are not floats but arbitrarily long sequences of digits.

For parsing, we should support parse_float = decimal.Decimal as is mentioned in the stdlib json documentation. There is already an open issue for that: #401

@bwoodsend
Copy link
Collaborator

Honestly, I'd rather delete the current handling of Decimal() and force people to either normalize their data into standard objects before passing it to ujson.dumps() or set default to a function using either str() or float(). This is the same issue as serialising datetime()s – it's ambiguous and shouldn't be the job of any JSON library to guess appropriate approximations of incompatible types.

@JustAnotherArchivist
Copy link
Collaborator

JustAnotherArchivist commented Jul 15, 2023

I disagree. decimal.Decimal can be directly mapped to a JSON number. It's just a -?\d+(\.\d+)?(E[+-]\d+)? number in almost all cases. There are two special cases, NaN and Infinity (positive or negative), which we already support for floats anyway as an unambiguous extension to JSON. Positive and negative zero are not a problem either. So encoding to JSON is not ambiguous. Parsing is a different story, but we rightly don't do anything there.

The current implementation is definitely not right though since it converts to a float first. This results in rounding and loss of number of significant figures.

@JustAnotherArchivist
Copy link
Collaborator

Some examples of what I would expect:

>>> ujson.dumps(decimal.Decimal('1.230'))
'1.230'
>>> ujson.dumps(decimal.Decimal('1.23000000000000001'))
'1.23000000000000001'
>>> ujson.dumps(decimal.Decimal('0.123456789012345678901234567890'))
'0.123456789012345678901234567890'
>>> ujson.dumps(decimal.Decimal('-0'))
'-0'
>>> ujson.dumps(decimal.Decimal('-0.00'))
'-0.00'
>>> ujson.dumps(decimal.Decimal('1e1'))
'1E+1'  # or equivalent; this is the Python 3.10 decimal normalisation behaviour

@bwoodsend
Copy link
Collaborator

I guess that if we went down the route of decimals written exactly as JSON numbers then we'd have to implement #401 or it's all a bit of a pointless exercise.

Sod it, I don't mind which route we choose.

@JustAnotherArchivist
Copy link
Collaborator

It wouldn't be entirely pointless, but yeah, I agree we should implement that anyway (if only for compatibility with stdlib).

I looked a bit into what would be needed for this. str(decimal.Decimal(...)) isn't documented really. I think it should always produce a valid JSON number, and simplejson uses it, so it's probably safe enough, but there's no documentation of the exact format. The relevant code for it is this, but I haven't read through it yet. The alternative would be to use decimal.Decimal.as_tuple and then format it as a JSON number ourselves, but I'd rather not.

@bwoodsend
Copy link
Collaborator

If you're worried about edge cases like infinity and negative zero, then as long as we have tests for them, I don't see any harm relying on str(decimal.Decimal(...)).

@JustAnotherArchivist
Copy link
Collaborator

Those edge cases are behaving. I can't think of anything that would produce weird output. I guess it just annoys me that it isn't documented, which means it shouldn't be relied on. Yeah, I'd add plenty of tests.

@bwoodsend
Copy link
Collaborator

I do find it weird that str(decimal.Decimal(math.inf)) != str(math.inf)

@JustAnotherArchivist
Copy link
Collaborator

Fortunately, the former produces the output we want, and same for math.nan. :-)

@JustAnotherArchivist
Copy link
Collaborator

I read through the code I linked above. There's one special case: signalling NaNs would become sNaN. I'm not sure you can produce an actual signalling NaN in Python, but you certainly can do decimal.Decimal('sNaN'). Everything else should result in a valid JSON number (or NaN/Infinity/-Infinity extension). The context is only used for deciding whether or not the exponent separator E should be capitalised, but both variants are valid JSON, so we can ignore that.

@FeldrinH
Copy link

I would argue that the dumping behaviour is correct. Numbers in JSON are not floats but arbitrarily long sequences of digits.

The claim "Numbers in JSON are not floats but arbitrarily long sequences of digits." is a bit dubious.
For example RFC 8259 states that implementations should, for maximum interoperability, assume that JSON numbers can get rounded to 64-bit floats or something even less precise and should not treat them as arbitrary precision numbers (see https://datatracker.ietf.org/doc/html/rfc8259#section-6).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants