Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoding Decimal from JSON number is lossy #9180

Open
1 task done
FeldrinH opened this issue Apr 6, 2024 · 8 comments · May be fixed by #9292
Open
1 task done

Decoding Decimal from JSON number is lossy #9180

FeldrinH opened this issue Apr 6, 2024 · 8 comments · May be fixed by #9292
Labels
bug V2 Bug related to Pydantic V2 good first issue help wanted Pull Request welcome

Comments

@FeldrinH
Copy link

FeldrinH commented Apr 6, 2024

Initial Checks

  • I confirm that I'm using Pydantic V2

Description

When decoding a JSON number into a Python Decimal the precision seems to be limited. After a certain number of digits the value is cut off. This is something I would expect for a fixed precision float, but not for an arbitrary precision Decimal. This only happens with numbers that contain a decimal point. I assume what happens is that the number is internally converted to a float and then a Decimal.

As a user, this lossy internal conversion is an unexpected and unwelcome surprise. Both the initial JSON string and the final Decimal can contain the full precision of the value without loss, so I would have expected the conversion to be lossless as well.

PS: I'm not sure if this is a bug per se because I could not find any documentation that explicitly states what the expected behavior is. However, based on what was written in the documentation it certainly was unexpected to me.

Example Code

from decimal import Decimal
from pydantic import BaseModel

class Test(BaseModel):
    value: Decimal

print(Test.model_validate_json('{"value": 1.234567890123456789012345678901234567890}'))
# Expected output: value=Decimal('1.234567890123456789012345678901234567890')
# Actual output: value=Decimal('1.2345678901234567')

print(Test.model_validate_json('{"value": 12345678901234567890123456789012345678.9}'))
# Expected output: value=Decimal('12345678901234567890123456789012345678.9')
# Actual output: value=Decimal('12345678901234568000000000000000000000')

Python, Pydantic & OS Version

             pydantic version: 2.6.4
        pydantic-core version: 2.16.3
          pydantic-core build: profile=release pgo=true
                 install path: <redacted>
               python version: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
                     platform: Windows-10-10.0.19045-SP0
             related packages: fastapi-0.109.2 mypy-1.3.0 typing_extensions-4.8.0
                       commit: unknown
@FeldrinH FeldrinH added bug V2 Bug related to Pydantic V2 pending Awaiting a response / confirmation labels Apr 6, 2024
@FeldrinH
Copy link
Author

FeldrinH commented Apr 6, 2024

PS: There are a number of similar issues with Decimal decoding that are marked as resolved (#6807, #6295). As far as I can tell they are distinct from this issue (most importantly, those issues are resolved whereas this issue is present in the latest Pydantic version).

@sydney-runkle
Copy link
Member

@FeldrinH,

Thanks for reporting. Definitely looks like a bug, and I'm guessing will have to be fixed in pydantic-core. Adding this to our 2.8 milestone, and marking as a good first issue for anyone interested!

@sydney-runkle sydney-runkle added good first issue help wanted Pull Request welcome and removed pending Awaiting a response / confirmation labels Apr 9, 2024
@ybressler
Copy link
Contributor

I can take this one.

@ybressler
Copy link
Contributor

As an aside, the following test cases pass:

@pytest.mark.parametrize(
    'value',
    [
        Decimal(1.234567890123456789012345678901234567890),
        Decimal(12345678901234567890123456789012345678.9),
        Decimal(1) / Decimal(7)
    ]
)
def test_long_decimal_decoding(value: Decimal) -> None:
    """
    Really large decimal values should not be lost when encoding or decoding from json (or other input formats).
    """

    class Obj(BaseModel):
        value: Decimal


    m = Obj.model_validate_json(json.dumps({"value": value.real}, default=str))
    assert m.value.real == value

But if the values are provided as raw floats, not Decimal, then they fail. Something to note,.

@ybressler
Copy link
Contributor

Found where the behavior is being caused:

        primitive_schema = core_schema.union_schema(
            [
                # if it's an int keep it like that and pass it straight to Decimal
                # but if it's not make it a string
                # we don't use JSON -> float because parsing to any float will cause
                # loss of precision
                core_schema.int_schema(strict=True),   # <--------------------- HERE
                core_schema.str_schema(strict=True, strip_whitespace=True),
                core_schema.no_info_plain_validator_function(str),
            ],
        )

@ybressler
Copy link
Contributor

Also, this test works in reverse, lossiness on json encoding too:

    m = Test(value=Decimal(1.234567890123456789012345678901234567890))
    print(m.model_dump_json())
    # expected output:  {"value":"1.234567890123456789012345678901234567890"}
    # actual output:    {"value":"1.2345678901234566904321354741114191710948944091796875"}

@ybressler
Copy link
Contributor

Alright, got a solution going. Need some help with the deserialization component. #9291

@ybressler
Copy link
Contributor

ybressler commented Apr 19, 2024

Alright, got a solution going. Need some help with the deserialization component. #9291

Welp! That was on a previous release of pydantic. Back to square one, working through it.

New PR: #9292

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V2 Bug related to Pydantic V2 good first issue help wanted Pull Request welcome
Projects
None yet
3 participants