Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect serialization For LogicalType.Decimal #308

Open
sheinbergon opened this issue Dec 30, 2021 · 7 comments
Open

Incorrect serialization For LogicalType.Decimal #308

sheinbergon opened this issue Dec 30, 2021 · 7 comments

Comments

@sheinbergon
Copy link

sheinbergon commented Dec 30, 2021

Your following extension NonBSGenericDatumWriter of GenericDatumWriter ignores proper support for AVRO's LogicalType.Decimal (not calling Conversions.DecimalConversion) when dealing with BigDecimal types.

A possible fix (inside NonBSGenericDatumWriter) might be:

        case BYTES:
            if (datum instanceof byte[]) {
                super.writeWithoutConversion(schema, ByteBuffer.wrap((byte[]) datum), out);
                return;
            }
            // Try and convert Decimal types, if the logical type indicates it
            LogicalType logicalType = schema.getLogicalType(); 
            if (logicalType instanceof Decimal.LogicalType  ) {
                Conversion<BigDecimal> conversion = data.getConversionByClass(BigDecimal.class, logicalType);
                byte[] decimal = convert(schema, logicalType, conversion, datum);
                super.writeWithoutConversion(schema, ByteBuffer.wrap(decimal), out);
                return;
            }
            break;

I'm willing to contribute a PR (including tests), but I'd like to apply it as a fix to 2.13.x branch.

@cowtowncoder Let me know what you think

@sheinbergon
Copy link
Author

I believe #221 refers to the the same issue

@cowtowncoder
Copy link
Member

Sounds good -- fix against 2.13 would be appreciated.

@sheinbergon
Copy link
Author

@cowtowncoder While the write/(serialize) path was rather easy to fix, the read path seems a bit harder. serialization happens in a mostly Jackson generic fashion, unaware that the JsonParser implementation providing the tokens is an Avro related one.

I can register cutom JsonDeserializer for BigDecimal, just like you did in for date/time types in #283 (maybe we can bind this to a Mapper Feature), But I require schema information in order to set the scale properly.

In order to adhere to Avro Decimal serialization behavior described in Conversions.DecimalConversion - the scale is not serialized, and it is expected to be available from the schema when trying to deserialize (the number is is saved as an unscaled BigInteger)

Any ideas on how to get the correct location of the schema for that a deserializer?

10x

@cowtowncoder
Copy link
Member

I would have thought that reading/deserialization should also work fine, as long as low-level decoder uses the schema information appropriately? Although I guess if this is not the case, then various workarounds may be needed.

@sheinbergon
Copy link
Author

sheinbergon commented Jan 6, 2022

@cowtowncoder Well, it's not. Any ideas on how to get the schema information there?

It seems like the correct approach would to make JacksonAvroParserImpl and its counterpart avroContext (RecordReader$Std) be able to return the correct schema reference/location. Any tips ideas on how to best perform this?

@cowtowncoder
Copy link
Member

Schema is carried along when constructing readers/writers, so that'd be the place to store information.
It has unfortunately been a while since I looked into this so I do not remember exact flow; but basically only information deemed necessary from schema is retained for readers/writers, not the schema. To ideally avoid various by-name lookups.

@cowtowncoder cowtowncoder added 2.14 and removed 2.13 labels May 16, 2022
@cowtowncoder
Copy link
Member

One possible way forward here would be a (unit) test that shows incorrect handling; possibly contrasting Avro-backed decoder (which would expect properly encoded/serialized value) to Jackson one (which works with whatever serialization is currently used) to show the issue?

I could probably find time to work on this in near future, but would need help reproducing the issue first (as well as then verifying the fix).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants