-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DGS-4172 Bound size of Avro datumReader/Writer caches #2331
Conversation
avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroDeserializer.java
Show resolved
Hide resolved
schema-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaSchemaSerDe.java
Show resolved
Hide resolved
avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroDeserializer.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroDeserializer.java
Show resolved
Hide resolved
avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroDeserializer.java
Show resolved
Hide resolved
I tested the cache implementation on the latest master / 7.3.0, and my test shows that the AbstractKafkaAvroSerializer.datumWriterCache still contains duplicate entries. The code I used to test is here: master...tnn:schema-registry:datumWriterCache-bug-guava-cache |
Hi @tnn . The cache for datumReader/Writer was introduced in CP 6.1.0 by a community contribution. Before that there was no cache. However, what we found is that using the cache with large schemas actually hurts performance. This is because the |
I follow that using the schema itself as the cache key is a suboptimal solution, generally that's also not the way a cache should be used. A suitable cache key can be constructed from a secure hash algorithm like SHA-256, which can digest ~3GiB/sec per core in Java on a modern Intel processor. It should not be a performance problem. If concerned about performance, may I suggest not creating a new AvroSchema instance for every record in KafkaAvroSerializer#serialize?
Thanks for recap of equality in Java. :)
From my findings, the cache miss holds for any case where the I've made a new branch with regression tests that provides the cache efficiency for equality, identity and that it no longer cause OOM: master...tnn:datumWriterCache-bug-take2 |
Hi @rayokota, we have ran into this issue @tnn reported on the creation of AvroSchema instances and there is a significant impact on performance in our case. Is there currently any plan to revisit this? |
No description provided.