Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider changing InternCache #946

Open
pjfanning opened this issue Mar 6, 2023 · 11 comments
Open

consider changing InternCache #946

pjfanning opened this issue Mar 6, 2023 · 11 comments

Comments

@pjfanning
Copy link
Member

pjfanning commented Mar 6, 2023

Clears cache when limit of 100 reached. Can cause blocking - eg https://stackoverflow.com/questions/75616007/at-com-fasterxml-jackson-core-util-interncache-interninterncache-java54-in-bl

LRUMap in jackson-databind used to do this but has now been changed to be a proper LRU cache.

InternCache can't use LRUMap but we could consider moving that class to jackson-core.

Alternatives include allowing InternCache to be configured. Size could be configurable or the underlying map implementation could be pluggable.

Also open is #726 and #998

@cowtowncoder
Copy link
Member

My initial reaction is that whenever blocking is reported, it is a pretty good indicator problem quite different from InternCache. This because cache is only accessed from actual SymbolTable implementation is not working the way it is intended.
Either because JsonParser is not getting closed, or JsonFactory/ObjectMapper is not reused.

Put another way, optimizing InternCache tends to be attempting to help broken/bad use cases.

I am not against possible pluggability of InternCache. Although that would, I think, require making it bound to JsonFactory (which is not a bad idea per se). I think pluggability would be preferable to configurability.
In fact, making this change might resolve most if not all reported cases too.

I would be -1 for moving LRUMap in jackson-core however.

Another thing that may be relevant: while interning is allowed in 3.0, it is disabled by default. So this issue will have less relevance if we ever move to Jackson 3.x.

schlosna added a commit to palantir/conjure-java-runtime that referenced this issue Apr 4, 2023
Interning introduces excessive contention
FasterXML/jackson-core#946
@schlosna
Copy link
Contributor

schlosna commented Apr 4, 2023

I have seen some contention in com.fasterxml.jackson.core.util.InternCache.intern(String) via JFR with services processing high volumes of SMILE encoded data, where we are reusing ObjectMapper I don't believe we are failing to close the JsonParser. We will test out disabling INTERN_FIELD_NAMES

image

@cowtowncoder
Copy link
Member

@schlosna If there was a reproduction, I could try to have a look. This would suggest some problem with reusing of symbol tables, or a very big set of distinct names (filling symbol table leading to flushing). InternCache really should never become a bottleneck as over time all symbols should be reused from symbol table and no longer need intern()ing.

@schlosna
Copy link
Contributor

schlosna commented Apr 4, 2023

Thanks @cowtowncoder

I'll see if we can create a smaller repro/benchmark. My sense is in our case we have many distinct symbols, so the interning cache is constantly flushing, and then becomes a synchronization bottleneck across threads using the same ObjectMapper instance for deserialization, so we may be better off disabling the interning. Related, but slightly separate, we may benefit from enabling GC string deduplication to handle this for any long lived strings.

@carterkozak
Copy link
Contributor

I believe the issue is the result of MapDeserializer._readAndBindStringKeyMap reaching through the canonicalizer and hitting InternCache with arbitrary user-provided string keys, causing each string to be interned under a lock.

@carterkozak
Copy link
Contributor

I've created a PR with a failing test here: FasterXML/jackson-databind#3859

@cowtowncoder
Copy link
Member

@carterkozak Ok as per my notes ALL JSON Object keys will get canonicalize -- or none will.
Concern has more to do with the value set of all keys; but yes, if set exceeds something like 8k distinct keys then canonicalization should probably be disabled.

Or in cases where keys just do not repeat (like UUIDs).

In such cases there's very little point in optimizing InternCache as well, fwtw.

@pjfanning
Copy link
Member Author

@cowtowncoder is there any chance you would consider making InternCache non-final and adding a static method that allows the cache instance to be replaced with an instance that has different size maintenance or max size, etc.?

@cowtowncoder
Copy link
Member

@pjfanning No, I don't think I want to add static configurability this way.

cowtowncoder added a commit that referenced this issue Sep 8, 2023
@joshiste
Copy link

As you're already working on this, I'd like to throw in my two cents:

  1. I think the cache must not be holding strong references to the strings so that they can be garbage collected.
    (e.g. https://github.com/raphw/weak-lock-free). This way there would be also no need to have a LRU with a limit.

  2. Is there any chance to make this an SPI where one could plugin a custom InternCache?

@cowtowncoder
Copy link
Member

I think that due to need for low latency, weak references are probably non starters for general implementation. But if pluggability was added, users could definitely opt for that.
LRU is... to me, more nice-to-have than necessity, as well.

In 3.0 (whenever we get there), intern()ing is not the default any more, so that may be more of a permanent solution eventually.

I am not against pluggability, maybe that could be worked on for 2.17.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants