Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threadsafety issue in org.jsoup.nodes.Entities #2042

Open
owengray-google opened this issue Nov 8, 2023 · 12 comments
Open

Threadsafety issue in org.jsoup.nodes.Entities #2042

owengray-google opened this issue Nov 8, 2023 · 12 comments
Labels
needs-more-info More information is needed from the reporter to progress the issue

Comments

@owengray-google
Copy link

owengray-google commented Nov 8, 2023

We have seen some hung processes when using our (Google's) dackka, built on Jetbrains' dokka, using jsoup.

I believe the underlying issue we're running into is in org.jsoup.nodes.Entities, where:
private static final HashMap<String, String> multipoints = new HashMap<>(); // name -> multiple character references
is static but is not threadsafe, and that this issue would be fixed by using ConcurrentHashMap there.

 
 
 

For reference, the relevant thread dump excerpts:

I think the dackka process was probably

4556 org.jetbrains.dokka.MainKt /buildbot/dist_dirs/aosp-androidx-main-linux-androidx/11066813/dackkaArgs-docs-public.json -loggingLevel WARN -Dfile.encoding=UTF-8 -Duser.country=US -Duser.language=en -Duser.variant

The full stacktrace for that process is pretty long but I notice that most threads have a stacktrace that looks like this:

"DefaultDispatcher-worker-1" #28 daemon prio=5 os_prio=0 cpu=243.46ms elapsed=3478.27s tid=0x00007f99a09cc450 nid=0x122d in Object.wait()  [0x00007f986edeb000]
   java.lang.Thread.State: RUNNABLE
	at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt$parseHtmlEncodedWithNormalisedSpaces$1.invoke(parseWithNormalisedSpaces.kt:25)
	- waiting on the Class initialization monitor for org.jsoup.nodes.Entities
	at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt$parseHtmlEncodedWithNormalisedSpaces$1.invoke(parseWithNormalisedSpaces.kt)
	at org.intellij.markdown.lexer.Compat.forEachCodePoint(Compat.kt:14)
	at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt.parseHtmlEncodedWithNormalisedSpaces(parseWithNormalisedSpaces.kt:19)
	at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt.parseWithNormalisedSpaces(parseWithNormalisedSpaces.kt:49)
	at org.jetbrains.dokka.base.parsers.factories.DocTagsFromIElementFactory.getInstance(DocTagsFromIElementFactory.kt:46)
	at org.jetbrains.dokka.base.parsers.factories.DocTagsFromIElementFactory.getInstance$default(DocTagsFromIElementFactory.kt:16)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.textHandler(MarkdownParser.kt:243)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.visitNode(MarkdownParser.kt:391)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.evaluateChildren(MarkdownParser.kt:408)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.evaluateChildren$default(MarkdownParser.kt:407)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.defaultHandler(MarkdownParser.kt:355)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.visitNode(MarkdownParser.kt:398)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.evaluateChildren(MarkdownParser.kt:408)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.evaluateChildren$default(MarkdownParser.kt:407)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.markdownFileHandler(MarkdownParser.kt:201)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.visitNode(MarkdownParser.kt:392)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.visitNode$default(MarkdownParser.kt:358)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.parseStringToDocNode(MarkdownParser.kt:40)
	at org.jetbrains.dokka.base.parsers.MarkdownParser$Companion$parseFromKDocTag$1.invoke(MarkdownParser.kt:514)
	at org.jetbrains.dokka.base.parsers.MarkdownParser$Companion.parseFromKDocTag(MarkdownParser.kt:526)
	at org.jetbrains.dokka.base.parsers.MarkdownParser$Companion.parseFromKDocTag$default(MarkdownParser.kt:508)
	at org.jetbrains.dokka.base.translators.descriptors.DokkaDescriptorVisitor.getDocumentation(DefaultDescriptorToDocumentableTranslator.kt:1029)
	at org.jetbrains.dokka.base.translators.descriptors.DokkaDescriptorVisitor.resolveDescriptorData(DefaultDescriptorToDocumentableTranslator.kt:904)
	at org.jetbrains.dokka.base.translators.descriptors.DokkaDescriptorVisitor.access$resolveDescriptorData(DefaultDescriptorToDocumentableTranslator.kt:135)
	at org.jetbrains.dokka.base.translators.descriptors.DokkaDescriptorVisitor$resolveClassDescriptionData$2.invokeSuspend(DefaultDescriptorToDocumentableTranslator.kt:939)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.internal.ScopeCoroutine.afterResume(Scopes.kt:33)
	at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:102)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)
	at kotlinx.coroutines.internal.ScopeCoroutine.afterResume(Scopes.kt:33)
	at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:102)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:749)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)

and I notice one thread whose stack looks like this:

"DefaultDispatcher-worker-15" #42 daemon prio=5 os_prio=0 cpu=697.27ms elapsed=3478.23s tid=0x00007f9824023400 nid=0x123b in Object.wait()  [0x00007f986d6f0000]
   java.lang.Thread.State: RUNNABLE
	at org.jsoup.nodes.Document$OutputSettings.<init>(Document.java:416)
	- waiting on the Class initialization monitor for org.jsoup.nodes.Entities$EscapeMode
	at org.jsoup.nodes.Document.<init>(Document.java:26)
	at org.jsoup.nodes.Document.createShell(Document.java:52)
	at org.jsoup.parser.Parser.parseBodyFragment(Parser.java:218)
	at org.jsoup.Jsoup.parseBodyFragment(Jsoup.java:241)
	at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser$Parse.invoke(JavadocParser.kt:465)
	at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser.convertJavadocElements(JavadocParser.kt:474)
	at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser.convertJavadocElements$default(JavadocParser.kt:471)
	at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser.getDescription(JavadocParser.kt:207)
	at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser.parseDocComment$base(JavadocParser.kt:59)
	at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser.parseDocumentation(JavadocParser.kt:52)
	at org.jetbrains.dokka.base.translators.psi.DefaultPsiToDocumentableTranslator$DokkaPsiParser$parseClasslike$2.invokeSuspend(DefaultPsiToDocumentableTranslator.kt:243)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:749)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)
@owengray-google
Copy link
Author

Note: there could be other similar static-but-not-threadsafe issues, this just appears to be the one we have run into. I believe some lint tools have lint against this case; I don't know if manual inspection of the codebase for it is viable.

@owengray-google
Copy link
Author

However, given that we observe what looks like a deadlock, the problem might not be the HashMap (which usually just gets corrupted from concurrent access), but something else.

There was a related bug in the class involving the initializers, #1910.
The fix to that was released in 1.6.1, and we are seeing this bug in 1.6.1.

Our exception looks similar to this bug in a different library, given - waiting on the Class initialization monitor for org.jsoup.nodes.Entities$EscapeMode, and the issue being related to static initializers ( private static void load is a static initializer).

I think this suggests that b6f652c might have just converted the NullPointerException case to a deadlock somehow?

@owengray-google
Copy link
Author

Actually, upon inspection while our dependency has updated to 1.6.1, we haven't updated to the version of the dependency that has 1.6.1.
I will see if taking up that fix fixes our issue.

@jhy
Copy link
Owner

jhy commented Nov 10, 2023

For some context - the static hashmap multipoints in Entities is write once, read many. I don't immediately see how using ConcurrentHashMap would help anything here. We don't want to concurrently write to it. When the class is initalized, the EscapeMode enums in their static initialization will serially load the entities from packed data into the hashmap. Thereafter there are no writes to the map, and so multi-thread reading is safe.

As far as I can see I don't believe this is related to #1910 as that was a straight single-thread path to a null pointer.

And I can't see a deadlock path in those stack traces (or in the code). Do you have more thread dumps that shows two threads are waiting on each other?

My expectation for this is that if you have many threads running, and then they all suddenly want to load Entities for the first time, there will be a race to lock the static initializer, and one thread will win. It will quickly (like, a few millis) parse the entities data and write to the hashmap, then unlock and the other threads will resume. The times you have in that trace look awful!

Is something else causing the Entities to be evicted from permgen? Does this lock happen at startup or after running for some time? How many concurrent threads are there?

@jhy
Copy link
Owner

jhy commented Nov 10, 2023

That they are waiting on different initializers is interesting and I wonder if there's a path there?

at org.jsoup.nodes.Document$OutputSettings.<init>(Document.java:416)
- waiting on the Class initialization monitor for org.jsoup.nodes.Entities$EscapeMode
at org.jsoup.nodes.Document.<init>(Document.java:26)
at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt$parseHtmlEncodedWithNormalisedSpaces$1.invoke(parseWithNormalisedSpaces.kt:25)
- waiting on the Class initialization monitor for org.jsoup.nodes.Entities
at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt$parseHtmlEncodedWithNormalisedSpaces$1.invoke(parseWithNormalisedSpaces.kt)

And particularly that there is any wait for Entities. That doesn't do anything - the only static init is to create the empty hashmap. The EscapeMode init (which does not happen in Entities but in other users of the enum) is what could take any time. Does feel like a lock somehow. Perhaps on something else?

@jhy
Copy link
Owner

jhy commented Nov 27, 2023

@owengray-google do you have any further info for this? Without a way to repro I haven't been able to make any progress here.

@jhy jhy added the needs-more-info More information is needed from the reporter to progress the issue label Nov 27, 2023
@owengray-google
Copy link
Author

Androidx is open-source, but this is a low-probability flake, less than 1%.
We recently tried to force the version we use to 1.16.2, but that landed shortly before thanksgiving when no builds ran, and we want to wait a bit to make sure this is still happening.

@owengray-google
Copy link
Author

owengray-google commented Nov 27, 2023

Alright, attached is a large threadump.

It actually does seem like OutputSettings is an entry point, which makes it seem like this is almost certainly related to the issue in #1910.

The clearly-jsoup-related segments:

"DefaultDispatcher-worker-1" #28 daemon prio=5 os_prio=0 cpu=144.31ms elapsed=3468.13s tid=0x00007fb42891b4c0 nid=0x13cb in Object.wait()  [0x00007fb2cf5f3000]
   java.lang.Thread.State: RUNNABLE
	at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt$parseHtmlEncodedWithNormalisedSpaces$1.invoke(parseWithNormalisedSpaces.kt:25)
	- waiting on the Class initialization monitor for org.jsoup.nodes.Entities
	at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt$parseHtmlEncodedWithNormalisedSpaces$1.invoke(parseWithNormalisedSpaces.kt)
	at org.intellij.markdown.lexer.Compat.forEachCodePoint(Compat.kt:14)
(x57 threads)


"DefaultDispatcher-worker-20" #47 daemon prio=5 os_prio=0 cpu=1743.79ms elapsed=3459.88s tid=0x00007fb2880013f0 nid=0x143a in Object.wait()  [0x00007fb38b5f3000]
   java.lang.Thread.State: RUNNABLE
	at org.jsoup.nodes.Document$OutputSettings.<init>(Document.java:416)
	- waiting on the Class initialization monitor for org.jsoup.nodes.Entities$EscapeMode
	at org.jsoup.nodes.Document.<init>(Document.java:26)
	at org.jsoup.nodes.Document.createShell(Document.java:52)
	at org.jsoup.parser.Parser.parseBodyFragment(Parser.java:218)
	at org.jsoup.Jsoup.parseBodyFragment(Jsoup.java:241)
	at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser$Parse.invoke(JavadocParser.kt:465)
(x5 threads)


"DefaultDispatcher-worker-36" #63 daemon prio=5 os_prio=0 cpu=228.34ms elapsed=3459.87s tid=0x00007fb2740023c0 nid=0x144b in Object.wait()  [0x00007fb21b7f5000]
   java.lang.Thread.State: RUNNABLE
	at org.jsoup.nodes.Entities$EscapeMode.<init>(Entities.java:54)
	- waiting on the Class initialization monitor for org.jsoup.nodes.Entities
	at org.jsoup.nodes.Entities$EscapeMode.<clinit>(Entities.java:35)
	at org.jsoup.nodes.Document$OutputSettings.<init>(Document.java:416)
	at org.jsoup.nodes.Document.<init>(Document.java:26)
	at org.jsoup.nodes.Document.createShell(Document.java:52)
	at org.jsoup.parser.Parser.parseBodyFragment(Parser.java:218)
	at org.jsoup.Jsoup.parseBodyFragment(Jsoup.java:241)
	at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser$Parse.invoke(JavadocParser.kt:465)
(x1 thread)

"DefaultDispatcher-worker-44" #71 daemon prio=5 os_prio=0 cpu=166.76ms elapsed=3459.87s tid=0x00007fb2800013f0 nid=0x1453 in Object.wait()  [0x00007fb21afec000]
   java.lang.Thread.State: RUNNABLE
	at org.jsoup.nodes.Document$OutputSettings.<init>(Document.java:416)
	- waiting on the Class initialization monitor for org.jsoup.nodes.Entities$EscapeMode
	at org.jsoup.nodes.Entities.<clinit>(Entities.java:29)
	at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt$parseHtmlEncodedWithNormalisedSpaces$1.invoke(parseWithNormalisedSpaces.kt:25)
	at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt$parseHtmlEncodedWithNormalisedSpaces$1.invoke(parseWithNormalisedSpaces.kt)
	at org.intellij.markdown.lexer.Compat.forEachCodePoint(Compat.kt:14)
(x1 thread)

thread dump information for jsoup threadlock 11-22-2023.txt

@jhy
Copy link
Owner

jhy commented Nov 29, 2023

Thanks for the update. It's interesting. I have been able to craft a kind-of repro by adding a Thread.sleep(2000) as a static initializer in each of Entities and OutputSettings. And then a runner which spawns a few threads which run a Entities.escape and a Document.parseBodyFragment. It will threadlock in a similar manner to those threads above.

However, I find that only happens when I rollback the changes for #1910. If I leave them in (the lazy init), even with adding extra sleeps about the place, I can't get it to threadlock like that.

I can take a pass on refactoring to further decouple OutputSettings and Entities during initialization. Would you be able to evaluate a snapshot / branch version of jsoup with a tentative fix?

@jhy
Copy link
Owner

jhy commented Nov 29, 2023

We recently tried to force the version we use to 1.16.2, but that landed shortly before thanksgiving when no builds ran, and

Can you confirm what version you are running that produced those dumps?

at org.jsoup.parser.Parser.parseBodyFragment(Parser.java:218)

That line must be from a version older than 1.16.2, because I added some lines for namespaces at the top of Parser which pushes it to line 227.

Is it possible that you are running a version earlier than 1.16.1? Prior to the OutputSettings lazy load being enabled?

@jhy
Copy link
Owner

jhy commented Jan 7, 2024

Ping @owengray-google, are you able to check that?

@owengray-google
Copy link
Author

Sorry about that. It turns out that our upstream dependency is jarjaring the old version of jsoup, so we're trying to upgrade to a newer version of it, but the version with a newer jsoup also includes an upgrade to the Kotlin K2 compiler that has a lot of breaking changes, so it's taking a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-more-info More information is needed from the reporter to progress the issue
Projects
None yet
Development

No branches or pull requests

2 participants