Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live-heap profiler (alpha) in dd-trace-java v1.33.0 crashes the JVM. (After updated from v1.32.0) #6948

Open
stevenshim opened this issue Apr 25, 2024 · 6 comments
Assignees

Comments

@stevenshim
Copy link

Agent Version
dd-java-agent:1.33.0

Describe what happened
The JVM intermittently crashes and logs the following error.

Additional environment details
Java VM: OpenJDK 64-Bit Server VM Corretto-17.0.6.10.1
AWS EKS 1.25

Log

2024-04-25T16:52:49+09:00 #
2024-04-25T16:52:49+09:00 # A fatal error has been detected by the Java Runtime Environment:
2024-04-25T16:52:49+09:00 #
2024-04-25T16:52:49+09:00 #  SIGSEGV (0xb) at pc=0x00007f9fba87a340, pid=1, tid=50
2024-04-25T16:52:49+09:00 #
2024-04-25T16:52:49+09:00 # JRE version: OpenJDK Runtime Environment Corretto-17.0.6.10.1 (17.0.6+10) (build 17.0.6+10-LTS)
2024-04-25T16:52:49+09:00 # Java VM: OpenJDK 64-Bit Server VM Corretto-17.0.6.10.1 (17.0.6+10-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
2024-04-25T16:52:49+09:00 # Problematic frame:
2024-04-25T16:52:49+09:00 # C  [libjavaProfiler8298632684405129428.so+0x40340]  Profiler::recordExternalSample(unsigned long long, int, _jvmtiFrameInfo*, int, bool, int, Event*)+0xb0
2024-04-25T16:52:49+09:00 #
2024-04-25T16:52:49+09:00 # Core dump will be written. Default location: /opt/goods-detail-api/core.1
2024-04-25T16:52:49+09:00 #
2024-04-25T16:52:49+09:00 # JFR recording file will be written. Location: /opt/goods-detail-api/hs_err_pid1.jfr
2024-04-25T16:52:49+09:00 #
2024-04-25T16:52:49+09:00 # An error report file with more information is saved as:
2024-04-25T16:52:49+09:00 # /opt/goods-detail-api/hs_err_pid1.log
2024-04-25T16:52:50+09:00 #
2024-04-25T16:52:50+09:00 # If you would like to submit a bug report, please visit:
2024-04-25T16:52:50+09:00 #   https://github.com/corretto/corretto-17/issues/
2024-04-25T16:52:50+09:00 # The crash happened outside the Java Virtual Machine in native code.
2024-04-25T16:52:50+09:00 # See problematic frame for where to report the bug.
2024-04-25T16:52:50+09:00 #
2024-04-25T16:52:50+09:00 
2024-04-25T16:52:50+09:00 [error occurred during error reporting (), id 0xb, SIGSEGV (0xb) at pc=0x00007fa02d41723b]
2024-04-25T16:52:50+09:00 

Hello, DataDog teams.
I have been facing java process crashing after update dd-agent from 1.32.0 to 1.33.0.

I guess this prevention (DataDog/java-profiler#84) might cause SIGSEGV.
I already opened a issue on java-profiler project. (DataDog/java-profiler#87)

I'm always grateful.

@richardstartin
Copy link
Member

Hi @stevenshim - apologies for the inconvenience caused here and thanks for reporting the error. Could you give a little more information please?

  1. the stack trace from /opt/goods-detail-api/hs_err_pid1.log and the siginfo line just below it
  2. your Datadog profiling config flags

@richardstartin
Copy link
Member

@stevenshim I suspect that you must be running the live heap profiler (DD_PROFILING_DDPROF_LIVEHEAP_ENABLED: true or -Ddd.profiling.ddprof.liveheap.enabled=true) - disabling this should be enough until we have a fix.

@jbachorik
Copy link
Contributor

Hello @stevenshim - the issue was identified and a fix is in progress (DataDog/java-profiler#88)

Thanks again for reporting this!

@richardstartin richardstartin changed the title dd-trace-java v1.33.0 crashes the JVM. (After updated from v1.32.0) Live-heap profiler (alpha) in dd-trace-java v1.33.0 crashes the JVM. (After updated from v1.32.0) Apr 29, 2024
@ghostdogpr
Copy link

ghostdogpr commented May 9, 2024

Is that issue happening only when Ddd.profiling.ddprof.liveheap.enabled=true? I've got 2 JVM crashes a few hours after upgrading to 1.33.0 that also say The crash happened outside the Java Virtual Machine in native code. (there's not much more detail though), but we didn't enable this flag.

@ghostdogpr
Copy link

I was able to replicate and save a crash dump. Maybe it's a different issue.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f5d72c0a43b, pid=1, tid=165464
#
# JRE version: OpenJDK Runtime Environment Temurin-21.0.3+9 (21.0.3+9) (build 21.0.3+9-LTS)
# Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.3+9 (21.0.3+9-LTS, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-amd64)
# Problematic frame:
# C  [ld-musl-x86_64.so.1+0x5943b]  strlen+0xd
#
# Core dump will be written. Default location: //core.1
#
# JFR recording file will be written. Location: //hs_err_pid1.jfr
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  S U M M A R Y ------------

Command Line: -XX:MaxRAMPercentage=70.0 -XX:InitialRAMPercentage=70.0 -XX:-UseCompressedOops -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/coredump -XX:OnError=/app/scripts-deploy/jvm-crash-handler.sh -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.port=9900 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.rmi.port=9900 -Djava.rmi.server.hostname=172.23.81.106 -javaagent:/app/external/dd-java-agent.jar -Ddd.profiling.enabled=true -Ddd.integration.zio.experimental.enabled=true -XX:+PrintCodeCache -XX:ProfiledCodeHeapSize=512m -XX:NonProfiledCodeHeapSize=1024m -XX:NonNMethodCodeHeapSize=32m -XX:ReservedCodeCacheSize=1568m -XX:+UseZGC -XX:+ZGenerational -Duser.timezone=Asia/Seoul -Dio.netty.allocator.numHeapArenas=0 -Ddd.integration.throwables.enabled=false --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.sql/java.sql=ALL-UNNAMED com.devsisters.ck.CkMain

Host: Intel(R) Xeon(R) Platinum 8488C, 16 cores, 110G, Alpine Linux v3.19
Time: Thu May  9 13:38:28 2024 KST elapsed time: 76054.516656 seconds (0d 21h 7m 34s)

---------------  T H R E A D  ---------------

Current thread (0x00007f571bf42040):  JavaThread "SIGTERM handler" daemon [_thread_in_native, id=165464, stack(0x00007f56bd317000,0x00007f56bd417a78) (1026K)]

Stack: [0x00007f56bd317000,0x00007f56bd417a78],  sp=0x00007f56bd416748,  free space=1021k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [ld-musl-x86_64.so.1+0x5943b]  strlen+0xd
C  [libjavaProfiler11068430143537826157.so+0x29217]  Recording::writeSettings(Buffer*, Arguments&)+0xd7
C  [libjavaProfiler11068430143537826157.so+0x35a49]  Recording::switchChunk(int)+0xf9
C  [libjavaProfiler11068430143537826157.so+0x35b2f]  FlightRecorder::dump(char const*, int)+0x6f
C  [libjavaProfiler11068430143537826157.so+0x457d0]  Profiler::dump(char const*, int)+0x200
C  [libjavaProfiler11068430143537826157.so+0x245c7]  Java_com_datadoghq_profiler_JavaProfiler_dump0+0x47
J 323292  com.datadoghq.profiler.JavaProfiler.dump0(Ljava/lang/String;)V (0 bytes) @ 0x00007f5d31c718ff [0x00007f5d31c71820+0x00000000000000df]
J 323289 c1 com.datadog.profiling.ddprof.DatadogProfilerRecording.snapshot(Ljava/time/Instant;Ldatadog/trace/api/profiling/ProfilingSnapshot$Kind;)Ldatadog/trace/api/profiling/RecordingData; (44 bytes) @ 0x00007f5d0b178be4 [0x00007f5d0b1788a0+0x0000000000000344]
J 323287 c1 com.datadog.profiling.controller.ddprof.DatadogProfilerOngoingRecording.snapshot(Ljava/time/Instant;Ldatadog/trace/api/profiling/ProfilingSnapshot$Kind;)Ldatadog/trace/api/profiling/RecordingData; (16 bytes) @ 0x00007f5d0a72d8d4 [0x00007f5d0a72d6e0+0x00000000000001f4]
J 260563 c1 com.datadog.profiling.agent.CompositeController$CompositeOngoingRecording$$Lambda+0x00007f58c81d9338.apply(Ljava/lang/Object;)Ljava/lang/Object; (16 bytes) @ 0x00007f5d1469d84c [0x00007f5d1469d5e0+0x000000000000026c]
J 4274 c2 java.util.stream.ReferencePipeline$3$1.accept(Ljava/lang/Object;)V java.base@21.0.3 (23 bytes) @ 0x00007f5d28ec3d18 [0x00007f5d28ec3ca0+0x0000000000000078]
J 47514 c2 java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Ljava/util/function/Consumer;)V java.base@21.0.3 (127 bytes) @ 0x00007f5d2b963e38 [0x00007f5d2b963d40+0x00000000000000f8]
J 32061 c2 java.util.stream.ReferencePipeline.collect(Ljava/util/stream/Collector;)Ljava/lang/Object; java.base@21.0.3 (124 bytes) @ 0x00007f5d2a9254ec [0x00007f5d2a925180+0x000000000000036c]
J 323261 c1 com.datadog.profiling.agent.CompositeController$CompositeOngoingRecording.snapshot(Ljava/time/Instant;Ldatadog/trace/api/profiling/ProfilingSnapshot$Kind;)Ldatadog/trace/api/profiling/RecordingData; (12 bytes) @ 0x00007f5d0a4c4754 [0x00007f5d0a4c4000+0x0000000000000754]
J 323260 c1 com.datadog.profiling.controller.ProfilingSystem$SnapshotRecording.snapshot(Z)V (144 bytes) @ 0x00007f5d07f3b784 [0x00007f5d07f3b540+0x0000000000000244]
j  com.datadog.profiling.controller.ProfilingSystem.shutdown(Z)V+29
j  com.datadog.profiling.agent.ProfilingAgent.shutdown(Lcom/datadog/profiling/controller/ProfilingSystem;Lcom/datadog/profiling/uploader/ProfileUploader;Z)V+17
j  com.datadog.profiling.agent.ProfilingAgent.shutdown(Z)V+7
j  java.lang.invoke.LambdaForm$DMH+0x00007f58cc09e800.invokeStatic(Ljava/lang/Object;I)V+10 java.base@21.0.3
j  java.lang.invoke.LambdaForm$MH+0x00007f58cc09f800.invoke(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+33 java.base@21.0.3
J 45127 c2 java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@21.0.3 (108 bytes) @ 0x00007f5d2b6454b8 [0x00007f5d2b644f80+0x0000000000000538]
j  datadog.trace.bootstrap.Agent.shutdownProfilingAgent(Z)V+64
j  datadog.trace.bootstrap.Agent.shutdown(Z)V+15
j  datadog.trace.bootstrap.instrumentation.shutdown.ShutdownHelper.shutdownAgent()V+11
j  java.lang.Shutdown.runHooks()V+0 java.base@21.0.3
j  java.lang.Shutdown.exit(I)V+12 java.base@21.0.3
j  java.lang.Terminator$1.handle(Ljdk/internal/misc/Signal;)V+8 java.base@21.0.3
j  jdk.internal.misc.Signal$1.run()V+8 java.base@21.0.3
j  java.lang.Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V+5 java.base@21.0.3
j  java.lang.Thread.run()V+19 java.base@21.0.3
v  ~StubRoutines::call_stub 0x00007f5d26ccbcbf
V  [libjvm.so+0x912765]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x2e5
V  [libjvm.so+0x9140ab]  JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, JavaThread*)+0x1cb
V  [libjvm.so+0x9ee64a]  thread_entry(JavaThread*, JavaThread*)+0x8a
V  [libjvm.so+0x928ccf]  JavaThread::thread_main_inner() [clone .part.0]+0xaf
V  [libjvm.so+0xf82c98]  Thread::call_run()+0xa8
V  [libjvm.so+0xd0b23a]  thread_native_entry(Thread*)+0x12a
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 323292  com.datadoghq.profiler.JavaProfiler.dump0(Ljava/lang/String;)V (0 bytes) @ 0x00007f5d31c71887 [0x00007f5d31c71820+0x0000000000000067]
J 323289 c1 com.datadog.profiling.ddprof.DatadogProfilerRecording.snapshot(Ljava/time/Instant;Ldatadog/trace/api/profiling/ProfilingSnapshot$Kind;)Ldatadog/trace/api/profiling/RecordingData; (44 bytes) @ 0x00007f5d0b178be4 [0x00007f5d0b1788a0+0x0000000000000344]
J 323287 c1 com.datadog.profiling.controller.ddprof.DatadogProfilerOngoingRecording.snapshot(Ljava/time/Instant;Ldatadog/trace/api/profiling/ProfilingSnapshot$Kind;)Ldatadog/trace/api/profiling/RecordingData; (16 bytes) @ 0x00007f5d0a72d8d4 [0x00007f5d0a72d6e0+0x00000000000001f4]
J 260563 c1 com.datadog.profiling.agent.CompositeController$CompositeOngoingRecording$$Lambda+0x00007f58c81d9338.apply(Ljava/lang/Object;)Ljava/lang/Object; (16 bytes) @ 0x00007f5d1469d84c [0x00007f5d1469d5e0+0x000000000000026c]
J 4274 c2 java.util.stream.ReferencePipeline$3$1.accept(Ljava/lang/Object;)V java.base@21.0.3 (23 bytes) @ 0x00007f5d28ec3d18 [0x00007f5d28ec3ca0+0x0000000000000078]
J 47514 c2 java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Ljava/util/function/Consumer;)V java.base@21.0.3 (127 bytes) @ 0x00007f5d2b963e38 [0x00007f5d2b963d40+0x00000000000000f8]
J 32061 c2 java.util.stream.ReferencePipeline.collect(Ljava/util/stream/Collector;)Ljava/lang/Object; java.base@21.0.3 (124 bytes) @ 0x00007f5d2a9254ec [0x00007f5d2a925180+0x000000000000036c]
J 323261 c1 com.datadog.profiling.agent.CompositeController$CompositeOngoingRecording.snapshot(Ljava/time/Instant;Ldatadog/trace/api/profiling/ProfilingSnapshot$Kind;)Ldatadog/trace/api/profiling/RecordingData; (12 bytes) @ 0x00007f5d0a4c4754 [0x00007f5d0a4c4000+0x0000000000000754]
J 323260 c1 com.datadog.profiling.controller.ProfilingSystem$SnapshotRecording.snapshot(Z)V (144 bytes) @ 0x00007f5d07f3b784 [0x00007f5d07f3b540+0x0000000000000244]
j  com.datadog.profiling.controller.ProfilingSystem.shutdown(Z)V+29
j  com.datadog.profiling.agent.ProfilingAgent.shutdown(Lcom/datadog/profiling/controller/ProfilingSystem;Lcom/datadog/profiling/uploader/ProfileUploader;Z)V+17
j  com.datadog.profiling.agent.ProfilingAgent.shutdown(Z)V+7
j  java.lang.invoke.LambdaForm$DMH+0x00007f58cc09e800.invokeStatic(Ljava/lang/Object;I)V+10 java.base@21.0.3
j  java.lang.invoke.LambdaForm$MH+0x00007f58cc09f800.invoke(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+33 java.base@21.0.3
J 45127 c2 java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@21.0.3 (108 bytes) @ 0x00007f5d2b6454b8 [0x00007f5d2b644f80+0x0000000000000538]
j  datadog.trace.bootstrap.Agent.shutdownProfilingAgent(Z)V+64
j  datadog.trace.bootstrap.Agent.shutdown(Z)V+15
j  datadog.trace.bootstrap.instrumentation.shutdown.ShutdownHelper.shutdownAgent()V+11
j  java.lang.Shutdown.runHooks()V+0 java.base@21.0.3
j  java.lang.Shutdown.exit(I)V+12 java.base@21.0.3
j  java.lang.Terminator$1.handle(Ljdk/internal/misc/Signal;)V+8 java.base@21.0.3
j  jdk.internal.misc.Signal$1.run()V+8 java.base@21.0.3
j  java.lang.Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V+5 java.base@21.0.3
j  java.lang.Thread.run()V+19 java.base@21.0.3
v  ~StubRoutines::call_stub 0x00007f5d26ccbcbf

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00007f58030185e4

Registers:
RAX=0x00007f58030185e4, RBX=0x000000000000218a, RCX=0x000000007265746c, RDX=0x0000000000000006
RSP=0x00007f56bd416748, RBP=0x00007f56bd416790, RSI=0x0000000000000006, RDI=0x00007f58030185e4
R8 =0x0000000000002182, R9 =0x0000000000002171, R10=0x0000000000000009, R11=0x0000000000000246
R12=0x00007f57fcd6a010, R13=0x00007f57fcd6a010, R14=0x0000000000002171, R15=0x00007f58030185e4
RIP=0x00007f5d72c0a43b, EFLAGS=0x0000000000010202, CSGSFS=0x002b000000000033, ERR=0x0000000000000004
  TRAPNO=0x000000000000000e

@richardstartin
Copy link
Member

Hi @ghostdogpr that looks like another issue, apologies for the inconvenience and thanks for the backtrace. I recommend downgrading until we've figured out the cause (we'll update you).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants