Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jmx_exporter can cause program to hang if it has a lot of threads #759

Open
Selikoff opened this issue Dec 17, 2022 · 2 comments
Open

Jmx_exporter can cause program to hang if it has a lot of threads #759

Selikoff opened this issue Dec 17, 2022 · 2 comments

Comments

@Selikoff
Copy link

Selikoff commented Dec 17, 2022

Version: jmx_exporter 0.17.2

I noticed if an application has too many threads (15k or more), the jmx_exporter can cause a program to hang. It'll hang the main thread however long it takes to finish the jmx_exporter process (10+ seconds in my tests). I wrote a simple script that can reproduce the issue:

public static void main(String[] args) throws Exception {
	final int count = 15_000;
	final Thread[] thread = new Thread[count];
	for(int i=0; i<thread.length; i++) {
		thread[i] = new Thread(() -> {
			while(true) {
				try {
					Thread.sleep(500);
				} catch (Exception e) {}
			}
		});
		thread[i].start();
	}

	while(true) {
		System.out.println("[time="+System.currentTimeMillis()+"]");
		Thread.sleep(100);
	}
}

Basically if you run this with the jmx_exporter and call curl http://localhost:123 in the background, it'll freeze the main thread intermittently (about 30% of the time). You might have to adjust some of the timings for it to appear.

I traced the source of the delay to this class ThreadExports.java class, lines 110-124. There is a filter, if enabled, would disable JVM_THREADS_STATE / jvm_threads_state. Enabling this filter prevents the issue from happening.

The problem, and the reason I'm reporting this as an issue, is there's no way to disable just the jvm_threads_state process in jmx_exporter. All of the rules in the config gets executed after the collectors run, not before. I believe the fix would be to pass down information to the HTTPServer.java class. Then, instead of calling metricFamilySamples(), use filteredMetricFamilySamples().

Note: It is possible to disable JVM metric in the curl call to the server, aka curl http://localhost:123?name[]=my_metric but this is extremely limited. In particular, you have to select metrics by name. You can't use regex or negation. Put another way, if you have 3,000 metrics and you want to filter out 1, you would have to list 2,999 using this technique. Ideally, the solution should be part of the jmx_exporter config.

@Selikoff
Copy link
Author

Here's some sample output (I add %100_000 to the print statement in the main loop for readability):

[time=82539]
[time=82652]
[time=82767]
[time=82875]  <--- Moment in which the curl command was called
[time=10397]
[time=10809]
[time=10918]
[time=11027]
[time=11137]

In this sample, calling jmx_exporter locks the main thread for 20 seconds. As mentioned, though, it's not consistent. I'd estimate about 30% of the time depending on your local hardware and number of threads.

@Selikoff
Copy link
Author

Selikoff commented Dec 17, 2022

Per this issue, I created a Pull Request that offers a fix: #760

I could have also modified code in java_client, such as the HTTPServer.java class, but since this class already offered Predicate<String> sampleNameFilter , I used that instead.

Using the PR with the following config prevents the main thread from locking up while allowing all other metrics to go through:

collectorNamePattern : "^(?!jvm_threads_state$).*$"            
rules:
  - pattern: ".*"

Even in the case that main thread doesn't lock up, it shortens the time to call curl http://locahost:123 from 20 seconds to 1 seconds in my earlier example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants