Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge spike on CPU and other resources when couchdb-prometheus-exporter is set database=_all_dbs with 2600 databases #259

Open
Sdas0000 opened this issue Oct 20, 2023 · 8 comments
Assignees

Comments

@Sdas0000
Copy link

Since frequency for matrices collections is set to 1 min , couchdb-prometheus-exporter attempting to collect information for all 2600 databases , this impacts the performance of the cluster.
Is there a way to collect database information sequentially or by a batch size ?
Can we add a parameter to collect database information frequency ( may be every 6 hours or 12 hours etc ) ?

@gesellix
Copy link
Owner

I think we'll have to change the collector to continuously (with configurable frequency) perform scrapes across the databases. Just like you suggested in your laster question. This might not be a quick fix, though, I'll have to check.

You might work around the issue by running multiple couchdb-prometheus-exporter instances and configuring each for only a subset of your databases. The Prometheus configuration would then have to scrape all those exporters, obviously. This is only a workaround.

@gesellix
Copy link
Owner

@Sdas0000 please have a look at the database.concurrent.requests parameter as introduced with #46. It allows to limit the concurrent requests between exporter and CouchDB cluster, which might help for your environment.

Nevertheless I'm going to implement an option to decouple Prometheus' scrape interval (Prometheus -> Exporter) and the exporter's scrape interval (Exporter -> CouchDB). Beware that this might have the undesired effect of collecting stale metrics.

@gesellix
Copy link
Owner

I just released v30.9.0 with a new flag to perform scrapes at a configurable interval independent of Prometheus scrapes. Example: --scrape.interval=6h for an interval of 6 hours (default is 0s).

Please leave some feedback and whether you need more optimization for your setup. Thanks!

@gesellix
Copy link
Owner

gesellix commented Nov 6, 2023

Closing now, feel free to leave feedback here or file another issue in case you still run into performance issues.

@gesellix gesellix closed this as completed Nov 6, 2023
@Sdas0000
Copy link
Author

Sdas0000 commented Nov 7, 2023

Scrap.interval is scraping all for that duration, our issue is _all_dbs (2600 databases) at same time , we are looking for database scraping interval

@gesellix
Copy link
Owner

gesellix commented Nov 7, 2023

I think you should give the option described in #259 (comment) a try. This would allow you to define "buckets" for requests to you cluster. Did you have a look at that option?

@gesellix gesellix reopened this Nov 7, 2023
@Sdas0000
Copy link
Author

We tried database.concurrent.requests = 100 , but that didn't help , we still see same high CPU , what we are looking scrape.interval for specific to database level maxtrix ( like doc count , disk utilization etc ) and other matirix can continue as usual. also if we can have a parameter like "database scrape batch size" which will scrape only that batch and after finish first batch it will pick up next batch , in this case it may use less resource. Basically we need disk , doc count etc only few times a day , but other information we need continuously throughout a day

@gesellix
Copy link
Owner

I think I need to reproduce the issue for myself... monitoring 2600 databases... and then trying to make it work using less resources. For the time being I don't have a better suggestion than above #259 (comment), deploying multiple exporter instances, each dedicated for a specific range of databases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants