Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading attribute names in PgBouncer metrics #83

Open
SpiegelMadis opened this issue Mar 18, 2022 · 1 comment
Open

Misleading attribute names in PgBouncer metrics #83

SpiegelMadis opened this issue Mar 18, 2022 · 1 comment
Labels
bug Categorizes issue or PR as related to a bug. priority/backlog Generalistic priority. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@SpiegelMadis
Copy link

I can not imagine a backwards compatible fix for this, but based on pgBouncer documentation maxwait field in pool stats contains values in seconds, however in nri-postgresql the field name is called maxwaitInMilliseconds.

MaxWait *int64 `db:"maxwait" metric_name:"pgbouncer.pools.maxwaitInMilliseconds" source_type:"gauge"`

From pgBouncer documentation:

maxwait:
How long the first (oldest) client in the queue has waited, in seconds. If this starts increasing, then the current pool of servers does not handle requests quickly enough. The reason may be either an overloaded server or just too small of a pool_size setting.

As maxwait_us attribute is not used at the moment by New Relic, there is no way of monitoring sub-second wait time values for pgBouncer, although there is definitely a value in monitoring the trends.

In addition, there are other attribute names that are incorrectly named, as everything else pgBouncer related pointing to milliseconds actually contains microseconds, for example:

pgbouncer.stats.avgQueryDurationInMilliseconds
From pgBouncer documentation:

avg_query_time:
Average query duration, in microseconds.

These issues however can be worked around at the moment by dividing the values by 1000. If there is no good way of fixing this, at least it should be pointed out in documentation.

PS: While at it, it would be useful if avg_wait_time stats would be sent as a metric, it is useful for monitoring in order to detect problems with connection pooling issues.

@gsanchezgavier gsanchezgavier added bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Mar 21, 2022
@gsanchezgavier
Copy link
Contributor

Hi @SpiegelMadis thanks for bringing this up. I labeled as bug and we will work on it as soon it gets prioritized.
@mangulonr we could add the note to docs until the change the metric name is corrected WDYT?

@alvarocabanas alvarocabanas self-assigned this Apr 20, 2022
@davidgit davidgit added the priority/backlog Generalistic priority. label Apr 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Categorizes issue or PR as related to a bug. priority/backlog Generalistic priority. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants