Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

high cardinality tags in micrometer metrics #2875

Closed
grzegorz-moto opened this issue Aug 10, 2023 · 7 comments
Closed

high cardinality tags in micrometer metrics #2875

grzegorz-moto opened this issue Aug 10, 2023 · 7 comments
Assignees
Labels
status/invalid We don't feel this issue is valid

Comments

@grzegorz-moto
Copy link

The metrics tags contains random id that caus high memory usage and leads to memory leak/exhausting. This is due the fact micrometer treats each new tag as new metric.
The example: https://github.com/reactor/reactor-netty/blob/main/reactor-netty-core/src/main/java/reactor/netty/resources/MicrometerPooledConnectionProviderMeterRegistrar.java#L55

@grzegorz-moto grzegorz-moto added status/need-triage A new issue that still need to be evaluated as a whole type/bug A general bug labels Aug 10, 2023
@violetagg
Copy link
Member

@grzegorz-moto Please specify Reactor Netty version. Also why do you need so many connection pools and do you need all of them operational all the time?

@violetagg violetagg added for/user-attention This issue needs user attention (feedback, rework, etc...) and removed status/need-triage A new issue that still need to be evaluated as a whole labels Aug 10, 2023
@violetagg violetagg self-assigned this Aug 10, 2023
@grzegorz-moto
Copy link
Author

version 1.0.25
the application is creating short leaving tcp connections to multiple targets - multiple of them at a time.
After some time in the heapdump there are tones of instances of io.micrometer.core.instrument.Meter$Id for

reactor.netty.connection.provider.active.connections
reactor.netty.connection.provider.total.connections
reactor.netty.connection.provider.max.connections
reactor.netty.connection.provider.pending.connections
reactor.netty.connection.provider.idle.connections
reactor.netty.connection.provider.max.pending.connections

where the only TAG ID makes it unique

@violetagg
Copy link
Member

@grzegorz-moto Update at least to version 1.0.26 where we deregister the metrics for the disposed connection pools. (better update to the latest one 1.0.34)
https://github.com/reactor/reactor-netty/releases/tag/v1.0.26
#2608

Then ensure you have configuration for disposeInactivePoolsInBackground, this will dispose all inactive connection pools and will deregister the metrics.
See more info for the configuration here https://projectreactor.io/docs/netty/release/reference/index.html#_connection_pool

@violetagg
Copy link
Member

violetagg commented Aug 10, 2023

where the only TAG ID makes it unique

This is interesting because the metrics have ID, Remote Address and Name, ideally you should not have different ID but the same Remote Address

@grzegorz-moto
Copy link
Author

I have ConnectionProvider and TcpClient configured in the following way:

private static final ConnectionProvider CONNECTION_PROVIDER = ConnectionProvider
            .builder("connection-provider-with-metrics")
            .metrics(true)
            .maxIdleTime(Duration.ofSeconds(60))
            .evictInBackground(Duration.ofSeconds(30))
            .disposeInactivePoolsInBackground(Duration.ofSeconds(30), Duration.ofSeconds(60))
            .build();

    private static final TcpClient TCP_CLIENT = TcpClient
            .create(CONNECTION_PROVIDER)
            .wiretap(true)
            .metrics(true)
            .option(ChannelOption.SO_KEEPALIVE, true)
            .option(EpollChannelOption.TCP_KEEPCNT, SipTransportTcpConnectionConfiguration.getTcpKeepCnt())
            .option(EpollChannelOption.TCP_KEEPIDLE, SipTransportTcpConnectionConfiguration.getTcpKeepIdle())
            .option(EpollChannelOption.TCP_KEEPINTVL, SipTransportTcpConnectionConfiguration.getTcpKeepIntvl())
            .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, CONNECT_TIMEOUT)
            .doOnConnect(bootstrap -> log.debug("Trying to connect to the remote host [{}]", NetworkLoggingUtility.createRemoteHostKeyValue(bootstrap.remoteAddress().get())));

    private Mono<? extends Connection> connect(InetSocketAddress socketAddress) {
        log.info("Establishing new TCP connection to [{}]", socketAddress);

        return TCP_CLIENT
                .remoteAddress(() -> socketAddress)
                .doOnConnected(this::addHandlers)
                .doOnConnected(tcpHandler::handleConnection)
                .doOnDisconnected(tcpHandler::handleDisconnected)
                .observe(tcpHandler.connectionObserver())
                .connect()
                .retryWhen(CONNECTION_RETRY_SPEC)
                .onErrorMap(e -> new TransportFailureException(TransportFailureException.FailureStatus.CONNECTION_FAILURE, e));
    }

and in onDisconnected() dispose() is being called on the connection. Without this manual dispose the connections seems leaking.
After all this changes I'm still observing the number of instances of class io.micrometer.core.instrument.Meter$Id is constantly growing

versions I'm using:
"io.projectreactor:reactor-bom:2022.0.9"
"io.netty:netty-bom:4.1.96.Final"

@violetagg
Copy link
Member

@grzegorz-moto Is the code below necessary to NOT be included into the common configuration?

                .doOnConnected(this::addHandlers)
                .doOnConnected(tcpHandler::handleConnection)
                .doOnDisconnected(tcpHandler::handleDisconnected)
                .observe(tcpHandler.connectionObserver())

@violetagg
Copy link
Member

@grzegorz-moto I'm closing this one, we can reopen it when you are able to reply to the comment above.

@violetagg violetagg added status/invalid We don't feel this issue is valid and removed type/bug A general bug for/user-attention This issue needs user attention (feedback, rework, etc...) labels Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/invalid We don't feel this issue is valid
Projects
None yet
Development

No branches or pull requests

2 participants