Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics support for Netty 4.x #3742

Merged
merged 1 commit into from Apr 6, 2023

Conversation

bclozel
Copy link
Contributor

@bclozel bclozel commented Apr 4, 2023

This commit adds two new MeterBinder implementations for instrumenting
Netty 4.x: NettyAllocatorMetrics and NettyEventExecutorMetrics.

NettyAllocatorMetrics will instrument any ByteBufAllocatorMetricProvider
and gather information about heap/direct memory allocated; additional
metrics are provided for pooled allocators.

NettyEventExecutorMetrics will instrument EventExecutor (typically,
EventLoop instances) and count the number of pending tasks for each.

Metrics and tags are described in the NettyMeters class.

Closes gh-522

@franz1981
Copy link

franz1981 commented Apr 4, 2023

In case you are interested, it is possible to add these additional metrics out of Netty event loops : netty/netty#9080

Sadly I have never completed it, but it should be simple

And the same, here: netty/netty#11293 (comment)
To use a sentinel periodic task, per event loop, to measure busyness of it

@bclozel
Copy link
Contributor Author

bclozel commented Apr 4, 2023

This PR is focusing on Netty 4.x - Netty 5.x is still in alpha version, we can add that support anytime.
I'll add a few notes here to highlight important points regarding the binder setup and the metrics themselves.

MeterBinder setup

The Reactor team is using a ConcurrentMap cache to avoid binding metrics to the same allocator/event loop multiple times. Since allocator and event loop resources can be configured in multiple ways, the project chose to instrument those lazily at runtime as they are encountered during channel initialization.

I initially adopted that approach but rolled it back for two reasons:

  • this is quite unusual in MeterBinder implementations
  • it works for Reactor Netty as it relies on a single registry, but supporting multiple registries would need a more complicated setup

Maybe this cache could be handled still as a Reactor Netty opinion and still leverage the binders provided here?

Metrics names configuration

In my initial proposal I said that I would try to provide a way to customize metric names. Because of the number and structure of metrics, this PR doesn't allow that. Instead, I think that libraries and apps could use a MeterFilter to rewrite metric names on the fly with a custom prefix. Is that acceptable? See the next section for the actual metrics.

Metrics

  • "netty.allocator.memory.used" - Size of memory used by the allocator, in bytes
    Tags: "id" (unique id for the allocator), "allocator.type" (pooled, unpooled), "memory.type" (heap, direct)
  • "netty.allocator.memory.pinned" - Size of memory used by allocated buffers, in bytes.
    Tags: "id" (unique id for the allocator), "allocator.type" (pooled, unpooled), "memory.type" (heap, direct)
  • "netty.allocator.pooled.arenas" - Number of Arenas for a pooled allocator.
    Tags: "id" (unique id for the allocator), "allocator.type" (pooled, unpooled), "memory.type" (heap, direct)
  • "netty.allocator.pooled.cache.size" - Size of the cache for a pooled allocator, in bytes.
    Tags: "id" (unique id for the allocator), "allocator.type" (pooled, unpooled), "cache.type" (normal, small)
  • "netty.allocator.pooled.threadlocal.caches" - Number of ThreadLocal caches for a pooled allocator.
    Tags: "id" (unique id for the allocator), "allocator.type" (pooled, unpooled)
  • "netty.allocator.pooled.chunk.size" - Size of memory chunks for a pooled allocator, in bytes.
    Tags: "id" (unique id for the allocator), "allocator.type" (pooled, unpooled)
  • "netty.eventexecutor.tasks.pending" - Number of pending tasks in the event executor.
    Tags: "name" (unique name for the event executor)

This PR does not instrument the DNS infrastructure and I didn't dig much in that area. I'm not sure we can use a DnsQueryLifecycleObserver to record Timers, maybe only Counters are possible?

Prometheus format sample

Here is a sample of Prometheus format I captured while testing the instrumentation on a running Netty server.

# HELP netty_eventexecutor_tasks_pending  
# TYPE netty_eventexecutor_tasks_pending gauge
netty_eventexecutor_tasks_pending{name="nioEventLoopGroup-3-2",} 0.0
netty_eventexecutor_tasks_pending{name="nioEventLoopGroup-3-1",} 0.0
netty_eventexecutor_tasks_pending{name="nioEventLoopGroup-3-5",} 0.0
netty_eventexecutor_tasks_pending{name="nioEventLoopGroup-3-4",} 0.0
netty_eventexecutor_tasks_pending{name="nioEventLoopGroup-3-3",} 0.0
# HELP netty_allocator_memory_used  
# TYPE netty_allocator_memory_used gauge
netty_allocator_memory_used{allocator_type="pooled",id="814169746",memory_type="heap",} 2.097152E7
netty_allocator_memory_used{allocator_type="pooled",id="814169746",memory_type="direct",} 2.097152E7
# HELP netty_allocator_pooled_cache_size  
# TYPE netty_allocator_pooled_cache_size gauge
netty_allocator_pooled_cache_size{allocator_type="pooled",cache_type="small",id="814169746",} 256.0
netty_allocator_pooled_cache_size{allocator_type="pooled",cache_type="normal",id="814169746",} 64.0
# HELP netty_allocator_memory_pinned  
# TYPE netty_allocator_memory_pinned gauge
netty_allocator_memory_pinned{allocator_type="pooled",id="814169746",memory_type="heap",} 5734400.0
netty_allocator_memory_pinned{allocator_type="pooled",id="814169746",memory_type="direct",} 0.0
# HELP netty_allocator_pooled_arenas  
# TYPE netty_allocator_pooled_arenas gauge
netty_allocator_pooled_arenas{allocator_type="pooled",id="814169746",memory_type="heap",} 16.0
netty_allocator_pooled_arenas{allocator_type="pooled",id="814169746",memory_type="direct",} 16.0
# HELP netty_allocator_pooled_threadlocal_caches  
# TYPE netty_allocator_pooled_threadlocal_caches gauge
netty_allocator_pooled_threadlocal_caches{allocator_type="pooled",id="814169746",} 5.0
# HELP netty_allocator_pooled_chunk_size  
# TYPE netty_allocator_pooled_chunk_size gauge
netty_allocator_pooled_chunk_size{allocator_type="pooled",id="814169746",} 4194304.0

@shakuzen
Copy link
Member

shakuzen commented Apr 5, 2023

Instead, I think that libraries and apps could use a MeterFilter to rewrite metric names on the fly with a custom prefix. Is that acceptable?

As someone who isn't a Netty expert, I think so. I'd love to hear from others who know more about Netty usage than me, though. In the past, the strong need to customize the metric name at the binder level came from there being multiple instances of the instrumented thing with potentially different tags on it, like ExecutorServiceMetrics. Will there be multiple instances of the binder in an app with a need to distinguish between the metrics from each? Netty being shaded was something I thought about, but if the package is different, these binders won't be usable anyway.

This PR does not instrument the DNS infrastructure and I didn't dig much in that area.

I think it's fine to leave that as out-of-scope for this PR. If any users would like us to add this, please open an issue requesting it (or a pull request).

if (eventExecutor instanceof SingleThreadEventExecutor) {
SingleThreadEventExecutor singleThreadEventExecutor = (SingleThreadEventExecutor) eventExecutor;
names.add(singleThreadEventExecutor.threadProperties().name());
new NettyEventExecutorMetrics(eventExecutor).bindTo(this.registry);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without being a Netty expert, having users loop like this to bind each feels a bit weird to me. Would there be a reason a user would want metrics for some event executors in an EventExecutorGroup but not others? I wonder if we should use a higher level abstraction in NettyEventExecutorMetrics and add metrics for each executor for users rather than make them bind each one individually.

Copy link
Contributor

@violetagg violetagg Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this question.
Basically you have EventLoopGroup with EventLoops. Every EventLoop has a name and a queue with pending tasks. What you are proposing is to have metrics on the EventLoopGroup is that correct? Typically the EventLoops are not equally loaded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying we should have metrics on all of the EventLoops like this test does, but without making users do new NettyEventExecutorMetrics(eventExecutor).bindTo(this.registry) for each individual EventExecutor. Instead we could take the EventLoopGroup as a parameter and register metrics for each EventExecutor so users only need to call, e.g. new NettyEventExecutorMetrics(eventLoopGroup).bindTo(this.registry) once rather than iterating over each element like now. Basically the question is does it make sense to make things as granular as they are now? It only makes sense to me if there is a case you would only want metrics for some EventLoops in a group but not all of them. If you always want all of them, we should just take the group as a parameter and iterate internally so users don't have to. Does that make more sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, it is just easier in Reactor Netty at this point to do it per EventLoop. Definitely you want metrics for all EventLoops in the EventLoopGroup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I could add a constructor variant that takes the entire EventLoopGroup?
This implementation was targeting the "lazy" case where the current EventLoop is found in the given channel during its initialization. Something like:

@Override
public void initChannel(SocketChannel channel) throws Exception {
  ByteBufAllocator alloc = channel.alloc();
  if (alloc instanceof ByteBufAllocatorMetricProvider) {
    // this concurrent check must be implemented by micrometer users
    if (isAllocatorInstrumented(alloc)) {
      new AllocatorMetrics(((ByteBufAllocatorMetricProvider)alloc)).bindTo(prometheusRegistry);
    }
  }
  // this concurrent check must be implemented by micrometer users
  if (isEventLoopInstrumented(channel.eventLoop())) {
    new EventExecutorMetrics(channel.eventLoop()).bindTo(prometheusRegistry);
  }
  channel.pipeline().addLast(new HttpRequestDecoder());
  channel.pipeline().addLast(new HttpResponseEncoder());
  channel.pipeline().addLast(new CustomHttpServerHandler());
}

@violetagg
Copy link
Contributor

In case you are interested, it is possible to add these additional metrics out of Netty event loops : netty/netty#9080

Sadly I have never completed it, but it should be simple

And the same, here: netty/netty#11293 (comment) To use a sentinel periodic task, per event loop, to measure busyness of it

Yep that's something that Reactor Netty has also in its backlog reactor/reactor-netty#1433

@violetagg
Copy link
Contributor

violetagg commented Apr 5, 2023

I think that libraries and apps could use a MeterFilter to rewrite metric names on the fly with a custom prefix. Is that acceptable?

Reactor Netty will need to change the name if we want to keep backwards compatibility ...

@bclozel
Copy link
Contributor Author

bclozel commented Apr 5, 2023

In case you are interested, it is possible to add these additional metrics out of Netty event loops : netty/netty#9080

Sadly I have never completed it, but it should be simple

And the same, here: netty/netty#11293 (comment) To use a sentinel periodic task, per event loop, to measure busyness of it

@franz1981 @violetagg I think this type of instrumentation really belongs in Netty directly. I'd be happy to expand metrics here once this API is available in Netty. We do maintain more involved instrumentations, but they usually rely on official extension points that are not likely to change.

@bclozel
Copy link
Contributor Author

bclozel commented Apr 5, 2023

I've just pushed additional changes in a separate commit that:

  • change the "allocator.type" Tag for allocator metrics now hold the actual Java simple class name, e.g. "UnpooledByteBufAllocator"
  • the binder for executor metrics now accept Iterable<EventExecutor>, which means both EventLoopGroup and EventLoop types are compatible.

I will squash this commit before merging this PR, once we're done with the review cycle.

@franz1981
Copy link

@bclozel

I think this type of instrumentation really belongs in Netty directly. I'd be happy to expand metrics here once this API is available in Netty

For this one netty/netty#9080 I think it's fine; but please consider that in Netty we don't modify public APIs unless marking them as Unstable (at least in Netty 4.1, for Netty 5, no idea)

netty/netty#11293 (comment)

For this one, is different, because is something that Netty cannot provide nor decide by it's own, if the dynamic that allow it work is clear: if not, I can better explain it here instead.

Copy link
Member

@shakuzen shakuzen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I would like to get rid of id on the allocator metrics if possible because it doesn't seem particularly meaningful to a user looking at the metrics. Due to my lack of experience with Netty, I don't know if there would ever be multiple instances of the same type of allocator in the same app to instrument. If not, it seems like we could get rid of id. However, it does seem theoretically possible for their to be multiple allocator instances of the same type.

* @since 1.11.0
* @see NettyMeters
*/
public class NettyAllocatorMetrics implements MeterBinder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a typical usage example to a JavaDoc? I don't know if it will be common knowledge for Netty users from the API defined here how/where to get the type to pass to the constructor.

Copy link

@alesj alesj Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests do show this to some extent, but far from real usage, so +1 on @shakuzen 's suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alesj we've added code snippets in the reference documentation. Does this work for you?

This commit adds two new `MeterBinder` implementations for instrumenting
Netty 4.x: `NettyAllocatorMetrics` and `NettyEventExecutorMetrics`.

`NettyAllocatorMetrics` will instrument any `ByteBufAllocatorMetricProvider`
and gather information about heap/direct memory allocated; additional
metrics are provided for pooled allocators.

`NettyEventExecutorMetrics` will instrument `Iterable<EventExecutor>`
(typically, `EventLoop` or `EventLoopGroup` instances) and count the
number of pending tasks for all.

Metrics and tags are described in the `NettyMeters` class.

Closes micrometer-metricsgh-522
@bclozel bclozel merged commit d985e62 into micrometer-metrics:main Apr 6, 2023
1 check passed
izeye added a commit to izeye/micrometer that referenced this pull request Apr 13, 2023
@bclozel bclozel deleted the netty-metrics branch April 24, 2023 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metrics support for Netty allocators and event executors
5 participants