Make No. of Transport Threads == Available CPUs #56488

original-brownbear · 2020-05-09T13:38:21Z

We never do any file IO or other blocking work on the transport threads
so no tangible benefit can be derived from using more threads than CPUs
for IO.
There are however significant downsides to using more threads than necessary
with Netty in particular. Since we use the default setting for
io.netty.allocator.useCacheForAllThreads which is true we end up
using 16MB of thread local buffer cache for each transport thread (as Tim points out this is the upper bound/worst case scenario).
Meaning we potentially waste CPUs x 16MB of heap.

We never do any file IO or other blocking work on the transport threads so no tangible benefit can be derived from using more threads than CPUs for IO. There are however significant downsides to using more threads than necessary with Netty in particular. Since we use the default setting for `io.netty.allocator.useCacheForAllThreads` which is `true` we end up using `16MB` of thread local buffer cache for each transport thread. Meaning we potentially waste 2 * CPUs * 16MB of heap across both tcp and http transports.

elasticmachine · 2020-05-09T13:38:22Z

Pinging @elastic/es-distributed (:Distributed/Network)

Tim-Brooks · 2020-05-09T17:11:49Z

This is a discuss, so we I'm sure we will discuss it. I'm onboard with this and I will also bring back and update the shared event loops PR.

we end up using 16MB of thread local buffer cache for each transport thread.

These chunks are allocated in arenas. The number of arenas is by default the number of CPUs. The setting useCacheForAllThreads impacts caching recently used ByteBuf thread locally. Fewer threads mean that the allocations (usually 32KB for TLS or 64KB socket reads) from the Arena will have more efficient reuse in the cache. But is only related to the number of chunks allocated through a level of indirection.

original-brownbear · 2020-05-12T06:30:37Z

Thanks Tim, updated the OP with your input. The 16M/thread is the upper bound (if getting really unlucky) not a set number.
Now that #46346 is merged this change is less impactful but still the same arguments apply I think.

…read-counts

original-brownbear · 2020-05-13T14:09:59Z

We discussed this during our team meeting and since there were no objections to doing this I'm removing the discuss label.

original-brownbear · 2020-05-13T14:29:57Z

@elasticmachine update branch

Tim-Brooks

LGTM

original-brownbear · 2020-05-14T16:16:50Z

Thanks Tim!

We never do any file IO or other blocking work on the transport threads so no tangible benefit can be derived from using more threads than CPUs for IO. There are however significant downsides to using more threads than necessary with Netty in particular. Since we use the default setting for `io.netty.allocator.useCacheForAllThreads` which is `true` we end up using up to `16MB` of thread local buffer cache for each transport thread. Meaning we potentially waste CPUs * 16MB of heap for unnecessary IO threads in addition to obvious inefficiencies of artificially adding extra context switches.

original-brownbear added :Distributed/Network Http and internode communication implementations team-discuss labels May 9, 2020

elasticmachine added the Team:Distributed Meta label for distributed team label May 9, 2020

Merge remote-tracking branch 'elastic/master' into halve-transport-th…

2f6db04

…read-counts

original-brownbear added v7.9.0 v8.0.0 and removed team-discuss labels May 13, 2020

original-brownbear requested review from Tim-Brooks and henningandersen May 13, 2020 14:10

Merge branch 'master' into halve-transport-thread-counts

5dfea3a

Tim-Brooks approved these changes May 14, 2020

View reviewed changes

original-brownbear merged commit c98ceb8 into elastic:master May 14, 2020

original-brownbear deleted the halve-transport-thread-counts branch May 14, 2020 16:18

original-brownbear mentioned this pull request May 14, 2020

Make No. of Transport Threads == Available CPUs (#56488) #56780

Merged

original-brownbear mentioned this pull request Jun 2, 2020

[Optimization]: During reroute async fetch data in GatewayAllocator, send request in generic threadpool instead of masterService#updateTask #57498

Closed

pugnascotia added the >enhancement label Jul 16, 2020

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make No. of Transport Threads == Available CPUs #56488

Make No. of Transport Threads == Available CPUs #56488

original-brownbear commented May 9, 2020 •

edited

elasticmachine commented May 9, 2020

Tim-Brooks commented May 9, 2020

original-brownbear commented May 12, 2020

original-brownbear commented May 13, 2020

original-brownbear commented May 13, 2020

Tim-Brooks left a comment

original-brownbear commented May 14, 2020

Make No. of Transport Threads == Available CPUs #56488

Make No. of Transport Threads == Available CPUs #56488

Conversation

original-brownbear commented May 9, 2020 • edited

elasticmachine commented May 9, 2020

Tim-Brooks commented May 9, 2020

original-brownbear commented May 12, 2020

original-brownbear commented May 13, 2020

original-brownbear commented May 13, 2020

Tim-Brooks left a comment

Choose a reason for hiding this comment

original-brownbear commented May 14, 2020

original-brownbear commented May 9, 2020 •

edited