Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EnvFilter lockfree implementation #2951

Open
TroyKomodo opened this issue Apr 24, 2024 · 7 comments
Open

EnvFilter lockfree implementation #2951

TroyKomodo opened this issue Apr 24, 2024 · 7 comments

Comments

@TroyKomodo
Copy link

Feature Request

Crates

tracing_subscriber

Motivation

Having a lock free implementation would be very nice performance boost.

Proposal

We can already do something like this if we use the registry lookup span and extensions. however env filter currently does not require that trait.

Alternatives

Perhaps disable the lock code paths if there are no dynamic filters.

@TroyKomodo
Copy link
Author

Something like this, perhaps we can have a code path when the registry has this trait, not sure.
patch.txt

@mladedav
Copy link
Contributor

perhaps we can have a code path when the registry has this trait, not sure.

This sounds like specialization, so I'm afraid this will not be possible anytime soon. Or rather there would need to be two filter inplementations, one general and one with lookup and the user would have to choose which one to use.

That is if we want to avoid breaking changes which adding a bound there is.

@TroyKomodo
Copy link
Author

Is it possible to use the slab storage directly in the env_filter rather than using a rwlock?

@TroyKomodo
Copy link
Author

I also think at a minimum its worth considering only doing anything with the 2 mutexs if there is at least 1 dynamic directive.

@TroyKomodo
Copy link
Author

TroyKomodo commented Apr 24, 2024

I have done a benchmark using https://docs.rs/scc/2.1.0/scc/hash_map/struct.HashMap.html as the hash map rather then the RwLock<HashMap<_, _>>

by_id: RwLock<HashMap<span::Id, directive::SpanMatcher>>,
by_cs: RwLock<HashMap<callsite::Identifier, directive::CallsiteMatcher>>,

normal env filter

Running 10s test @ http://127.0.0.1:8080/hello
  40 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.08ms    1.98ms  51.93ms   93.00%
    Req/Sec    32.08k     4.44k   93.11k    77.15%
  Latency Distribution
     50%  627.00us
     75%    1.15ms
     90%    2.42ms
     99%    5.30ms
  12868373 requests in 10.10s, 1.91GB read
Requests/sec: 1274081.70
Transfer/sec:    193.19MB

image

env filter with scc hash map

Running 10s test @ http://127.0.0.1:8080/hello
  40 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   371.94us  846.78us  30.63ms   97.67%
    Req/Sec    73.95k     2.93k  102.92k    80.79%
  Latency Distribution
     50%  280.00us
     75%  391.00us
     90%  550.00us
     99%    1.57ms
  29728054 requests in 10.10s, 4.40GB read
Requests/sec: 2943392.96
Transfer/sec:    446.32MB

image

level filter

Running 10s test @ http://127.0.0.1:8080/hello
  40 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   426.64us    1.58ms  45.88ms   99.59%
    Req/Sec    73.53k     4.19k  101.11k    88.06%
  Latency Distribution
     50%  272.00us
     75%  408.00us
     90%  625.00us
     99%    1.55ms
  29544121 requests in 10.10s, 4.37GB read
Requests/sec: 2925205.39
Transfer/sec:    443.56MB

image

From looking at these results it seems that env filter now adds no overhead and still works the exact same way. In the env filter i can still filter by spans and variables. The only difference is there is no mutex.

The reason its a big difference is because i am using a 128 core machine. Repeating the test but only using 4 cores.

normal env filter

Running 10s test @ http://127.0.0.1:8080/hello
  40 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.65ms    5.41ms  96.66ms   86.83%
    Req/Sec     5.66k   645.49     9.36k    71.33%
  Latency Distribution
     50%    4.38ms
     75%    4.92ms
     90%   13.56ms
     99%   22.64ms
  2272637 requests in 10.10s, 344.61MB read
Requests/sec: 225020.20
Transfer/sec:     34.12MB

env filter with scc hash map

Running 10s test @ http://127.0.0.1:8080/hello
  40 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.61ms    5.92ms  74.38ms   86.51%
    Req/Sec     6.20k   764.95    10.71k    71.84%
  Latency Distribution
     50%    3.99ms
     75%    4.52ms
     90%   15.37ms
     99%   25.34ms
  2487946 requests in 10.10s, 377.26MB read
Requests/sec: 246345.68
Transfer/sec:     37.35MB

level filter

Running 10s test @ http://127.0.0.1:8080/hello
  40 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.16ms    1.47ms  53.88ms   96.18%
    Req/Sec     6.13k   410.84     9.86k    97.67%
  Latency Distribution
     50%    4.28ms
     75%    4.42ms
     90%    4.59ms
     99%    4.83ms
  2462271 requests in 10.10s, 373.36MB read
Requests/sec: 243786.75
Transfer/sec:     36.97MB

@mladedav
Copy link
Contributor

What do you mean by "level filter"?

It seems completely reasonable to me to replace the locked hashmap with a concurrent variant. I'd just like to check if we don't already (transitively) depend on another one (flurry maybe?) so that we wouldn't have to add different dependency.

@TroyKomodo
Copy link
Author

What do you mean by "level filter"?

https://docs.rs/tracing/latest/tracing/level_filters/struct.LevelFilter.html

It seems completely reasonable to me to replace the locked hashmap with a concurrent variant. I'd just like to check if we don't already (transitively) depend on another one (flurry maybe?) so that we wouldn't have to add different dependency.

That sounds good, the scc crate is 2 deps i believe, scc itself and the backer crate
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants