chore: cache calls to signing key #15337

miketheman · 2024-02-06T21:00:26Z

With this simple caching mechanism, each running instance should only have to make a single call at their first instantiation, and cache the result for the lifetime of the process.

This call rarely fails, and adds ~200ms of each inbound hook, so caching across requests should cut down the time it takes to complete the processing.

Instead of using a Redis cache and worrying about cache expiration strategies, if this ever fails a restart should evict the in-memory cache and trigger a new HTTP call for the key.

Resolves #4463

With this simple caching mechanism, each running instance should only have to make a single call at their first instantiation, and cache the result for the lifetime of the process. This call rarely fails, and adds ~200ms of each inbound hook, so caching across requests should cut down the time it takes to complete the processing. Instead of using a Redis cache and worrying about cache expiration strategies, if this ever fails a restart should evict the in-memory cache and trigger a new HTTP call for the key. Resolves pypi#4463 Signed-off-by: Mike Fiedler <miketheman@gmail.com>

ewdurbin

from a pure "ops" perspective i'm -1 on the assumed recovery scenario of restarting the process. while it is straightforward, how are we to recall that this could be the necessary action if it happened?

ewdurbin · 2024-02-06T21:26:00Z

our gunicorn configuration simultaneously makes recovery a non-issue and reduces the impact of this cache:

warehouse/gunicorn-prod.conf.py

Lines 16 to 17 in 95c1b6e

    
           max_requests = 2048 
        
           max_requests_jitter = 128

i'm now -0

miketheman · 2024-02-06T22:21:52Z

That is interesting, didn't know that. Does this mean we are recycling worker processes every few minutes now?
If we run a single web worker per instance, since we don't set the workers value, we default to 1.

We have 40 deployed instances of web workers right now, and with a request rate of ~500 requests per second (imprecise, but demonstrative), and each request takes ~150ms to complete (again, using estimations here) - so they'd all exhaust and restart after ~2.5 minutes or so.

Does that align with your understanding of our current universe? If it does, should we consider the max_requests value from 2018?

I couldn't find a Redis-backed cache pattern yet in the codebase, I guess we don't do that very much yet, unless I'm looking in the wrong place?

ewdurbin · 2024-02-07T15:40:14Z

Yes, this is effectively a diaper for memory leaks. Each worker gracefully reloads after 1920-2176 requests. This was put in place as our web processes slowly leaked memory over time leading to non graceful restarts.

Ideally we just don't leak memory! I'd reconsider the value if we had demonstrably stable memory usage without.

ewdurbin · 2024-02-07T15:41:04Z

Note: We also tune worker count by the WEB_CONCURRENCY env var rather than hardcoding so we don't have to push code to tune it.

miketheman · 2024-02-07T15:47:33Z

Ideally we just don't leak memory! I'd reconsider the value if we had demonstrably stable memory usage without.

Makes sense! Did you have a way to test/verify memory leakage/profiling prior to production changes in mind, or is this a "let's double max_requests and observe" kind of thing?

Note: We also tune worker count by the WEB_CONCURRENCY env var rather than hardcoding so we don't have to push code to tune it.

WEB_CONCURRENCY was removed circa 2015 in https://github.com/pypi/warehouse/pull/741/files#diff-0a99231995da379e7aebabe76c9d849a23737a42c3b3a8994043e2aa80958424 and I can't find another tunable for gunicorn

ewdurbin · 2024-02-07T16:00:28Z

WEB_CONCURRENCY is native to gunicorn: https://docs.gunicorn.org/en/stable/settings.html#workers

miketheman · 2024-02-07T16:01:42Z

WEB_CONCURRENCY is native to gunicorn: docs.gunicorn.org/en/stable/settings.html#workers

🤯 That'll learn me to read the docs more

ewdurbin · 2024-02-07T16:02:15Z

Makes sense! Did you have a way to test/verify memory leakage/profiling prior to production changes in mind, or is this a "let's double max_requests and observe" kind of thing?

Measured via datadog, couldn't track it down. gave up!

ewdurbin · 2024-02-08T14:59:39Z

FWIW the leak appears to still exist, here's a sample trace that appears as i recall, a sudden spike in memory usage that remains until the worker is reaped.

miketheman requested a review from a team as a code owner February 6, 2024 21:00

ewdurbin reviewed Feb 6, 2024

View reviewed changes

miketheman added the blocked Issues we can't or shouldn't get to yet label Feb 7, 2024

miketheman marked this pull request as draft February 7, 2024 17:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: cache calls to signing key #15337

chore: cache calls to signing key #15337

miketheman commented Feb 6, 2024

ewdurbin left a comment

ewdurbin commented Feb 6, 2024

miketheman commented Feb 6, 2024

ewdurbin commented Feb 7, 2024

ewdurbin commented Feb 7, 2024

miketheman commented Feb 7, 2024 •

edited by ewdurbin

ewdurbin commented Feb 7, 2024

miketheman commented Feb 7, 2024

ewdurbin commented Feb 7, 2024

ewdurbin commented Feb 8, 2024

chore: cache calls to signing key #15337

Are you sure you want to change the base?

chore: cache calls to signing key #15337

Conversation

miketheman commented Feb 6, 2024

ewdurbin left a comment

Choose a reason for hiding this comment

ewdurbin commented Feb 6, 2024

miketheman commented Feb 6, 2024

ewdurbin commented Feb 7, 2024

ewdurbin commented Feb 7, 2024

miketheman commented Feb 7, 2024 • edited by ewdurbin

ewdurbin commented Feb 7, 2024

miketheman commented Feb 7, 2024

ewdurbin commented Feb 7, 2024

ewdurbin commented Feb 8, 2024

miketheman commented Feb 7, 2024 •

edited by ewdurbin