Add resilience to the network call to fetch the SNS signing certificate #4463

dstufft · 2018-08-04T04:37:25Z

Whenever we're verifying a SNS message, we have to fetch the public certificate from an HTTP url provided to us by Amazon. If fetching this fails for any reason, we will error and will rely on SNS retrying the request to get it accurately recorded.

We can do better!

There are two possible strategies I can think of here, and the right answer might be to use one or the other, or both.

Cache the public key.
- The HTTP response at the URL does not indicate that it can be cached, however on the AWS forums AWS has indicated that if/when they change the certificate they will use a different URL. That means one option here is to just cache the signing certificate for a long time. This could either just be a simply in memory cache (in which case we will refetch it anytime we restart the process) or utilizing redis to store the cached signing URL so that the cache survives restarts, is shared amongst processes etc.
- This cache should expire some how, probably some sort of LRU that keeps some number of keys but will evict older ones when needed.
Add retries.
- Whenever we get an error, simply try fetching it again! This will make the HTTP request take longer and it's possible that whatever network error is effecting us will last longer then we're willing to have a single request take, so it doesn't eliminate the problem, but makes us survive momentary blips better.

My opinion is I'd start with caching, ideally with a redis based cache and see where that leaves us. It will likely make the failures infrequent enough as to not be worth worrying about, and will make verifying the signature faster as well.

dstufft · 2018-08-10T20:19:30Z

With retries and #4526 this is alrgely done. I'm going to leave this open because I believe that adding caching here would still be a good step.

With this simple caching mechanism, each running instance should only have to make a single call at their first instantiation, and cache the result for the lifetime of the process. This call rarely fails, and adds ~200ms of each inbound hook, so caching across requests should cut down the time it takes to complete the processing. Instead of using a Redis cache and worrying about cache expiration strategies, if this ever fails a restart should evict the in-memory cache and trigger a new HTTP call for the key. Resolves pypi#4463 Signed-off-by: Mike Fiedler <miketheman@gmail.com>

dstufft added the feature request label Aug 4, 2018

ewdurbin mentioned this issue Aug 6, 2018

Add Resilience to HIBP API calls #4475

Merged

miketheman linked a pull request Feb 6, 2024 that will close this issue

chore: cache calls to signing key #15337

Draft

miketheman self-assigned this Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add resilience to the network call to fetch the SNS signing certificate #4463

Add resilience to the network call to fetch the SNS signing certificate #4463

dstufft commented Aug 4, 2018

dstufft commented Aug 10, 2018

Add resilience to the network call to fetch the SNS signing certificate #4463

Add resilience to the network call to fetch the SNS signing certificate #4463

Comments

dstufft commented Aug 4, 2018

dstufft commented Aug 10, 2018