Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoRefreshingProvider caches erroneous responses indefinitely #1765

Open
samwho opened this issue Jun 2, 2020 · 4 comments
Open

AutoRefreshingProvider caches erroneous responses indefinitely #1765

samwho opened this issue Jun 2, 2020 · 4 comments

Comments

@samwho
Copy link

samwho commented Jun 2, 2020

The implementation of AutoRefreshingProvider will cache an error forever, and this behaviour is causing us some problems using the library in production.

Code in question:

#[async_trait]
impl<P: ProvideAwsCredentials + Send + Sync + 'static> ProvideAwsCredentials
for AutoRefreshingProvider<P>
{
async fn credentials(&self) -> Result<AwsCredentials, CredentialsError> {
loop {
let mut guard = self.current_credentials.lock().await;
match guard.as_ref() {
// no result from the future yet, let's keep using it
None => {
let res = self.credentials_provider.credentials().await;
*guard = Some(res);
}
Some(Err(e)) => return Err(e.clone()),
Some(Ok(creds)) => {
if creds.credentials_are_expired() {
*guard = None;
} else {
return Ok(creds.clone());
};
}
}
}
}
}

I think there are a couple potential solutions to this:

  1. Treat the Some(Err(e)) => return Err(e.clone()), match arm the same way expired credentials are treated and set the guard to None and loop again. This means it'll keep trying in the face of errors, so probably needs some sort of backoff / give-up mechanism.
  2. Cache failures for a defined length, probably shorter than the success cache length.

If you agree that this behaviour needs changing and you let me know which approach you prefer, I'd be happy to contribute a PR. 🙂

If this is working as intended I would be keen to hear what workarounds you suggest. The behaviour we seem to be seeing is that sometimes requesting the current role from the instance profile in EC2 will fail, and that failure gets cached forever, causing our binary to get stuck in a non-functional state.

@iliana
Copy link
Contributor

iliana commented Jun 4, 2020

I don't think this is working as intended and would be happy to review a PR to change the error behavior here.

@samwho
Copy link
Author

samwho commented Jun 5, 2020

🙏

Do you have a preferred approach?

@iliana
Copy link
Contributor

iliana commented Jun 12, 2020

Not off the top of my head. Our goal is to match the behaviors of other AWS SDKs when it comes to fundamentals like credentials, request signing, retries, etc., so your best reference is probably botocore.

@nbaztec
Copy link

nbaztec commented Jul 29, 2021

#1933 fixes it but seems the project isn't maintained anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants