Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set_session does not seem to be thread / jobs safe #854

Open
GuichardVictor opened this issue Mar 3, 2024 · 4 comments
Open

set_session does not seem to be thread / jobs safe #854

GuichardVictor opened this issue Mar 3, 2024 · 4 comments

Comments

@GuichardVictor
Copy link

Most functions call _s3_call which calls set_session. The function goal is to "connect" to aws.

In the context of multiple jobs (or batch_size > 1), each jobs will race to create the session despite the check if the session was already created.

Using process authentification, assume role and source_profile in the aws config, loading the credentials will fail with InfiniteLoopConfigError which can be replicated with the following example:

import botocore.session
import botocore.credentials

# this part will be done once by s3fs
session = botocore.session.Session(profile="profile")
resolver = botocore.credentials.create_credential_resolver(session, region_name=None)

# this part might be runned multiple times due to a potential race condition
credentials = resolver.load_credentials() # <--- works
credentials = resolver.load_credentials() # <--- fails

where profile depends on an other profile.

One way to fix this is to use asyncio.Lock when creating the session to ensure that only one job will create the session.

I'm open to contribute if this is something we want to fix.

@martindurant
Copy link
Member

Since you are talking about "jobs", do you mean you have threads running?

Agree, an syncio lock would be reasonable. The only await is at the call for s3creator, but load_credentials() happens earlier. I wonder if there's an asyncio clean way to say: "I have to await this, but I don't really want to yield control". Otherwise, we need to pass this new Lock object around.

@GuichardVictor
Copy link
Author

On my experiment it was only multiple asyncio futures that was running (using the batch size argument of the async file system of fsspec). I have added a asyncio.Lock and it seemed to fixed the problem. That's why I opened the issue.

@martindurant
Copy link
Member

You are welcome to propose that change as a PR

@GuichardVictor
Copy link
Author

Sure, I will open a PR this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants