Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redis.exceptions.LockError when cloning projects concurrently #5803

Open
1 task done
Tracked by #5694 ...
bisgaard-itis opened this issue May 10, 2024 · 4 comments
Open
1 task done
Tracked by #5694 ...

redis.exceptions.LockError when cloning projects concurrently #5803

bisgaard-itis opened this issue May 10, 2024 · 4 comments
Assignees
Labels
bug buggy, it does not work as expected

Comments

@bisgaard-itis
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Which deploy/s?

No response

Current Behavior

After resolving performance issues in storage I now see a lot of 502 status codes from Webserver when running https://github.com/wvangeit/osparc-pyapi-tests/tree/master/noninter1 against dalco-master. After digging into graylog I see that many (perhaps even all) arise from the same exception type in the wb-api-server:

Project [project_uuid='6dc3f228-06cb-11ef-bb37-02420a00f1d5'] already locked in state 'prj_states.locked.status='CLONING''. Please check with support.
Traceback (most recent call last):
  File "/home/scu/.venv/lib/python3.10/site-packages/simcore_service_webserver/projects/projects_api.py", line 1531, in lock_with_notification
    async with lock_project(
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/home/scu/.venv/lib/python3.10/site-packages/simcore_service_webserver/projects/lock.py", line 60, in lock_project
    raise ProjectLockError(msg)
redis.exceptions.LockError: Lock for project '6dc3f228-06cb-11ef-bb37-02420a00f1d5' user 61349 could not be acquired

This exception makes sense because the project attempts to create 100 clones of the project '6dc3f228-06cb-11ef-bb37-02420a00f1d5' at the same time. If that operation requires a lock then some of these will definitely fail. The question is how to solve it.

Expected Behavior

No response

Steps To Reproduce

No response

Anything else?

No response

@bisgaard-itis bisgaard-itis added the bug buggy, it does not work as expected label May 10, 2024
@bisgaard-itis
Copy link
Contributor Author

@sanderegg I can see you have been working on this, it would be great to discuss what a potential solution could be.
My immediate intuition is that other tasks in the event loop which are also requiring the lock should await until the lock is released instead of throwing an exception straight away. I guess that's how a mutex would work when threading.

@bisgaard-itis bisgaard-itis added this to the Leeroy Jenkins milestone May 10, 2024
@bisgaard-itis bisgaard-itis changed the title redis.exceptions.LockError when running meta modelling project redis.exceptions.LockError when cloning projects concurrently May 10, 2024
@bisgaard-itis
Copy link
Contributor Author

@sanderegg I can see you have been working on this, it would be great to discuss what a potential solution could be. My immediate intuition is that other tasks in the event loop which are also requiring the lock should await until the lock is released instead of throwing an exception straight away. I guess that's how a mutex would work when threading.

One approach would be to remove the blocking=False here and instead introduce a blocking timeout.

@bisgaard-itis
Copy link
Contributor Author

Closing this due to this

@bisgaard-itis
Copy link
Contributor Author

Reopening this due a comment by @sanderegg. Potential solutions:

  • Implement a read-only lock so that multiple read-only operations can run concurrently but the project is locked for editing. This could be a bit tricky, so maybe the best thing is to check if redis already offers this.
  • Implement an endpoint for submiting a batch of study jobs. However this poses a challenge in the api-server because the current create_study_job endpoint takes the inputs as body. So one would have to pass a list of input params or alternatively factor out the setting og the inputs to another endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug buggy, it does not work as expected
Projects
None yet
Development

No branches or pull requests

3 participants