Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Locking mechanism for file-based stores #832

Open
Andrew-S-Rosen opened this issue Jul 27, 2023 · 3 comments
Open

Enhancement: Locking mechanism for file-based stores #832

Andrew-S-Rosen opened this issue Jul 27, 2023 · 3 comments

Comments

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Jul 27, 2023

As discussed in #828, most file-based database packages (including MontyDB in the already-implemented MontyStore) do not have any built-in protection against multiple Python processes (or threads) reading/writing to the same database at the same time. This makes them useful only for serial calculations and less suitable for high-throughput settings where the odds of a collision are very high.

Rather than relying on the external package to implement a file-locking system, we should introduce a file-locking mechanism within maggma that can be applied to all file-based data stores. py-filelock and portalocker are both good platform-agnostic options, with the former perhaps being slightly more active. There are built-in locking features in the MP monty package, but in my opinion we are better off using a battle-tested solution since they are usually light on the dependencies anyway (and the lock mechanism used in fireworks often caused headaches...).

I'm jotting this down so that I don't forget. I don't have plans to work on this right now, but I will likely need to implement it one day in the future.

@munrojm
Copy link
Member

munrojm commented Jul 27, 2023

I like this idea

@Andrew-S-Rosen
Copy link
Member Author

Andrew-S-Rosen commented Feb 19, 2024

FYI: Here is what happens when two processes try to write to a montystore at the same time. It looks like montydb has a locking mechanism, but it doesn't support concurrent processes.

@rkingsbury
Copy link
Collaborator

I had started some work to replace mongomock with actual mongodb in MemoryStore (see #846 ). Since JSONStore is backed by MemoryStore, I wonder whether doing this could also address the locking issue?

We have had success using JSONStore to run atomate2 workflows in low throughput, but I'm sure we would encounter a similar problem in high throughput.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants