Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial work on transactions for chunkstore #776

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bmoscon
Copy link
Collaborator

@bmoscon bmoscon commented May 19, 2019

  • Please do not merge, this is a work in progress - I do want to get some feedback now as I finish out the testing of this change
  1. On every write, creates a transaction document. When write completes, removes the transaction document. Reads and writes check for this, and if its present raise an error.

  2. Supplies a recovery function that can remove the data from a failed transaction. Chunkstore has no way of knowing if a write is in progress or if its has failed, so it cannot recover automatically.

Still testing the recovery method and adding unit tests

@shashank88
Copy link
Contributor

Is my understanding correct that this happens as you are basically writing the chunk (data + metadata) sequentially due to bulk_write not being atomic and any interrupts causes a bad intermediate state due to committed metadata?

  • Can having better SIGINT or even a try / finally help with this to start with?
  • Also personally I find using a context manager cleaner than explicit the start/end if possible.
  • Can you add the cleanup as part of some generic _fsck like op that fixes all of these cases?

I haven't spent much time with chunkstore so apologies in advance for stupid questions

@bmoscon
Copy link
Collaborator Author

bmoscon commented May 20, 2019

yes, chunkstore creates the chunks, and then writes them one at a time in a bulk operation, so any interruption of that write causes a corrupted state. A power outage or sigkill would be a case where a finally or the like would not be sufficient. There is also the case where some sort of multiprocessing code might try and read/write the same chunk, which would also cause issues. I can certainly change the code to use a context manager, but want to make sure the code works as expected before that (this is WIP). Also, trying to auto recover/fix is not really possible due to the nature of the issue. You can definitely hit an "invalid read" or write scenario where you wouldnt want to try and fix the issue (i.e. a concurrent reader and writer).

@scoriiu
Copy link

scoriiu commented Oct 9, 2019

@bmoscon Any updates on this?

@harryd31
Copy link

@bmoscon We would love to use this in production in our application however this issue is holding us back... Any update on when we can expect a fix please?

@bmoscon
Copy link
Collaborator Author

bmoscon commented Oct 18, 2019

@harryd31 fix is here, give it a try, let me know if it works or doesnt work

@harryd31
Copy link

@bmoscon Thank you, I will give it a try and let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants