Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION]: Is Badger rsync friendly during db in use ? #2029

Closed
sancar opened this issue Nov 8, 2023 · 3 comments
Closed

[QUESTION]: Is Badger rsync friendly during db in use ? #2029

sancar opened this issue Nov 8, 2023 · 3 comments
Labels
kind/question Something requiring a response

Comments

@sancar
Copy link

sancar commented Nov 8, 2023

Question.

In our setup we are using rsync to a badger db without preventing writes to it. We prefer this one over the Backup API as it is faster and there is no need for an intermediate work, but recently we discovered data lost on the backup.

We assumed it should work by looking at the doc here
https://dgraph.io/docs/badger/get-started/#database-backup

#!/bin/bash
set -o history
set -o histexpand
# Makes a complete copy of a Badger database directory.
# Repeat rsync if the MANIFEST and SSTables are updated.
rsync -avz --delete db/ dst
while !! | grep -q "(MANIFEST\|\.sst)$"; do :; done

But I saw some vague comments here and there about it is not safe during database in use. So this question is to get a final clarification.
If it is safe to do rsync, I have further questions about should we make sure the order of the copies. Like should manifest file was the latest updated one ? As we saw just doing rsync is not working.

I can further clarify the test we are doing and try to come up with a minimal reproducer depending on the answer. If the answer is clear no, there is no need.

@sancar sancar added the kind/question Something requiring a response label Nov 8, 2023
@solracsf
Copy link

solracsf commented Nov 23, 2023

Same. Because when we run a badger info on a rsync backup, we get:

failed to open database err:
while opening memtables error:
while opening fid:
70 error:
while updating skiplist error:
end offset:
7764549 < size: 134217728 error:
Log truncate required to run DB. This might result in data loss

github.com/dgraph-io/badger/v4.init
        /home/runner/work/badger/badger/errors.go:101
runtime.doInit
        /opt/hostedtoolcache/go/1.19.11/x64/src/runtime/proc.go:6331
runtime.doInit
        /opt/hostedtoolcache/go/1.19.11/x64/src/runtime/proc.go:6308
runtime.doInit
        /opt/hostedtoolcache/go/1.19.11/x64/src/runtime/proc.go:6308
runtime.main
        /opt/hostedtoolcache/go/1.19.11/x64/src/runtime/proc.go:233
runtime.goexit
        /opt/hostedtoolcache/go/1.19.11/x64/src/runtime/asm_amd64.s:1594

But, if we make a badger backup of that rsync backup folder, followed by a badger restore, no more errors on that DB copy restore using badger info...

@dezren39
Copy link

dezren39 commented Feb 6, 2024

related: #1883 (comment)

@sancar
Copy link
Author

sancar commented Mar 29, 2024

I am closing the issue since, we get our answer in #1883 (comment) as No.

Copying the answer also here for ease of future readers.

"Since it's ACID compliant, it should work. But only if you take an atomic filesystem snapshot. Copying the database just with rsync during it's being used certainly is not the way to go."

@sancar sancar closed this as completed Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Something requiring a response
Development

No branches or pull requests

3 participants