You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In our setup we are using rsync to a badger db without preventing writes to it. We prefer this one over the Backup API as it is faster and there is no need for an intermediate work, but recently we discovered data lost on the backup.
#!/bin/bash
set -o history
set -o histexpand
# Makes a complete copy of a Badger database directory.
# Repeat rsync if the MANIFEST and SSTables are updated.
rsync -avz --delete db/ dst
while !! | grep -q "(MANIFEST\|\.sst)$"; do :; done
But I saw some vague comments here and there about it is not safe during database in use. So this question is to get a final clarification.
If it is safe to do rsync, I have further questions about should we make sure the order of the copies. Like should manifest file was the latest updated one ? As we saw just doing rsync is not working.
I can further clarify the test we are doing and try to come up with a minimal reproducer depending on the answer. If the answer is clear no, there is no need.
The text was updated successfully, but these errors were encountered:
Same. Because when we run a badger info on a rsync backup, we get:
failed to open database err:
while opening memtables error:
while opening fid:
70 error:
while updating skiplist error:
end offset:
7764549 < size: 134217728 error:
Log truncate required to run DB. This might result in data loss
github.com/dgraph-io/badger/v4.init
/home/runner/work/badger/badger/errors.go:101
runtime.doInit
/opt/hostedtoolcache/go/1.19.11/x64/src/runtime/proc.go:6331
runtime.doInit
/opt/hostedtoolcache/go/1.19.11/x64/src/runtime/proc.go:6308
runtime.doInit
/opt/hostedtoolcache/go/1.19.11/x64/src/runtime/proc.go:6308
runtime.main
/opt/hostedtoolcache/go/1.19.11/x64/src/runtime/proc.go:233
runtime.goexit
/opt/hostedtoolcache/go/1.19.11/x64/src/runtime/asm_amd64.s:1594
But, if we make a badger backup of that rsync backup folder, followed by a badger restore, no more errors on that DB copy restore using badger info...
I am closing the issue since, we get our answer in #1883 (comment) as No.
Copying the answer also here for ease of future readers.
"Since it's ACID compliant, it should work. But only if you take an atomic filesystem snapshot. Copying the database just with rsync during it's being used certainly is not the way to go."
Question.
In our setup we are using rsync to a badger db without preventing writes to it. We prefer this one over the Backup API as it is faster and there is no need for an intermediate work, but recently we discovered data lost on the backup.
We assumed it should work by looking at the doc here
https://dgraph.io/docs/badger/get-started/#database-backup
But I saw some vague comments here and there about it is not safe during database in use. So this question is to get a final clarification.
If it is safe to do rsync, I have further questions about should we make sure the order of the copies. Like should manifest file was the latest updated one ? As we saw just doing rsync is not working.
I can further clarify the test we are doing and try to come up with a minimal reproducer depending on the answer. If the answer is clear no, there is no need.
The text was updated successfully, but these errors were encountered: