Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BadgerDB database is too big for dataset #4187

Closed
solracsf opened this issue Nov 22, 2023 · 5 comments
Closed

BadgerDB database is too big for dataset #4187

solracsf opened this issue Nov 22, 2023 · 5 comments

Comments

@solracsf
Copy link
Contributor

solracsf commented Nov 22, 2023

There is a problem somewhere (maybe GC not running as it should?) with BadgerDB.
In a production env. the database is almost 10Gb in size:

# du -sh /opt/badger
9.2G    /opt/badger

But after doing a :

# juicefs dump badger:////opt/badger /tmp/latest.json
Scan keys: 7041370/7041370 [===========================================================]  136371.9/s used: 51.633579922s
Dumped entries: 2551157/2551157 [===========================================================]  48629.4/s  used: 52.461271285s

# juicefs load badger:///tmp/badgertest /tmp/latest.json
Loaded entries: 2551157/2551157 [===========================================================]  32701.7/s used: 1m18.01287635s

We can observe that the new database is 96% smaller !!

# du -sh /tmp/badgertest
303M    /tmp/badgertest

Also, doing a backup using badger CLI:

# badger backup -f /tmp/badger.dump --dir /opt/badger

# du -sh /tmp/badger.dump
494M    /tmp/badger.dump
  • JuiceFS 1.1.0+2023-09-04.08c4ae6
@solracsf solracsf added the kind/bug Something isn't working label Nov 22, 2023
@solracsf
Copy link
Contributor Author

solracsf commented Nov 23, 2023

In another instance (running for almost a year) also using Badger, i can confirm this problem by dumping and loading the dump:

# du -sh /opt/badger
6.4G   /opt/badger

# juicefs dump --keep-secret-key badger:///opt/badger /tmp/meta-dump.json
Scan keys: 38127/38127 [==============================================================]  255614.4/s used: 149.17406ms
Dumped entries: 15472/15472 [==============================================================]  171927.0/s used: 90.011918ms

# juicefs load badger:///opt/badger_new /tmp/meta-dump.json                                                     2023/11/23 10:49:07.989729
Loaded entries: 15472/15472 [==============================================================]  104029.6/s used: 148.735698ms

# du -sh /opt/badger_new
1.5M    /opt/badger_new

@davies
Copy link
Contributor

davies commented Nov 24, 2023

The garbage collection is triggered in every hour, so there should be other issues.

Can you list the sizes of all the files in badger directory?

@davies davies removed the kind/bug Something isn't working label Nov 24, 2023
@solracsf
Copy link
Contributor Author

1.1G    000075.vlog
1.1G    000076.vlog
1.1G    000077.vlog
1.1G    000078.vlog
1.1G    000079.vlog
1.1G    000080.vlog
276M    000081.vlog
1.1G    000082.vlog
1.1G    000083.vlog
548M    000084.vlog
4.6M    000656.sst
68K     000668.sst
7.8M    000779.sst
4.5M    000782.sst
7.8M    000783.sst
7.8M    000787.sst
7.8M    000811.sst
896K    000816.sst
7.8M    000818.sst
2.3M    000819.sst
7.8M    000820.sst
3.5M    000822.sst
7.8M    000843.sst
4.0M    000846.sst
1.7M    000857.sst
28K     000876.sst
392K    000878.sst
7.8M    000880.sst
7.8M    000882.sst
7.8M    000883.sst
7.8M    000885.sst
7.8M    000886.sst
4.0K    000887.sst
1.3M    000888.sst
7.8M    000889.sst
7.8M    000891.sst
3.5M    000892.sst
7.8M    000893.sst
7.8M    000896.sst
200K    000897.sst
7.8M    000913.sst
7.8M    000914.sst
7.8M    000915.sst
7.8M    000916.sst
7.8M    000917.sst
7.8M    000918.sst
7.8M    000919.sst
1.3M    000920.sst
7.8M    000921.sst
7.8M    000922.sst
7.8M    000923.sst
7.8M    000924.sst
3.5M    000925.sst
7.4M    000926.sst
7.8M    000927.sst
7.1M    000928.sst
7.8M    000929.sst
204K    000930.sst
1.8M    000931.sst
444K    001970.sst
900K    001971.sst
3.9M    001972.sst
4.0M    001973.sst
3.9M    001974.sst
3.7M    001975.sst
4.1M    001976.sst
64K     001977.sst
2.1M    001978.sst
3.8M    001979.sst
20M     001980.sst
49M     00471.mem
4.0K    DISCARD
4.0K    KEYREGISTRY
4.0K    LOCK
36K     MANIFEST

@davies
Copy link
Contributor

davies commented Nov 26, 2023

It looks like badger can't compact the vlog file as expected, can you file an issue to badger?

@solracsf
Copy link
Contributor Author

Seems similar to dgraph-io/badger#1995 and dgraph-io/badger#2003 so let's keep discussion upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants