Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt(stream): add option to directly copy over tables from lower levels (#1700) #1872

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mangalaman93
Copy link
Contributor

This PR adds FullCopy option in Stream. This allows sending the table entirely to the writer. If this option is set to true we directly copy over the tables from the last 2 levels. This option increases the stream speed while also lowering the memory consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement in speed. The DB contained ~65GB in the last 2 levels while remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode has to be the same in sender and receiver. This will restrict db.StreamDB() to use the same encryption mode in both input and output DB. Added TODO for allowing different encryption modes.

@CLAassistant
Copy link

CLAassistant commented Feb 14, 2023

CLA assistant check
All committers have signed the CLA.

stream_writer.go Show resolved Hide resolved
level_handler.go Outdated Show resolved Hide resolved
Base automatically changed from aman/block-length to release/v4.0 February 21, 2023 06:22
@coveralls
Copy link

coveralls commented Feb 22, 2023

Coverage Status

Coverage: 61.149% (+0.4%) from 60.713% when pulling 081b41b on aman/copy-levels into a89c52c on main.

@mangalaman93
Copy link
Contributor Author

This code has a data race, I am looking into it:

==================
WARNING: DATA RACE
Write at 0x00c000b100e8 by goroutine 2957:
  github.com/dgraph-io/badger/v3.(*StreamWriter).Write.func1()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.6-0.20230214155941-0e7c6a7a614a/stream_writer.go:212 +0x167
  github.com/dgraph-io/ristretto/z.(*Buffer).SliceIterate()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/ristretto@v0.1.1/z/buffer.go:290 +0x30a
  github.com/dgraph-io/badger/v3.(*StreamWriter).Write()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.6-0.20230214155941-0e7c6a7a614a/stream_writer.go:146 +0x194
  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*countIndexer).writeIndex()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/count_index.go:188 +0x9c7
  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*countIndexer).addCountEntry.func1()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/count_index.go:100 +0x47

Previous write at 0x00c000b100e8 by goroutine 228:
  github.com/dgraph-io/badger/v3.(*StreamWriter).Write.func1()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.6-0.20230214155941-0e7c6a7a614a/stream_writer.go:212 +0x167
  github.com/dgraph-io/ristretto/z.(*Buffer).SliceIterate()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/ristretto@v0.1.1/z/buffer.go:290 +0x30a
  github.com/dgraph-io/badger/v3.(*StreamWriter).Write()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.6-0.20230214155941-0e7c6a7a614a/stream_writer.go:146 +0x194
  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).startWriting.func2()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:349 +0xd1
  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).startWriting()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:389 +0x284
  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).reduce.func4()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:475 +0x64

Goroutine 2957 (running) created at:
  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*countIndexer).addCountEntry()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/count_index.go:100 +0x4c9
  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).startWriting.func1.2()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:338 +0x4e
  github.com/dgraph-io/ristretto/z.(*Buffer).SliceIterate()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/ristretto@v0.1.1/z/buffer.go:290 +0x30a
  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).startWriting.func1()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:336 +0x1cb
  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).startWriting()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:392 +0x2bc
  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).reduce.func4()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:475 +0x64

Goroutine 228 (running) created at:
  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).reduce()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:475 +0x4c5
  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).run.func1()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:109 +0xbe4
  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).run.func2()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:123 +0x66
==================

@mangalaman93 mangalaman93 changed the base branch from main-deprecated-v4 to main March 1, 2023 07:18
@mangalaman93 mangalaman93 changed the base branch from main to aman/incr March 6, 2023 21:35
@mangalaman93 mangalaman93 marked this pull request as draft March 6, 2023 21:35
Base automatically changed from aman/incr to main March 7, 2023 19:41
@mangalaman93 mangalaman93 marked this pull request as ready for review March 7, 2023 19:49
@mangalaman93 mangalaman93 force-pushed the aman/copy-levels branch 3 times, most recently from 5d84509 to 8873caa Compare March 9, 2023 18:10
@mangalaman93 mangalaman93 changed the base branch from main to aman/sw March 9, 2023 18:10
Copy link
Contributor

@harshil-goel harshil-goel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write a benchmark for this?

@mangalaman93 mangalaman93 force-pushed the aman/sw branch 2 times, most recently from 77973c6 to 908259a Compare June 12, 2023 04:07
Base automatically changed from aman/sw to aman/bug June 12, 2023 04:07
Base automatically changed from aman/bug to main June 12, 2023 04:37
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants