Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sstable: flush value blocks if 8MB are buffered #3188

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dt
Copy link
Member

@dt dt commented Dec 28, 2023

No description provided.

@dt dt requested a review from sumeerbhola December 28, 2023 01:39
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dt dt force-pushed the valblocks branch 2 times, most recently from 1641955 to 5aae780 Compare December 28, 2023 15:14
@dt dt changed the title sstable: flush value blocks if 128 are buffered sstable: flush value blocks if 8MB are buffered Dec 28, 2023
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break EstimateDiskUsage which assumes that the data blocks are contiguous, and interpolates the size contribution of the value blocks. There is a TODO to improve the situation but it will need a format change

pebble/sstable/reader.go

Lines 1106 to 1137 in ccb96e5

includeInterpolatedValueBlocksSize := func(dataBlockSize uint64) uint64 {
// INVARIANT: r.Properties.DataSize > 0 since startIdxIter is not nil.
// Linearly interpolate what is stored in value blocks.
//
// TODO(sumeer): if we need more accuracy, without loading any data blocks
// (which contain the value handles, and which may also be insufficient if
// the values are in separate files), we will need to accumulate the
// logical size of the key-value pairs and store the cumulative value for
// each data block in the index block entry. This increases the size of
// the BlockHandle, so wait until this becomes necessary.
return dataBlockSize +
uint64((float64(dataBlockSize)/float64(r.Properties.DataSize))*
float64(r.Properties.ValueBlocksSize))
}
if endIdxIter == nil {
// The range spans beyond this file. Include data blocks through the last.
return includeInterpolatedValueBlocksSize(r.Properties.DataSize - startBH.Offset), nil
}
key, val = endIdxIter.SeekGE(end, base.SeekGEFlagsNone)
if key == nil {
if err := endIdxIter.Error(); err != nil {
return 0, err
}
// The range spans beyond this file. Include data blocks through the last.
return includeInterpolatedValueBlocksSize(r.Properties.DataSize - startBH.Offset), nil
}
endBH, err := decodeBlockHandleWithProperties(val.InPlaceValue())
if err != nil {
return 0, errCorruptIndexEntry
}
return includeInterpolatedValueBlocksSize(
endBH.Offset + endBH.Length + blockTrailerLen - startBH.Offset), nil

Is cockroachdb/cockroach#117122 an adequate work around in the absence of online restore? For backups in online restore, is there any benefit to making file sizes smaller so that they often don't span range boundaries (I realize we will virtualize and can share files across ranges)? With online restore, we could benefit from value blocks until the file is materialized locally, but we could also live without that optimization.

Reviewed 1 of 6 files at r1.
Reviewable status: 1 of 6 files reviewed, 2 unresolved discussions (waiting on @dt)


sstable/options.go line 277 at r1 (raw file):

	// presence of prefetching/read-ahead, page caching, etc.
	//
	// A value of 0 implies the default of max(8MB/BlockSize, 16) while a value of

I am curious about the 8MB default. Versus something like 64MB, which would allow most Pebble written files to be unaffected by this change.


sstable/options.go line 282 at r1 (raw file):

}

func (o WriterOptions) ensureDefaults() WriterOptions {

sstable.WriterOptions are usually generated using pebble.MakeWriterOptions (including in CRDB's sst_writer.go), so we need to add this ValueBlockBufferLimit to the pebble.Options so that it is actually possible to configure it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants