sstable: add writer option to remove a common prefix #3242

dt · 2024-01-23T22:33:34Z

No description provided.

cockroach-teamcity · 2024-01-23T22:33:43Z

This change is

dt · 2024-01-23T22:42:31Z

I'm guessing the elided-prefix prop is what's making the readers 16 bytes bigger, via the extra string header in the props? We don't strictly need this prop so I could... skip it? I figured it'd be nice to have so we could tell by looking at an sst if someone hacked some prefix out of its content during creation, but I suppose when everything works as intended we'll know this based on other metadata.

sumeerbhola

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion (waiting on @dt, @itsbilal, and @RaduBerinde)

sstable/writer.go line 1237 at r1 (raw file):

		}
		key.UserKey = key.UserKey[len(w.elidePrefix):]
	}

what about the encoded keyspan.Span.End that is contained in value? Don't we need to confirm that it has the elidePrefix and strip the prefix?

RaduBerinde · 2024-01-30T14:28:16Z

sstable/writer.go line 1237 at r1 (raw file):

Previously, sumeerbhola wrote…

what about the encoded keyspan.Span.End that is contained in value? Don't we need to confirm that it has the elidePrefix and strip the prefix?

It would be unfortunate to decode/reencode the values and account for all cases in here, when we just encoded them before calling this function.

I wonder if we should replace AddRangeKey with "already fragmented" variants of RangeKeySet, RangeKeyUnset, RangeKeyDelete. It would also make existing order-checking code easier.

sumeerbhola

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @RaduBerinde)

sstable/writer.go line 1237 at r1 (raw file):

It would be unfortunate to decode/reencode the values and account for all cases in here, when we just encoded them before calling this function.

IIRC, there are two paths:

where fragmentation is done in the writer. These call into addRangeKeySpan.
fragmentation is done outside the writer. Compactions and suffix rewriting are using this path. The compaction case is a very shallow use rangekey.Encode(&v, tw.AddRangeKey), so we could expose a Writer.AddRangeKeyPrefragmented(*keyspan.Span) and do the encoding here. Suffix rewriting is similarly shallow. I think this is similar to what you are suggesting, except we can make do with one method.

dt requested review from itsbilal and RaduBerinde January 23, 2024 22:33

dt marked this pull request as ready for review January 23, 2024 22:34

dt added 2 commits January 23, 2024 22:38

sstable/writer: add support for removing common prefix

b5139bb

sstable: capture elided prefix in a prop

3b862af

dt force-pushed the de-prefix-writer branch from f1b98ed to 3b862af Compare January 23, 2024 22:40

sumeerbhola reviewed Jan 30, 2024

View reviewed changes

sumeerbhola requested changes Jan 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sstable: add writer option to remove a common prefix #3242

sstable: add writer option to remove a common prefix #3242

dt commented Jan 23, 2024

cockroach-teamcity commented Jan 23, 2024

dt commented Jan 23, 2024

sumeerbhola left a comment

RaduBerinde commented Jan 30, 2024

sumeerbhola left a comment

sstable: add writer option to remove a common prefix #3242

Are you sure you want to change the base?

sstable: add writer option to remove a common prefix #3242

Conversation

dt commented Jan 23, 2024

cockroach-teamcity commented Jan 23, 2024

dt commented Jan 23, 2024

sumeerbhola left a comment

Choose a reason for hiding this comment

RaduBerinde commented Jan 30, 2024

sumeerbhola left a comment

Choose a reason for hiding this comment