Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sstable: add writer option to remove a common prefix #3242

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

dt
Copy link
Member

@dt dt commented Jan 23, 2024

No description provided.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dt dt marked this pull request as ready for review January 23, 2024 22:34
@dt
Copy link
Member Author

dt commented Jan 23, 2024

I'm guessing the elided-prefix prop is what's making the readers 16 bytes bigger, via the extra string header in the props? We don't strictly need this prop so I could... skip it? I figured it'd be nice to have so we could tell by looking at an sst if someone hacked some prefix out of its content during creation, but I suppose when everything works as intended we'll know this based on other metadata.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion (waiting on @dt, @itsbilal, and @RaduBerinde)


sstable/writer.go line 1237 at r1 (raw file):

		}
		key.UserKey = key.UserKey[len(w.elidePrefix):]
	}

what about the encoded keyspan.Span.End that is contained in value? Don't we need to confirm that it has the elidePrefix and strip the prefix?

@RaduBerinde
Copy link
Member

sstable/writer.go line 1237 at r1 (raw file):

Previously, sumeerbhola wrote…

what about the encoded keyspan.Span.End that is contained in value? Don't we need to confirm that it has the elidePrefix and strip the prefix?

It would be unfortunate to decode/reencode the values and account for all cases in here, when we just encoded them before calling this function.

I wonder if we should replace AddRangeKey with "already fragmented" variants of RangeKeySet, RangeKeyUnset, RangeKeyDelete. It would also make existing order-checking code easier.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @RaduBerinde)


sstable/writer.go line 1237 at r1 (raw file):

It would be unfortunate to decode/reencode the values and account for all cases in here, when we just encoded them before calling this function.

IIRC, there are two paths:

  • where fragmentation is done in the writer. These call into addRangeKeySpan.
  • fragmentation is done outside the writer. Compactions and suffix rewriting are using this path. The compaction case is a very shallow use rangekey.Encode(&v, tw.AddRangeKey), so we could expose a Writer.AddRangeKeyPrefragmented(*keyspan.Span) and do the encoding here. Suffix rewriting is similarly shallow. I think this is similar to what you are suggesting, except we can make do with one method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants