Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/base: add doc comment discussing TrySeekUsingNext #3329

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jbowens
Copy link
Collaborator

@jbowens jbowens commented Feb 21, 2024

No description provided.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jbowens jbowens marked this pull request as ready for review February 21, 2024 22:03
@jbowens jbowens requested review from sumeerbhola and a team February 21, 2024 22:03
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a work-in-progress, but I'm having trouble structuring it in a coherent way

Reviewable status: 0 of 2 files reviewed, all discussions resolved (waiting on @sumeerbhola)

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is definitely much better than what we currently have, so I am good with merging this
:lgtm:

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @jbowens)


internal/base/doc.go line 30 at r2 (raw file):

// beneath the range deletion. However in doing so, a TrySeekUsingNext flag
// passed by the merging iterator's client no longer transitively holds for
// subsequent seeks of child level iterators in all cases. The merging iterator

do we currently do this? I only see the usual test coverage case in merging_iter.go.

	if invariants.Enabled && flags.TrySeekUsingNext() && !m.forceEnableSeekOpt &&
		disableSeekOpt(key, uintptr(unsafe.Pointer(m))) {
		flags = flags.DisableTrySeekUsingNext()
	}

internal/base/doc.go line 72 at r2 (raw file):

// The pebble levelIter makes use of the TrySeekUsingNext flag to avoid a naive
// seek among a level's file metadatas. When TrySeekUsingNext is passed by the
// caller, the relevant key must fall within the current file or later.

if its a later file we disable DisableTrySeekUsingNext(), so isn't this just for the current file?


internal/base/doc.go line 79 at r2 (raw file):

//
// The sstable iterators use the TrySeekUsingNext flag to avoid naive seeks
// through a table's index structures:

perhaps point to the long comment in reader_iter.go


internal/base/iterator.go line 234 at r2 (raw file):

// instead focuses on the contract expected of the caller.

// TrySeekUsingNext is set when the caller has knowledge that no action has been

that it has performed no action ...


internal/base/iterator.go line 246 at r2 (raw file):

// not return a key less than the current iterator position even if a naive seek
// would land there.
//

We should probably also say that the above promise from the caller is the same for SeekPrefixGE. That is, the prefixes of k1 and k2 can be different. The callee must remember if it did not position itself for k1 (e.g. an sstable iterator that did not position itself for k1 due to the bloom filter not matching), and that it needs to do the full work for k2.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @jbowens)


internal/base/doc.go line 68 at r2 (raw file):
hmm, I assume this relates to the other statement

If true, the callee should not return a key less than the current iterator position even if a naive seek would land there.

doesn't that remove the optionality on the iterator to ignore TrySeekUsingNext? We do that when the bloom filter did not match.

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @RaduBerinde and @sumeerbhola)


internal/base/doc.go line 30 at r2 (raw file):

Previously, sumeerbhola wrote…

do we currently do this? I only see the usual test coverage case in merging_iter.go.

	if invariants.Enabled && flags.TrySeekUsingNext() && !m.forceEnableSeekOpt &&
		disableSeekOpt(key, uintptr(unsafe.Pointer(m))) {
		flags = flags.DisableTrySeekUsingNext()
	}

we don't currently do this, but it's one of the options that was considered to address @RaduBerinde's problem in #3324.


internal/base/doc.go line 68 at r2 (raw file):

Previously, sumeerbhola wrote…

hmm, I assume this relates to the other statement

If true, the callee should not return a key less than the current iterator position even if a naive seek would land there.

doesn't that remove the optionality on the iterator to ignore TrySeekUsingNext? We do that when the bloom filter did not match.

it does remove the optionality, but in a way that the bloom filter use is still compliant.

The contract is that if the previous call to SeekPrefixGE(k1) returned some key k, then SeekPrefixGE(k2, TrySeekUsingNext()=true) must return some key ≥ k and ≥ k2. In the bloom filter case, the bloom filter exclusion causes to return no key at all. This ensures we won't position other levels' range deletions according to some key k that's > k2 during the previous seek.


internal/base/doc.go line 72 at r2 (raw file):

Previously, sumeerbhola wrote…

if its a later file we disable DisableTrySeekUsingNext(), so isn't this just for the current file?

yeah. this is trying to say that the seek among the file metadatas (not within the files themselves) is constrained to [current file, +∞) rather than (-∞,+∞). I expanded the comment to clarify


internal/base/doc.go line 79 at r2 (raw file):

Previously, sumeerbhola wrote…

perhaps point to the long comment in reader_iter.go

Done.


internal/base/iterator.go line 234 at r2 (raw file):

Previously, sumeerbhola wrote…

that it has performed no action ...

Done.


internal/base/iterator.go line 246 at r2 (raw file):

Previously, sumeerbhola wrote…

We should probably also say that the above promise from the caller is the same for SeekPrefixGE. That is, the prefixes of k1 and k2 can be different. The callee must remember if it did not position itself for k1 (e.g. an sstable iterator that did not position itself for k1 due to the bloom filter not matching), and that it needs to do the full work for k2.

Done.

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 6 unresolved discussions (waiting on @RaduBerinde and @sumeerbhola)


internal/base/doc.go line 68 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

it does remove the optionality, but in a way that the bloom filter use is still compliant.

The contract is that if the previous call to SeekPrefixGE(k1) returned some key k, then SeekPrefixGE(k2, TrySeekUsingNext()=true) must return some key ≥ k and ≥ k2. In the bloom filter case, the bloom filter exclusion causes to return no key at all. This ensures we won't position other levels' range deletions according to some key k that's > k2 during the previous seek.

Hrm, maybe a better way of thinking about it is that the mergingIter is violating the contract. You might get a sequence like:

mergingIter.SeekPrefixGE(k1)
  # mergingIter observes a range deletion deleting the span [k1,k3)
  > levelIter.SeekPrefixGE(k3)
mergingIter.SeekPrefixGE(k2, TrySeekUsingNext()=true)
  # mergingIter does not observe the [k1,k3) range deletion because the
  # relevant iterator has already been positioned to a key ≥ k3
  > levelIter.SeekPrefixGE(k2, TrySeekUsingNext()=true)

The caller obeyed the contract, passing TrySeekUsingNext()=true because k2 ≥ k1. The merging iterator violated it, because in the second seek k2 < k3, and yet it passed TrySeekUsingNext()=true.

The semantics that the merging iterator is relying on if TrySeekUsingNext()=true, a iter.Seek[Prefix]GE(k) should be interpreted as iter.Seek[Prefix]GE(max(k, iter.Key()))

@RaduBerinde
Copy link
Member

The semantics that the merging iterator is relying on if TrySeekUsingNext()=true, a iter.Seek[Prefix]GE(k) should be interpreted as iter.Seek[Prefix]GE(max(k, iter.Key()))

The max is something the merging iterator could do instead of requiring all implementations to tolerate it, no?

@jbowens
Copy link
Collaborator Author

jbowens commented Feb 29, 2024

The max is something the merging iterator could do instead of requiring all implementations to tolerate it, no?

That's true, but the original design of TrySeekUsingNext strived to limit the number of key comparisons by performing the comparison at the top-level, and propagating the flag indicating the possibility of applying the optimization to its children. It's possible to avoid performing any kind of max key comparison within the leaf implementations. I think there's a tradeoff between a complicated, subtle interface and eking out performance. I'm not sure the performance warrants the complexity, but it still feels like there's some possible refactor or recharacterization that lets us get the best of both worlds...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants