Add support for basic cursors and limits to LookupSubjects #1379

josephschorr · 2023-06-01T21:33:15Z

This change supports a limit (called the "concrete limit") on LookupSubjects and will filter concrete subjects based on the returned cursor.

This change does not filter intermediate lookups, which will be done in a followup PR.

vroldanbet

I didn't have time to finish the review, sorry! This is my first attempt. So far it's looking good, although I have a bunch of questions.

Something I noticed is that when one resumes with a cursor, we are evaluating the whole graph again and sending queries to the database, even though they will return empty. It works, but is a lot of work that is wasted after resuming, and will be directly proportional to the complexity of the schema. Ideally the evaluation of the schema can also resume from where the cursor left off, e.g. if permission view = a + b + c you'd restart evaluation at c if that's where you left when the limit was hit. It would spare a bunch of dispatching (network calls across SpiceDBs!) and DB roundtrips.

vroldanbet · 2023-07-28T13:35:39Z

internal/graph/lookupsubjects.go

(unrelated to this file) I found that ErrLimitReached in limits.go is unused.

spicedb/internal/graph/limits.go

Line 9 in 28f9ccd

var ErrLimitReached = fmt.Errorf("limit has been reached")

vroldanbet · 2023-07-28T14:05:35Z

internal/services/v1/permissions.go

+				excludedSubjectIDs := make([]string, 0, len(foundSubject.ExcludedSubjects))
+				for _, excludedSubject := range foundSubject.ExcludedSubjects {
+					excludedSubjectIDs = append(excludedSubjectIDs, excludedSubject.SubjectId)
+				}


can we merge this into the loop below?

I did so, but I'm thinking maybe we just remove it entirely now? The field has been marked deprecated for a number of versions

vroldanbet · 2023-07-28T14:13:22Z

internal/services/v1/permissions.go

+				for _, excludedSubject := range foundSubject.ExcludedSubjects {
+					resolvedExcludedSubject, err := foundSubjectToResolvedSubject(ctx, excludedSubject, caveatContext, ds)
+					if err != nil {
+						return err


lets add some context to those errors, given we are calling foundSubjectToResolvedSubject in two different spots

vroldanbet · 2023-07-28T14:36:24Z

internal/services/v1/permissions.go

-			if subject == nil {
-				continue
-			}
+				encodedCursor, err := cursor.EncodeFromDispatchCursor(


do we need to allocate a new encoded cursor each time here? Could we reuse the same proto instance and just modify the corresponding field? I did something as an optimization in the ReadRelationships streaming bits and helped a bunch with allocations and GC overhead. It would also spare us calling revision.String() repeatedly.

It would be far less maintainable, but I'll see what I can do

Done, but its a bit ugly

vroldanbet · 2023-07-28T14:46:50Z

internal/services/v1/permissions.go

+				if subject.SubjectObjectId != tuple.PublicWildcard {
+					countSubjectsFound++
+					if req.OptionalConcreteLimit > 0 && remainingConcreteLimit <= 0 {
+						return nil


Will subjects found continue to be streamed and discarded here even after the concrete limit is hit? Why not discard them earlier in this handler, given that once you reach the limit, you don't really care if a subject is resolved or not?

shouldn't we be signalling no more streaming is needed, e.g. cancelling?

Added cancelation; hopefully it doesn't break the stream

vroldanbet · 2023-07-28T17:01:14Z

internal/graph/lookupsubjects.go

+	if req.SubjectRelation.Relation != tuple.Ellipsis {
+		return nil
+	}


do we need to return earlier with this check given we are calling IsAllowedPublicNamespace below?

Yes, to avoid a database roundtrip

vroldanbet · 2023-07-28T17:06:24Z

internal/graph/lookupsubjects.go

-				return fmt.Errorf("failed to UnionWith under lookupSubjectsExclusion: %w", err)
+	afterSubjectID, _ := ci.headSectionValue()
+
+	// Filter down the subjects found by the cursor (if applicable) and the apply a limit.


typo

Suggested change

// Filter down the subjects found by the cursor (if applicable) and the apply a limit.

// Filter down the subjects found by the cursor (if applicable) and then apply a limit.

vroldanbet · 2023-07-28T17:18:56Z

internal/graph/lookupsubjects.go

+	for _, foundSubjects := range subjects {
+		for _, foundSubject := range foundSubjects.FoundSubjects {
+			// NOTE: wildcard is always returned, because it is needed by all branches, at all times.
+			if foundSubject.SubjectId == tuple.PublicWildcard || (afterSubjectID == "" || foundSubject.SubjectId > afterSubjectID) {


in not sure to understand how we guarantee that we do not skip found subjects that are alphabetically after the current cursor but that hasn't been seen by the stream before.

Because the stream always returns in sorted order from the root. That way, we are always guaranteed to have a defined ordering (alphabetical) coming out of each subproblem

vroldanbet · 2023-07-28T17:19:55Z

internal/graph/lookupsubjects.go

+		if subjectID == tuple.PublicWildcard {
+			subjectIDsToPublish = append(subjectIDsToPublish, subjectID)
+			continue
 		}


even though we increased the limit by one to accommodate the wildcard, we do not call prepareForPublicshing here.

This is what I meant earlier with "special casing wildcards". Why increasing the limit by one to account for it instead of just ignoring the limit when we find one?

The wildcard is never applied against the limit because its "special". The limit above is the limit for the datastore call, not the stream.

vroldanbet · 2023-07-28T17:23:41Z

internal/graph/lookupsubjects.go

+		}
+
+		subjectIDsToPublish = append(subjectIDsToPublish, subjectID)
+		subjectIDsToPublishWithoutWildcard = append(subjectIDsToPublishWithoutWildcard, subjectID)


As far as I can tell this slice is only used to identify the latest subjectID so it can be used in the cursor. Why keeping track of all subjectIDs when we just could keep track of the last one?

I was being lazy? Changed

vroldanbet · 2023-08-21T13:54:31Z

pkg/genutil/slicez/chunking_test.go

+		for _, chunksize := range []uint16{1, 2, 3, 5, 10, 50} {
+			chunksize := chunksize
+			t.Run(fmt.Sprintf("test-%d-%d", datasize, chunksize), func(t *testing.T) {
+				data := []int{}


Suggested change

data := []int{}

var data []int

vroldanbet · 2023-08-21T13:54:44Z

pkg/genutil/slicez/chunking_test.go

+					data = append(data, i)
+				}
+
+				found := []int{}


Suggested change

found := []int{}

var found []int

vroldanbet · 2023-08-21T19:43:08Z

internal/services/v1/permissions_test.go

+				testutil.GenResourceTuples("document", "somedoc", "viewer", "user", "...", 580),
+				testutil.GenResourceTuplesWithOffset("document", "somedoc", "viewer", "user", "...", 1200, 100),
+				testutil.GenResourceTuplesWithOffset("group", "somegroup", "direct_member", "user", "...", 500, 500),
+				testutil.GenResourceTuplesWithOffset("group", "parentgroup", "direct_member", "user", "...", 700, 500),


nit: this would actually be a childgroup of somegroup

vroldanbet · 2023-08-21T19:53:06Z

internal/services/v1/permissions_test.go

+
+								dispatchCount, err := responsemeta.GetIntResponseTrailerMetadata(trailer, responsemeta.DispatchedOperationsCount)
+								req.NoError(err)
+								req.GreaterOrEqual(dispatchCount, 0)


there had been some regressions recently around dispatches and I wonder if we could use a new test-case that checks the number of dispatches over a cursored LR call

You mean over the LS call? We do now have the test for LR calls

vroldanbet · 2023-08-21T19:54:55Z

internal/datasets/subjectsetbytype.go

@@ -56,6 +56,23 @@ func (s *SubjectByTypeSet) ForEachType(handler func(rr *core.RelationReference,
 	}
 }

+func (s *SubjectByTypeSet) ForEachTypeUntil(handler func(rr *core.RelationReference, subjects SubjectSet) (bool, error)) error {


add a unit test

vroldanbet · 2023-08-21T20:11:49Z

internal/services/integrationtesting/consistency_test.go

+								// Loop until all subjects have been found or we've hit max iterations.
+								var currentCursor *v1.Cursor
+								resolvedSubjects := map[string]*v1.LookupSubjectsResponse{}
+								for i := 0; i < 100; i++ {


should this be a for loop without the condition? as it stands it's confusing as we have to iterate indefinitely until all elements are streamed

I don't want it to hang indefinitely on failure so I put a "large" limit on it

vroldanbet · 2023-08-21T20:14:53Z

internal/datasets/basesubjectset.go

@@ -264,6 +264,25 @@ func (bss BaseSubjectSet[T]) AsSlice() []T {
 	return values
 }

+// SubjectCount returns the number of subjects in the set.
+func (bss BaseSubjectSet[T]) SubjectCount() int {
+	if _, ok := bss.wildcard.get(); ok {


reuse HasWildcard?

vroldanbet · 2023-08-21T20:19:36Z

internal/dispatch/keys/computed.go

@@ -94,5 +94,7 @@ func lookupSubjectsRequestToKey(req *v1.DispatchLookupSubjectsRequest, option di
 		hashableRelationReference{req.ResourceRelation},
 		hashableRelationReference{req.SubjectRelation},
 		hashableIds(req.ResourceIds),
+		hashableCursor{req.OptionalCursor},


shouldn't we also consider the wildcard option as part of the dispatch cache key, or else 2 requests with different arguments would collide?

(edited) No, because the wildcard option is not passed to the dispatcher. It is only used to filter at the API level

vroldanbet · 2023-09-05T10:47:31Z

internal/graph/lookupsubjects.go

+	)
+}
+
+// yieldMatchingResources yields the current resource IDs iff the resource matches the target


Suggested change

// yieldMatchingResources yields the current resource IDs iff the resource matches the target

// yieldMatchingResources yields the current resource IDs if the resource matches the target

iff is correct: means "if and only if"

vroldanbet · 2023-09-05T10:52:58Z

internal/graph/lookupsubjects.go

-	for _, subjectID := range subjectIds {
+// subjectsForConcreteIds returns a FoundSubjects map for the given *concrete* subject IDs, filtered by the cursor (if applicable).
+func subjectsForConcreteIds(subjectIDs []string, ci cursorInformation) (map[string]*v1.FoundSubjects, error) {
+	foundSubjects := make(map[string]*v1.FoundSubjects, len(subjectIDs))


move this after afterSubjectID == tuple.PublicWildcard. I also wonder if allocating upfront is to aggressive given subjectID may be discarded if below the current head section of the cursor.

It could be over allocating, but only if the cursor happens to hit the results set, which is unlikely in aggregate

vroldanbet · 2023-09-05T11:38:33Z

internal/graph/lookupsubjects.go

+	subjects map[string]*v1.FoundSubjects,
+	metadata *v1.ResponseMeta,
+) (*v1.DispatchLookupSubjectsResponse, func(), error) {
+	if subjects == nil {


if nil is an invalid state, then I assume zero value (or an empty map) is also an invalid state, so you may want to check with len instead

Suggested change

if subjects == nil {

if len(subjects) == 0 {

Empty is valid (if unusual), but nil is not

vroldanbet · 2023-09-05T12:03:32Z

internal/graph/lookupsubjects.go

+	reader datastore.Reader,
+) error {
+	// Check if the direct subject can be found on this relation and, if so, query for then.
+	directAllowed, err := validatedTS.IsAllowedDirectRelation(req.ResourceRelation.Relation, req.SubjectRelation.Namespace, req.SubjectRelation.Relation)


this is not expensive but it starts to add up for large datasets. Seems wasteful to run this again when it was just run in lookupDirectSubjects - we could pass it as argument if we really want to check directAllowed == namespace.DirectRelationNotValid

I have a long-standing TODO to cache type system operations like these; I'd rather fix it at that layer

vroldanbet · 2023-09-05T12:08:14Z

internal/graph/lookupsubjects.go

+	afterSubjectID, _ := ci.headSectionValue()
+
+	// If the cursor specifies the wildcard, then skip all further non-wildcard results.
+	if afterSubjectID == tuple.PublicWildcard {


I've seen this in a few places and it worries me a bit the potential risk we would skip subjects because we prematurely published the wildcard. Any way we could detect this?

The only "good" way is via exhaustive unit and consistency testing, unfortunately

vroldanbet · 2023-09-05T12:19:49Z

internal/graph/lookupsubjects.go

+}
+
+// filterSubjectsMap filters the subjects found in the subjects map to only those allowed, returning an updated map.
+func filterSubjectsMap(subjects map[string]*v1.FoundSubjects, allowedSubjectIds ...string) map[string]*v1.FoundSubjects {


I see no use of the variadic allowedSubjectIds so please pass as a slice. NewSet should also support a slice - a lot of calls in the codebase that end up doing ... just to pass it to it.

Changed here but not NewSet. We have some places where the variadic is helpful. If you like, I can add another NewSetFromSlice and change the call sites?

vroldanbet · 2023-09-05T12:52:46Z

internal/graph/lookupsubjects.go

+		return nil, done, nil
+	}
+
+	// Determine the subject ID for the cursor. If there are any concrete subject IDs, then the last


I'm very confused by this, and this ties with an earlier question around various bits in the code base ignoring any further subjectIDs if the wildcard is found, and my concern around guarantees we are not skipping subjects if a wildcard is published earlier than it should.

The code here indicates that the wildcard is used to denote all concrete subjects have been consumed. But at the same time we also find sometimes the wildcard as a subject to publish because it's found. How do we disambiguate those two situations?

We always sort the returned results so the wildcard is last when retrieving from the datastore. This occurs on line 1193 after we union together the wildcard with the concrete results found. Thus, since we sort at every level (and wildcard always sorts last), we're guaranteed that it should always appear at the end. If you like I can add additional point unit tests for the individual reducers to validate this, but we can't really test everything directly

vroldanbet · 2023-09-05T13:06:15Z

internal/graph/lookupsubjects.go

+
+	updatedCI, err := ci.withOutgoingSection(cursorSubjectID)
+	if err != nil {
+		return nil, func() {}, err


Suggested change

return nil, func() {}, err

return nil, done, err

vroldanbet · 2023-09-05T13:16:22Z

internal/graph/lookupsubjects_test.go

+	}
+}
+
+func TestCreateFilteredAndLimitedResponse(t *testing.T) {


I think we should also assert expectations around the returned cursor, by checking where the cursor is at, given it's critical to denote a branch has been exhausted and there are no more concretes to return.

This example is me attempting to break it. It works as intended because the cursor will be at a. The next time we iterate we will get b, and the returned cursor head should be at *

{ "blah", "", map[string]*v1.FoundSubjects{ "foo": fsubs("a"), "bar": fsubs("*", "b"), }, 1, map[string]*v1.FoundSubjects{ "foo": fsubs("a"), "bar": fsubs("*"), }, },

Can you clarify what, exactly, you want me to test? Is this a new unit test or an additional to this one?

vroldanbet · 2023-09-05T14:36:41Z

internal/dispatch/graph/lookupsubjects_test.go

+			},
+		},
+		{
+			"indirect with combined caveat direct",


isn't this test exactly the same as the previous one?

No. One is on viewer, one is on view

vroldanbet · 2023-09-05T14:46:28Z

internal/graph/lookupsubjects.go

+
+// CursorForFoundSubjectID returns an updated version of the afterResponseCursor (which must have been created
+// by this dispatcher), but with the specified subjectID as the starting point.
+func CursorForFoundSubjectID(subjectID string, afterResponseCursor *v1.Cursor) (*v1.Cursor, error) {


I couldn't find a test that exercised this

The API tests do so

josephschorr · 2023-09-06T17:57:58Z

Updated

josephschorr · 2024-03-11T17:41:04Z

Rebased

This change supports a limit (called the "concrete limit") on LookupSubjects and will filter concrete subjects based on the returned cursor. This change does *not* filter intermediate lookups, which will be done in a followup PR.

josephschorr · 2024-04-26T19:14:05Z

Rebased

josephschorr requested a review from vroldanbet June 1, 2023 21:33

josephschorr requested a review from a team as a code owner June 1, 2023 21:33

github-actions bot added area/api v1 Affects the v1 API area/dependencies Affects dependencies area/dispatch Affects dispatching of requests area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) labels Jun 1, 2023

josephschorr force-pushed the cursored-lookup-subjects branch 2 times, most recently from da48a6b to 19099a2 Compare June 1, 2023 21:50

josephschorr marked this pull request as draft June 20, 2023 23:38

josephschorr force-pushed the cursored-lookup-subjects branch 3 times, most recently from 21ae960 to c386a05 Compare July 12, 2023 19:36

josephschorr marked this pull request as ready for review July 12, 2023 19:58

vroldanbet reviewed Jul 28, 2023

View reviewed changes

vroldanbet reviewed Aug 21, 2023

View reviewed changes

vroldanbet reviewed Sep 5, 2023

View reviewed changes

josephschorr force-pushed the cursored-lookup-subjects branch from c386a05 to 137e08e Compare September 6, 2023 17:57

github-actions bot removed the area/dependencies Affects dependencies label Sep 6, 2023

josephschorr force-pushed the cursored-lookup-subjects branch from 137e08e to 1dec055 Compare March 11, 2024 17:40

josephschorr force-pushed the cursored-lookup-subjects branch 2 times, most recently from 3202a4b to b00913b Compare March 11, 2024 20:36

josephschorr force-pushed the cursored-lookup-subjects branch from b00913b to 83a96b2 Compare March 25, 2024 13:11

josephschorr force-pushed the cursored-lookup-subjects branch from 83a96b2 to 39f033f Compare April 26, 2024 19:13

	// Filter down the subjects found by the cursor (if applicable) and the apply a limit.
	// Filter down the subjects found by the cursor (if applicable) and then apply a limit.

	// yieldMatchingResources yields the current resource IDs iff the resource matches the target
	// yieldMatchingResources yields the current resource IDs if the resource matches the target

Add support for basic cursors and limits to LookupSubjects #1379

Are you sure you want to change the base?

Add support for basic cursors and limits to LookupSubjects #1379

Conversation

josephschorr commented Jun 1, 2023 • edited

vroldanbet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josephschorr Sep 6, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josephschorr Sep 6, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josephschorr commented Sep 6, 2023

josephschorr commented Mar 11, 2024

josephschorr commented Apr 26, 2024

josephschorr commented Jun 1, 2023 •

edited

josephschorr Sep 6, 2023 •

edited

josephschorr Sep 6, 2023 •

edited