Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpiceDB Spanner queries use different+worse index than e.g. same query in Spanner Studio #1687

Open
kruegener opened this issue Dec 19, 2023 · 6 comments
Labels
area/datastore Affects the storage system area/perf Affects performance or scalability kind/bug Something is broken or regressed

Comments

@kruegener
Copy link

What platforms are affected?

others

What architectures are affected?

others

What SpiceDB version are you using?

1.25.0 - 1.27.0

Not sure if this issue is present in earlier versions as well.

Steps to Reproduce

TLDR:

  • Use Spanner Datastore
  • Run a LookupResources call
  • Compare query performance and query plan from SpiceDB issued Spanner queries vs. ones ran in e.g. Spanner Studio
  • Observe significant performance deficit

Details:
A LookupResource will issue multiple SQL queries targeting the Spanner datastore that are very similar, but will use different indices for the lookup, which depending on the amount of relations present will result in large performance degradations.

First query, using the ix_relation_tuple_by_subject index and returning in <20ms:

SELECT
  namespace,
  object_id,
  relation,
  userset_namespace,
  userset_object_id,
  userset_relation,
  caveat_name,
  caveat_context
FROM
  relation_tuple
WHERE
  ((userset_namespace = '<USERSET_NAMESPACE>'
      AND userset_object_id IN (('<OBJECT_ID>'))
      AND userset_relation = '...'))
  AND namespace = '<NAMESPACE>'
  AND relation = '<RELATION>'
ORDER BY
  userset_namespace,
  userset_object_id,
  userset_relation,
  namespace,
  object_id,
  relation
LIMIT
  9223372036854775807

Later query, using ix_relation_tuple_by_subject_relation index, returns in 2000-4000ms:

SELECT
  namespace,
  object_id,
  relation,
  userset_namespace,
  userset_object_id,
  userset_relation,
  caveat_name,
  caveat_context
FROM
  relation_tuple
WHERE
  ((userset_namespace = '<USERSET_NAMESPACE>'
      AND userset_object_id IN (('<OBJECT_ID>'))
      AND userset_relation = '...'))
  AND namespace = '<NAMESPACE>'
  AND relation = '<RELATION>'
  AND ((object_id > '<SOME OBJECT ID>'))
ORDER BY
  userset_namespace,
  userset_object_id,
  userset_relation,
  namespace,
  object_id,
  relation
LIMIT
  9223372036854775807

This only occurs with queries issued by SpiceDB. Running the EXACT SAME query inside e.g. Spanner Studio will always use the correct index and return in <20ms as well, yielding the same rows.

This issue was investigated by adding the QueryWithStats() Spanner call in spanner.go and verifying the planning and execution times as well as which indices where used.

The used Query optimizer version and Statistics package are the same between the queries in Spanner Studio and issued by SpiceDB.

Expected Result

SpiceDB issued Spanner calls use the correct/fastest index for SQL queries

Actual Result

ix_relation_tuple_by_subject_relation is used instead of ix_relation_tuple_by_subject, leading to a severe performance degradation.

We have stopped using LookupResources calls for now for cases with many relations, as the performance penalty was too severe.

@kruegener kruegener added the kind/bug Something is broken or regressed label Dec 19, 2023
@vroldanbet
Copy link
Contributor

Thanks for the detailed report @kruegener! 👏🏻 ⭐

@jzelinskie jzelinskie added area/perf Affects performance or scalability area/datastore Affects the storage system labels Dec 20, 2023
@jzelinskie
Copy link
Member

I suspect the best way forward is actually to force the index. This is often discouraged for other databases, but it looks like the Spanner documentation actually recommends it: https://cloud.google.com/spanner/docs/secondary-indexes#index-directive

jzelinskie added a commit to jzelinskie/spicedb that referenced this issue Dec 20, 2023
jzelinskie added a commit to jzelinskie/spicedb that referenced this issue Dec 20, 2023
jzelinskie added a commit to jzelinskie/spicedb that referenced this issue Dec 20, 2023
jzelinskie added a commit to jzelinskie/spicedb that referenced this issue Dec 21, 2023
@jzelinskie
Copy link
Member

If you'd like you can try out #1692 which should fix this issue.

@kruegener
Copy link
Author

Thx for the quick MR on this! I hope you also had some time off in the recent weeks.
I just got to testing your MR for our use-case, but unfortunately with the same result.

There was a small typo in your MR, the index is called ix_relation_tuple_by_subject, not idx_relation_tuple_by_subject. But fixing that also didn't help.

I can confirm that the slow query is altered (what seems correct to me) with the forced index and now looks like this:

SELECT namespace,
       object_id,
       relation,
       userset_namespace,
       userset_object_id,
       userset_relation,
       caveat_name,
       caveat_context
FROM relation_tuple /*@ FORCE_INDEX = ix_relation_tuple_by_subject */
WHERE ((userset_namespace = @p1 AND userset_object_id IN (@p2) AND userset_relation = @p3))
  AND namespace = @p4
  AND relation = @p5
  AND ((object_id > @p6))
ORDER BY userset_namespace, userset_object_id, userset_relation, namespace, object_id,
         relation LIMIT 9223372036854775807

But QueryWithStats still reveals that one step of the query plan involves a full scan of the ix_relation_tuple_by_subject_relation index, which takes up 99% of the time.

Running the same query once again in Spanner Studio does not use that index, similar to the original situation without the forced index.
I've got no idea why this happens and the forced index is ignored, but the result and timings are exactly the same as before.

@jzelinskie
Copy link
Member

I've found some documentation interleaved indexes which is another lead to explore for making these queries perform.

Spanner stores index data in the same way as tables, with one row per index entry. Many of the design considerations for tables also apply to indexes. Non-interleaved indexes store data in root tables.

Totally spitballing: it's possible the full scan is of this table is worse than using a less-specific, but interleaved index.

@josephschorr
Copy link
Member

@kruegener Would you be willing to try modifying one of your Spanner indexes manually?

The current index being used is defined as CREATE INDEX ix_relation_tuple_by_subject ON relation_tuple (userset_object_id, userset_namespace, userset_relation, namespace, relation), which does not contain the object_id (likely because it wasn't used previously).

Now that LR has cursors, it might be worth adding another index of the form:

CREATE INDEX ix_full_relation_tuple_by_subject ON relation_tuple (userset_object_id, userset_namespace, userset_relation, namespace, relation, object_id)

If you're willing to add that manually and try on your slow query, I'd appreciate it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datastore Affects the storage system area/perf Affects performance or scalability kind/bug Something is broken or regressed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants