Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch spanner datastore to use the built-in stats table for estimating rel count #1892

Merged
merged 1 commit into from May 13, 2024

Conversation

josephschorr
Copy link
Member

While this will be far less accurate of an estimate, it removes the need to write to a stats table on every write and delete, which should help with performance

@josephschorr josephschorr requested a review from a team as a code owner May 8, 2024 20:37
@github-actions github-actions bot added area/CLI Affects the command line area/datastore Affects the storage system area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) labels May 8, 2024
@@ -213,6 +214,7 @@ func RegisterDatastoreFlagsWithPrefix(flagSet *pflag.FlagSet, prefix string, opt
flagSet.StringVar(&opts.SpannerEmulatorHost, flagName("datastore-spanner-emulator-host"), "", "URI of spanner emulator instance used for development and testing (e.g. localhost:9010)")
flagSet.Uint64Var(&opts.SpannerMinSessions, flagName("datastore-spanner-min-sessions"), 100, "minimum number of sessions across all Spanner gRPC connections the client can have at a given time")
flagSet.Uint64Var(&opts.SpannerMaxSessions, flagName("datastore-spanner-max-sessions"), 400, "maximum number of sessions across all Spanner gRPC connections the client can have at a given time")
flagSet.Uint64Var(&opts.SpannerEstimatedBytesPerRelationship, flagName("datastore-spanner-estimated-bytes-per-relationship"), spanner.DefaultEstimatedBytesPerRelationship, "estimated number of bytes per relationship tuple in the spanner instance")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem like a strange flag to have the user provide. While I understand the motivation of performance and simplifying the code, it feels clunky because it requires internal knowledge of how Spanner stores data on disk.

Even just querying Spanner for the size of a random stored relationship tuple when Statistics() is called would be probably be good enough to estimate the number of relationships, instead of providing this flag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is odd, true. I added it as a flag so users can set it IF they want more accurate estimates, not that the estimates are used for much as-is :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how many datastore-specific flags we already have, avoiding adding more when it isn't absolutely necessary will reduce potential tech debt if we decide to consolidate the flags in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think I should just hard code in then? Alternatively, we could make it load a random relationship and estimate based on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading a random relationship and using that as an estimate seems like a sane approach to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to load some relationships and calculate the estimated size based on them, with a fallback if none found

alecmerdler
alecmerdler previously approved these changes May 10, 2024
Copy link
Contributor

@alecmerdler alecmerdler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@josephschorr josephschorr force-pushed the spanner-stats branch 4 times, most recently from 3b2232a to 557d5e2 Compare May 10, 2024 17:58
@josephschorr josephschorr marked this pull request as draft May 10, 2024 18:13
@josephschorr josephschorr marked this pull request as ready for review May 10, 2024 19:57
Copy link
Contributor

@vroldanbet vroldanbet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please replace the fmt.Prinlnt statements with trace-level logging if need be

internal/datastore/spanner/spanner_test.go Show resolved Hide resolved
internal/datastore/spanner/stats.go Outdated Show resolved Hide resolved
internal/datastore/spanner/stats.go Outdated Show resolved Hide resolved
…ng rel count

While this will be far less accurate of an estimate, it removes the need to write to a stats table on every write and delete, which should help with performance
@josephschorr
Copy link
Member Author

Updated

@josephschorr josephschorr added this pull request to the merge queue May 13, 2024
Merged via the queue into authzed:main with commit 2ccd129 May 13, 2024
22 checks passed
@josephschorr josephschorr deleted the spanner-stats branch May 13, 2024 15:40
@github-actions github-actions bot locked and limited conversation to collaborators May 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/CLI Affects the command line area/datastore Affects the storage system area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants