New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch spanner datastore to use the built-in stats table for estimating rel count #1892
Conversation
19718f3
to
ae9eb77
Compare
pkg/cmd/datastore/datastore.go
Outdated
@@ -213,6 +214,7 @@ func RegisterDatastoreFlagsWithPrefix(flagSet *pflag.FlagSet, prefix string, opt | |||
flagSet.StringVar(&opts.SpannerEmulatorHost, flagName("datastore-spanner-emulator-host"), "", "URI of spanner emulator instance used for development and testing (e.g. localhost:9010)") | |||
flagSet.Uint64Var(&opts.SpannerMinSessions, flagName("datastore-spanner-min-sessions"), 100, "minimum number of sessions across all Spanner gRPC connections the client can have at a given time") | |||
flagSet.Uint64Var(&opts.SpannerMaxSessions, flagName("datastore-spanner-max-sessions"), 400, "maximum number of sessions across all Spanner gRPC connections the client can have at a given time") | |||
flagSet.Uint64Var(&opts.SpannerEstimatedBytesPerRelationship, flagName("datastore-spanner-estimated-bytes-per-relationship"), spanner.DefaultEstimatedBytesPerRelationship, "estimated number of bytes per relationship tuple in the spanner instance") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does seem like a strange flag to have the user provide. While I understand the motivation of performance and simplifying the code, it feels clunky because it requires internal knowledge of how Spanner stores data on disk.
Even just querying Spanner for the size of a random stored relationship tuple when Statistics()
is called would be probably be good enough to estimate the number of relationships, instead of providing this flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is odd, true. I added it as a flag so users can set it IF they want more accurate estimates, not that the estimates are used for much as-is :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how many datastore-specific flags we already have, avoiding adding more when it isn't absolutely necessary will reduce potential tech debt if we decide to consolidate the flags in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think I should just hard code in then? Alternatively, we could make it load a random relationship and estimate based on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loading a random relationship and using that as an estimate seems like a sane approach to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to load some relationships and calculate the estimated size based on them, with a fallback if none found
ae9eb77
to
c43ad8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
3b2232a
to
557d5e2
Compare
8cddb42
to
df9b26c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please replace the fmt.Prinlnt
statements with trace-level logging if need be
…ng rel count While this will be far less accurate of an estimate, it removes the need to write to a stats table on every write and delete, which should help with performance
df9b26c
to
d05a0f9
Compare
Updated |
While this will be far less accurate of an estimate, it removes the need to write to a stats table on every write and delete, which should help with performance