Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMPROVED] Fast and Direct access to stream messages. #3158

Merged
merged 2 commits into from Jun 5, 2022

Conversation

derekcollison
Copy link
Member

Stream get for KV was going through the API layer, but with the popularity, needed a more performant, lighter weight and direct approach. This skips the API layer, which currently still floods and does a bunch of API layer stats and client info tracking.

We could look into using a DQ for R>1 as well, or suggest that for KV just use mirrors for horizontal scalability for reads.

Signed-off-by: Derek Collison derek@nats.io

/cc @nats-io/core

…eded a more peformant and lighter weight and direct approach.

Signed-off-by: Derek Collison <derek@nats.io>
@@ -72,6 +72,8 @@ type StreamConfig struct {
// AllowRollup allows messages to be placed into the system and purge
// all older messages using a special msg header.
AllowRollup bool `json:"allow_rollup_hdrs"`
// Allow higher peformance, direct access to get individual messages.
AllowDirect bool `json:"allow_direct,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this option makes it so that KV would have to examine not only the server version but the stream info to know if this feature is enabled. Why does this feature need stream opt-in, but the other doesn't?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would only get true here on streams where its opted in AND the server is the correct version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct it is actually simpler, but the point is that we have to enable some option to enable the API, but we didn't for the other, so what is so special about this one?

Copy link
Contributor

@ripienaar ripienaar May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From discussions with Derek - but not yet from reading the whole PR - this sets up a new subject where you can request data bypassing various expensive encodings, audits and things. It's opt in so these things dont listen on all servers etc. Crucially also bypasses the all-servers system import of $JS.API.> - hence "direct"

I have concerns about upgrading to the new model and breaking old clients - KV is now WIDELY used - so this is opt in behavior and behavior appropriately versions client libraries can detect and choose to use if they wish. There are also concerns about breaking existing deployments wrt we are adding new subjects and their ACLs might not cater for them meaning the client either breaks or we now have a new way to exfil data users dont know about. Not to mention cross account/domain issues.

Feature is there to make N million clients work rather than your garden variety day to day, and will have complexities cross accounts/leafnodes etc being on a subject outside of $JS.API.> (all the domain denies etc).

So given all above, its opt-in for the moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct, only new servers will return stream info with this set to true. Clients will only be able to use direct method if this is set to true, otherwise use current behavior through $JS.API route.

1. Do not use original subject since this could use Request() and we want to use muxing.
2  Place original subject and timestamp into headers.

Signed-off-by: Derek Collison <derek@nats.io>
Copy link
Member

@aricart aricart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

// Will return the message similar to how a consumer receives the message, no JSON processing.
// If the message can not be found we will use a status header of 404. If the stream does not exist the client will get a no-responders or timeout.
JSDirectMsgGet = "$JS.DS.GET.*"
JSDirectMsgGetT = "$JS.DS.GET.%s"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesnt domains allow 2 domains to host the same name stream? But the core subject space is shared by all domains in an account? So just saying stream name could lead to issues right?

Not that up to speed with domains so might just be confused

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair question.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be thought through a bit more, but even now if those streams have the same subjects they will be placed into both.

I also think we should place the direct get in a DQ group by default such that replicas can share the load.

This is experimental for now as we settle in the API and subjects and mechanics. Will merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm the domain scenario where 2 streams have the same name and subject space - a monumentally bad idea - isn’t about replicas it’s about like regional to hub migration/consolidation.

So it’s not about replicas sharing the load. Data might just not be there.

Wirh this I would say direct must not be default behaviour but sometning you opt in when you really grok your setup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct is opt-in, agree you don't want to cross over with Domains, just saying that issues already is present with ingest subjects.

@derekcollison
Copy link
Member Author

Also we should decide if we want that subscriber to be a queue subscriber and make cure clients detect write them immediate read. Trivial change to enable and we could even monitor and drop peers that are too far behind from the group.

@aricart
Copy link
Member

aricart commented Jun 2, 2022

@derekcollison This works properly with mux subscriptions, it would be awesome if regular pull subscriber requests also honored the same behaviour. As some of the operations on the request status can be greatly simplified if this is the case.

@derekcollison
Copy link
Member Author

We need to do a direct access for pull requests as well, agree.

@derekcollison derekcollison merged commit fddc31a into main Jun 5, 2022
@derekcollison derekcollison deleted the kv-direct-get branch June 5, 2022 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants