Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shim events streaming #9704

Closed
wants to merge 15 commits into from
Closed

Shim events streaming #9704

wants to merge 15 commits into from

Conversation

mxpv
Copy link
Member

@mxpv mxpv commented Jan 29, 2024

In current implementation, for each shim instance, we typically maintain two TTRPC/GRPC connections. One from containerd to the shim to manage tasks, and the other one from shim back to containerd to post shim events. Since both TTRPC and GRPC now support streaming, we can utilize one bi-directional connection.

This PR needs a couple of cleanups, I plan to follow up to keep the diff reasonable.
Also 2f37cb6 adds a workaround for containerd/ttrpc#126 that needs to be reverted once the issue is addressed.

@k8s-ci-robot
Copy link

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@mxpv mxpv force-pushed the event_streaming branch 2 times, most recently from f4c2f2e to 479c052 Compare January 30, 2024 03:56
@mxpv mxpv changed the title Event streaming Shim events streaming Jan 30, 2024
@mxpv mxpv force-pushed the event_streaming branch 5 times, most recently from 5ffab05 to 2f37cb6 Compare February 2, 2024 01:42
@mxpv mxpv marked this pull request as ready for review February 2, 2024 02:11
@mxpv mxpv added this to the 2.0 milestone Feb 2, 2024
@mxpv
Copy link
Member Author

mxpv commented Feb 2, 2024

/test all

@mxpv
Copy link
Member Author

mxpv commented Feb 6, 2024

  • Rebased against main
  • Removed workarounds (see referenced PRs above).

pkg/oom/v1/v1.go Show resolved Hide resolved
core/runtime/v2/manager.go Show resolved Hide resolved
Namespace: ns,
Event: evt,
Timestamp: timestamppb.Now(),
}); err != nil {
log.G(ctx).WithError(err).Error("post event")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If containerd is restarting, original connection has been closed. It seems that the server side is still running. After containerd restarted, there are two connections watching on one channel. If old connection wins, it seems containerd won't get the event. Is it correct?

Maybe we can fail-fast if connection is closed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fair. We should exit stream if failed to send event.
Added 1bb75f5

Copy link
Member

@fuweid fuweid Feb 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need Context() in streamer. When ttrpc connection is closed, we can get notification from context and stop receiving events from channel. Or introduce multiple event receiver, like #9661. It can be done in the follow-up.

image

Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mxpv
Copy link
Member Author

mxpv commented Apr 11, 2024

Rebased against main

cmd/containerd-shim-runc-v2/task/service.go Show resolved Hide resolved
core/runtime/v2/manager.go Outdated Show resolved Hide resolved
core/runtime/v2/manager.go Outdated Show resolved Hide resolved
mxpv added 14 commits April 26, 2024 21:07
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
This reverts commit 2f37cb6.

Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
@k8s-ci-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dmcgowan
Copy link
Member

I don't think this one should be considered for 2.0. We can accomplish this without changing the task interface and I think we need more time to test the push vs pull events model. This approach is better but we need to consider how the shim can know an event was handled after it is placed on the stream.

@mxpv
Copy link
Member Author

mxpv commented Apr 30, 2024

yes, fair enough. I'll reiterate this some time in future.

@mxpv mxpv closed this Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants