Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate to controller-runtime controllers for k8s watch events #3510

Merged
merged 8 commits into from May 11, 2021

Conversation

stevesloka
Copy link
Member

@stevesloka stevesloka commented Mar 23, 2021

This replaces the internal informers and moves to the controller-runtime controllers
setting up Contour to migrate over to the controller-runtime caches.

Note: This only currently applies to gateway-api resources. Once a GA version of controller-runtime is released, we can migrate the rest of the resources over.

Updates #2683

Signed-off-by: Steve Sloka slokas@vmware.com

@stevesloka stevesloka requested a review from a team as a code owner March 23, 2021 15:39
@stevesloka stevesloka requested review from danehans and skriss and removed request for a team March 23, 2021 15:39
@codecov
Copy link

codecov bot commented Mar 23, 2021

Codecov Report

Merging #3510 (041e9c7) into main (7ed38de) will decrease coverage by 0.84%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3510      +/-   ##
==========================================
- Coverage   77.06%   76.21%   -0.85%     
==========================================
  Files         100      103       +3     
  Lines        7116     7195      +79     
==========================================
  Hits         5484     5484              
- Misses       1511     1590      +79     
  Partials      121      121              
Impacted Files Coverage Δ
cmd/contour/serve.go 10.92% <0.00%> (-0.35%) ⬇️
internal/controller/gateway.go 0.00% <0.00%> (ø)
internal/controller/httproute.go 0.00% <0.00%> (ø)
internal/controller/tlsroute.go 0.00% <0.00%> (ø)

@youngnick
Copy link
Member

Thanks for doing this Steve. It looks much more straightforward than I feared.

How does this fit into the overall "migrate everything to controller-runtime" plan? What's left to do after this?

@stevesloka
Copy link
Member Author

How does this fit into the overall "migrate everything to controller-runtime" plan? What's left to do after this?

This essentially just replaces the informers with controller-runtime. To use the runtime caches, we'd need to totally swap out cache.go and figure out how to do the new filtering bits since there's a lot of logic there now that would need to be done somewhere else in some way. That's the more difficult part and would need a bunch of thought as to how to approach that problem.

@danehans
Copy link
Contributor

danehans commented Apr 1, 2021

@stevesloka needs a rebase.

// Setup a Manager
mgr, err := manager.New(controller_config.GetConfigOrDie(), manager.Options{})
if err != nil {
log.Fatal(err, "unable to set up overall controller manager")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest dropping"overall", maybe something like:
"unable to create contour manager" or "unable to create contour server manager".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped "overall".

internal/dag/cache.go Outdated Show resolved Hide resolved
func (r *backendPolicyReconciler) Reconcile(ctx context.Context, request reconcile.Request) (reconcile.Result, error) {

// Fetch the BackendPolicy from the cache.
gateway := &gatewayapi_v1alpha1.BackendPolicy{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this var name misleading, maybe something like policy or be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm the best at copy/paste. =)

if err != nil {
return nil, err
}
if err := c.Watch(&source.Kind{Type: &gatewayapi_v1alpha1.Gateway{}}, &handler.EnqueueRequestForObject{}); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should gateway resources be filtered using a predicate? For example, only enqueue gateways that match Contour's gateway config and reference a GatewayClass owned by contour or contour-operator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say "and/or" since the GatewayClass is not required, but aside from that I agree.

Copy link
Contributor

@danehans danehans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I provided a few bits of feedback throughout.

Copy link
Member

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this much boilerplate, but it seems like it's a necessary step on getting us closer to using controller-runtime fully, including caches.

A small non-blocking suggestion about keeping doServe small as well.

Thanks @stevesloka for getting this done.

cmd/contour/serve.go Outdated Show resolved Hide resolved

// NewEndpointController creates the endpoint controller from mgr. The controller will be pre-configured
// to watch for Endpoint objects across all namespaces.
func NewEndpointController(mgr manager.Manager, eventHandler cache.ResourceEventHandler, log logrus.FieldLogger) (controller.Controller, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tangentially related: We're also going to have to figure out what to do about EndpointSlice at some point, since Endpoints will go away (I think) in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I agree: #2696

if err != nil {
return nil, err
}
if err := c.Watch(&source.Kind{Type: &gatewayapi_v1alpha1.Gateway{}}, &handler.EnqueueRequestForObject{}); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say "and/or" since the GatewayClass is not required, but aside from that I agree.

@@ -37,34 +37,14 @@ type DynamicClientHandler struct {
}

func (d *DynamicClientHandler) OnAdd(obj interface{}) {
obj, err := d.Converter.FromUnstructured(obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed that we'll never get Unstructured? If so, neat.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we shouldn't since the controller-runtime is doing all of this for us now.

Copy link
Member

@sunjayBhatia sunjayBhatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cmd/contour/ingressstatus.go file also sets up some informers so it should probably be updated as well to use the same pattern

I guess I would echo the sentiments about trying to deduplicate some of the repeated bits in the controller constructors/reconcile methods, if I need to make a change it would be great to do it one place rather than many since its pretty much the same code

maybe a constructor that takes a type and for any slight differences between the controllers for different types there could be an optional mutator function or something? not sure how much would be different between different controllers other than different predicates/filters

if err != nil {
log.WithError(err).WithField("resource", r).Fatal("failed to create informer")
}
//err = contour_api_v1.AddToScheme(mgr.GetScheme())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some extra content here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -231,7 +243,7 @@ func doServe(log logrus.FieldLogger, ctx *serveContext) error {

// Before we can build the event handler, we need to initialize the converter we'll
// use to convert from Unstructured.
converter, err := k8s.NewUnstructuredConverter()
converter, err := k8s.NewUnstructuredConverter(mgr.GetScheme())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense to create the scheme with the gvk's that we care about and pass the scheme in as a manager.Option?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's sort of what was happening here:

func NewContourScheme(s *runtime.Scheme) error {

Copy link
Member

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevesloka could you provide some additional context on what the end state for this looks like? I'm having a little trouble giving feedback on approach since I'm not sure where we're trying to get to. My understanding is that we're already using some controller-runtime libraries for the informers (see https://github.com/projectcontour/contour/blob/main/internal/k8s/clients.go#L36 and https://github.com/projectcontour/contour/blob/main/internal/k8s/clients.go#L95). What is the role of these reconcilers going to be in the target state? Is that going to change how the DAG is processed?

@stevesloka
Copy link
Member Author

The goal of this change was to allow us to manage separate controllers for different objects, mainly the gatewayapi work. We have cleaner hooks now to work on changes to gatewayclasses, gateways, etc which may not be DAG relevant (i.e. a new gateway might cause an envoy service to be created, etc).

There has been discussion of removing the internal k8s cache and moving to a controller-runtime cache, but that needs some more thoughts as to how to approach since we filter on the k8s cache so much today.

@stevesloka
Copy link
Member Author

I guess I would echo the sentiments about trying to deduplicate some of the repeated bits in the controller constructors/reconcile methods, if I need to make a change it would be great to do it one place rather than many since its pretty much the same code

@sunjayBhatia yup I agree that there are duplicated bits at the moment, but I think when @danehans starts to add some of the "operator" logic that might make more sense. We've overloaded the cahce.go file to do so much, splitting out more I think would be helpful.

Things we could do would be to move the secret validation out, the logic to know if a service is referenced from an ingress object, etc. That way we'd have a cleaner approach to look at when making changes and the k8s cache turns into just a cache, not a cache w/logic.

@skriss
Copy link
Member

skriss commented Apr 7, 2021

The goal of this change was to allow us to manage separate controllers for different objects, mainly the gatewayapi work. We have cleaner hooks now to work on changes to gatewayclasses, gateways, etc which may not be DAG relevant (i.e. a new gateway might cause an envoy service to be created, etc).

So is this primarily useful only if we decide to move the gatewayclass/gateway controllers into Contour? If so, I'm wondering if it'd make sense to keep this in the experimental branch that you/@danehans are going to be working on for now until we make a decision on path forward there. Otherwise, this seems like a fair amount of overhead to add to Contour if all the controllers are doing is passing events through to the event handler since the informers are capable of doing that today. Maybe we can spend some time talking through this one in the maintainers meeting.

@stevesloka
Copy link
Member Author

Aside from GatewayAPI work, this allows us to further clean up cache.go file to not do so much, splitting out into different places which would also be helpful.

We do need to discuss if we do want to move to controller-runtime, doing that would be best done in steps vs a drastic change.

I think fine to keep experimental until we decide the above steps.

@github-actions
Copy link

Marking this PR stale since there has been no activity for 14 days. It will be closed if there is no activity for another 90 days.

@stevesloka
Copy link
Member Author

rebased @danehans 🚀

@danehans
Copy link
Contributor

danehans commented May 7, 2021

@stevesloka the updated PR no longer starts the controller-runtime manager. See this for the fix.

@stevesloka
Copy link
Member Author

@danehans ahh good catch! Interesting that the events still worked? The integration tests all passed fine.

Copy link
Member

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still LGTM, I'd like to get this one in so we can unblock further Gateway API work.

cmd/contour/serve.go Outdated Show resolved Hide resolved
cmd/contour/serve.go Outdated Show resolved Hide resolved
internal/k8s/cache/gateway.go Outdated Show resolved Hide resolved
internal/k8s/cache/gateway.go Outdated Show resolved Hide resolved
internal/k8s/cache/gateway.go Outdated Show resolved Hide resolved
@skriss
Copy link
Member

skriss commented May 11, 2021

Looks like you hit #3660 in the E2E run (fixed by #3662). We can ignore since it's not a required check yet.


// NewGatewayController creates the gateway controller from mgr. The controller will be pre-configured
// to watch for Gateway objects across all namespaces.
func NewGatewayController(mgr manager.Manager, eventHandler cache.ResourceEventHandler, log logrus.FieldLogger) (controller.Controller, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like 90% of code between the controller files is the same, seems like a future refactor could be to have a controller factory that creates a resource specific controller given a type and a few other fields so we don't have to duplicate?

do we foresee needing different logic for different resources?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I think we will for sure need different logic. We'll need to manage finalizers as well as Envoy instances, etc.

@danehans
Copy link
Contributor

Yeah I think reconciler makes sense.

@stevesloka kubebuilder uses "controller" to name the pkg. You may want to consider following suit.

@stevesloka
Copy link
Member Author

Ok all fixed up I think and also renamed to controller. @danehans / @skriss

Copy link
Member

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending CI.

@skriss
Copy link
Member

skriss commented May 11, 2021

sorry, merged another PR and caused a merge conflict.

Signed-off-by: Steve Sloka <slokas@vmware.com>
Moves from informers to controller-runtime for gateway-api processing.

Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: Steve Sloka <slokas@vmware.com>
@stevesloka stevesloka merged commit db368dc into projectcontour:main May 11, 2021
@stevesloka stevesloka deleted the controllerRuntime branch May 11, 2021 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants