feat: crd provisioner use cache #176

sunpa93 · 2021-09-18T00:15:18Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Add cached client managed by controller manager to crd provisioner to allow cached queries for CRIs in crd provisioner calls.
This would help reduce overload on apiserver

Which issue(s) this PR fixes:

Fixes #

Requirements:

uses conventional commit messages
includes documentation
adds unit tests
tested upgrade from previous version

Special notes for your reviewer:

Release note:

none

edreed · 2021-09-20T16:38:19Z

pkg/azuredisk/azuredisk_v2.go

@@ -297,6 +298,7 @@ func (d *DriverV2) StartControllersAndDieOnExit(ctx context.Context) {
 	}

 	sharedState := controller.NewSharedState()
+	sharedState.RegisterCachedClient(mgr.GetClient())


Consider passing the mgr.GetClient() to the NewSharedState() function instead of through a new interface method.

I actually meant to insert this line once the controller manager acquires the leader lock so that the cached client is not registered for those that failed to acquire the lock.

azuredisk-csi-driver/pkg/azuredisk/azuredisk_v2.go

Line 356 in 6578e37

<-mgr.Elected()

Wouldn't it make sense to initialize sharedState first and post-append cached client in that case?

Nvm. As long as we register sharedState to crdProvisioner only when leader lock is acquired, that should do.

Ideally, sharedState should be fully initialized (including setting the cached client) before being passed to its dependents through their constructor functions. Post-initialization inject of objects in makes it difficult to track down dependencies and make nil-reference bugs possible since it may not be clear at the point of use whether one can assume it was initialized. Consumers of this field have to deal with the possibility that it may be nil, increasing complexity.

I am equally concerned about the RegisterSharedState method on the CrdProvisioner but I don't see a good way of handling this unless we drastically change how the driver and controllers are initialized.

sunpa93 · 2021-09-20T16:43:22Z

pkg/provisioner/crdprovisioner.go

+	if c.controllerSharedState == nil {
+		return nil
+	}
+	return c.getCachedClient()


return c.controllerSharedState.GetCachedClient()

edreed · 2021-09-20T16:52:59Z

pkg/provisioner/crdprovisioner.go

-	namespace    string
+	azDiskClient          azDiskClientSet.Interface
+	namespace             string
+	controllerSharedState *controller.SharedState


Consider renaming this field to simply sharedState.

edreed · 2021-09-20T16:54:41Z

pkg/controller/pod.go

@@ -55,7 +54,7 @@ func (r *ReconcilePod) Reconcile(ctx context.Context, request reconcile.Request)
 	var pod corev1.Pod
 	klog.V(5).Infof("Reconcile pod %s.", request.Name)
 	podKey := getQualifiedName(request.Namespace, request.Name)
-	if err := r.client.Get(ctx, request.NamespacedName, &pod); err != nil {
+	if err := r.controllerSharedState.cachedClient.Get(ctx, request.NamespacedName, &pod); err != nil {


Consider renaming controllerSharedState to simply sharedState.

edreed · 2021-09-20T17:45:24Z

I have overall concerns with this approach. While it is convenient and desirable to share resources (i.e. object cache to reduce direct API server calls), The CrdProvisioner is logically part of the driver where the controllers are separate services that just happen to be hosted in the driver executable/process for convenience.

The post-Mandalay plan intends to partition the work done by the controllers (specifically for volume creation and attachment management) across multiple nodes. We may repackage the controllers into a separate image at this point. It may also be possible regardless of packaging that the driver and controllers may no longer coexist in the same process reducing the opportunity for shared state.

It would also be beneficial to replace our current polling in the CrdProvisioner with a watcher based approach to waiting for CRI status updates. The current controller-runtime support for using watches was recently added, requires a separate client with separate cache and not really for use with controllers. (See kubernetes-sigs/controller-runtime#1460.) Each watch appears to also make an explicit call to the API server to setup the watch.

I am of the opinion that the CrdProvisioner should instead use our auto-generated informers to provide a cached client much like what is done in the scheduler extender. We could then use the informers' events to build an event driven wait mechanism for CRI status changes in addition to leveraging the informers' caching. Informer events are based on a multiplexed watch setup reducing overall calls to API server. That is, we only have a single watch per object type and namespace that are used to drive the events rather than creating a new watch for each CRI monitored for status updates.

@abhisheksinghbaghel @landreasyan Do you have any thoughts about this?

sunpa93 · 2021-09-20T18:06:12Z

I am for this opinion on adding an informer and revising the current polling method to use watch events. I think we should add caching to crdProvisioner at one point to avoid excessive api-server calls. And as per post-mandalay plan you just brought up, it wouldn't make sense for the controllers and crdProvisioner to share resources.

And maybe we can set up go channels for informers' event handlers to use to signal the controllerServer calls that volume attachment / detachment / creation / deletion / update has been completed?

edreed · 2021-09-20T18:45:52Z

I am for this opinion on adding an informer and revising the current polling method to use watch events. I think we should add caching to crdProvisioner at one point to avoid excessive api-server calls. And as per post-mandalay plan you just brought up, it wouldn't make sense for the controllers and crdProvisioner to share resources.

The informers provide caching and use watches to drive the cache to eventual consistency. The cached objects are accessible through lister interfaces. e.g. AzVolumeAttachmentNamespaceLister:

obj, err := azVolumeAttachmentInformer.Lister().AzVolumeAttachments("azure-disk-csi").Get(attachmentName)

And maybe we can set up go channels for informers' event handlers to use to signal the controllerServer calls that volume attachment / detachment / creation / deletion / update has been completed?

Yes. We could use the informers' event handlers to implement an interface like:

func (w *conditionWatcher) WatchForObjectCondition(ctx context.Context, obj runtime.Object, conditionFunc func (obj runtime.Object) bool) (runtime.Object, error) <- chan

WatchForObjectCondition watches for the condition of the specified object to be satisfied. It returns the updated state of the object, or an error if wait fails due to timeout or other error condition, through the returned channel. The caller is expected to call this function before starting the operation that would result in the object's condition changes.

Internally, conditionWatcher would maintain a map of object types to a map of object namespaced names to a list of pending waiters' condition functions and return value channel. On receiving an event for an object, conditionWatcher looks up pending waiters, executes their condition function and sends the updated object to the watcher if true is returned. It should also respect the deadline specified in context.Context to handle timeouts and send a timeout error on expiry.

sunpa93 · 2021-09-20T18:51:33Z

The informers provide caching and use watches to drive the cache to eventual consistency. The cached objects are accessible through lister interfaces. e.g. AzVolumeAttachmentNamespaceLister:

Oh, I didn't mean to say that we should add an additional caching mechanism other than the one provided by the informer. I meant that 1) necessity of caching and 2) inability for crdProvisioner to use controllers' shared resources gives use the logical justification to add informers to the crdProvisioner.

edreed · 2021-09-20T18:59:05Z

And maybe we can set up go channels for informers' event handlers to use to signal the controllerServer calls that volume attachment / detachment / creation / deletion / update has been completed?

One additional point here is that the watch-based wait should be transparent to the controllerServer. I'm not sure it makes sense to expose the watcher result channel to it, if that's what you were getting at.

sunpa93 · 2021-09-20T19:55:17Z

And maybe we can set up go channels for informers' event handlers to use to signal the controllerServer calls that volume attachment / detachment / creation / deletion / update has been completed?

One additional point here is that the watch-based wait should be transparent to the controllerServer. I'm not sure it makes sense to expose the watcher result channel to it, if that's what you were getting at.

I actually meant to say crdProvisioner here sorry. I guess what I was envisioning is not too different from what we have right now. Without too much focus on the details, these are along the lines of what I envisioned.

type CrdProvisioner struct {
    ...
    conditionWatcher conditionWatcher
}

type waitEntry struct {
    conditionFunc func(obj runtime.Object) (bool, error)
    waitChan chan <-(runtime.Object, error)
}

type conditionWatcher struct {
    informerMap map[Type]cache.SharedIndexInformer // maps object type to its informer
    waitMap sync.Map // maps namespaced name to waitEntry
}

func (w *conditionWatcher) WatchForObjectCondition(ctx context.Context, objectName string, conditionFunc func (obj runtime.Object) bool) (runtime.Object, error) {
    waitChan := make(chan (runtimeObject, error), 1)
    entry := waitEntry {
        conditionFunc: conditionFunc,
        waitChan: waitChan
    }
    conditionWatcher.waitMap.Store(qualifiedName, conditionFunc)
    defer conditionWatcher.waitMap.Delete(qualifiedName)

    select {
		case <-ctx.Done():
			errMsg := fmt.Sprintf("context timeout for %v (%s)", reflect.TypeOf(obj), qualifiedName)
            klog.Error(errMsg)
			return nil, Status.Errorf(codes.Internal, errMsg)
        case obj, err := <- waitChan:
            return obj, err
        default:
            return nil, status.Errorf(Aborted, "unexpected error")
        }
}

And as you said, at event handling, we would check the map, evaluate the conditionFunc, and return its value to the stored channel.

And for every crdProvisioner PublishVolume, UnpublishVolume, CreateVolume, DeleteVolume, UpdateVolume calls, it would invoke WathcForObjectCondition to check for desired object conditions.

edreed · 2021-09-20T20:37:58Z

I actually meant to say crdProvisioner here sorry. I guess what I was envisioning is not too different from what we have right now. Without too much focus on the details, these are along the lines of what I envisioned.
...

This looks pretty good, but the fact you're doing the wait in WatchForObjectCondition requires that it be called after the operation that will result in the change. In this case, WatchForObjectCondition would need to Get the object after adding the condition function to the map since the object may be been updated in the interim. This is fine, but we would need to deal with the a case where the object doesn't yet exist and ignore a "NotFound" error. This would be reasonably straightforward to implement. If you choose this route, I would recommend renaming the method WaitForObjectCondition instead.

I had originally thought to have the caller wait on the result channel directly, but this means it also has to deal with context cancelation/timeout separately. I like how your solution handles both - it makes it easier to use. One enhancement I can think of to my original proposal would be to have WatchForObjectCondition instead return a conditionWaiter which could wrap waiting on both the result channel and context, e.g.

type conditionWaiter struct {
    objectType string
    objectName string
    watcher conditionWatcher
    resultChan chan <- (runtime.Object, error)
}

func (w *conditionWaiter) Wait(ctx context.Context) (runtime.Object, error) {
    select {
    case <-ctx.Done():
        errMsg := fmt.Sprintf("context timeout for %s (%s)", w.objectType, w.objectName )
        klog.Error(errMsg)
        return nil, Status.Errorf(codes.DeadlineExceeded, errMsg)
    case obj, err := <- w.resultChan:
        return obj, err
    default:
        return nil, status.Errorf(codes.Internal, "unexpected error")
    }
}

func (w *conditionWaiter) Close() {
    w.watcher.waitMap.Delete(w.objectName)
}

func (w *conditionWatcher) WatchForObjectCondition(obj runtime.Object, conditionFunc func (obj runtime.Object) bool) (conditionWaiter, error)

Here's an example of usage:

func (p* CrdProvisioner) CreateVolume(ctx context.Context, ...) (*v1alpha1.AzVolumeStatusParams, error) {
    newAzVolume = &v1alpha1.AzVolume{ Metadata: metav1.ObjectMetadata{ Name: newVolumeName,, Namespace: ns }, ... }

    waiter, err := p.conditionWatcher.WatchForCondition(newAzVolume, func (obj runtime.Object) bool {
        azVol := obj.(*v1alpha1.AzVolume)
        if azVol.Status.Detail != nil {
            return true
        }
        return false
    })
    if err != nil {
        return nil, err
    }
    defer waiter.Close()

    // Create AzVolume CRI here
    ...

    newAzVolume, err = waiter.Wait(ctx)
    if err != nil {
        return nil, err
    }

    // Handle AzError returns here
    ...

    return newAzVolume.Status.Detail.ResponseObject, nil

I am fine with either approach, though.

sunpa93 · 2021-09-29T21:34:58Z

replaced with #199

feat: make crd provisioner use cached client to get objects

6578e37

sunpa93 force-pushed the crdProvisioner-use-cache branch from 6c05ad8 to 6578e37 Compare September 18, 2021 00:32

edreed reviewed Sep 20, 2021

View reviewed changes

sunpa93 commented Sep 20, 2021

View reviewed changes

edreed reviewed Sep 20, 2021

View reviewed changes

sunpa93 added 3 commits September 20, 2021 16:55

fix: rename field and fix bug

6485f8f

use cache only when cachedClient is registered for crdProvisioner

9bcbd06

refactor shared state codes

3e4d941

sunpa93 force-pushed the crdProvisioner-use-cache branch from e4403c4 to 3e4d941 Compare September 20, 2021 17:43

edreed mentioned this pull request Sep 23, 2021

feat: async attach detach #174

Closed

4 tasks

edreed added this to In progress in Mandalay Sep 23, 2021

edreed mentioned this pull request Sep 23, 2021

bug: Significant client-side throttling in CrdProvisioner #191

Closed

sunpa93 closed this Sep 29, 2021

Mandalay automation moved this from In progress to Done Sep 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: crd provisioner use cache #176

feat: crd provisioner use cache #176

sunpa93 commented Sep 18, 2021

edreed Sep 20, 2021

sunpa93 Sep 20, 2021

sunpa93 Sep 20, 2021

edreed Sep 20, 2021

sunpa93 Sep 20, 2021 •

edited

edreed Sep 20, 2021

edreed Sep 20, 2021

edreed commented Sep 20, 2021 •

edited

sunpa93 commented Sep 20, 2021

edreed commented Sep 20, 2021 •

edited

sunpa93 commented Sep 20, 2021

edreed commented Sep 20, 2021

sunpa93 commented Sep 20, 2021 •

edited

edreed commented Sep 20, 2021 •

edited

sunpa93 commented Sep 29, 2021

feat: crd provisioner use cache #176

feat: crd provisioner use cache #176

Conversation

sunpa93 commented Sep 18, 2021

edreed Sep 20, 2021

Choose a reason for hiding this comment

sunpa93 Sep 20, 2021

Choose a reason for hiding this comment

sunpa93 Sep 20, 2021

Choose a reason for hiding this comment

edreed Sep 20, 2021

Choose a reason for hiding this comment

sunpa93 Sep 20, 2021 • edited

Choose a reason for hiding this comment

edreed Sep 20, 2021

Choose a reason for hiding this comment

edreed Sep 20, 2021

Choose a reason for hiding this comment

edreed commented Sep 20, 2021 • edited

sunpa93 commented Sep 20, 2021

edreed commented Sep 20, 2021 • edited

sunpa93 commented Sep 20, 2021

edreed commented Sep 20, 2021

sunpa93 commented Sep 20, 2021 • edited

edreed commented Sep 20, 2021 • edited

sunpa93 commented Sep 29, 2021

sunpa93 Sep 20, 2021 •

edited

edreed commented Sep 20, 2021 •

edited

edreed commented Sep 20, 2021 •

edited

sunpa93 commented Sep 20, 2021 •

edited

edreed commented Sep 20, 2021 •

edited