Skip to content

Commit

Permalink
WIP: Fix deadlock during NRI plugin registration
Browse files Browse the repository at this point in the history
During NRI external plugin registration:

* acceptPluginConnections() is called
* adaptation lock from `nri/pkg/adaptation` is acquired
* `syncFn` is invoked
* `syncFn` acquires NRI lock in `pkg/nri/nri.go`

During container lifecycle events such as `ContainerStart`
* NRI lock is acquired in pkg/nri.go
* adaptation lock is acquired in `StateChange()` in `nri/pkg/adaptation`

As a result, the locking order during NRI plugin registration is:
* adaptation lock -> NRI lock

While the locking order during container starts is:
* NRI lock -> adaptation lock

Due the fact that the locking order is inverted and not consistent, it
it possible to encounter a deadlock.

To fix the issue, during NRI plugin registration, first acquire the NRI
lock (done via `syncFn` call) and only after acquire the adaptation lock.
This ensures that NRI plugin registration the locking order is adaption
lock -> NRI lock, which is consistent with the locking order during
container lifecycle events.

Fixes containerd/containerd#10085
  • Loading branch information
bobbypage committed Apr 18, 2024
1 parent 53d3371 commit bfd1507
Showing 1 changed file with 2 additions and 4 deletions.
6 changes: 2 additions & 4 deletions pkg/adaptation/adaptation.go
Expand Up @@ -431,18 +431,16 @@ func (r *Adaptation) acceptPluginConnections(l net.Listener) error {
continue
}

r.Lock()

err = r.syncFn(ctx, p.synchronize)
if err != nil {
log.Infof(ctx, "failed to synchronize plugin: %v", err)
} else {
r.Lock()
r.plugins = append(r.plugins, p)
r.sortPlugins()
r.Unlock()
}

r.Unlock()

log.Infof(ctx, "plugin %q connected", p.name())
}
}()
Expand Down

0 comments on commit bfd1507

Please sign in to comment.