Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix close in use certificate providers after double Close() method call on wrapper object #7128

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bozaro
Copy link

@bozaro bozaro commented Apr 16, 2024

wrappedProvider add refcounter to certprovider.Provider, but there no guards to prevent multiple Close() call after single Build() call.

The situation is complicated by:

  • the error does not appear where there was a double closure;
  • before the rotation of certificates, the problem may not manifest itself (that is, the effect is greatly delayed in time);
  • certprovider.Provider implementation don't panic/error on multiple Close() calls.

Fix errors like:

[core][Channel #39 SubChannel #941] grpc: addrConn.createTransport failed to connect to ...
Err: connection error: desc = "transport: authentication handshake failed:
xds: fetching trusted roots from CertificateProvider failed: provider instance is closed"

RELEASE NOTES:

  • Fix xds: fetching trusted roots from CertificateProvider failed: provider instance is closed error

…call on wrapper object

Fix errors like:
```
[core][Channel grpc#39 SubChannel grpc#941] grpc: addrConn.createTransport failed to connect to ...
Err: connection error: desc = "transport: authentication handshake failed:
xds: fetching trusted roots from CertificateProvider failed: provider instance is closed"
```
@aranjans aranjans added this to the 1.64 Release milestone Apr 17, 2024
@arvindbr8
Copy link
Member

arvindbr8 commented Apr 17, 2024

@bozaro -- Calling Close() feels more like a User error to me and wouldnt categorize this as a bug. I agree that the API could have been better by not providing a Close() method which can be called multiple times, which would cause incorrect accounting wrt wp.refCount.

That being said, inorder to fix this, I would rather make Provider.Close a grpcsync.OnceFunc. That way we can cleanly assert that a reference can only decrement their reference when calling Close().

@bozaro
Copy link
Author

bozaro commented Apr 18, 2024

Calling Close() feels more like a User error to me and wouldnt categorize this as a bug.

I think that even if the double Close() call is considered a user error, it is useful to do a check here to localize the consequences of it and not play the whack-a-mole game.

The situation when all neighboring uses must be written correctly for your code to work is very fragile.

I have found at least two places through which double Close() is possible:

That being said, inorder to fix this, I would rather make Provider.Close a grpcsync.OnceFunc.

I can also add a wrapper in grpcsync.OnceFunc, but this will not make the code simpler: I still need to check the closing status to return an error from provider.KeyMaterial of the closed wrapper.

@easwars
Copy link
Contributor

easwars commented May 7, 2024

@bozaro : Thank you for figuring this out and sending us a PR.

There are a few options available to fix this issue:

  • Make breaking API changes to this package
    • Remove the Close method from the Provider interface
    • Change the signature of the starter argument passed to NewBuildableConfig to return a tuple Provider, func(). The second return value is a close function to be invoked when the provider is to be closed.
    • Change BuildableConfig.Build() to return three values: (Provider, func(), error) . The second return value would the close function returned by the provider wrapped in a grpcsync.OnceFunc.
    • With this approach, the user of a provider will be able to close it how many ever times they want, and things will still continue to work cleanly. But this approach could hide potential bugs in the user's code that never show up in their tests, but only happen in production. Therefore we want to avoid this approach
  • The approach taken by you, where we return a separate per-instantiation wrapper that wraps the current wrappedProvider.
    • With this approach, the user of a provider instance that has called Close will start seeing errProviderClosed on subsequent invocations of KeyMaterial()
    • While this approach is a significant improvement over the existing code, this still will not help the user easily debug when and where they are double closing.

Instead the approach that we like (which is quite similar to your approach) is as follows:

  • Have a per-instantiation wrapper that wraps the current wrappedProvider. Call it something else, wrappedProviderCloser is too close to wrappedProvider. Maybe something like singleCloseWrappedProvider or singleRefWrappedProvider or something better.
type singleCloseWrappedProvider struct {
       mu       sync.Mutex
       provider Provider
}

func (s *singleCloseWrappedProvider) KeyMaterial(ctx context.Context) (*KeyMaterial, error) {
       s.mu.Lock()
       p := s.provider
       s.mu.Unlock()
       return p.KeyMaterial(ctx)
}
func (s *singleCloseWrappedProvider) Close() {
       s.mu.Lock()
       s.provider.Close()
       s.provider = panickingProvider{}
       s.mu.Unlock()
}
  • Have a provider implementation that panics in its Close() method.
type panickingProvider struct{}

func (panickingProvider) Close() {
       panic("Close called more than once on certificate provider")
}
func (panickingProvider) KeyMaterial(context.Context) (*KeyMaterial, error) {
       return nil, errProviderClosed
}
  • Change BuildableConfig.Build() to always return an instance of singleCloseWrappedProvider wrapping a newly created wrappedProvider or an existing one (in which case, the ref count on it would have to be incremented, as is done today)

Please let me know your thoughts on this approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants