Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource leak when creating private keys #4509

Closed
sfc-gh-jkowalski opened this issue Oct 18, 2022 · 7 comments · Fixed by #4556
Closed

Resource leak when creating private keys #4509

sfc-gh-jkowalski opened this issue Oct 18, 2022 · 7 comments · Fixed by #4556
Labels
Milestone

Comments

@sfc-gh-jkowalski
Copy link
Contributor

Describe the bug

We're frequently creating and closing KubernetesClient instances in our code, but it appears there's some kind of resource leak associated with private keys, leading to:

Caused by: com.cavium.cfm2.CFM2Exception: A call to the API importRSAPrivateKey from KeySpec failed with error code ab : HSM Error: Reached the limit of Max number of supported objects

Stack trace:

Caused by: io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:103)
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:97)
    at io.fabric8.kubernetes.client.utils.HttpClientUtils.applyCommonConfiguration(HttpClientUtils.java:223)
    at io.fabric8.kubernetes.client.okhttp.OkHttpClientFactory.createHttpClient(OkHttpClientFactory.java:96)
    at io.fabric8.kubernetes.client.okhttp.OkHttpClientFactory.createHttpClient(OkHttpClientFactory.java:32)
    at io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:174)
    at io.fabric8.kubernetes.client.DefaultKubernetesClient.<init>(DefaultKubernetesClient.java:153)
    ... 48 more
Caused by: java.security.spec.InvalidKeySpecException: com.cavium.cfm2.CFM2Exception: A call to the API importRSAPrivateKey from KeySpec failed with error code ab : HSM Error: Reached the limit of Max number of supported objects
    at com.cavium.key.factory.CaviumRSAKeyFactory.engineGeneratePrivate(CaviumRSAKeyFactory.java:119)
    at java.base/java.security.KeyFactory.generatePrivate(KeyFactory.java:384)
    at io.fabric8.kubernetes.client.internal.CertUtils.handleOtherKeys(CertUtils.java:177)
    at io.fabric8.kubernetes.client.internal.CertUtils.loadKey(CertUtils.java:136)
    at io.fabric8.kubernetes.client.internal.CertUtils.createKeyStore(CertUtils.java:112)
    at io.fabric8.kubernetes.client.internal.CertUtils.createKeyStore(CertUtils.java:247)
    at io.fabric8.kubernetes.client.internal.SSLUtils.keyManagers(SSLUtils.java:168)
    at io.fabric8.kubernetes.client.internal.SSLUtils.keyManagers(SSLUtils.java:157)
    at io.fabric8.kubernetes.client.utils.HttpClientUtils.applyCommonConfiguration(HttpClientUtils.java:214)
    ... 52 more
Caused by: com.cavium.cfm2.CFM2Exception: A call to the API importRSAPrivateKey from KeySpec failed with error code ab : HSM Error: Reached the limit of Max number of supported objects
    at com.cavium.cfm2.Util.handleCfm2Failure(Util.java:410)
    at com.cavium.cfm2.ImportKey.importRSAPrivateKey(ImportKey.java:315)
    at com.cavium.key.factory.CaviumRSAKeyFactory.engineGeneratePrivate(CaviumRSAKeyFactory.java:117)
    ... 60 more

Fabric8 Kubernetes Client version

6.0.0

Steps to reproduce

  1. Use CaviumRSAKeyFactory
  2. Run on EC2 instance with CloudHSM
  3. In a loop: create then close KubernetesClients connecting to a single cluster.

Expected behavior

Clients can be created indefinitely.

Runtime

Kubernetes (vanilla)

Kubernetes API Server version

1.23

Environment

Linux

Fabric8 Kubernetes Client Logs

Not sure what additional logs I can provide here.

Additional context

No response

@shawkins
Copy link
Contributor

@sfc-gh-jkowalski I'm not sure this is a resource leak on the part of the client. The issue seems to be with how the cavium provider works. The KeyFactory.generatePrivate call goes to importRSAPrivateKey, which seems to have a stateful side-effect. That does not happen with other providers, such as BouncyCastle.

Also the java.security.KeyFactory interface does not provide any mechanism to "destroy" a key - I'm not sure there is a built-in way to do that.

Without knowing more of their internals / expectations (perhaps they have some cleanup logic based upon garbage collection, or some provider specific method to call) it's hard to tell what to do here. Can you find docs / source on the cavium provider related to this exception?

@sfc-gh-jkowalski
Copy link
Contributor Author

java.security.PrivateKey extends Destroyable which has destroy() method. Is there a chance the client is not calling that?

@shawkins
Copy link
Contributor

Is there a chance the client is not calling that?

It is not. For other providers that would be just a in-memory operation to clear sensitive fields. Can you confirm for cavium that it allows for more generate calls?

@sfc-gh-jkowalski
Copy link
Contributor Author

Hmm, looks like CaviumRSAPrivateKey has default implementation of destroy() which throws DestroyFailedException

@shawkins
Copy link
Contributor

Hmm, looks like CaviumRSAPrivateKey has default implementation of destroy() which throws DestroyFailedException

Without their code / docs, I'm not sure how to make progress.

Is it possible to come at this problem by reducing the number of times you open / close clients?

sfc-gh-jkowalski added a commit to sfc-gh-jkowalski/kubernetes-client that referenced this issue Nov 2, 2022
Fixes fabric8io#4509

This took a while to find the root cause:

The internal SPI fallback logic inside `KeyFactory.generatePrivate()`
has the weird side effect of latching onto the LAST registered provider
(which in our case was Cavium) after `InvalidKeySpecException` is thrown.

This choice is sticky for a single instance of KeyFactory and the fix
for our issue is to get fresh `KeyFactory` instance when retrying.
@sfc-gh-jkowalski
Copy link
Contributor Author

We found a very surprising fix for this issue, which ends up being caused by reusing KeyFactory inside handleOtherCerts method. See PR #4556.

@sfc-gh-dbrown
Copy link

Here is a repro showing that re-using the KeyFactory after an exception results in a different Provider than using a fresh KeyFactory:

import java.security.KeyFactory;
import java.security.Provider;
import java.security.Security;
import java.security.spec.InvalidKeySpecException;
import java.security.spec.PKCS8EncodedKeySpec;
import java.util.Base64;

public class Repro {
    public static void main(String[] args) throws Exception {
        System.out.println("List Providers that implement KeyFactory.RSA:");
        for (Provider p : Security.getProviders()) {
            if (p.getService("KeyFactory", "RSA") != null) {
                System.out.println(" " + p.getName());
            }
        }
        KeyFactory kf = KeyFactory.getInstance("RSA");
        // Uncommenting this will change behavior as calls to getProvider() pin the provider
        //System.out.println("Provider: " + kf.getProvider().getName());

        try {
            kf.generatePrivate(new PKCS8EncodedKeySpec(invalidPKCS8));
            System.out.println("success=bad");
        } catch (InvalidKeySpecException ex) {
            System.out.println("invalid key=good");
            var key = kf.generatePrivate(new PKCS8EncodedKeySpec(validPKCS8));
            System.out.println("Provider: " + kf.getProvider().getName());

            System.out.println("Do it again with a fresh KeyFactory and get a different Provider.");
            kf = KeyFactory.getInstance("RSA");
            key = kf.generatePrivate(new PKCS8EncodedKeySpec(validPKCS8));
            System.out.println("Provider: " + kf.getProvider().getName());
        }
    }

    private static final byte[] invalidPKCS8 = new byte[] {0,0,0};

    private static final byte[] validPKCS8 =
            Base64.getDecoder()
                    .decode(
                            "MIIBVgIBADANBgkqhkiG9w0BAQEFAASCAUAwggE8AgEAAkEAq7BFUpkGp3+LQmlQYx2eqzDV+xeG8kx/sQFV18S5JhzGeIJNA72wSeukEPojtqUyX2J0CciPBh7eqclQ2zpAswIDAQABAkAgisq4+zRdrzkwH1ITV1vpytnkO/NiHcnePQiOW0VUybPyHoGM/jf75C5xET7ZQpBe5kx5VHsPZj0CBb3b+wSRAiEA2mPWCBytosIU/ODRfq6EiV04lt6waE7I2uSPqIC20LcCIQDJQYIHQII+3YaPqyhGgqMexuuuGx+lDKD6/Fu/JwPb5QIhAKthiYcYKlL9h8bjDsQhZDUACPasjzdsDEdq8inDyLOFAiEAmCr/tZwA3qeAZoBzI10DGPIuoKXBd3nk/eBxPkaxlEECIQCNymjsoI7GldtujVnr1qT+3yedLfHKsrDVjIT3LsvTqw==");
}

@manusa manusa added the bug label Nov 8, 2022
@manusa manusa added this to the 6.3.0 milestone Nov 8, 2022
manusa pushed a commit that referenced this issue Nov 15, 2022
Fixes #4509

This took a while to find the root cause:

The internal SPI fallback logic inside `KeyFactory.generatePrivate()`
has the weird side effect of latching onto the LAST registered provider
(which in our case was Cavium) after `InvalidKeySpecException` is thrown.

This choice is sticky for a single instance of KeyFactory and the fix
for our issue is to get fresh `KeyFactory` instance when retrying.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants