Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server cert hot reloading #5335

Closed
dsyzhu opened this issue Feb 7, 2019 · 42 comments
Closed

Server cert hot reloading #5335

dsyzhu opened this issue Feb 7, 2019 · 42 comments
Milestone

Comments

@dsyzhu
Copy link

dsyzhu commented Feb 7, 2019

Please answer these questions before submitting your issue.

What version of gRPC are you using?

v1.18.0

What did you expect to see?

Server can reload new cert without restarting

@dapengzhang0
Copy link
Member

This totally depends on whether or not the io.netty.handler.ssl.SslContext object that you provide to io.grpc.NettyServerBuilder supports hot reloading.

@sanjaypujare
Copy link
Contributor

To add to what @dapengzhang0 said, you will need to provide your own X509KeyManager or X509TrustManager (and possibly your own KayManagerFactory and TrustManagerFactory) that support hot reloading of certs/keys as per your requirement. Let us know if you are interested in implementing this.

@dsyzhu
Copy link
Author

dsyzhu commented Feb 7, 2019

Can I mimic DelegatingSslContext to create a SslContext wrapper class and update the wrapped SslContext object when it is necessary?

@sanjaypujare
Copy link
Contributor

@dsyzhu that might work and is worth trying. Although all you need is the Key and Trust Managers that support hot reloading, but your idea might be simpler. You have to be careful to not leak resources - I remember reading a warning somewhere in the source code in connection with SslContext.

@dsyzhu
Copy link
Author

dsyzhu commented Feb 8, 2019

@sanjaypujare and @dapengzhang0,

Thanks for your prompt response.

I have a question of mimic io.netty.handler.ssl.DelegatingSslContext. The abstract class has an abstract fucntion:

     /**
     * Init the {@link SSLEngine}.
     */
    protected abstract void initEngine(SSLEngine engine);

What is it used for? Can I just give an empty implementation?

Thanks

@sanjaypujare
Copy link
Contributor

sanjaypujare commented Feb 8, 2019

@dsyzhu it depends on the concrete subtype of SSLEngine that is being passed. If that needs initialization then you can do it here. That's what I think - I have not played with SSLEngine nor DelegatingSslContext.

BTW the warning I was talking about is for io.netty.handler.ssl.OpenSslServerContext and io.netty.handler.ssl.ReferenceCountedOpenSslServerContext. You can check the source code on the netty repo.

@sanjaypujare
Copy link
Contributor

@dsyzhu I am looking forward to your PR for this implementation :-)

@dsyzhu
Copy link
Author

dsyzhu commented Feb 8, 2019

@sanjaypujare,

Thanks for information.

I will try to submit a PR.

@dapengzhang0 dapengzhang0 added this to the Unscheduled milestone Feb 9, 2019
@ejona86
Copy link
Member

ejona86 commented Feb 11, 2019

The KeyStore API doesn't really define whether it is thread-safe. But it seems like it has to be and both PKCS12KeyStore and JavaKeyStore (for JKS) use locks.

So I think I would suggest just changing the KeyStore live.

You'll want to specify KeyManagerFactory to SslContextBuilder. To create it, you'd do something like:

KeyStore ks = KeyStore.getInstance(KeyStore.getDefaultType());
ks.load(null);
CertificateFactory cf = CertificateFactory.getInstance("X.509");
X509Certificate cert = (X509Certificate) cf.generateCertificate(inputStream);
// If you don't want to hard-code principalName, you can use cert.getSubjectX500Principal().getName();
String principalName = "theserver";
ks.setCertificateEntry(principalName, cert);
KeyManagerFactory kmf = KeyManagerFactory.getInstance(KeyManagerFactory.getDefaultAlgorithm());
kmf.init(ks, null); // I'm not 100% about the null "password" here

And then later when you want to change the certificate, you'd use:

CertificateFactory cf = CertificateFactory.getInstance("X.509");
X509Certificate cert = (X509Certificate) cf.generateCertificate(newInputStream);
ks.setCertificateEntry(principalName, cert);

I'd want to double-check that KeyManagerFactory is compatible with netty-tcnative, but I think it may be. If not, then you just need to use Java's SSL provider instead of tcnative.

@sanjaypujare
Copy link
Contributor

And if you want to change the private key (and associated public certificate) the code would be like this:

byte[] keyBytes = privateKey.getEncoded();
Certificate[] certChain = ...;
ks.setKeyEntry(principalName, keyBytes, certChain);

@dsyzhu if this thing works, do you still want to submit a PR?

@ejona86
Copy link
Member

ejona86 commented Feb 12, 2019

@sanjaypujare, a PR for what?

@sanjaypujare
Copy link
Contributor

@ejona86 exactly. He may not need a PR - but I would like to hear what he says

@dsyzhu
Copy link
Author

dsyzhu commented Feb 12, 2019

Our use case is to periodically reload cert before it is expired. I would focus on SslContextBuilder. Every time create a new SslContextBuilder object and call build() function to create SslContext to update SslContextWrapper class. Won't go deep inside SslContextBuilder. Any pitfalls if doing this way?

@grpc grpc deleted a comment from dsyzhu Feb 12, 2019
@ejona86
Copy link
Member

ejona86 commented Feb 12, 2019

@dsyzhu, the way I suggested seems to be less code and doesn't need re-creating the SslContext.

@yihuazhang
Copy link

@ejona86 Just to clarify, since an SslContext can be created by SslContextBuilder that takes KeyManagerFactory as an argument, each time the underlying KeyManagerFactory object gets refreshed with a new credential, the change will be automatically propagated the SslContex object, and thus there is no need to restart the server. Is it correct?

@ejona86
Copy link
Member

ejona86 commented Feb 15, 2019

@yihuazhang, yes, that's the expectation. When KeyManagerFactory is mutated (via it's KeyStore), the server will see those changes on future connections. Do test it, but it seems it should work and would be a limited amount of code to implement.

@yihuazhang
Copy link

@ejona86 SG. I will come up with a toy example to validate it.

I also found a thread that discusses the same issue (https://stackoverflow.com/questions/51411594/reload-netty-servers-ssl-context-for-grpc), but have different suggestions (also proposed by you Eric :)

@ejona86
Copy link
Member

ejona86 commented Feb 15, 2019

Yeah :). You can tell I was uncertain in that answer. I had seen that the KeyManager could change the alias, but I didn't think you could easily mutate the KeyStore (I had considered a DelegatingKeyStore, though). Noticing that KeyStore was mutable after init() was important, as well as figuring out whether KeyStore was thread-safe (technically the API doesn't say it is, but it sort of has to be, and the main implementations are... It's the same problem as HttpSession in the servlet API).

I had looked into it again for @sanjaypujare and had come up with this easier option. If things work out I'll add this approach to the SO question, because it is much easier.

@dsyzhu
Copy link
Author

dsyzhu commented Feb 19, 2019

Thanks for all your inputs. Am wondering why server cert reloading hasn't been provided. I think all the grpc applications using tls connection need this feature.

@dsyzhu
Copy link
Author

dsyzhu commented Feb 22, 2019

For my case I do something like:

public abstract class DelegatingSslContext extends SslContext {

private SslContext ctx;
private final SslContextBuilder ctxBuilder;

protected DelegatingSslContext(SslContextBuilder builder) throws SSLException {
    this.ctx = builder.build();
    this.ctxBuilder = builder;
}

@Override
public final boolean isClient() {
    return ctx.isClient();
}

@Override
public final List<String> cipherSuites() {
    return ctx.cipherSuites();
}

@Override
public final long sessionCacheSize() {
    return ctx.sessionCacheSize();
}

@Override
public final long sessionTimeout() {
    return ctx.sessionTimeout();
}

@Override
public final ApplicationProtocolNegotiator applicationProtocolNegotiator() {
    return ctx.applicationProtocolNegotiator();
}

@Override
public final SSLEngine newEngine(ByteBufAllocator alloc) {
    SSLEngine engine = ctx.newEngine(alloc);
    initEngine(engine);
    return engine;
}

@Override
public final SSLEngine newEngine(ByteBufAllocator alloc, String peerHost, int peerPort) {
    SSLEngine engine = ctx.newEngine(alloc, peerHost, peerPort);
    initEngine(engine);
    return engine;
}

@Override
protected final SslHandler newHandler(ByteBufAllocator alloc, boolean startTls) {
    SslHandler handler = ctx.newHandler(alloc, startTls);
    initHandler(handler);
    return handler;
}

@Override
protected final SslHandler newHandler(ByteBufAllocator alloc, String peerHost, int peerPort, boolean startTls) {
    SslHandler handler = ctx.newHandler(alloc, peerHost, peerPort, startTls);
    initHandler(handler);
    return handler;
}

@Override
public final SSLSessionContext sessionContext() {
    return ctx.sessionContext();
}

/**
 * Init the {@link SSLEngine}.
 */
protected abstract void initEngine(SSLEngine engine);

/**
 * Init the {@link SslHandler}. This will by default call {@link #initEngine(SSLEngine)}, sub-classes may override
 * this.
 */
protected void initHandler(SslHandler handler) {
    initEngine(handler.engine());
}

private synchronized void update() throws SSLException {
    ctx = GrpcSslContexts.configure(ctxBuilder).build();
}

}

GRPC server will use DelegatingSslContext and schedule a thread to call update() periodically or necessarily.

Am thinking this issue may be created for netty api

@creamsoup
Copy link
Contributor

fyi. i updated the comment (just formatted the code snippet)

@dsyzhu
Copy link
Author

dsyzhu commented Feb 22, 2019 via email

@sanjaypujare
Copy link
Contributor

@dsyzhu have you tested your code by any chance?

@sanjaypujare
Copy link
Contributor

CC @jiangtaoli2016 @yihuazhang to comment on the proposal and code

@dsyzhu
Copy link
Author

dsyzhu commented Feb 22, 2019

@sanjaypujare,

I did base function test. After I update cert I can the next connection will use the new cert and request succeeds. I can do more tests of launching a group of client thread to simulate real world traffic to do further verification. Do you think it is necessary? Or the code structure guarantees no hassle during cert switch?

BTW our use case is periodically refresh cert. So I use ScheduledThreadPoolExecutor to periodically call updateCert. That is why the function is private. For general cases it may be public to let other class to call it.

And I think it may be easier to make the change in netty.

Thanks

@jiangtaoli2016
Copy link
Contributor

@dsyzhu We are designing a gRPC SSL utility library that can handle

  • Credential reloading
  • Root certificate reloading
  • SPIFFE identity support
  • Server authorization check plugin

Would you be interested?

In gRPC C core we are providing something similar via SPIFFE credentials.

@gava2727
Copy link

gava2727 commented Nov 14, 2019

@jiangtaoli2016 I am interested in the library that automatically reloads the certs. Did you start working on it? do you have any timeline when it will be ready?

@magpor
Copy link

magpor commented Nov 22, 2019

Hi

@magpor
Copy link

magpor commented Nov 22, 2019

I am also interested in the util can you provide?

@sgaruda
Copy link

sgaruda commented Apr 9, 2020

@dsyzhu I liked your idea of using DelegatingSSLContext. I have one question:
So you will pass instance of DelegatingSSLContext while building the NettyServer right? Like below:
Server server = NettyServerBuilder.forAddress(addr); builder.bossEventLoopGroup(bossEventLoopGroup); builder.workerEventLoopGroup(workerEventLoopGroup); builder.channelType(EpollServerSocketChannel.class); builder.maxInboundMessageSize(agentConfig.grpcMaxMessageSizeBytes); builder.maxHeaderListSize(agentConfig.grpcMaxHeaderListSizeBytes); builder.addService(ServerInterceptors.intercept(service, interceptors)); builder.sslContext(**instance of DelegatingSSLContext**).build();

@sgaruda
Copy link

sgaruda commented Apr 10, 2020

@ejona86 I was little confused with the two approaches that were discussed above. Which one is better? KeyManager approach that you suggested or DelegatingSSLContext approach?

@ejona86
Copy link
Member

ejona86 commented Apr 10, 2020

@sgaruda, @sanjaypujare had mentioned to me in person that he had trouble with the KeyManager approach, that something was still cached. There are other avenues, like changing to a new alias name, but DelegatingSSLContext is probably easiest for you to do yourself.

@sgaruda
Copy link

sgaruda commented Apr 10, 2020

@ejona86 Thanks for the quick response. Sure I will try DelegatingSSLcontext. I need one clarification:
I am planning to implement sslReloading logic in the initHandler, something like this:

protected void initHandler(SslHandler handler)  {

        log.info("Trigger AutoReloadingSslContext.initHandler", handler);
        AssertUtil.checkNotNull(sslContext);
        // Schedule background automatic reload of keys and certs
        ListenableScheduledFuture future = scheduledExecutorService.scheduleAtFixedRate(() -> {
            try {
                reloadSslContext(); // this will update the SSLContext
            } catch (Exception e) {              
                log.error("Failed to reload SSLContext", e);
            }
            });
        // fail fast callback, which basically asserts on any failure
        Futures.addCallback(future, new FutureCallback<Object>() {
            @Override
            public void onSuccess(@Nullable Object result) {
            }
            @Override
            public void onFailure(Throwable t) {
                }
            }, scheduledExecutorService);
    }   

Let me know your thoughts. Also, I wanted to understand a little bit more on the initEngine. As of now i am not doing anything inside it. Is that fine?

@ejona86
Copy link
Member

ejona86 commented Apr 10, 2020

@sgaruda, newHandler() (and thus initHandler()) will be called for every connection, so you don't want to schedule at fixed rate within it. I'd scheduleAtFixedRate outside of the context, make SslContext ctx volatile, and instead of update() just have a setter for ctx. Then the scheduled task would just replace ctx when it changes.

The code above have update() as synchronized, but no other calls were synchronized. The idea is you would call update() with some schedule and mutate the builder, I think.

@sgaruda
Copy link

sgaruda commented Apr 11, 2020

@ejona86 thanks for nice suggestion. I moved scheduler out from initHandler(). One more question, is there anything i have to do initHandler() and initEngine()? My only requirement is to reset the sslContext whenenver there is a change in the certs.

@ejona86
Copy link
Member

ejona86 commented Apr 13, 2020

@sgaruda, I don't think so. The delegate ctx should do all the work necessary.

@sgaruda
Copy link

sgaruda commented Apr 16, 2020

@ejona86 Wht is the purpose of making ctx final in DelegatingSslContext abstract class? Though I change the ctx( after certs are changed) in the AutoReloadSSLContent which extends DelegatingSslContext, I see that certs are not getting updated for the service. Certs are getting updated only after I restart the service.

I am thinking of directly extending SSLContext and copy paste everything DelegatingSSLContext has and with my logic of reloading the certs. Do you see any issue with this?

@ejona86
Copy link
Member

ejona86 commented Apr 17, 2020

@sgaruda, ctx isn't final. And it shouldn't be final; it should be changed.

I don't see any issue with combining your class with DelegatingSSLContext. I don't expect it to help behaviorally, but maybe it will make it easier for you to debug.

@ejona86
Copy link
Member

ejona86 commented Apr 17, 2020

I should also mention that #6805 is adding a utility to make some things in this realm easier.

@ejona86
Copy link
Member

ejona86 commented Feb 11, 2021

In #7397 (comment) I figured out what was wrong with #5335 (comment) . You have to use KeyManagerFactory.getInstance("NewSunX509"). The new key manager factory will notice changes to the KeyStore...

@Hakky54
Copy link

Hakky54 commented Jan 19, 2022

I noticed that this issue is open and I wanted to share my solution for this kind of challenge. I have it already working here: Instant SSL Reloading With gRPC.

Basically what it does it is wrapping the actual trustmanager and keymanager with a custom one. This custom one delegates the method calls to the real keymanager and trustmanager. However it has also the capability of swapping the actual managers. Below is an example usage:

SSLFactory baseSslFactory = SSLFactory.builder()
    .withIdentityMaterial(Paths.get("/path/to/your/identity.jks"), "password".toCharArray())
    .withTrustMaterial(Paths.get("/path/to/your/truststore.jks"), "password".toCharArray())
    .withSwappableIdentityMaterial()
    .withSwappableTrustMaterial()
    .build();

SslContext sslContext = GrpcSslContexts.configure(NettySslUtils.forServer(baseSslFactory)).build();
          
Server server = NettyServerBuilder
    .forPort(8443)
    .executor(executorService)
    .addService(myService)
    .sslContext(sslContext)
    .build();

server.start();

After some time when you have the new keystore/truststore either from an endpoint, or changes on the file system or from somewhere else, you can execute the following snippet to update the server ssl without the need of restarting it:

// create a new sslFactory with the updated keystore/truststore
SSLFactory updatedSslFactory = SSLFactory.builder()
    .withIdentityMaterial(Paths.get("/path/to/your/identity.jks"), "password".toCharArray())
    .withTrustMaterial(Paths.get("/path/to/your/truststore.jks"), "password".toCharArray())
    .build();

SSLFactoryUtils.reload(baseSslFactory, updatedSslFactory);

See here for the documentation of this option: Swapping KeyManager and TrustManager at runtime

This option is available within the following library here: GitHub - SSLContext Kickstart which I made, hopefully you guys will like it and hopefully it will make it easier for you all to set it up.

@ejona86
Copy link
Member

ejona86 commented Aug 29, 2022

#5335 (comment) with NewSunX509 (as mentioned in #5335 (comment)) resolved this. But #8175 makes this even easier.

@ejona86 ejona86 closed this as completed Aug 29, 2022
@ejona86 ejona86 modified the milestones: Unscheduled, 1.41 Aug 29, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests