Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Higher-level storage implementations #241

Open
emersion opened this issue Jun 27, 2023 · 20 comments
Open

Higher-level storage implementations #241

emersion opened this issue Jun 27, 2023 · 20 comments
Labels
feature request Request for new feature or functionality

Comments

@emersion
Copy link
Contributor

What would you like to have changed?

Currently Storage is a pretty low-level key-value store API. However, some storage backends may benefit from higher-level information. For instance, a storage implementation which uses k8s secrets to store private keys needs to know which items are TLS private keys and which items are just metadata.

Why is this feature a useful, necessary, and/or important addition to this project?

This would allow one to implement new Storage implementations, e.g. one based on k8s secrets.

What alternatives are there, or what are you doing in the meantime to work around the lack of this feature?

It's possible to parse the format of names passed to Storage. Are these guaranteed to be stable?

@emersion emersion added the feature request Request for new feature or functionality label Jun 27, 2023
@mholt
Copy link
Member

mholt commented Jun 28, 2023

Hi @emersion!

Just to make sure I understand the motivation for this:

For instance, a storage implementation which uses k8s secrets to store private keys needs to know which items are TLS private keys and which items are just metadata.

I don't know much about k8s. Is there a reason the metadata can't also be stored as a secret?

It's possible to parse the format of names passed to Storage. Are these guaranteed to be stable?

Not 100% guaranteed, but changing them is extremely tedious since we'd have to write code to move all the files automatically. We did that once years ago, hopefully never again.

Overall I'd say most characteristics of the names/keys are pretty stable.

Another possibility may be to peek the first few bytes of the contents. If it's "BEGIN PRIVATE KEY" etc then you can know it's a secret.

But generally, storage implementations treat them all the same. For example, the default file system implementation just treats all the files as sensitive (locked-down permissions, etc).

@emersion
Copy link
Contributor Author

Is there a reason the metadata can't also be stored as a secret?

It can, however TLS certificates are stored with a different secret type in k8s.

Another possibility may be to peek the first few bytes of the contents. If it's "BEGIN PRIVATE KEY" etc then you can know it's a secret.

I see, makes sense!

@mholt
Copy link
Member

mholt commented Jun 28, 2023

Is that a sufficient workaround then (peeking the bytes)?

@emersion
Copy link
Contributor Author

I'll try, and re-open if this proves insufficient.

@bitfehler
Copy link
Contributor

I hope you don't mind if I chime in here. Full disclosure: I had discussed this with @emersion and think I may have been the reason he opened this issue. It took me a few days of fiddling with certmagic to be able to have an at least somewhat reasonably informed opinion on this, but I think I can I now say: it's not quite as simple... 😇

First and foremost: my goal was to build an application that uses certmagic to manage certificates and makes those available to other, external applications by storing them in Kubernetes secrets. I recognize that this is not the primary use-case of certmagic, and you may well say this is not supported or desired and we'd be done and you can ignore the rest.

If you, however, consider this something that is in-scope for certmagic, I'll share my feedback and would be interested in your thoughts:

  1. There are some features that would not work in such a setup (OCSP stapling, on-demand), and that's fine.
  2. All of the below is based on the assumption that the storage interface should treat the keys used to store data as opaque (because the interface seems to be designed this way, and you suggest looking at data (i.e. values) rather than keys. If this assumption were wrong, so would be the conclusions that follow
  3. The most logical place to implement the stated goal is storage. The storage is well-managed by certmagic, updates are only performed when applicable (unlike e.g. a cache w/ potential evictions), and data-duplication would be avoided
  4. It requires some slight hackery, but is entirely possible to design a storage implementation that achieves that basic goal of using Kubernetes TLS secrets (plus some metadata that K8s will ignore). However, the storage interface has the following characteristics that prevent this from being usable:
    • Separate write of key and certificate - an application that uses a certificate from a k8s secret will likely have a watcher for changes in this secret - meaning, a write to e.g. the key would likely trigger a reload of a state where certificate and key may not match
    • Lack of context - sniffing the bytes let's you tell whether you are storing a certificate or a key, but you don't know what certificate is being handled (and while you could parse the cert, it's close to impossible to find out for the key), leaving you no choice but to store things in a secret whose name depends on the key used by the calls to storage (violating assumption 1. if you want other apps to use them), or...
    • Use some workaround, like one I came up with, where you can use a secret name based on the domain, but it requires trade-offs like in this case using a separate storage for each domain, which implies a separate account for each domain, which may or may not be an issue, but is kind of a silly constraint I guess?
  5. As an alternative, I have tried using the cache + events to get certificates and keys and write them someplace atomically, but this also has issues. Since I consider this to not be the right approach anyways I'll not list these here for "brevity", but I am happy to elaborate should you think that it would be indeed the more appropriate approach.

Again, just saying "please don't do this" would be entirely acceptable 💙

@bitfehler
Copy link
Contributor

And, sorry, as always a late follow-up - one thing I would also miss: if the app gets started with a config saying "manage certs x and y", and then stopped and restarted with a config saying "manage certs y and z", as I see it, there is currently no way to get all certs from storage to clean up those no longer needed (in this case x), except once it has expired?

@mholt
Copy link
Member

mholt commented Jul 3, 2023

Hi @bitfehler -- thanks for chiming in! That's really interesting.

Let me see what we can do about this... piece by piece, and as a whole.

Numbers 0-2, sounds good to me. Same page 💯

Separate write of key and certificate - an application that uses a certificate from a k8s secret will likely have a watcher for changes in this secret - meaning, a write to e.g. the key would likely trigger a reload of a state where certificate and key may not match

This reminds me a little of a previous issue where we were requested to write the private key before the cert, since writing the a certificate without a key would yield an error by the storage mechanism, even though the key would be written microseconds later.

What is the API like you have to work with, that a key and certificate can't be separate?

Lack of context - sniffing the bytes let's you tell whether you are storing a certificate or a key, but you don't know what certificate is being handled (and while you could parse the cert, it's close to impossible to find out for the key), leaving you no choice but to store things in a secret whose name depends on the key used by the calls to storage (violating assumption 1. if you want other apps to use them), or...

Do you mean that storing data according to their keys makes it impossible for other apps to use them? They're still in storage, why would it be impossible?

As an alternative, I have tried using the cache + events to get certificates and keys and write them someplace atomically, but this also has issues. Since I consider this to not be the right approach anyways I'll not list these here for "brevity", but I am happy to elaborate should you think that it would be indeed the more appropriate approach.

I don't think that's a bad approach necessarily. It does sound more complicated though.

one thing I would also miss: if the app gets started with a config saying "manage certs x and y", and then stopped and restarted with a config saying "manage certs y and z", as I see it, there is currently no way to get all certs from storage to clean up those no longer needed (in this case x), except once it has expired?

Deleting unexpired certificates is somewhat of an antipattern. Just because a config that's being used right now doesn't need it, doesn't mean it won't again soon. Sometimes you can't foresee that a cert is needed, too: for example, on-demand TLS.

We do use the Storage interface as-is, however, to scan for expired certificates and delete them: https://pkg.go.dev/github.com/caddyserver/certmagic#CleanStorage

In general I'd advise against deleting valid certificates.

All that said, I think we can find a way to make things work for you, but I'd like to know more about your constraints, the underlying API you have, etc.

@bitfehler
Copy link
Contributor

Thank you for your response and sorry for the delay!

Numbers 0-2, sounds good to me. Same page 100

👌

This reminds me a little of a previous issue where we were requested to write the private key before the cert, since writing the a certificate without a key would yield an error by the storage mechanism, even though the key would be written microseconds later.

What is the API like you have to work with, that a key and certificate can't be separate?

Yes, I would indeed say the issue is similar. As hinted at earlier, I want to store certificates in Kubernetes secrets, specifically of the type TLS. Consumers of those would likely have a watch on those secrets. Two distinct writes to the secret would cause two new resource versions. One of them would be short-lived, but a watcher receiving the stream of changes would still receive both. It would then have to figure out which one to act on and which one to ignore. It would probably not even be impossible, just very... messy? Like, I think you can configure to keep the key, I suppose it wouldn't get written in this case? Proper error handling, etc...

Do you mean that storing data according to their keys makes it impossible for other apps to use them? They're still in storage, why would it be impossible?

If an external application wants a cert for domain X from my implementation, I see two options:

  1. The application defines where to store the cert/key: not possible, because in the storage interface, I don't know if I am writing something for domain X or domain Y
  2. The fact where an application should look for a cert for domain Y becomes a defined interface of my implementation: not possible, because it would violate the assumption that the storage interface should not make assumptions about the keys used for storing values

Maybe I have overlooked something?

I don't think that's a bad approach necessarily. It does sound more complicated though.

For sure. I'll save that as a last resort discussion 🙂

Deleting unexpired certificates is somewhat of an antipattern. Just because a config that's being used right now doesn't need it, doesn't mean it won't again soon. Sometimes you can't foresee that a cert is needed, too: for example, on-demand TLS.

I can see how this is true for on-demand settings, but I will politely disagree with the "antipattern", especially in a setting where on-demand is off the table anyways. But it's also not too much of an issue, I can live with certs having to expire.

All that said, I think we can find a way to make things work for you, but I'd like to know more about your constraints, the underlying API you have, etc.

I'd be delighted, even happy to supply some patches if needed. But I also don't want to force anything that's against the basic design of this library. It just seemed very convenient, and it almost worked 😉

@bitfehler
Copy link
Contributor

Actually, sorry: I couldn't stop thinking about the other issue you linked. And I think the one question a higher-level storage interface might answer is: what would you do if I told you that my CDN needs the certificate first? 😉

@bitfehler
Copy link
Contributor

Come to think of it: maybe the storage interface is actually fine, and this is just the wrong interface to use for this. Maybe we'd be better off with something like an export interface?

@mholt mholt reopened this Jul 7, 2023
@mholt
Copy link
Member

mholt commented Jul 7, 2023

Interesting, ok. So one idea that comes to mind is an optional interface for a storage module to implement. If it does, then CertMagic can call that method for an "optimized experience" or something.

If we did this, what would be your ideal API? Maybe a function that takes the *x509.Certificate as input?

Of course, I'm not sure how Load() would work then.

Maybe we'd be better off with something like an export interface?

Like, export everything? Or a semantic export API?

@mholt
Copy link
Member

mholt commented Jul 11, 2023

I had a call with a company who has a similar request for a higher-level API.

So I'm thinking about an optional interface now. Would still like to get your thoughts on what you'd like that to be as a suggestion :)

@bitfehler
Copy link
Contributor

I had a call with a company who has a similar request for a higher-level API.

So I'm thinking about an optional interface now. Would still like to get your thoughts on what you'd like that to be as a suggestion :)

Ah, sorry for dropping the ball here. So, are you thinking export API or higher-level storage API? Storage still seems like it would be convenient, as it would also leverage the fact that certmagic already maintains a serialized version of the certificates...

Maybe the storage interface remains as is, but one can opt in to higher-level storage functions for certain types of data like account, certificates (with key), etc. Where opted in, the higher-level functions get called. Where not opted it, the opaque key-value store interface is used, because the user does not seem to care (or no high-level interface is even available).

In the higher-level functions, I'd imagine that - besides the actual object - some context would be helpful, e.g. for a certificate maybe the domain for which it was requested. This could also serve as a key for a load?

I assume that also e.g. the account would have to be part of a load/store of a certificate, because you could request the same domain from different accounts.

I do admit this does sound a bit complicated. Like, would the higher-level interface require list functionality? If the higher-level interface is being used, does the key-value store need to contain some special data indicating that that is the case? Can that become out-of-sync?

My use-case is unfortunately pretty specific, and I am not sure I can make a good generalized interface from that, especially since I don't know much about other use cases...

@mholt
Copy link
Member

mholt commented Jul 12, 2023

I'm envisioning an optional interface like this:

type CertificateStorage interface {
    StoreCertificateResource(cert *x509.Certificate, privKey crypto.PrivateKey, meta []byte) error
    LoadCertificateResource(???) ...
}

If a Storage module also has those two methods, CertMagic will prefer those when loading and storing cert data specifically.

@bitfehler
Copy link
Contributor

What would the meta bytes contain?

Maybe a key (as in key-value) could be used here as well, which has to be treated as opaque by the app, but can be used to identify a cert and key for loading?

type CertificateStorage interface {
    StoreCertificateResource(key string, cert *x509.Certificate, privKey crypto.PrivateKey, meta []byte) error
    LoadCertificateResource(key string) ...
}

However, looking at my current storage implementation, my main question would be: how would this play together with List()?

Maybe the requirement could be that whatever key is used (in my version of the interface) also has to be returned by List()?

@emersion
Copy link
Contributor Author

emersion commented Jul 13, 2023

Maybe a key (as in key-value) could be used here as well, which has to be treated as opaque by the app, but can be used to identify a cert and key for loading?

If this is a high-level interface, the opaque key could be replaced a hostname. Note, this would encode in the API that one certificate equals one hostname.

Alternatively, the storage implementation could look up cert.IPAddresses and cert.DNSNames (and Load can take a hostname). This adds more complexity if the implementation ever needs to handle certificates with multiple hostnames.

It sounds like List is only used to cleanup expired certificates at the moment.

@mholt
Copy link
Member

mholt commented Jul 13, 2023

Oh, yeah that's a good idea -- to include the key as well. That should solve the List problem, right? Because then List could return the key that was passed in.

It sounds like List is only used to cleanup expired certificates at the moment.

Also to export the entire storage so that it can be imported into another storage backend. We also use it to discover ACME accounts.

If this is a high-level interface, the opaque key could be replaced a hostname. Note, this would encode in the API that one certificate equals one hostname.

Good point; and I've kind of intentionally avoided that up to now because I wasn't sure I wanted to lock into that restriction. In general it's good practice to have 1 cert per hostname, so I don't see this changing, but I also can't see the future. It'd be nice if we didn't officially restrict to 1:1 just in case there's a future valid use case.

@francislavoie
Copy link
Member

In general it's good practice to have 1 cert per hostname, so I don't see this changing, but I also can't see the future. It'd be nice if we didn't officially restrict to 1:1 just in case there's a future valid use case.

We've had feature requests wanting separate certs for different key types, i.e. one RSA and one ECC, to more widely complete TLS handshakes for clients that don't support ECC for example. So that would be one reason to have multiple.

@bitfehler
Copy link
Contributor

If this is a high-level interface, the opaque key could be replaced a hostname. Note, this would encode in the API that one certificate equals one hostname.

If I understand correctly, this is already encoded in the API. However, not the other way around: currently, nothing prevents you from creating configs for two different accounts (e.g. staging and prod), use the same storage for both, and then request the same domain name from both. I think, at least? 😅

Using two accounts at the same time might sound far fetched, but one might (accidentally or not) re-use the same storage after testing with staging? Currently, the storage API namespaces by account, and I think something like that you might want to preserve?

@mholt
Copy link
Member

mholt commented Jul 22, 2023

@bitfehler Correct; the key does not encode the certificate configuration, only the issuer and account.

Perhaps that was not the most forward thinking of me. But it IS simple, which is nice.

So letsee, where does that bring us then?

Ah, right, the most recent API suggestions...

Maybe the requirement could be that whatever key is used (in my version of the interface) also has to be returned by List()?

Yeah. If we do provide a key parameter, then the key has to be able to be used for loading and listing. I think your suggestion makes the most sense so far:

type CertificateStorage interface {
    StoreCertificateResource(key string, cert *x509.Certificate, privKey crypto.PrivateKey, meta []byte) error
    LoadCertificateResource(key string) ...
}

Though maybe I'm just more OK with this since I don't have to implement this myself 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for new feature or functionality
Projects
None yet
Development

No branches or pull requests

4 participants