rfc: Introducing SECRETs for Secure Storage of Sensitive Information #86

tabVersion · 2024-03-18T14:56:58Z

as title

xxchan · 2024-03-19T02:09:50Z

rfcs/0086-create-secret.md

+
+This RFC proposes a new syntax and implementation for securely storing and referencing sensitive information,
+such as database passwords and API keys, in SQL statements using a SECRET catalog. 
+The goal is to enhance the security of production environments by encrypting sensitive data and applying access controls.


Could you elaborate the usage scenarios and explain how SECRET enhances security? And how is it access controlled?

previously we don't have any control over the credentials. All sensitive info is exposed in plain text. Having a dedicated catalog is an improvement.

And how is it access controlled?

For now, we do access control by enforcing the privileges set by meta, the same as other catalogs.

All sensitive info is exposed in plain text. Having a dedicated catalog is an improvement.

I mainly want to know what's the prior "exposed" cases that the new mechanism can solve. In other words, what's the threat model? (prevent who from accessing the secrets)

we do access control by enforcing the privileges set by meta, the same as other catalogs

e.g., if we just want to prevent user B from accessing secrets that owned by user A, CREATE SECRET seems not useful, as they already don't have any privileges over tables.

At the same time, we are also developing security features like redaction risingwavelabs/risingwave#14310, so I hope these can be compared.

IMO we don't need redaction if we support CREATE SECRET. I suppose 14310 is not continued.

After offline discussion, I now think it's sth like GHA secret:

cannot be exposed

can grant READ privilege to other users (i.e., use the secret without knowing the value)

So it makes more sense to me now.

(And yes, redaction is not needed with this (if carefully implemented))

BugenZhao

I believe there are several topics that need to be enriched.

Syntax

What about DROP, ALTER?
What's the behavior of DROP, ALTER?

Implementation

How will the secrets be stored? Is there any security improvement than other catalogs?
When is a secret resolved? What's the data flow and control flow for using a secret for some source/connection?
What happens if a secret is altered while being used? How do we propagate the changes?

neverchanje · 2024-03-19T04:06:32Z

rfcs/0086-create-secret.md

+
+### Security Considerations
+
+Storing SECRETs in meta nodes is unavoidable, but access controls and encryption will be applied to protect the sensitive information.


The access to secrets should also be taken into account.
In general, only admins should have the access to secrets. But to be more specifically, we can allow users with the CREATE privilege to execute SHOW SECRETS ON <source> to view secrets of a specific source they created.

Admins can execute SHOW SECRETS for all secrets in system.

I think the secret can not be displayed once it is created. It can be altered but cannot be shown in SQL or SQL command. We may leave a backdoor in risectl to let admin access the plain text directly in the instance.

Agree. Your approach sounds better.
Then things can be simplified - we only need SHOW SECRETS and users will not be able to see the actual secret data. They are only allowed to see the secret names.

rfcs/0086-create-secret.md

neverchanje · 2024-03-19T04:16:06Z

A step forward is to apply this secret framework to our mysql/PG connectors, which also expose secrets like username and password in plain texts. @StrikeW

StrikeW · 2024-03-19T04:40:09Z

A step forward is to apply this secret framework to our mysql/PG connectors, which also expose secrets like username and password in plain texts. @StrikeW

IIUC, the idea of SECRETs is not limited to a specific connector in the RFC, we can apply to cdc connectors indeed.

tabVersion · 2024-03-19T06:42:23Z

I believe several topics need to be enriched.

Syntax

What about DROP, ALTER?

What's the behavior of DROP, ALTER?

Yes, I did not mention this part because I do not want to allow altering credentials for now. It can be troublesome to apply changes, especially if we may need to distribute a cert file.
For Vault backend, we just need to trigger a refresh and rebuild the async stream inside the actor, and it shall be fine. But with the meta backend, it may require a mutation to all related actors. Users may have to wait a long time to wait for the barrier to get collected. I hope to make the process in a lazy way but there is no rpc request from CN to meta.

Implementation

How will the secrets be stored? Is there any security improvement than other catalogs?

I don't think encrypting the credentials within the cluster is worthwhile since access to the database is already private enough.
The MAIN purpose of this feature is to allow other team members to access some secret keys without knowing what the keys are. Of course, we can design a comprehensive secret control just like what aws does but I think it is an overkill.

When is a secret resolved? What's the data flow and control flow for using a secret for some source/connection?

It is stored in plaintext, yes, so no need to resolve it (or encrypt it with a self-hosted private key, but I do not think it can prevent leaking credentials if the meta already gets hacked). For the Vault backend, we access the credentials when building the actors, in the from_proto process.

What happens if a secret is altered while being used? How do we propagate the changes?

I have no plan supporting alter now.
And I am up with the idea of "propagating the changes" because changing credentials will influence all existing clients, not only newly created ones. But we still need a promising way to propagate the changes.

tabVersion · 2024-03-19T07:00:30Z

A step forward is to apply this secret framework to our mysql/PG connectors, which also expose secrets like username and password in plain texts. @StrikeW

IIUC, the idea of SECRETs is not limited to a specific connector in the RFC, we can apply to cdc connectors indeed.

Yes, I do not intend to make this a complicated design, just to make it a reference to a static string.

arkbriar · 2024-03-19T09:12:16Z

rfcs/0086-create-secret.md

+
+### Usage Scenarios
+
+There are two main scenarios for using SECRETs:


Shall we take source/sink credentials into consideration?

the design is for source/sink credentials.

rfcs/0086-create-secret.md

arkbriar · 2024-03-19T09:14:53Z

rfcs/0086-create-secret.md

+
+### Security Considerations
+
+Storing SECRETs in meta nodes is unavoidable, but access controls and encryption will be applied to protect the sensitive information.


Storing SECRETs in meta nodes is unavoidable,

In memory, yes. Persistent, no.

External vault is similar to what the proposal except the storage location.

Sorry for the confusion.

Both meta secret management and Vault (external credential management) are the backends. The RFC focus on the interface to define and ref a secret.

rfcs/0086-create-secret.md

fuyufjh · 2024-03-19T09:27:40Z

rfcs/0086-create-secret.md

+SECRETs are always stored as bytea type in SQL to ensure flexibility in adapting to different usage requirements.
+If a user needs to use a certificate, its contents are also stored as bytea type and distributed to Compute Nodes (CNs) as needed.
+
+To control storage consumption on meta nodes, the maximum length of a single SECRET is limited to 1MB.


I'd like to keep the flexibility of using a 3rd-party credential store (such as Vault) as an optional plugin in the future. This has been requested/asked by community users.

It will be implemented via a trait

trait GetCredentials { fn fetch_credential(...) -> Result<Vec<u8>> }

the solution is compatible with any credential backend as long as we can get secrets in bytes.

fuyufjh · 2024-03-19T09:32:11Z

rfcs/0086-create-secret.md

+## Implementation
+
+SECRETs are always stored as bytea type in SQL to ensure flexibility in adapting to different usage requirements.
+If a user needs to use a certificate, its contents are also stored as bytea type and distributed to Compute Nodes (CNs) as needed.


So you will just store them as plaintext on disk? If so, we might need to use some encryption function provided by the DB, otherwise, storing credential in plaintext seems unacceptable to me.

To prevent plaintext in data at rest, either way works for me:

Run a symmetrical encryption & decryption algorithm in RW's code.

Rely on the DB's encryption (just the outsourced version of 1 😄)

In both approaches, the encryption key will be managed by another system (such as K8s) and inject to RW/DB via config file/env var/etc. OR, just simply configured by users.

prefer 2. Let's ask @yezizp2012

Considering the need to be compatible with encryption implementations of different DB backends, I am more inclined prefer 1. 🤔

I prefer 1. I guess it'd be hard to implement encryption for every DB backends that we support. Think about sqlite.

It's okay to tell users data-at-rest encryption is not available for SQLite.

I also slightly tend to 1 for PG meta-store backend, actually, PG ships with some handy functions and we may just need to write SQL.

While, if Vault is in use, we can let Vault do it.

BugenZhao · 2024-03-21T06:16:41Z

I have no plan supporting alter now.
And I am up with the idea of "propagating the changes" because changing credentials will influence all existing clients, not only newly created ones. But we still need a promising way to propagate the changes.

Wow. IMO this should be considered from the day one of the design. Regardless of whether it is a goal for the initial stage of implementation, it will definitely be requested in the future.

arkbriar · 2024-03-21T07:41:41Z

I have no plan supporting alter now.
And I am up with the idea of "propagating the changes" because changing credentials will influence all existing clients, not only newly created ones. But we still need a promising way to propagate the changes.

Wow. IMO this should be considered from the day one of the design. Regardless of whether it is a goal for the initial stage of implementation, it will definitely be requested in the future.

Actually I agree with @tabVersion that we shouldn't allow altering. We may support some dynamic secret source/reference in the future to support some advanced use cases like password rotation, but altering a secret in our own storage doesn't sound good to me as it causes more trouble than convenience. Users will be sure that secrets are correct when they successfully create sources/sinks by referencing them. In that case, DROP must be supported. (I think it's a natural requirement 😄)

I'd rather separate altering secret and dynamic loading apart. And then I agree with @BugenZhao on the latter.

xxchan · 2024-03-21T08:07:42Z

Why using ALTER to do password rotation isn't reasonable? (like I can update a key in GHA secret)

Anyway, I agree with Bugen that either alter or rotation should be discussed early. Otherwise we might be in larger trouble in the future.

arkbriar · 2024-03-21T08:28:41Z

Why using ALTER to do password rotation isn't reasonable? (like I can update a key in GHA secret)

Altering a secret makes it implicit when it is updated in references. Dropping then recreating is more explicit though they are in fact equivalent. Users will be aware of what they are going to do and what the consequences would be like, as DROP hints that dependents will break, while ALTER doesn't.

In terms of external vaults that provides password rotation, although the way to support dynamic loading may be similar, I'd rather believe that they have enforced the users to think about what to do and what's the expected result in advance to use the password rotation feature provided, for example, making sure the old and new credentials both work during the rotation. That way we just need to do our best to load the new credentials in time.

tabVersion · 2024-03-21T09:52:09Z

I have no plan supporting alter now.
And I am up with the idea of "propagating the changes" because changing credentials will influence all existing clients, not only newly created ones. But we still need a promising way to propagate the changes.

Wow. IMO this should be considered from the day one of the design. Regardless of whether it is a goal for the initial stage of implementation, it will definitely be requested in the future.

Actually I agree with @tabVersion that we shouldn't allow altering. We may support some dynamic secret source/reference in the future to support some advanced use cases like password rotation, but altering a secret in our own storage doesn't sound good to me as it causes more trouble than convenience. Users will be sure that secrets are correct when they successfully create sources/sinks by referencing them. In that case, DROP must be supported. (I think it's a natural requirement 😄)

I'd rather separate altering secret and dynamic loading apart. And then I agree with @BugenZhao on the latter.

Yes, I have the same concern with @arkbriar about credential rotation. We've met the case requiring cert rotation in PoC, where I haven't come up with a compatible and graceful plan.

Details about the case: They have a disk mounted to all containers and change the cert periodically. They don't know the exact time to change the cert (another team does the change), but they know both cert versions gonna be valid for a while.
The solution is we keep waiting until the kafka SDK cannot pass the auth and drop the actor by recovery, letting the SDK load the new cert from the fixed dir again.

For the case above, I am not sure if we really want the meta to keep track of the file. So I'd leave it static for now and not enforce secret for SSL config in properties.

@xxchan Anyway, I agree with Bugen that either alter or rotation should be discussed early. Otherwise we might be in larger trouble in the future.

As I mentioned above, the main idea of the RFC (on the impl level) is to provide an interface to load credentials from "backend".
We are going to support meta-static backend and Vault backend at this stage. In this design, we always dispatch credentials with table fragments. I don't mean to say no to a more dynamic backend and the RFC does not block any approach to make the process dynamic for altering, because rebuilding the actor can fix the issue anyway.

In my point of view, we have no mature roadmap telling Rw will not encounter the case above or that we pivot to a cloud-only solution, keeping all things controllable. The open-source kernel should be compatible with as many cases as possible and it is ok to say "not yet" if we don't have a comprehensive enough sight for requests in this topic.

tabVersion · 2024-03-21T10:10:14Z

I have no plan supporting alter now.
And I am up with the idea of "propagating the changes" because changing credentials will influence all existing clients, not only newly created ones. But we still need a promising way to propagate the changes.

Wow. IMO this should be considered from the day one of the design. Regardless of whether it is a goal for the initial stage of implementation, it will definitely be requested in the future.

Yes, we all know the credentials can be changed in the future but no one can tell they want to change in which way atm.

The thing has a low priority because 1) most auth only use aksk, which belongs to a dedicated assume role and it is not gonna change once setup; 2) as mentioned above, a pause-resume config change process can serve as a cover-up plan, which can be easy to impl. No need to be spec to a new one.

xxchan · 2024-03-21T11:54:34Z

I also agree that a comprehensive implementation plan might not be needed too early, but some high level ideas might be helpful to demonstrate the flexibility of the current design.

But to clarify,

As I mentioned above, the main idea of the RFC (on the impl level) is to provide an interface to load credentials from "backend".

Do you mean you are only discussing this interface in the RFC?

trait GetCredentials {
    fn fetch_credential(...) -> Result<Vec<u8>>
}

I'm asking because according to

For Vault backend, we just need to trigger a refresh and rebuild the async stream inside the actor, and it shall be fine. But with the meta backend, it may require a mutation to all related actors.

It seems different secret backend needs different handling at the usage site? (i.e., at the actors) And that's not in the scope of the RFC?

because rebuilding the actor can fix the issue anyway.

I guess @fuyufjh is unhappy with that 😟

tabVersion · 2024-03-21T12:32:27Z

I also agree that a comprehensive implementation plan might not be needed too early, but some high level ideas might be helpful to demonstrate the flexibility of the current design.

You can give one in a new RFC.

But to clarify,

As I mentioned above, the main idea of the RFC (on the impl level) is to provide an interface to load credentials from "backend".

Do you mean you are only discussing this interface in the RFC?

The RFC mainly talks about how secret catalog interact with users, like you summarized before #86 (comment), and it is what the RFC for. It is a quite simple one.
I don't take the flexibility into account because #86 (comment), I don't think we've collected enough cases to make a wonderful design and I believe a static credential can meet most of the requirement of most users.
Furthermore, putting the credentials in the table fragment when creating actors won't block any future work as all stream job actors are built in this way.

I will mark further discussion on flexibility Off-Topic unless you give a concrete example we need to do it now or why config change cannot handle this situation.

I'm asking because according to

For Vault backend, we just need to trigger a refresh and rebuild the async stream inside the actor, and it shall be fine. But with the meta backend, it may require a mutation to all related actors.

It seems different secret backend needs different handling at the usage site? (i.e., at the actors) And that's not in the scope of the RFC?

I mention this part in #86 (comment), handling the backends in from_proto process as a hint to implementation.
The meta node dispatch either the credential itself or a token to fetch from Vault, actors do the work independently.
I don't think I need to write a detailed guide on how to put credentials in their place before passing to SDK. I am not talking about changing, building from proto is the common path for all actors.

To further address your question,

For Vault backend, we just need to trigger a refresh and rebuild the async stream inside the actor, and it shall be fine. But with the meta backend, it may require a mutation to all related actors.

What I mean here is we don't need to push a mutation when credentials in Vault change. (and we cannot do that because we don't know when it changes, "let is crash" is the only option) -- when we want to impl a dynamic credential

because rebuilding the actor can fix the issue anyway.

I guess @fuyufjh is unhappy with that 😟

As I mentioned above, it is a cover-up plan in case we have urgent requests for this. The happy path is still suggesting drop secret and re-create.

stdrc · 2024-03-22T05:28:50Z

I mention this part in #86 (comment), handling the backends in from_proto process as a hint to implementation.
The meta node dispatch either the credential itself or a token to fetch from Vault, actors do the work independently.

I'm not sure if I totally get it here. Do you mean that for secrets that are stored in our meta store, the secret will be resolved (by resolve I mean to map from secret name to secret content, i.e. the password/key itself) when generating plan node?

Then how do you track the dependency from streaming jobs to secrets? If later sometime we do plan to support drop/alter, how do you (or how do our users) know which streaming jobs will be affected?

What about provide a meta API to resolve a secret when need to use it?

I'm not suggesting that we must do it in this or that way, I'm just saying that "when to resolve the secret" seems to have profound impact on future extension and user experience, which need to be discussed in the RFC. We should analyze the pros and cons of different approach instead of being vague in such a serious and official design document.

neverchanje · 2024-03-22T06:13:39Z

rfcs/0086-create-secret.md

+
+```sql
+-- create secret
+create secret <secret-name> from <expr>


Suggested change

create secret <secret-name> from <expr>

create secret <secret-name> as <expr>

Why using from? AS might be more consistent with other create statements, such as CTAS.

fuyufjh · 2024-03-22T08:41:54Z

Just read the discussion above.

First, I do think alter secret must be considered, although it doesn't need to be implemented in the first place. However, I don't think we need to proactively trigger a "dynamic secret reloading", instead, just wait for the credential to be invalid, and then a recovery will be triggered.

Explain:

The major purpose of it is allowing users to rotate the credentials, which is a common practice for many use cases. Note that the rotation may be manual (for example, the old credential was possibly leaked or updated) or automatic (driven by TTL e.g. 180 days). In both cases, it doesn't make sense to let users recreate the source or table, but only the credential need to be updated. The most intuitive approach is obviously alter secret.

While for "dynamic secret reloading",

Secret rotation is a very rare operation, and doesn't worth us designing a new code path to handle.
It's a typical procedure to restart the dependent system when users update a credential. RisingWave just follows the convention.

In my mind, we may provide a command (risingwavelabs/risingwave#14700) to trigger recovery (better than restart). Alternatively, users can just do nothing, and RisingWave will do recovery and recreate all actors with new secrets upon the old credential expiration.

Thus, strong +1 for @stdrc

Do you mean that for secrets that are stored in our meta store, the secret will be resolved (by resolve I mean to map from secret name to secret content, i.e. the password/key itself) when generating plan node?

Then how do you track the dependency from streaming jobs to secrets? If later sometime we do plan to support drop/alter, how do you (or how do our users) know which streaming jobs will be affected?

Resolving secret on create table/source seems unacceptable for many reasons. Regarding of my idea here, we only need to make sure to resolve the secret on its actual use and not being cached, such as when the SourceExecutor initializes.

tabVersion added 2 commits March 18, 2024 22:56

create secret

f937f2e

rename

f675406

xxchan reviewed Mar 19, 2024

View reviewed changes

BugenZhao reviewed Mar 19, 2024

View reviewed changes

neverchanje reviewed Mar 19, 2024

View reviewed changes

neverchanje requested a review from StrikeW March 19, 2024 04:15

BugenZhao requested a review from fuyufjh March 19, 2024 09:04

tabVersion added 2 commits March 19, 2024 17:06

resolve

564815b

new syntax

52cbe4f

arkbriar reviewed Mar 19, 2024

View reviewed changes

fuyufjh approved these changes Mar 19, 2024

View reviewed changes

new syntax

2ab141d

neverchanje reviewed Mar 22, 2024

View reviewed changes

This was referenced Apr 8, 2024

feat: redact SqlOption in SHOW CREATE risingwavelabs/risingwave#14310

Closed

support redact SHOW CREATE on demand risingwavelabs/risingwave#14115

Closed


		### Security Considerations

		Storing SECRETs in meta nodes is unavoidable, but access controls and encryption will be applied to protect the sensitive information.


		### Usage Scenarios

		There are two main scenarios for using SECRETs:

	create secret <secret-name> from <expr>
	create secret <secret-name> as <expr>

rfc: Introducing SECRETs for Secure Storage of Sensitive Information #86

Are you sure you want to change the base?

rfc: Introducing SECRETs for Secure Storage of Sensitive Information #86

Conversation

tabVersion commented Mar 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neverchanje Mar 19, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BugenZhao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neverchanje commented Mar 19, 2024 • edited

StrikeW commented Mar 19, 2024

tabVersion commented Mar 19, 2024

tabVersion commented Mar 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fuyufjh Mar 19, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fuyufjh Mar 22, 2024 • edited

Choose a reason for hiding this comment

BugenZhao commented Mar 21, 2024

arkbriar commented Mar 21, 2024

xxchan commented Mar 21, 2024 • edited

arkbriar commented Mar 21, 2024

tabVersion commented Mar 21, 2024

tabVersion commented Mar 21, 2024

xxchan commented Mar 21, 2024

tabVersion commented Mar 21, 2024 • edited

stdrc commented Mar 22, 2024

Choose a reason for hiding this comment

fuyufjh commented Mar 22, 2024 • edited

neverchanje Mar 19, 2024 •

edited

neverchanje commented Mar 19, 2024 •

edited

fuyufjh Mar 19, 2024 •

edited

fuyufjh Mar 22, 2024 •

edited

xxchan commented Mar 21, 2024 •

edited

tabVersion commented Mar 21, 2024 •

edited

fuyufjh commented Mar 22, 2024 •

edited