Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: Introducing SECRETs for Secure Storage of Sensitive Information #86

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

tabVersion
Copy link
Contributor

as title


This RFC proposes a new syntax and implementation for securely storing and referencing sensitive information,
such as database passwords and API keys, in SQL statements using a SECRET catalog.
The goal is to enhance the security of production environments by encrypting sensitive data and applying access controls.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate the usage scenarios and explain how SECRET enhances security? And how is it access controlled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously we don't have any control over the credentials. All sensitive info is exposed in plain text. Having a dedicated catalog is an improvement.

And how is it access controlled?

For now, we do access control by enforcing the privileges set by meta, the same as other catalogs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All sensitive info is exposed in plain text. Having a dedicated catalog is an improvement.

I mainly want to know what's the prior "exposed" cases that the new mechanism can solve. In other words, what's the threat model? (prevent who from accessing the secrets)

we do access control by enforcing the privileges set by meta, the same as other catalogs

e.g., if we just want to prevent user B from accessing secrets that owned by user A, CREATE SECRET seems not useful, as they already don't have any privileges over tables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the same time, we are also developing security features like redaction risingwavelabs/risingwave#14310, so I hope these can be compared.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we don't need redaction if we support CREATE SECRET. I suppose 14310 is not continued.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After offline discussion, I now think it's sth like GHA secret:

  • cannot be exposed
  • can grant READ privilege to other users (i.e., use the secret without knowing the value)

So it makes more sense to me now.

(And yes, redaction is not needed with this (if carefully implemented))

Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there are several topics that need to be enriched.

Syntax

  • What about DROP, ALTER?
  • What's the behavior of DROP, ALTER?

Implementation

  • How will the secrets be stored? Is there any security improvement than other catalogs?
  • When is a secret resolved? What's the data flow and control flow for using a secret for some source/connection?
  • What happens if a secret is altered while being used? How do we propagate the changes?


### Security Considerations

Storing SECRETs in meta nodes is unavoidable, but access controls and encryption will be applied to protect the sensitive information.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The access to secrets should also be taken into account.
In general, only admins should have the access to secrets. But to be more specifically, we can allow users with the CREATE privilege to execute SHOW SECRETS ON <source> to view secrets of a specific source they created.

Admins can execute SHOW SECRETS for all secrets in system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the secret can not be displayed once it is created. It can be altered but cannot be shown in SQL or SQL command. We may leave a backdoor in risectl to let admin access the plain text directly in the instance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Your approach sounds better.
Then things can be simplified - we only need SHOW SECRETS and users will not be able to see the actual secret data. They are only allowed to see the secret names.

rfcs/0086-create-secret.md Outdated Show resolved Hide resolved
@neverchanje
Copy link

neverchanje commented Mar 19, 2024

A step forward is to apply this secret framework to our mysql/PG connectors, which also expose secrets like username and password in plain texts. @StrikeW

@StrikeW
Copy link
Contributor

StrikeW commented Mar 19, 2024

A step forward is to apply this secret framework to our mysql/PG connectors, which also expose secrets like username and password in plain texts. @StrikeW

IIUC, the idea of SECRETs is not limited to a specific connector in the RFC, we can apply to cdc connectors indeed.

@tabVersion
Copy link
Contributor Author

I believe several topics need to be enriched.

Syntax

  • What about DROP, ALTER?
  • What's the behavior of DROP, ALTER?

Yes, I did not mention this part because I do not want to allow altering credentials for now. It can be troublesome to apply changes, especially if we may need to distribute a cert file.
For Vault backend, we just need to trigger a refresh and rebuild the async stream inside the actor, and it shall be fine. But with the meta backend, it may require a mutation to all related actors. Users may have to wait a long time to wait for the barrier to get collected. I hope to make the process in a lazy way but there is no rpc request from CN to meta.

Implementation

  • How will the secrets be stored? Is there any security improvement than other catalogs?

I don't think encrypting the credentials within the cluster is worthwhile since access to the database is already private enough.
The MAIN purpose of this feature is to allow other team members to access some secret keys without knowing what the keys are. Of course, we can design a comprehensive secret control just like what aws does but I think it is an overkill.

  • When is a secret resolved? What's the data flow and control flow for using a secret for some source/connection?

It is stored in plaintext, yes, so no need to resolve it (or encrypt it with a self-hosted private key, but I do not think it can prevent leaking credentials if the meta already gets hacked). For the Vault backend, we access the credentials when building the actors, in the from_proto process.

  • What happens if a secret is altered while being used? How do we propagate the changes?

I have no plan supporting alter now.
And I am up with the idea of "propagating the changes" because changing credentials will influence all existing clients, not only newly created ones. But we still need a promising way to propagate the changes.

@tabVersion
Copy link
Contributor Author

A step forward is to apply this secret framework to our mysql/PG connectors, which also expose secrets like username and password in plain texts. @StrikeW

IIUC, the idea of SECRETs is not limited to a specific connector in the RFC, we can apply to cdc connectors indeed.

Yes, I do not intend to make this a complicated design, just to make it a reference to a static string.

@BugenZhao BugenZhao requested a review from fuyufjh March 19, 2024 09:04

### Usage Scenarios

There are two main scenarios for using SECRETs:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we take source/sink credentials into consideration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the design is for source/sink credentials.

rfcs/0086-create-secret.md Show resolved Hide resolved

### Security Considerations

Storing SECRETs in meta nodes is unavoidable, but access controls and encryption will be applied to protect the sensitive information.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing SECRETs in meta nodes is unavoidable,

In memory, yes. Persistent, no.

External vault is similar to what the proposal except the storage location.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion.

Both meta secret management and Vault (external credential management) are the backends. The RFC focus on the interface to define and ref a secret.

rfcs/0086-create-secret.md Outdated Show resolved Hide resolved
SECRETs are always stored as bytea type in SQL to ensure flexibility in adapting to different usage requirements.
If a user needs to use a certificate, its contents are also stored as bytea type and distributed to Compute Nodes (CNs) as needed.

To control storage consumption on meta nodes, the maximum length of a single SECRET is limited to 1MB.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep the flexibility of using a 3rd-party credential store (such as Vault) as an optional plugin in the future. This has been requested/asked by community users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be implemented via a trait

trait GetCredentials {
    fn fetch_credential(...) -> Result<Vec<u8>>
}

the solution is compatible with any credential backend as long as we can get secrets in bytes.

## Implementation

SECRETs are always stored as bytea type in SQL to ensure flexibility in adapting to different usage requirements.
If a user needs to use a certificate, its contents are also stored as bytea type and distributed to Compute Nodes (CNs) as needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you will just store them as plaintext on disk? If so, we might need to use some encryption function provided by the DB, otherwise, storing credential in plaintext seems unacceptable to me.

Copy link
Contributor

@fuyufjh fuyufjh Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent plaintext in data at rest, either way works for me:

  1. Run a symmetrical encryption & decryption algorithm in RW's code.
  2. Rely on the DB's encryption (just the outsourced version of 1 😄)

In both approaches, the encryption key will be managed by another system (such as K8s) and inject to RW/DB via config file/env var/etc. OR, just simply configured by users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer 2. Let's ask @yezizp2012

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the need to be compatible with encryption implementations of different DB backends, I am more inclined prefer 1. 🤔

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer 1. I guess it'd be hard to implement encryption for every DB backends that we support. Think about sqlite.

Copy link
Contributor

@fuyufjh fuyufjh Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay to tell users data-at-rest encryption is not available for SQLite.

I also slightly tend to 1 for PG meta-store backend, actually, PG ships with some handy functions and we may just need to write SQL.

While, if Vault is in use, we can let Vault do it.

@BugenZhao
Copy link
Member

I have no plan supporting alter now.
And I am up with the idea of "propagating the changes" because changing credentials will influence all existing clients, not only newly created ones. But we still need a promising way to propagate the changes.

Wow. IMO this should be considered from the day one of the design. Regardless of whether it is a goal for the initial stage of implementation, it will definitely be requested in the future.

@arkbriar
Copy link

I have no plan supporting alter now.
And I am up with the idea of "propagating the changes" because changing credentials will influence all existing clients, not only newly created ones. But we still need a promising way to propagate the changes.

Wow. IMO this should be considered from the day one of the design. Regardless of whether it is a goal for the initial stage of implementation, it will definitely be requested in the future.

Actually I agree with @tabVersion that we shouldn't allow altering. We may support some dynamic secret source/reference in the future to support some advanced use cases like password rotation, but altering a secret in our own storage doesn't sound good to me as it causes more trouble than convenience. Users will be sure that secrets are correct when they successfully create sources/sinks by referencing them. In that case, DROP must be supported. (I think it's a natural requirement 😄)

I'd rather separate altering secret and dynamic loading apart. And then I agree with @BugenZhao on the latter.

@xxchan
Copy link
Member

xxchan commented Mar 21, 2024

Why using ALTER to do password rotation isn't reasonable? (like I can update a key in GHA secret)

Anyway, I agree with Bugen that either alter or rotation should be discussed early. Otherwise we might be in larger trouble in the future.

@arkbriar
Copy link

Why using ALTER to do password rotation isn't reasonable? (like I can update a key in GHA secret)

Altering a secret makes it implicit when it is updated in references. Dropping then recreating is more explicit though they are in fact equivalent. Users will be aware of what they are going to do and what the consequences would be like, as DROP hints that dependents will break, while ALTER doesn't.

In terms of external vaults that provides password rotation, although the way to support dynamic loading may be similar, I'd rather believe that they have enforced the users to think about what to do and what's the expected result in advance to use the password rotation feature provided, for example, making sure the old and new credentials both work during the rotation. That way we just need to do our best to load the new credentials in time.

@tabVersion
Copy link
Contributor Author

I have no plan supporting alter now.
And I am up with the idea of "propagating the changes" because changing credentials will influence all existing clients, not only newly created ones. But we still need a promising way to propagate the changes.

Wow. IMO this should be considered from the day one of the design. Regardless of whether it is a goal for the initial stage of implementation, it will definitely be requested in the future.

Actually I agree with @tabVersion that we shouldn't allow altering. We may support some dynamic secret source/reference in the future to support some advanced use cases like password rotation, but altering a secret in our own storage doesn't sound good to me as it causes more trouble than convenience. Users will be sure that secrets are correct when they successfully create sources/sinks by referencing them. In that case, DROP must be supported. (I think it's a natural requirement 😄)

I'd rather separate altering secret and dynamic loading apart. And then I agree with @BugenZhao on the latter.

Yes, I have the same concern with @arkbriar about credential rotation. We've met the case requiring cert rotation in PoC, where I haven't come up with a compatible and graceful plan.

Details about the case: They have a disk mounted to all containers and change the cert periodically. They don't know the exact time to change the cert (another team does the change), but they know both cert versions gonna be valid for a while.
The solution is we keep waiting until the kafka SDK cannot pass the auth and drop the actor by recovery, letting the SDK load the new cert from the fixed dir again.

For the case above, I am not sure if we really want the meta to keep track of the file. So I'd leave it static for now and not enforce secret for SSL config in properties.

@xxchan Anyway, I agree with Bugen that either alter or rotation should be discussed early. Otherwise we might be in larger trouble in the future.

As I mentioned above, the main idea of the RFC (on the impl level) is to provide an interface to load credentials from "backend".
We are going to support meta-static backend and Vault backend at this stage. In this design, we always dispatch credentials with table fragments. I don't mean to say no to a more dynamic backend and the RFC does not block any approach to make the process dynamic for altering, because rebuilding the actor can fix the issue anyway.

In my point of view, we have no mature roadmap telling Rw will not encounter the case above or that we pivot to a cloud-only solution, keeping all things controllable. The open-source kernel should be compatible with as many cases as possible and it is ok to say "not yet" if we don't have a comprehensive enough sight for requests in this topic.

@tabVersion
Copy link
Contributor Author

I have no plan supporting alter now.
And I am up with the idea of "propagating the changes" because changing credentials will influence all existing clients, not only newly created ones. But we still need a promising way to propagate the changes.

Wow. IMO this should be considered from the day one of the design. Regardless of whether it is a goal for the initial stage of implementation, it will definitely be requested in the future.

Yes, we all know the credentials can be changed in the future but no one can tell they want to change in which way atm.

The thing has a low priority because 1) most auth only use aksk, which belongs to a dedicated assume role and it is not gonna change once setup; 2) as mentioned above, a pause-resume config change process can serve as a cover-up plan, which can be easy to impl. No need to be spec to a new one.

@xxchan
Copy link
Member

xxchan commented Mar 21, 2024

I also agree that a comprehensive implementation plan might not be needed too early, but some high level ideas might be helpful to demonstrate the flexibility of the current design.

But to clarify,

As I mentioned above, the main idea of the RFC (on the impl level) is to provide an interface to load credentials from "backend".

Do you mean you are only discussing this interface in the RFC?

trait GetCredentials {
    fn fetch_credential(...) -> Result<Vec<u8>>
}

I'm asking because according to

For Vault backend, we just need to trigger a refresh and rebuild the async stream inside the actor, and it shall be fine. But with the meta backend, it may require a mutation to all related actors.

It seems different secret backend needs different handling at the usage site? (i.e., at the actors) And that's not in the scope of the RFC?

because rebuilding the actor can fix the issue anyway.

I guess @fuyufjh is unhappy with that 😟

@tabVersion
Copy link
Contributor Author

tabVersion commented Mar 21, 2024

I also agree that a comprehensive implementation plan might not be needed too early, but some high level ideas might be helpful to demonstrate the flexibility of the current design.

You can give one in a new RFC.

But to clarify,

As I mentioned above, the main idea of the RFC (on the impl level) is to provide an interface to load credentials from "backend".

Do you mean you are only discussing this interface in the RFC?

The RFC mainly talks about how secret catalog interact with users, like you summarized before #86 (comment), and it is what the RFC for. It is a quite simple one.
I don't take the flexibility into account because #86 (comment), I don't think we've collected enough cases to make a wonderful design and I believe a static credential can meet most of the requirement of most users.
Furthermore, putting the credentials in the table fragment when creating actors won't block any future work as all stream job actors are built in this way.

I will mark further discussion on flexibility Off-Topic unless you give a concrete example we need to do it now or why config change cannot handle this situation.

I'm asking because according to

For Vault backend, we just need to trigger a refresh and rebuild the async stream inside the actor, and it shall be fine. But with the meta backend, it may require a mutation to all related actors.

It seems different secret backend needs different handling at the usage site? (i.e., at the actors) And that's not in the scope of the RFC?

I mention this part in #86 (comment), handling the backends in from_proto process as a hint to implementation.
The meta node dispatch either the credential itself or a token to fetch from Vault, actors do the work independently.
I don't think I need to write a detailed guide on how to put credentials in their place before passing to SDK. I am not talking about changing, building from proto is the common path for all actors.

To further address your question,

For Vault backend, we just need to trigger a refresh and rebuild the async stream inside the actor, and it shall be fine. But with the meta backend, it may require a mutation to all related actors.

What I mean here is we don't need to push a mutation when credentials in Vault change. (and we cannot do that because we don't know when it changes, "let is crash" is the only option) -- when we want to impl a dynamic credential

because rebuilding the actor can fix the issue anyway.

I guess @fuyufjh is unhappy with that 😟

As I mentioned above, it is a cover-up plan in case we have urgent requests for this. The happy path is still suggesting drop secret and re-create.

@stdrc
Copy link

stdrc commented Mar 22, 2024

I mention this part in #86 (comment), handling the backends in from_proto process as a hint to implementation.
The meta node dispatch either the credential itself or a token to fetch from Vault, actors do the work independently.

I'm not sure if I totally get it here. Do you mean that for secrets that are stored in our meta store, the secret will be resolved (by resolve I mean to map from secret name to secret content, i.e. the password/key itself) when generating plan node?

Then how do you track the dependency from streaming jobs to secrets? If later sometime we do plan to support drop/alter, how do you (or how do our users) know which streaming jobs will be affected?

What about provide a meta API to resolve a secret when need to use it?

I'm not suggesting that we must do it in this or that way, I'm just saying that "when to resolve the secret" seems to have profound impact on future extension and user experience, which need to be discussed in the RFC. We should analyze the pros and cons of different approach instead of being vague in such a serious and official design document.


```sql
-- create secret
create secret <secret-name> from <expr>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
create secret <secret-name> from <expr>
create secret <secret-name> as <expr>

Why using from? AS might be more consistent with other create statements, such as CTAS.

@fuyufjh
Copy link
Contributor

fuyufjh commented Mar 22, 2024

Just read the discussion above.

First, I do think alter secret must be considered, although it doesn't need to be implemented in the first place. However, I don't think we need to proactively trigger a "dynamic secret reloading", instead, just wait for the credential to be invalid, and then a recovery will be triggered.

Explain:

The major purpose of it is allowing users to rotate the credentials, which is a common practice for many use cases. Note that the rotation may be manual (for example, the old credential was possibly leaked or updated) or automatic (driven by TTL e.g. 180 days). In both cases, it doesn't make sense to let users recreate the source or table, but only the credential need to be updated. The most intuitive approach is obviously alter secret.

While for "dynamic secret reloading",

  1. Secret rotation is a very rare operation, and doesn't worth us designing a new code path to handle.
  2. It's a typical procedure to restart the dependent system when users update a credential. RisingWave just follows the convention.

In my mind, we may provide a command (risingwavelabs/risingwave#14700) to trigger recovery (better than restart). Alternatively, users can just do nothing, and RisingWave will do recovery and recreate all actors with new secrets upon the old credential expiration.

Thus, strong +1 for @stdrc

Do you mean that for secrets that are stored in our meta store, the secret will be resolved (by resolve I mean to map from secret name to secret content, i.e. the password/key itself) when generating plan node?

Then how do you track the dependency from streaming jobs to secrets? If later sometime we do plan to support drop/alter, how do you (or how do our users) know which streaming jobs will be affected?

Resolving secret on create table/source seems unacceptable for many reasons. Regarding of my idea here, we only need to make sure to resolve the secret on its actual use and not being cached, such as when the SourceExecutor initializes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants