Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Use GCP Datastream to ensure consistency of Redis entries #1633

Open
jalseth opened this issue Aug 13, 2023 · 3 comments
Open

Proposal: Use GCP Datastream to ensure consistency of Redis entries #1633

jalseth opened this issue Aug 13, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@jalseth
Copy link
Contributor

jalseth commented Aug 13, 2023

Description

There have been times when Rekor is successful in writing a new entry to the transparency log, but fails to write to Redis. It would be possible to improve this by adding retry logic to the Rekor API, but there will always be some edge case where the API server is unable to write to Redis before the API server is shut down and loses its in-memory retry queue.

Rather than relying on the API server to guarantee writes to Redis, we can use the MySQL database as the source of truth. GCP Datastream is a serverless offering that integrates with databases and takes action when writes occur. Datastream does not currently (WIP) support taking events straight to GCP PubSub, but does support writing to GCS and GCS write events can be used to trigger PubSub which would then be consumed by a new job. The new job would only Ack the PubSub messages after the entry was successfully written to Redis.

Open Questions:

  • Should the API server still attempt to write to Redis and this would only be used for reconciliation, or should the API server not write to Redis at all and rely on this flow?
  • How to prevent abuse of principals with the ability to write to the GCS bucket? Can we rely on IAM or should the PubSub consumer job validate the entry against Rekor's signing keys?
  • How to handle the lifecycle for the temporary GCS objects? Is deleting all objects older than N days sufficient?
  • Would the costs of Datastream + GCS + PubSub be too much?
@jalseth jalseth added the enhancement New feature or request label Aug 13, 2023
@haydentherapper
Copy link
Contributor

I very much like this idea to use the DB as the source of truth and rely on GCP to guarantee entry upload side effects occur.

Should the API server still attempt to write to Redis and this would only be used for reconciliation, or should the API server not write to Redis at all and rely on this flow?

Given this feature would be exclusively for GCP, having both be supported would be ideal, for those who only want to use Redis. I would disable writing directly to Redis if a flag is enabled that says datastream is in use.

How to prevent abuse of principals with the ability to write to the GCS bucket? Can we rely on IAM or should the PubSub consumer job validate the entry against Rekor's signing keys?

I think IAM should be sufficient, though this should be fleshed out in a design.

How to handle the lifecycle for the temporary GCS objects? Is deleting all objects older than N days sufficient?

Should also consider multi-day outages. For example, if pub/sub is down for a few days, what happens if the temporary object has been deleted from GCS? Do we need a job to delete old entries from GCS?

@bobcallaway
Copy link
Member

I like the idea as well.

This pattern could also be potentially supported with an OSS stack like Debezium when Rekor is run in other environments.

@jalseth
Copy link
Contributor Author

jalseth commented Aug 17, 2023

Great! I didn't realize there was an OSS offering in this space.

I'll throw together a small design doc and we can discuss further, including the potential impact of compromised GCS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants