Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make GKE clusters ephemeral by default #730

Open
1 of 5 tasks
rainest opened this issue Jun 28, 2023 · 0 comments
Open
1 of 5 tasks

Make GKE clusters ephemeral by default #730

rainest opened this issue Jun 28, 2023 · 0 comments

Comments

@rainest
Copy link
Contributor

rainest commented Jun 28, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Problem Statement

Use of KTF outside CI may leak GKE clusters due to humans forgetting to delete them after usage. To avoid unnecessary cloud spend, we'd like a means to have them default to ephemeral unless marked otherwise.

Proposed Solution

  • Add a GKE cluster label indicating a deletion date.
  • Default this label to three days after creation.
  • Provide a flag allowing users to override the default.
  • Add a scheduled task to delete clusters past their expiration.

Additional information

Elsewhere we run cleanup jobs as part of CI in projects that use KTF. There isn't really a great CI location for this scheduled deletion, since this is intended for clusters created outside CI. I don't think it makes sense to run this in the KTF repo itself.

GCP does provide https://cloud.google.com/scheduler with 3 free jobs a month ($0.10/job/month after). We should use it as a project-agnostic deletion method.

We may want a "never" option but it's simpler to just use dates as label values, and we should probably discourage indefinite lifetime clusters anyway. Setting the duration to 9999 or something similarly ridiculous or manually removing the label should be sufficient if you really, really want to avoid the cleanup job.

Acceptance Criteria

  • KTF's GKE cluster creation utility provides a CLI flag that takes a number of days as input and sets an expiration date label.
  • The CLI flag defaults to 3 days.
  • We provide an example GKE Cloud Scheduler configuration that deletes clusters past expiration.
  • We use the scheduler job on our GKE project.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant