Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃[question] Open to updates to EKS deployment? #9355

Closed
2 tasks done
bryantbiggs opened this issue May 13, 2024 · 6 comments 路 Fixed by #9427
Closed
2 tasks done

馃[question] Open to updates to EKS deployment? #9355

bryantbiggs opened this issue May 13, 2024 · 6 comments 路 Fixed by #9427
Labels

Comments

@bryantbiggs
Copy link

Describe your question

Hello 馃憢馃徑 - are you open to updates to the EKS deployment guidance https://docs.determined.ai/latest/setup-cluster/k8s/setup-eks-cluster.html ?

I work on a number of AWS led projects for EKS such as https://github.com/aws-ia/terraform-aws-eks-blueprints and https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest and would be happy to provide an update to the deployment using Terraform (since this is the tool we see that is used the most)

Checklist

  • Did you search the docs for a solution?
  • Did you search github issues to find if somebody asked this question before?
@ioga
Copy link
Contributor

ioga commented May 13, 2024

hello @bryantbiggs ,

thanks a lot for addressing terraform-aws-modules/terraform-aws-eks#3027 . you are right to guess that I've been investigating how we can modernize our EKS support and move from a manual setup to terraform. as an open-source product we'd be happy to take a PR for that.

reading between the lines, I assume you're looking to offer us your professional services. unfortunately we're not able to do that at this time.

@bryantbiggs
Copy link
Author

I assume you're looking to offer us your professional services. unfortunately we're not able to do that at this time.

I do not know what you mean by this, perhaps you could clarify. To be clear, I am not looking for monetary compensation to add this update - it was more of raising it as a discussion first before going down the path of submitting a PR, and just overall align on the approach. If you are open to the idea of updating the EKS portion of the docs, I am offering my services to do so, free of charge 馃槵

@ioga
Copy link
Contributor

ioga commented May 13, 2024

I do not know what you mean by this, perhaps you could clarify. To be clear, I am not looking for monetary compensation to add this update -

ah, I apologize for misunderstanding. I've noticed that terraform-aws-eks is sponsored by a consulting/professional services business, and so I assumed you're here on behalf of that business.

it was more of raising it as a discussion first before going down the path of submitting a PR, and just overall align on the approach. If you are open to the idea of updating the EKS portion of the docs, I am offering my services to do so, free of charge 馃槵

As a background, today we have a det deploy tool which uses cloudformation on AWS and terraform on GCP to spin up determined clusters on raw EC2/GCP nodes. We also have a very raw solution for GKE which sets up a GKE cluster and depoys our helm chart to it.

What I'd like to have in a long term, is det deploy eks which creates an appropriate EKS cluster and deploys our helm chart to it. If I were to break it up into milestones:

  1. Terraform code to create/update/maintain an EKS cluster with autoscaling for two types of instances: GPU instances of configurable type and max count for ML loads, and cheap CPU instances (e.g. m5.xlarge) for lightweight jobs. On GKE it's literally a checkbox, but I've really struggled to set this up on EKS before opening that ticket.
  2. Support for a RDS Postgres instance our helm chart will use for database needs.
  3. Support for a S3 bucket our helm chart will use for (ml model training) checkpoint storage.
  4. Support for a shared AWS EFS filesystem for users home directories and so on.
  5. Put a helm chart on it.

@bryantbiggs
Copy link
Author

thank you for sharing that information! I'll put it on my list to try putting together a pattern of running the Determined AI helm chart on EKS and then we can discuss how that fits into the documentation that is currently provided

One thing to keep in mind - most of the Terraform users are used to interacting with Terraform directly, and not through a wrapper CLI. So this is more along the lines of what we provide for folks to help them understand how to achieve a certain outcome. This gives them options for consumption - they can copy+paste it into their environment and deploy it, they can compare the code against their setup if trying to figure out what they may be missing, or they can simply use it as a frame of reference to guide their implementation

@ioga
Copy link
Contributor

ioga commented May 14, 2024

yep, I understand that's a typical approach for terraform ecosystem. However in our product historically we've been targeting ML engineers who do not have any experience with terraform, but want to push a button and get a cluster in a box deployed. In the end of the day, CLI is just a thin wrapper on top of terraform code. Some users elect to bypass the wrapper and take the raw terraform code if they want to consume it that way.

@ioga
Copy link
Contributor

ioga commented May 16, 2024

@bryantbiggs can you please share what are you plans and timelines? I'd also like working in that direction, but I don't want to repeat the same work you are doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants