Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deploy-runner-on-gcp - does the cloud region matter in relation to overall region? #148

Open
gregoryfoster opened this issue Nov 25, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@gregoryfoster
Copy link
Contributor

Describe the Bug

In the deploy-runner-on-gcp job called from event-gather-pipeline.yml, the cml runner option for cloud-region is hardcoded to us-central1-f. In my (very brief!) experience, this resulted in failures when attempting to create the machine due to ZONE_RESOURCE_POOL_EXHAUSTED - which may be transient, but I saw it repeatedly enough to try a different cloud region that supports T4 GPUs.

As well, I specified a region of us-west1 for my GCP project as a whole, different from the default us-central1 region in CDP. That distinction---and the fact that us-west1-b cloud region worked for me---made me wonder whether this is a setting which needs to track the overall GCP region to ensure access to associated resources. I don't know enough about any of this to know whether that's true or if this machine is standalone.

Expected Behavior

I expected the Event Gather action deploy-runner-on-gcp job to complete sucessfully.

Reproduction

Stand up a CDP instance situated in a region other than us-central1 and execute the Event Gather action.

Environment

Any additional information about your environment.

  • OS Version: [e.g. macOS 11.3.1]
  • Cookiecutter Version: [e.g. 0.5.0]
@gregoryfoster gregoryfoster added the bug Something isn't working label Nov 25, 2023
@evamaxfield
Copy link
Member

Ya this is an interesting one. You are the second person to report that us-central might be overloaded now. In general, I tested a bunch of different regions for GCP compute way back when we added that process and found the us-central was generally available but sometimes was overloaded, but not nearly as much as all the other regions I tested. If you want to change the region feel free.

To my knowledge, there is no downside / drawback of using a different region for compute vs the region for the project. The only "big difference" is maybe data download + upload from storage which may cost a fraction more but in comparison to "stability of compute" I went with central at the time.

All of this is to say... do want you would like? And maybe we should document this somewhere?

@gregoryfoster
Copy link
Contributor Author

Based on your feedback, I suggest we change this issue to a feature request to make cloud-region a template variable that can be edited on project generation.

@evamaxfield
Copy link
Member

Seems fair to me!

@evamaxfield
Copy link
Member

I have switched to us-west1-b for now as I am also running into a lot of issues.

@evamaxfield
Copy link
Member

woops. reopening as I think we still want this to be parametrizable

@evamaxfield evamaxfield reopened this Dec 5, 2023
@dvdokkum
Copy link

dvdokkum commented Apr 2, 2024

I'm also running into this issue on a new cookie cutter install... event gather runs are failing when trying to set up the runner: us-west1-b does not have enough resources available to fulfill the request. The instance is set up on the default central1 gcp region. Is there a workaround to get this working? It isn't clear to me how I would specify a different region...

If helpful, I haven't customized anything... I just followed the directions in the youtube tutorial using all default cookie cutter values.

Update:
I ended up just changing the specified region in the GH workflow back to us-central1-f and it worked!

@evamaxfield
Copy link
Member

Ah yea sorry. All of the region stuff is entirely parameterizable. Whichever works best for you is great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants