New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Firestore index intermittent failure creating resources - 404 Job ID not found #528
Firestore index intermittent failure creating resources - 404 Job ID not found #528
Comments
If I have the time next week I may try to setup a github actions in one of my repos that generates a stack and then tears it down every day but installs pulumi alphas / betas so that I can catch bugs like this and pulumi/pulumi#9694 before they are released |
Thank you @JacksonMaxfield. I have a hard time thinking of a possible reason for this error that could be introduced in the CLI 3.34.0... it feels like a provider problem. I expect there is a significant difference between your I'll go ahead and move this issue to google-native. |
Yea this definitely may be a google problem :/ |
Another replication: https://github.com/CouncilDataProject/cabq/runs/6834785215?check_suite_focus=true |
Thank you for opening the issue, there is some ongoing work in this provider that will hopefully address this. |
Still happening: https://github.com/CouncilDataProject/abq/runs/6885881433?check_suite_focus=true Any way I can help? More debug logs? |
@JacksonMaxfield - seems like a number of the index resources are succeeding but one is not. I can only see the latest failure so can't corelate with prior runs. Some questions:
|
Sorry! I deleted the old repos because of how out-of-wack the stack was. Easier to start over than it is to fix the stack.
Thos are succeeding everytime. It's happening when trying to create new stacks / resources for the first time -- it loses track of the JobID run 1: ignorable because the repo didnt have google creds yet
But it does also have some "JobID lost" errors on the second run too. So seems like some of them get through on the first attempt and some dont?
sidenote: I am wondering what google changed recently because we have ~9 of these stacks deployed and none have ever had any problems. |
Update running a stack create locally with the depends on suggestion and it is working it is simply taking a very long time. ~3 minutes per index. |
Update: I can confirm that setting the indexes to create one after each other fixes the problem. Never happened before but 🤷 Normally infrastructure deploys take ~3 minutes with this stack configuration this setup now takes ~2 hours. I am going to make the changes on the stack. Thanks for the idea. I would love to add docs to the firstore index resource that says "recommended to make one index at a single time" -- how would I go about doing that? Should I add the same docstring contribution to all the firestore index objects?
etc? |
@JacksonMaxfield thanks for trying that out! Agree this is definitely painful. I will bring this up with our Google contacts. Another option is to try splitting the indexes up in groups (of 2 or 3) instead of trying them entirely serially. Alternatively you could remove the artificial ordering and try with
Those files are auto-generated from discovery docs published by Google so changes to those files will be overwritten. We can add a note with an overlay during our schema generation if necessary but I'd like to get a word back from Google first. |
Extra notes: I wondered if it is limited by the collection the index is being attached to? Basically you could only create a single index per collection at a time but I tried that and it didn't work. I have also seen it fail with parallelism = 2 so I am just going to leave it with this weird depends on tree right now.
Sounds good! Would love updates whenever you find out. |
Unfortunately even after the recent change and local testing with the "depends on" thing we are still getting failures... https://github.com/CouncilDataProject/albuquerque/runs/6909101822?check_suite_focus=true |
Pulling out relevant error message:
|
@JacksonMaxfield would you mind terribly rerunning the stack with |
Will do this afternoon likely |
Sent! Those logs are long. Let me know if you need anything else or want me to try anything. Thank you so much for investigating. |
@viveklak any update on this? I just tried again and I am still experiencing this bug: https://github.com/CouncilDataProject/oakland/runs/7620442143?check_suite_focus=true |
I am going to try rolling back pulumi version to something older, I am going to try running the same script with Terraform, last ditch I don't know, I guess I will rip out the firestore index creation? |
Apologies for the radio silence. I am just about to cut a v0.23.0 release with a bug fix which might be helpful (not sure). Would you be able to try it once its out before trying older releases? |
No worries on silence! We are all busy and I get it. Sorry if my messages came off as pester-y, I am simply using this issue to track the state of our own build infra too. Will happily try a new release later today! |
Attempt with 0.23.0 still fails: https://github.com/CouncilDataProject/oakland/runs/7815016330?check_suite_focus=true |
I have this working without pulumi and just with firebase CLI tools. Unfortunately going to head that direction. |
What happened?
Certain tasks are lossing track of their Job IDs it seems?
Steps to reproduce
I can try to make a reproduction tomorrow but this same workflow was working on v3.33.2
Full log of failing on v3.34.0: https://github.com/CouncilDataProject/abqnm/runs/6803735092?check_suite_focus=true
Full log of different instance but same stack specification working on v3.33.2: https://github.com/CouncilDataProject/boston/runs/6783350773?check_suite_focus=true
Expected Behavior
No lost Job IDs / Succeeds
Actual Behavior
Fails with errors that are too hard for me to debug unfortunately :(
Versions used
pulumi: 3.34.0 (and working in v3.33.2)
pulumi_gcp: 6.26.0 (same version used in the successful deployment on pulumi base v3.33.2)
pulumi_google_native: 0.20.0 (same version used in the successful deployment on pulumi base v3.33.2)
python: 3.9.13
Additional context
No response
Contributing
Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
The text was updated successfully, but these errors were encountered: