Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amplify push failure - CloudFormation stack rollback failed #2458

Open
2 tasks done
pr0g opened this issue Apr 14, 2024 · 8 comments
Open
2 tasks done

Amplify push failure - CloudFormation stack rollback failed #2458

pr0g opened this issue Apr 14, 2024 · 8 comments
Labels
pending-triage question Further information is requested transferred

Comments

@pr0g
Copy link

pr0g commented Apr 14, 2024

How did you install the Amplify CLI?

npm

If applicable, what version of Node.js are you using?

16.20.2

Amplify CLI Version

12.10.3

What operating system are you using?

macOS

Did you make any manual changes to the cloud resources managed by Amplify? Please describe the changes made.

Made one change to an AppSync resolve after seeing this post (#2157) which seemed related to the issue I was seeing. I only made this change after the stack was in a bad state (not able to roll back).

Describe the bug

We recently upgraded from Amplify Transformer V1 to V2. A bug was detected a few days after this upgrade, and to verify the V1 -> V2 Transformer change was the cause, I synced back to the commit before the upgrade, did an amplify push, and confirmed things worked as expected.

After doing some digging I discovered the issue was down to a change in behavior from Amplify Transformer V1 to V2 where the owner field is not automatically populated when making GraphQL requests. I found this post #37, which suggested to update amplify/cli.json to include "populateOwnerFieldForStaticGroupAuth": true. I made this change, along with a few other updates to amplify.json (see cli.diff.txt, just remove the .txt extension to preview), synced back to our change after the upgrade to Transformer V2, and did an amplify push.

After running this push, the CloudFormation stack for the application failed. It failed in the api<app>client stack. From the top level, its status is reported as UPDATE_COMPLETE, but when you click the Physical ID link, it shows as UPDATE_ROLLBACK_FAILED. The reason is The following resource(s) failed to update: [ConnectionStack].. Looking at the ConnectionStack, I can see it is also in UPDATE_ROLLBACK_FAILED. The following reason is three resolvers failed to deploy (these are tables in our GraphQL schema):

The following resource(s) failed to update: [GuestUserRegistrationssourceFileResolver, AppUserlicenseResolver, ViewSessionsourceFileResolver]. 

If I attempt to continue the rollback, I go from one of the nested stacks to the root stack, but as it is not reported as being in UPDATE_ROLLBACK_FAILED, I can't rollback at all. If I try and run amplify push again I see:

Rolled back (2 of 1)
🛑 ["Index: 0 State: {\"deploy\":\"deploying\"} Message: Stack:arn:aws:cloudformation:eu-west-2:<account>:stack/amplify-<app>client-<environment>-115352-api<app>client-.../... is in UPDATE_ROLLBACK_FAILED state and can not be updated."]

At this stage I am not sure how to recover the stack. Is there something I can do to fix the resolvers outside Amplify? Any guidance/support/advice would be hugely appreciated.

Expected behavior

The CloudFormation stack can be rolled back and amplify push works as expected.

Reproduction steps

Not sure exactly if this will work, but this is roughly what I did:

  1. Upgrade project from Amplify Transformer V1 to V2 (without "populateownerfieldforstaticgroupauth": true set - see attachment for state of amplify/cli.json)
  2. amplify push
  3. Sync back in Git history to before the upgrade (Transformer V1)
  4. amplify push (restore earlier state)
  5. Sync back to latest Git commit (Transformer V2)
  6. Update amplify/cli.json with changes shown in diff
  7. amplify push
  8. See push fail, CloudFormation enters apparent unrecoverable state.

Project Identifier

Attempting to run amplify diagnose --send-report shows:

⠏ Sending zip
DiagnoseReportUploadError
✖ Sending zip

Log output

See earlier description

Additional information

No response

Before submitting, please confirm:

  • I have done my best to include a minimal, self-contained set of instructions for consistently reproducing the issue.
  • I have removed any sensitive information from my code snippets and submission.
@ykethan
Copy link

ykethan commented Apr 15, 2024

Hey👋 thanks for raising this! I'm going to transfer this over to our API repository for better assistance 🙂

@ykethan ykethan transferred this issue from aws-amplify/amplify-cli Apr 15, 2024
@pr0g
Copy link
Author

pr0g commented Apr 16, 2024

I haven't had a huge amount of help with my other issues on the API repo I'm afraid @ykethan, if you have any ideas I'd love to know!

I have opened an AWS support case for this particular issue too, but it seems like something with Amplify caused it to fail and get into a bad state, hopefully support can help me recover it.

@AnilMaktala
Copy link

Hey @pr0g, I apologize for the inconvenience you've experienced. Have you been contacted by a member of our support team regarding this issue?

@pr0g
Copy link
Author

pr0g commented Apr 17, 2024

Hi @AnilMaktala, thanks for getting back to me. That's okay, I'm speaking to someone from support who's contacted the CloudFormation team to help restore it. One of the nested stacks is reporting to its parent that it's in UPDATE_COMPLETE, but internally it's in UPDATE_ROLLBACK_FAILED, so the rollback can't be continued from the root stack. Apparently that's a symptom of Drift, but I don't know how that could have happened as I was using the Amplify CLI to perform all operations. I'll report back with an update hopefully when it's sorted. Thanks!

@AaronZyLee
Copy link
Contributor

From the description, it is most likely that the failure of the deployment is caused by this

I made this change, along with a few other updates to amplify.json

As you mention about the error comes from ConnectionStack, it should be related to the feature flag regarding the connection changes rather than the auth resolver ones. I examine the diff file and find that there are other flags added apart from populateOwnerFieldForStaticGroupAuth: true.

As a workaround for the update rollback failure, I notice there are already steps mentioned by another customer (see #2157 (comment)) about adding the dummy resolvers for those with errors (in your case the ones in connection stacks), which should be helpful for you to resolve the rollback issue.

Once you rollback successfully, I suggest only keeping the populateOwnerFieldForStaticGroupAuth: true but remove the other changes in the diff, which should prevent unintended changes/failures to the resolvers/resources.

@pr0g
Copy link
Author

pr0g commented Apr 17, 2024

Hi @AaronZyLee,

Thanks for your reply. Yes in hindsight I should have been a good scientist and only changed one thing at a time (lesson learned again 🙈). The reason I updated these flags is I'd been meaning to do it after @ykethan suggested I do it in this post. I realize I probably should have done this after though (less haste more speed).

I did see the post you mentioned, but unfortunately I don't think it will work for me because the root stack doesn't think it's in an UPDATE_ROLLBACK_FAILED state, only the nested stacks do, so I'm basically stuck without intervention from AWS support (I will follow-up with them again tomorrow to see where things have go to).

Might it be possible to delete the ConnectionStack and have it get redeployed? I've shied away from doing that because I didn't want to make things worse, but maaaybe that might work?

Thanks for the feedback and it's good to know for future, but ideally now I just need a way of recovering things and getting back to a good state.

@AnilMaktala AnilMaktala added the question Further information is requested label Apr 18, 2024
@AnilMaktala
Copy link

Hey @pr0g, Are you still experiencing this issue?

@pr0g
Copy link
Author

pr0g commented May 20, 2024

Hi @AnilMaktala,

Thanks for following-up, I was able to talk to AWS support and was able to get my CloudFormation stack back to UPDATE_ROLLBACK_COMPLETE, unfortunately when I try and do an Amplify push things are still failing. I've been talking with the AWS Amplfy support team and have managed to narrow things down a bit.

I'm going to try and sync back to earlier in our Git history when this problem occurred and do an amplify push --force to see if I can get the deployment to succeed (this is after deleting the deployment.json file in S3). I think that now that the environment is so out of sync with what's in Git, trying to do an amplify push now is causing problems (the failure happens when trying to update a model/table that's been removed).

I'm going to try and get to this later this week and will leave an update if that works.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending-triage question Further information is requested transferred
Projects
None yet
Development

No branches or pull requests

4 participants