Amplify push failure - CloudFormation stack rollback failed #2458

pr0g · 2024-04-14T21:54:48Z

How did you install the Amplify CLI?

npm

If applicable, what version of Node.js are you using?

16.20.2

Amplify CLI Version

12.10.3

What operating system are you using?

macOS

Did you make any manual changes to the cloud resources managed by Amplify? Please describe the changes made.

Made one change to an AppSync resolve after seeing this post (#2157) which seemed related to the issue I was seeing. I only made this change after the stack was in a bad state (not able to roll back).

Describe the bug

We recently upgraded from Amplify Transformer V1 to V2. A bug was detected a few days after this upgrade, and to verify the V1 -> V2 Transformer change was the cause, I synced back to the commit before the upgrade, did an amplify push, and confirmed things worked as expected.

After doing some digging I discovered the issue was down to a change in behavior from Amplify Transformer V1 to V2 where the owner field is not automatically populated when making GraphQL requests. I found this post #37, which suggested to update amplify/cli.json to include "populateOwnerFieldForStaticGroupAuth": true. I made this change, along with a few other updates to amplify.json (see cli.diff.txt, just remove the .txt extension to preview), synced back to our change after the upgrade to Transformer V2, and did an amplify push.

After running this push, the CloudFormation stack for the application failed. It failed in the api<app>client stack. From the top level, its status is reported as UPDATE_COMPLETE, but when you click the Physical ID link, it shows as UPDATE_ROLLBACK_FAILED. The reason is The following resource(s) failed to update: [ConnectionStack].. Looking at the ConnectionStack, I can see it is also in UPDATE_ROLLBACK_FAILED. The following reason is three resolvers failed to deploy (these are tables in our GraphQL schema):

The following resource(s) failed to update: [GuestUserRegistrationssourceFileResolver, AppUserlicenseResolver, ViewSessionsourceFileResolver].

If I attempt to continue the rollback, I go from one of the nested stacks to the root stack, but as it is not reported as being in UPDATE_ROLLBACK_FAILED, I can't rollback at all. If I try and run amplify push again I see:

Rolled back (2 of 1)
🛑 ["Index: 0 State: {\"deploy\":\"deploying\"} Message: Stack:arn:aws:cloudformation:eu-west-2:<account>:stack/amplify-<app>client-<environment>-115352-api<app>client-.../... is in UPDATE_ROLLBACK_FAILED state and can not be updated."]

At this stage I am not sure how to recover the stack. Is there something I can do to fix the resolvers outside Amplify? Any guidance/support/advice would be hugely appreciated.

Expected behavior

The CloudFormation stack can be rolled back and amplify push works as expected.

Reproduction steps

Not sure exactly if this will work, but this is roughly what I did:

Upgrade project from Amplify Transformer V1 to V2 (without "populateownerfieldforstaticgroupauth": true set - see attachment for state of amplify/cli.json)
amplify push
Sync back in Git history to before the upgrade (Transformer V1)
amplify push (restore earlier state)
Sync back to latest Git commit (Transformer V2)
Update amplify/cli.json with changes shown in diff
amplify push
See push fail, CloudFormation enters apparent unrecoverable state.

Project Identifier

Attempting to run amplify diagnose --send-report shows:

⠏ Sending zip
DiagnoseReportUploadError
✖ Sending zip

Log output

See earlier description

Additional information

No response

Before submitting, please confirm:

I have done my best to include a minimal, self-contained set of instructions for consistently reproducing the issue.
I have removed any sensitive information from my code snippets and submission.

The text was updated successfully, but these errors were encountered:

ykethan · 2024-04-15T15:31:01Z

Hey👋 thanks for raising this! I'm going to transfer this over to our API repository for better assistance 🙂

pr0g · 2024-04-16T21:25:52Z

I haven't had a huge amount of help with my other issues on the API repo I'm afraid @ykethan, if you have any ideas I'd love to know!

I have opened an AWS support case for this particular issue too, but it seems like something with Amplify caused it to fail and get into a bad state, hopefully support can help me recover it.

AnilMaktala · 2024-04-17T02:17:03Z

Hey @pr0g, I apologize for the inconvenience you've experienced. Have you been contacted by a member of our support team regarding this issue?

pr0g · 2024-04-17T06:39:46Z

Hi @AnilMaktala, thanks for getting back to me. That's okay, I'm speaking to someone from support who's contacted the CloudFormation team to help restore it. One of the nested stacks is reporting to its parent that it's in UPDATE_COMPLETE, but internally it's in UPDATE_ROLLBACK_FAILED, so the rollback can't be continued from the root stack. Apparently that's a symptom of Drift, but I don't know how that could have happened as I was using the Amplify CLI to perform all operations. I'll report back with an update hopefully when it's sorted. Thanks!

AaronZyLee · 2024-04-17T20:49:08Z

From the description, it is most likely that the failure of the deployment is caused by this

I made this change, along with a few other updates to amplify.json

As you mention about the error comes from ConnectionStack, it should be related to the feature flag regarding the connection changes rather than the auth resolver ones. I examine the diff file and find that there are other flags added apart from populateOwnerFieldForStaticGroupAuth: true.

As a workaround for the update rollback failure, I notice there are already steps mentioned by another customer (see #2157 (comment)) about adding the dummy resolvers for those with errors (in your case the ones in connection stacks), which should be helpful for you to resolve the rollback issue.

Once you rollback successfully, I suggest only keeping the populateOwnerFieldForStaticGroupAuth: true but remove the other changes in the diff, which should prevent unintended changes/failures to the resolvers/resources.

pr0g · 2024-04-17T21:10:16Z

Hi @AaronZyLee,

Thanks for your reply. Yes in hindsight I should have been a good scientist and only changed one thing at a time (lesson learned again 🙈). The reason I updated these flags is I'd been meaning to do it after @ykethan suggested I do it in this post. I realize I probably should have done this after though (less haste more speed).

I did see the post you mentioned, but unfortunately I don't think it will work for me because the root stack doesn't think it's in an UPDATE_ROLLBACK_FAILED state, only the nested stacks do, so I'm basically stuck without intervention from AWS support (I will follow-up with them again tomorrow to see where things have go to).

Might it be possible to delete the ConnectionStack and have it get redeployed? I've shied away from doing that because I didn't want to make things worse, but maaaybe that might work?

Thanks for the feedback and it's good to know for future, but ideally now I just need a way of recovering things and getting back to a good state.

AnilMaktala · 2024-05-20T12:39:26Z

Hey @pr0g, Are you still experiencing this issue?

pr0g · 2024-05-20T19:38:34Z

Hi @AnilMaktala,

Thanks for following-up, I was able to talk to AWS support and was able to get my CloudFormation stack back to UPDATE_ROLLBACK_COMPLETE, unfortunately when I try and do an Amplify push things are still failing. I've been talking with the AWS Amplfy support team and have managed to narrow things down a bit.

I'm going to try and sync back to earlier in our Git history when this problem occurred and do an amplify push --force to see if I can get the deployment to succeed (this is after deleting the deployment.json file in S3). I think that now that the environment is so out of sync with what's in Git, trying to do an amplify push now is causing problems (the failure happens when trying to update a model/table that's been removed).

I'm going to try and get to this later this week and will leave an update if that works.

Thanks!

pr0g added the pending-triage label Apr 14, 2024

ykethan transferred this issue from aws-amplify/amplify-cli Apr 15, 2024

ykethan added the transferred label Apr 15, 2024

AnilMaktala added the question Further information is requested label Apr 18, 2024

AnilMaktala added the pending-response label May 20, 2024

AnilMaktala mentioned this issue May 20, 2024

Cognito update attribute failure when running amplify push #2384

Open

2 tasks

github-actions bot removed the pending-response label May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amplify push failure - CloudFormation stack rollback failed #2458

Amplify push failure - CloudFormation stack rollback failed #2458

pr0g commented Apr 14, 2024

ykethan commented Apr 15, 2024

pr0g commented Apr 16, 2024

AnilMaktala commented Apr 17, 2024

pr0g commented Apr 17, 2024

AaronZyLee commented Apr 17, 2024

pr0g commented Apr 17, 2024

AnilMaktala commented May 20, 2024

pr0g commented May 20, 2024

Amplify push failure - CloudFormation stack rollback failed #2458

Amplify push failure - CloudFormation stack rollback failed #2458

Comments

pr0g commented Apr 14, 2024

How did you install the Amplify CLI?

If applicable, what version of Node.js are you using?

Amplify CLI Version

What operating system are you using?

Did you make any manual changes to the cloud resources managed by Amplify? Please describe the changes made.

Describe the bug

Expected behavior

Reproduction steps

Project Identifier

Log output

Additional information

Before submitting, please confirm:

ykethan commented Apr 15, 2024

pr0g commented Apr 16, 2024

AnilMaktala commented Apr 17, 2024

pr0g commented Apr 17, 2024

AaronZyLee commented Apr 17, 2024

pr0g commented Apr 17, 2024

AnilMaktala commented May 20, 2024

pr0g commented May 20, 2024