Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AppSync detached resolvers #146

Open
alextriaca opened this issue Jan 22, 2021 · 25 comments
Open

AppSync detached resolvers #146

alextriaca opened this issue Jan 22, 2021 · 25 comments
Labels
bug Something isn't working

Comments

@alextriaca
Copy link

It is possible for resolvers to become detached when an incorrect schema is deployed and correcting the schema does not reattach the resolvers. In addition this issue can ripple to affect other resolvers on other fields.

Reproduction Steps

The most localised reproduction of this can be done with a single field schema and resolver. Given the following schema and API:
Schema:

type Query {
    ping: String
}

API (In CDK):

from aws_cdk import core, aws_appsync


class GraphqlTestStack(core.Stack):
    def __init__(self, scope: core.Construct, **kwargs) -> None:
        super().__init__(scope, "GraphqlTestStack", **kwargs)

        api = aws_appsync.GraphqlApi(
            self,
            "test_api-",
            name="test_api",
            schema=aws_appsync.Schema.from_asset("resources/schema.graphql"),
        )

        api.add_none_data_source("ping").create_resolver(
            type_name="Query",
            field_name="ping",
            request_mapping_template=aws_appsync.MappingTemplate.from_string(
                '{"version": "2018-05-29"}'
            ),
            response_mapping_template=aws_appsync.MappingTemplate.from_string(
                '$util.toJson("pong")'
            ),
        )

If you then deploy this stack it will deploy successfully with a query of ping returning pong. If you then deploy the following schema you will end up with a resolver that points to nothing:

type Query {
}

If you then realise your mistake and correct the schema and add the ping field back and redeploy, the schema will correct itself and the resolver will be present but the resolver will not be attached and a query to ping will return null.

Reproduction Steps (a more sinister presentation)

We can extend the previous example to add three fields to our Query like so:

Schema:

type Query {
    ping1: String
    ping2: String
    ping3: String
}

API (In CDK):

from aws_cdk import core, aws_appsync


class GraphqlTestStack(core.Stack):
    def __init__(self, scope: core.Construct, **kwargs) -> None:
        super().__init__(scope, "GraphqlTestStack", **kwargs)

        api = aws_appsync.GraphqlApi(
            self,
            "test_api-",
            name="test_api",
            schema=aws_appsync.Schema.from_asset("resources/schema.graphql"),
        )

        api.add_none_data_source("ping1").create_resolver(
            type_name="Query",
            field_name="ping1",
            request_mapping_template=aws_appsync.MappingTemplate.from_string('{"version": "2018-05-29"}'),
            response_mapping_template=aws_appsync.MappingTemplate.from_string('$util.toJson("pong1")'),
        )
        api.add_none_data_source("ping2").create_resolver(
            type_name="Query",
            field_name="ping2",
            request_mapping_template=aws_appsync.MappingTemplate.from_string('{"version": "2018-05-29"}'),
            response_mapping_template=aws_appsync.MappingTemplate.from_string('$util.toJson("pong2")'),
        )
        api.add_none_data_source("ping3").create_resolver(
            type_name="Query",
            field_name="ping3",
            request_mapping_template=aws_appsync.MappingTemplate.from_string('{"version": "2018-05-29"}'),
            response_mapping_template=aws_appsync.MappingTemplate.from_string('$util.toJson("pong3")'),
        )

If you then deploy a broken version of the schema:

type Query {
    ping1: String
    ping2: String this_is_a_typo
    ping3: String
}

This will successfully deploy however the resulting schema will be empty:

type Query {

}

If you then correct this mistake and deploy the original schema, the schema will be corrected and all three resolvers will be deployed, however all three will be detached.


This is 🐛 Bug Report

@alextriaca
Copy link
Author

I'm not sure I follow how this can be expected behaviour. There are two issues I see here:

  1. You cannot deploy a resolver for a non existent field, you will simply get a Cloudformation error. However you can delete the schema from underneath the resolver without the same error being raised (example one above). A resolver requires you specify the specific Type and Field it connects to so mapping between the two is always possible.
  2. It is possible to deploy an "incorrect" schema which passes the deployment validation but does not deploy correctly (example two above). This can then result in multiple resolvers detaching for no reason (and as you stated, will not reattach once they are corrected).

@ansman
Copy link

ansman commented Feb 1, 2021

I agree with @alextriaca. This cannot be expected behavior as we have never manually attached the resolvers in the first place.

The flow is something like this:

  1. An initial schema is deployed
  2. CDK creates the resolvers AND attached them for us
  3. A new, invalid, schema is deployed
  4. CDK detaches the resolvers
  5. A new, valid, schema is deployed
  6. CDK does NOT attach the resolvers (which I would argue is expected)

@alextriaca
Copy link
Author

alextriaca commented Feb 11, 2021

I've found a better example of how there is some subtle bug here rather than this being intended functionality. The resolvers below each have the timestamp embedded in their ID. Each time this stack is deployed it will deploy new resolvers, attach them to the API and then delete the old ones.

Schema:

type Query {
    ping1: String
    ping2: String
    ping3: String
}

API (In CDK):

from aws_cdk import core, aws_appsync
from datetime import datetime

class GraphqlTestStack(core.Stack):
    def __init__(self, scope: core.Construct, **kwargs) -> None:
        super().__init__(scope, "GraphqlTestStack", **kwargs)

        api = aws_appsync.GraphqlApi(
            self,
            "test_api-",
            name="test_api",
            schema=aws_appsync.Schema.from_asset("resources/schema.graphql"),
        )

        api.add_none_data_source(f"ping1_{int(datetime.now().timestamp())}").create_resolver(
            type_name="Query",
            field_name="ping1",
            request_mapping_template=aws_appsync.MappingTemplate.from_string('{"version": "2018-05-29"}'),
            response_mapping_template=aws_appsync.MappingTemplate.from_string('$util.toJson("pong1")'),
        )
        api.add_none_data_source("ping2_{int(datetime.now().timestamp())}").create_resolver(
            type_name="Query",
            field_name="ping2",
            request_mapping_template=aws_appsync.MappingTemplate.from_string('{"version": "2018-05-29"}'),
            response_mapping_template=aws_appsync.MappingTemplate.from_string('$util.toJson("pong2")'),
        )
        api.add_none_data_source("ping1_{int(datetime.now().timestamp())}").create_resolver(
            type_name="Query",
            field_name="ping3",
            request_mapping_template=aws_appsync.MappingTemplate.from_string('{"version": "2018-05-29"}'),
            response_mapping_template=aws_appsync.MappingTemplate.from_string('$util.toJson("pong3")'),
        )

Where this breaks is if you then deploy a broken schema (as below) you will now get a deploy error (resolver 2 will fail to attach) and this will trigger a rollback. After rollback, none of the resolvers will be attached. This is definitely unintended.

type Query {
    ping1: String
    ping2: String this_is_a_typo
    ping3: String
}

@ansman
Copy link

ansman commented Mar 3, 2021

I would love to get some attention on this. All of our resolvers were just detached during a deployment and deploying again did not reattach them. I had to comment out the resolver part, deploy, uncomment out the resolvers and deploy again to make them be reattached again.

@shaneyuan18
Copy link

We identify a bug on schema validation process, and are working on a fix so that it is not possible to deploy an incorrect schema.

@ansman
Copy link

ansman commented Mar 3, 2021

I don’t think that will fix the root issue. I got this without deploying an invalid schema. I changed the name of some lambdas and data sources which resulted in all affected resolvers being detached

@alextriaca
Copy link
Author

I can also confirm I've seen the same behaviour as @ansman. The issue I had with reporting it was that I couldn't reliably reproduce it. Its exactly as @ansman says, when renaming a resolver/function/schema/field/something you can end up with a detatched resolver. I've really struggled to produce this reliably though. I do however think it's more closely related to how the resolvers fail to reattach rather than an invalid schema can cause this to occur.
@shaneyuan18, super excited to hear about that schema validation change though, will definitely increase the speed of failures being detected.

@shaneyuan18
Copy link

If I understand correctly, I think there are 2 different root causes to resolvers being detached.

Issue 1: as @alextriaca mentioned earlier in the thread with the invalid schema update. This should be fixed once we have the schema validation work ready.

Issue 2: @ansman mentioned that resolvers detached due to data source update? I think this is a different issue and root cause is different from the one above.

If the above statement is correct, we can open a new Github issue on Issue 2, and discuss error reproducing steps and related items there. Thanks.

@alextriaca
Copy link
Author

@shaneyuan18 sounds like a plan. There is definitely something more subtle here on issue 2 and a separate ticket to discuss it makes sense 👍

@akuma12
Copy link

akuma12 commented Mar 23, 2021

I think we've experienced a similar issue to Issue 2 in @shaneyuan18 's comment above. Due to an update to CDK's AppSync libraries, our resolver logical resource IDs were changed. On deployment, this caused the creation of new resolvers, and the cleanup deletion of the old ones. I believe the new resolvers were created fine and attached to the fields, but when CloudFormation deleted the old resolvers, it also deleted the field->resolver mapping, leaving the field without an attached resolver. We haven't reproduced it yet, but if we do I'll post the results here, or in a new ticket if it's warranted.

@twrichards
Copy link

twrichards commented Apr 15, 2021

Somewhat inspired by @Dave-McCraw 's comment on a related issue I have also added a workaround in our CDK code which computes an md5 checksum of the Schema and includes this as a comment in the resolver templates, as our CD will run cloudformation step if the clouformation has changed (as such a timestamp would run this every time and slow our deploys when in reality we don't change the schema that often).

I can't share the PR at the moment but here's the implementation if its useful...

  • Read the schema from file into an appsync.Schema (which we were doing anyway to use in the new appsync.GraphqlApi() call)
    const gqlSchema = appsync.Schema.fromAsset(
      join(__dirname, "../shared/graphql/schema.graphql")
    );
  • Produce an md5 checksum of the Schema content (given import crypto from "crypto";)...
    const gqlSchemaChecksum = crypto
      .createHash("md5")
      .update(gqlSchema.definition, "utf8")
      .digest("hex");
  • Re-usable function to wrap mapping templates to add the comment containing the checksum...
    // workaround for resolvers sometimes getting disconnected
    // see https://github.com/aws/aws-appsync-community/issues/146
    const resolverBugWorkaround = (mappingTemplate: appsync.MappingTemplate) =>
      appsync.MappingTemplate.fromString(
        `## schema checksum : ${gqlSchemaChecksum}\n${mappingTemplate.renderTemplate()}`
      );
  • Use the re-usable function on either the request or response mapping for each of your resolvers, for example...
    blahDataSource.createResolver({
      typeName: "Query",
      fieldName: "listBlah",
      responseMappingTemplate: resolverBugWorkaround(
        appsync.MappingTemplate.lambdaResult()
      ),
    });

... I only implemented last night, but seems to be working so far (it certainly reconnected all the resolvers when I deployed it to test environment).

@danielblignaut
Copy link

I know this is not a productive comment but the broader issue of resolvers becoming dettached (irrespective of the cause) really makes Appsync unbearable on large schema deployments. Please can some time be invested at a cloudformation or appsync team level to resolve this. One would expect that on a valid cloudformation template deployment, where you have a list of resolvers matching graphql type and fields, that those resolvers are attached and on aninvalid / failed deployment, the state of the appsync app should return to where it was before.

I know there is complexity to solving this but anything short of that is an unreliable product for a part of a tech-stack that needs to be the most robust and failure-proof - your API. The fact that one can have successful api layer deployments, with resolvers specified in the cloudformation template only to perform an API query and find out that it's not attached to the appsync API is a big problem.

This is a topic that's been brought up on:

  • AWS forums
  • AWS CDK github issues
  • serverless-appsync framework github issues
  • AWS amplify github issues
  • and here

@ahmadizm or any other team members, can we bring more attention to this? I'm happy to invest my time working to help resolve this (whether it's our own use case examples or suggestions on how to resolve it)

@bigkraig
Copy link

Unfortunately there are a lot of sharp edges with Cloudformation & Appsync and they don't appear to be likely resolved anytime soon.

We have renamed our Query/Mutation types to Query#/Mutation# and will increment # whenever we are likely to run into one of these problems. It causes a minor outage on the fields but its a better workaround than having an unknown amount of detached resolvers.

Some other related issues:

aws-amplify/amplify-cli#682

aws/aws-cdk#13269

@jpignata jpignata added the bug Something isn't working label Jun 6, 2021
@jpignata
Copy link
Contributor

Heya. AppSync team member here. I just wanted to check in on this issue. While the subject of this issue is still very much a sharp edge (i.e., the resolver just goes away, and your stack has no idea), we have worked to reduce some of the friction around managing resolvers. I wanted to check in with folks to understand what pain you're experiencing around this topic to focus our efforts. I'd appreciate if you could post and re-articulate the friction or wishlist items here or shoot me an email at pignataj at amazon.com. Thank you!

@elishevarom
Copy link

I just had the same issue. I accidentally deployed an invalid schema and my resolver got detached. Redeploying with a valid schema did not reattach the resolver. It happened on the test environment so it doesn't really matter right now, but would have been a problem in the prod environment. (It also took me a while to pinpoint the issue because there was no error, just a null response.) I commented out my resolver, deployed, and commented it back. I agree that we shouldn't have to do above steps and the resolver should be attached automatically.

@bigkraig
Copy link

@jpignata We have moved away from AppSync due to this and other performance related issues.

@jpignata
Copy link
Contributor

Gotcha, @bigkraig. Thanks for letting me know. If you have any interest in chatting further, I'd love to hear more.

@elishevarom Makes sense, and I agree. Looks like some folks have workarounds where they version resolvers and the schema, and increment each time. We can do better than this.

@mille-printemps
Copy link

I came across this issue. Is there any update?

@twrichards , can I ask you if your workaround, i.e. inserting a hash value into response mapping templates as a comment, is still working? I thought that this would be the most feasible and simplest way to work around this issue at this point if it was still working.

@twrichards
Copy link

twrichards commented Feb 21, 2022

I came across this issue. Is there any update?

@twrichards , can I ask you if your workaround, i.e. inserting a hash value into response mapping templates as a comment, is still working? I thought that this would be the most feasible and simplest way to work around this issue at this point if it was still working.

@mille-printemps Yes definitely still working. I'm not aware of whether the bug is fixed though.

@mille-printemps
Copy link

@twrichards , Thank you for the prompt reply. I will use this way to work around this issue.

I hope that this issue is fixed soon...

@jeremycare
Copy link

@jpignata Any update on this? We are facing the issue right now..

@jpignata
Copy link
Contributor

@jeremycare can you possibly shoot me an email? I’d like to look at your issue specifically with our team to see what’s up. pignataj at amazon dot com.

@trevorbennett
Copy link

My team just encountered this issue, or some variant of it. Had three people pulling hair out to solve it, ultimate solution was to nuke everything underneath the graphql API itself (resolvers, data sources, and function configurations).

We commented out all those resources, deployed via sam, uncommented them, redeployed, and it worked like a charm.

@jeremycare
Copy link

We also ran into that yesterday and had to do the exact same thing. @jpignata it might be a regression on your side

@alharris-at
Copy link

alharris-at commented Jul 27, 2023

One workaround I've seen work here is to explicitly model dependencies between all resolvers in CDK/CFN and the Appsync Schema object (not API), as well as all datasources and the Schema object.

At least that mitigation worked when I was encountering issues related to updating my schema and some resolvers leading to the untouched resolvers detached, but based on the comments here about needing to explicitly rename all resolvers, I suspect something similar may be happening in your use-cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests