Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple wasm query planner objects? #210

Closed
dkapadia opened this issue Oct 6, 2020 · 4 comments
Closed

Support for multiple wasm query planner objects? #210

dkapadia opened this issue Oct 6, 2020 · 4 comments

Comments

@dkapadia
Copy link

dkapadia commented Oct 6, 2020

In our application, we instantiate multiple different Apollo Gateway objects, each with slightly different schemas. We do this because we're currently migrating code from one service to another and we want to do side-by-side testing of the old and new implementations. Having two gateway objects allows us to compare the query plans that each produces and to make a side-by-side test whenever we notice a difference between those plans.

After the wasm query planner was introduced @apollo/gateway in v0.20.1, we found that we could no longer do this - it seemed that we always got query plans for whichever gateway was initialized last (see reproduction code below).

I am not a rust expert, but I suspect the issue is here in this file where there currently is just a global SCHEMA and DATA objects that are always reused whenever a gateway object updates its schema.

Given that these variables are setup right now as vectors rather than just global pointers, I'm hoping that y'all are planning to support multiple query planner objects in the future. Is that the case, or is that something you'd be open to accepting a PR for? We'd like to update to the latest query planner since it seems to fix some bugs we're experiencing, but this issue will block us from doing so.

To reproduce the issue here:
const { ApolloGateway } = require("@apollo/gateway");
const { parse } = require("graphql");

const gatewayA = new ApolloGateway({
  localServiceList: [
    { name: "accounts", url: "http://localhost:4001/graphql", typeDefs: parse("type Query { foo: String, bar: String}")},
    { name: "reviews", url: "http://localhost:4002/graphql", typeDefs: parse("type Query { baz: String }")},
  ],
});

const gatewayB = new ApolloGateway({
  localServiceList: [
    { name: "accounts", url: "http://localhost:4001/graphql", typeDefs: parse("type Query { foo: String }")},
    { name: "reviews", url: "http://localhost:4002/graphql", typeDefs: parse("type Query { baz: String, bar: String }")},
  ],
});

// The two schemas are different
console.log(JSON.stringify(gatewayA.schema) == JSON.stringify(gatewayB.schema)) // expected: 'false', actual: 'false'

// However the two gateway objects share a pointer to the same query
// planner wasm object, which means they will always return the same
// query plans (those of gatewayB)
console.log(gatewayA.queryPlannerPointer == gatewayB.queryPlannerPointer) // expected: 'false', actual: 'true'
@KoltesDigital
Copy link

KoltesDigital commented Oct 8, 2020

After upgrading dependencies I can't run my dev env anymore, in which I start multiple services and gateways in the same process. I think the problem may be the same: my system design is not compatible with a global shared schema. I'll try to run them in different processes to prevent this state sharing.

FYI here's the error:

{
  "error": {
    "errors": [
      {
        "message": "unreachable",
        "extensions": {
          "code": "INTERNAL_SERVER_ERROR",
          "exception": {
            "stacktrace": [
              "RuntimeError: unreachable",
              "    at <anonymous>:wasm-function[535]:0xa3fec",
              "    at <anonymous>:wasm-function[703]:0xa94fe",
              "    at <anonymous>:wasm-function[723]:0xa9961",
              "    at <anonymous>:wasm-function[657]:0xa8692",
              "    at <anonymous>:wasm-function[264]:0x8c14b",
              "    at <anonymous>:wasm-function[28]:0x464db",
              "    at <anonymous>:wasm-function[335]:0x94b7d",
              "    at <anonymous>:wasm-function[33]:0x4ab21",
              "    at <anonymous>:wasm-function[575]:0xa5e02",
              "    at getQueryPlan (<anonymous>:wasm-function[391]:0x9a112)",
              "    at Object.module.exports.getQueryPlan (...\\node_modules\\@apollo\\query-planner-wasm\\dist\\index.js:107:24)",
              "    at Object.buildQueryPlan (...\\node_modules\\@apollo\\gateway\\dist\\buildQueryPlan.js:7:33)",
              "    at Object.ApolloGateway.executor (...\\node_modules\\@apollo\\gateway\\dist\\index.js:90:46)",
              "    at processTicksAndRejections (internal/process/task_queues.js:93:5)"
            ]
          }
        }
      }
    ]
  }
}

There are several unreachable!() invocations in the source code, I can't tell which one panics. Although the root cause is the shared state, consider adding a message to each of the invocations in order to have meaningful error messages.

dkapadia added a commit to dkapadia/federation-demo that referenced this issue Oct 12, 2020
I've modified the federation demo code here to reproduce an issue we're
seeing in our monolith. Specifically we have the `accounts` service own
an interface (`Task`) and an implementing type (`TestSectionTask`) and
we have the `reviews` service extend both the interface and the
implemementing type.

The issue we're seeing shows up when we make a query like [1] below:

[1]```
{
  me {
    tasksFromReviews {
      reviewsField
      accountsField
    }
  }
}
```

This returns:
```
{
  "data": {
    "me": {
      "tasksFromReviews": [
        null
      ]
    }
  }
}
```

But we expect it to return:
```
{
  "data": {
    "me": {
      "tasksFromReviews": [
        {
          "reviewsField": "reviews",
          "accountsField": "accounts"
        }
      ]
    }
  }
}
```

which _is_ the data we get when we make a query like [2], which fetches
first from `accounts` and then gets data from `reviews`

[2]

```
{
  me {
    tasks {
      reviewsField
      accountsField
    }
  }
}
```

When we run query [1], I expect:
 1. The gateway to fetch a list of Tasks from `reviews`, including the
    reviewsField
 2. The gateway to send a list of representations to the `_entities`
    query to accounts, in order to fetch the `accountsField` for those
    entities
 3. The gateway to merge the data from the accounts with reviews and
    then return to the end user.

But, what we actually see is that step 2 fails - the gateway sends an
empty list of representations to the `accounts` service.

As best as I can tell, the issue is in the query planner (or in the
interaction between query planner and executor). For query [1],
step 1, it outputs a plan like:
```
  Flatten(path: "me") {
      Fetch(service: "reviews") {
        {
          ... on User {
            __typename
            id
          }
        } =>
        {
          ... on User {
            tasksFromReviews {
              __typename
              ... on TestSectionTask {
                reviewsField
                id
              }
            }
          }
        }
      },
    },
  Flatten(path: "me.tasksFromReviews.@") {
      Fetch(service: "accounts") {
        {
          ... on Task {
            __typename
          }
          ... on TestSectionTask {
            id
          }
        } =>
        {
          ... on Task {
            accountsField
          }
        }
      },
    },
```

The issue (i think) is confusingly the lack of `__typename` selected
from the TestSectionTask type - this [3] bit of execute query plan will
only merge in data if the typename matches.

More confusingly, the issue seems to go away when we upgrade to the
latest version of apollo/gateway, but since that version rewrote the
query planner in rust it's not immediately obvious to me what changed
between the JS and rust implementations. We also cannot upgrade to that
version until the issue described in [4] is fixed.

[3] https://github.com/apollographql/federation/blob/88cb9164341363d908c8a414f879ecb143017688/gateway-js/src/executeQueryPlan.ts#L423-L434
[4] apollographql/federation#210
jaredly added a commit to Khan/federation that referenced this issue Oct 12, 2020
Fixes apollographql#210

I considered using a HashMap, but it's a little more complicated, and I'm assuming we won't have people creating many hundreds of query plans 🙃 if that's a use case we want to support, we could move to HashMaps.
@Enrico2
Copy link
Contributor

Enrico2 commented Oct 13, 2020

@KoltesDigital If you're running multiple gateways in the same process, then yes, it's the same root cause. The issue brought up here is an interesting use case, and we're thinking about the right way to solve this. Stay tuned.

Enrico2 pushed a commit that referenced this issue Oct 28, 2020
This PR makes it so that getting a query planner creates a new planner every time and never overrides it.
We've added a new function named `updatePlannerSchema` that overrides a specific planner.

This hopefully fixes #210
This implementation is suggested instead of #226 to keep things simple.
@Enrico2
Copy link
Contributor

Enrico2 commented Nov 2, 2020

@dkapadia Can you see if #261 solves your issue? There are instructions in the PR for how to test it.

@abernix
Copy link
Member

abernix commented Apr 16, 2021

This should be fixed via #261 (comment), copy-pasted here for completeness:

This should no longer be necessary with the landing of #622 (specifically, the changing of the query planner back to TypeScript). Put another way, the code this changes is no longer present and the query planners are now scoped within the Gateways that generate and execute them.

@abernix abernix closed this as completed Apr 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants