Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using IntrospectAndCompose with High Availability Micro Services #1784

Open
Borduhh opened this issue Apr 26, 2022 · 1 comment
Open

Using IntrospectAndCompose with High Availability Micro Services #1784

Borduhh opened this issue Apr 26, 2022 · 1 comment

Comments

@Borduhh
Copy link

Borduhh commented Apr 26, 2022

Expansion of issue #349 (comment)

We are trying to use Apollo Federation with AWS services (i.e. AppSync) and have the following constraints that might apply to a lot of other companies.

IAM Support for Apollo Studio

We cannot use Apollo Studio because all of our services are created and authenticated using AWS IAM. It would be nice if we could give Apollo Studio an ID key and Secret from an IAM Role that would be used to authenticate all of our requests. Right now we do that manually like so:

export default class AuthenticatedDataSource extends RemoteGraphQLDataSource {
  /**
   * Adds the necessary IAM Authorization headers for AppSync requests
   * @param request The request to Authorize
   * @returns The headers to pass through to the request
   */
  private async getAWSCustomHeaders(request: GraphQLRequest): Promise<{
    [key: string]: OutgoingHttpHeader | undefined;
  }> {
    const { http, ...requestWithoutHttp } = request;

    if (!http) return {};

    const url = new URL(http.url);

    // If the graph service is not AppSync, we should not sign these request.
    if (!url.host.match(/appsync-api/)) return {};

    const httpRequest = new HttpRequest({
      hostname: url.hostname,
      path: url.pathname,
      method: 'POST',
      headers: {
        Host: url.host,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify(requestWithoutHttp),
    });

    const signer = new SignatureV4({
      region: 'us-east-1',
      credentials: defaultProvider(),
      service: 'appsync',
      sha256: Sha256,
    });

    const signedRequest = await signer.sign(httpRequest);

    return signedRequest.headers || {};
  }

  /**
   * Customize the request to AppSync
   * @param options The options to send with the request
   */
  public async willSendRequest({ request }: GraphQLDataSourceProcessOptions) {
    const customHeaders = await this.getAWSCustomHeaders(request);

    if (customHeaders)
      Object.keys(customHeaders).forEach((h) => {
        request.http?.headers.set(h, customHeaders[h] as string);
      });
  }
}

IntrospectAndCompose is all or nothing

Right now, the IntrospectAndCompose.initialize() method fails completely if even one service has a network timeout, which makes it almost impossible to use in production scenarios. For each service we add to our gateway, we increase the likelihood of a network error that cancels the entire process inevitably causing downtime or CI/CD failures.

To solve this, it would be rather easy to have loadServicesFromRemoteEndpoint() process schema fetching on a per-service basis. This could be hyper-efficient by wrapping dataSource.process() with a retry counter and retrying 5xx errors. That way the user can choose how many times they want to retry before IntrospectAndCompose fails altogether and rolls back.

Right now we are manually adding retries around the entirety of IntrospectAndCompose but as we add more services, this becomes really inefficient (I.E., if we have 150 services and service 148 fails, we still need to re-fetch services 1 through 147 on the next attempt).

Central Caching Schema Files

This isn't something that necessarily needs to be done by Apollo, but is something that is required for microservices. Our team currently uses S3 to cache a schema file since in our case we can be relatively confident that it will not change without the services being redeployed. The first (and sometimes possibly second) ECS container that comes online builds it's own schema using IntrospectAndCompose and then stores the cached file with a unique per-deployment ID that other service can use when they scale to fetch the cached schema.

@prasek
Copy link
Contributor

prasek commented Sep 26, 2022

Hi @Borduhh 👋 re: IntrospectAndCompose and central caching schema files - we generally recommend shifting left on composition to get it out of each gateway's runtime in production and moved into your build pipeline to generate a single static supergraph schema that can be deployed to each gateway. This helps with a variety of things as below.

In general, once a given subgraph (fleet) is available to serve an updated schema, it's published to the schema registry using rover subgraph publish which can accept the output of rover subgraph introspect.

This can be done with a form of rover subgraph introspect | rover subgraph publish from:

  • CI/CD pipeline deployment script
  • post-deploy analysis job - e.g. if using something like Argo Rollouts
  • manually after your subgraph deployment is complete if not using automated deployments

See the Federation docs for details

⚠️ We strongly recommend against using IntrospectAndCompose in production. For details, see Limitations of IntrospectAndCompose.

The IntrospectAndCompose option can sometimes be helpful for local development, but it's strongly discouraged for any other environment. Here are some reasons why:

  • Composition might fail. With IntrospectAndCompose, your gateway performs composition dynamically on startup, which requires network communication with each subgraph. If composition fails, your gateway throws errors and experiences unplanned downtime.
    With the static or dynamic supergraphSdl configuration, you instead provide a supergraph schema that has already been composed successfully. This prevents composition errors and enables faster startup.
  • Gateway instances might differ. If you deploy multiple instances of your gateway while deploying updates to your subgraphs, your gateway instances might fetch different schemas from the same subgraph. This can result in sporadic composition failures or inconsistent supergraph schemas between instances.
    When you deploy multiple instances with supergraphSdl, you provide the exact same static artifact to each instance, enabling more predictable behavior.

What you have with the Gateway willSendRequest looks right, and you could do something similar in your pipeline deployment script, generating the AWS AuthV4 headers and passing them to rover subgraph introspect --header.

See:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants