Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SQS performance] High CPU usage #4471

Open
dungvv-qsoft opened this issue Jul 27, 2023 · 0 comments
Open

[SQS performance] High CPU usage #4471

dungvv-qsoft opened this issue Jul 27, 2023 · 0 comments
Assignees
Labels
bug This issue is a bug. investigating Issue has been looked at and needs deep dive work by OSDS. p2 This is a standard priority issue

Comments

@dungvv-qsoft
Copy link

Describe the bug

We are facing with an issue when turn on SQS on our server which is they use 100% of the available CPU compared around 60% when turn off queue (with the same traffic).

We run a profiler:

On the local server to be able to generate flame graph

With Queue enabled, It shows the initial spike is too high on resources occupied
image

This is non queue test, the initial spike is not too high compared with queue turning on:
image

Evidence: .cpuprofile and .text report files for 2 cases (turn on/off queue when do stress test)
https://github.com/dungvv-qsoft/cpu-profile

On our production server

Queue on: (CPU/Ram usage graph - From 3:20 - 3:30)
Maximum average CPU usage is almost 100%
image

Queue off: (CPU/Ram usage graph - From 4:15 to 4:30)
Maximum average CPU usage is around 60%
image

Environment
Node version: v16.20.1
AWS-SDK version: v2.1106.0
SQS-Consumer version: v5.7.0

Snippet to initialize SQS consumer

export const sqsProvider = {
  provide: SQS_PROVIDER,
  useFactory: (): SQS => {
    const sqs = new SQS(getOptionsForAWSService(EAWSService.SQS));
    return sqs;
  },
};

....

export const getOptionsForAWSService = (service: EAWSService) => {
  return {
    credentials: new ChainableTemporaryCredentials({
      params: {
        RoleArn: config.get<string>(`aws.${service}.roleArn`),
        RoleSessionName: `zaapi-chat-${service}-${uuidV4()}`,
      },
      masterCredentials: new Credentials({
        accessKeyId: config.get('aws.accessKeyId'),
        secretAccessKey: config.get('aws.secretAccessKey'),
      }),
    }),
    region: config.get<string>('aws.region'),
    httpOptions: {
      agent: new https.Agent({
        keepAlive: true,
        maxSockets: config.get<number>('aws.maxSockets'),
      }),
    },
  };
};
constructor(
    @Inject(SQS_PROVIDER)
    private readonly sqs: SQS,
    private readonly sagaConsumer: SagaConsumer,
  ) {
    const queueConfigMap: Record<EChatChannel | string, string> = {
      [EChatChannel.LINE]: 'aws.sqs.queues.lineWebhookQueueFifo.url',
      [EChatChannel.FACEBOOK]: 'aws.sqs.queues.fbWebhookQueueFifo.url',
      [EChatChannel.INSTAGRAM]: 'aws.sqs.queues.instaWebhookQueueFifo.url',
      [EChatChannel.SHOPEE]: 'aws.sqs.queues.shopeeWebhookQueueFifo.url',
      [EChatChannel.LAZADA]: 'aws.sqs.queues.lazadaWebhookQueueFifo.url',
      [INACTIVE_STORE_SQS_INSTANCE_NAME]:
        'aws.sqs.queues.inactiveStoreWebhookQueue.url',
    };
    Object.entries(queueConfigMap).forEach(([channel, configKey]) => {
      this.queueUrls.set(channel, config.get<string>(configKey));
    });

    const baseSqsConfig = {
      handleMessageBatch: async (messages: SQS.Message[]) => {
        const jobGroups = _.groupBy(
          messages,
          (msg) => msg.MessageAttributes?.['Name'].StringValue || '',
        );
        await Promise.all(
          Object.entries(jobGroups).map(async ([jobName, events]) => {
            await this.sagaConsumer.consume(
              {
                name: WEBHOOK_EVENTS_RECEIVER_SAGA_NAME,
                jobName,
              },
              {
                events: events.map((event) => tryParseStringToJson(event.Body)),
              },
            );
          }),
        );
      },
      messageAttributeNames: ['Name'],
      batchSize: 10,
    };
    if (WebhookReceiverSqsDuplex.webhookReceiverQueueEnabled) {
      this.queueUrls.forEach((queueUrl, name) => {
        const sqsConsumer = this.createSqsConsumer(queueUrl, baseSqsConfig);
        this.sqsConsumers.set(name, sqsConsumer);
      });
    }
  }

  private createSqsConsumer(queueUrl: string, baseConfig: any): SQSConsumer {
    const sqsConsumer = SQSConsumer.create({
      queueUrl,
      sqs: this.sqs,
      ...baseConfig,
      pollingWaitTimeMs: config.get<number>('aws.sqs.pollingWaitTimeMs'),
    });

    sqsConsumer.on('error', (error) => {
      errorLog(error, { queueUrl }, `Error on Webhook Receiver SQS`);
    });

    return sqsConsumer;
  }

  onApplicationBootstrap() {
    this.sqsConsumers.forEach((sqsConsumer) => sqsConsumer.staqrt());
  }

  beforeApplicationShutdown() {
    this.sqsConsumers.forEach((sqsConsumer) => sqsConsumer.stop());
  }

Expected Behavior

CPU usage should be not high compared when turn off queue (with the same traffic)

Current Behavior

We are facing with an issue when turn on SQS on our server which is they use 100% of the available CPU compared around 60% when turn off queue (with the same traffic).

Reproduction Steps

Try to run a stress test on the api that using sqs-consumer lib -> The CPU usage increase significantly (It's not that high if we turn off the queue)

Possible Solution

No response

Additional Information/Context

No response

SDK version used

v2.1106.0

Environment details (OS name and version, etc.)

Fargate

@dungvv-qsoft dungvv-qsoft added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jul 27, 2023
@ajredniwja ajredniwja added the p2 This is a standard priority issue label Sep 25, 2023
@ajredniwja ajredniwja added investigating Issue has been looked at and needs deep dive work by OSDS. and removed needs-triage This issue or PR still needs to be triaged. labels Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. investigating Issue has been looked at and needs deep dive work by OSDS. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

2 participants