Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka rebalancing hangs when using Message pattern in microservice #12355

Open
3 of 15 tasks
smuschevich opened this issue Sep 6, 2023 · 2 comments
Open
3 of 15 tasks

Comments

@smuschevich
Copy link

smuschevich commented Sep 6, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Current behavior

Consumer is not assigned to a topic partition after rollout in k8s.

Precondition: it’s necessary to have 2+ replicas of microservice with Kafka client.

Minimum reproduction code

if (assignment[assignee][topic].length === 0) {

Steps to reproduce

  1. Create a nest microservice with ClientKafka configured to communicate via MessagePattern
  2. Create a topic with 2 partitions in Kafka
  3. Subscribe to the topic via subscribeToResponseOf() method
  4. Run two instances of the microservice
  5. Nest will create a “reply” topic for receiving responses (increase the number of partitions to 2 if the reply topic has only one)
  6. Consumer of the first microservice is assigned to the partition 0, and consumer of the second one to the partition 1
  7. Run one more microservice, Nest will leave its consumer without assigned partition (because we have just 2)
  8. Shutdown the first microservice
  9. Nest will start rebalancing using KafkaReplyPartitionAssigner

Result: the consumer of the second microservice is assigned to two partitions, and the consumer of third one to zero partitions. Because of that the rebalancing is launched continuously after rebalanceTimeout period.

The reason of this behavior is in the logic which tries to retain previous assignments. The assignment of the 3rd consumer comes with previous value null and Nest successfully re-assigns it again to this consumer.

As far as I understand it’s necessary to improve the condition in this line if (assignment[assignee][topic].length === 0) { to check additionally for null assignments.

if (assignment[assignee][topic].length === 0) {

Expected behavior

All consumers are assigned to at least one partition after the rebalancing.

Package

  • I don't know. Or some 3rd-party package
  • @nestjs/common
  • @nestjs/core
  • @nestjs/microservices
  • @nestjs/platform-express
  • @nestjs/platform-fastify
  • @nestjs/platform-socket.io
  • @nestjs/platform-ws
  • @nestjs/testing
  • @nestjs/websockets
  • Other (see below)

Other package

No response

NestJS version

9 (but I guess it’s actual for 10th as well)

Packages versions

Node.js version

No response

In which operating systems have you tested?

  • macOS
  • Windows
  • Linux

Other

No response

@smuschevich smuschevich added the needs triage This issue has not been looked into label Sep 6, 2023
@kamilmysliwiec
Copy link
Member

Would you like to create a PR for this issue?

@micalevisk micalevisk removed the needs triage This issue has not been looked into label Sep 18, 2023
@jaime-amate
Copy link

Would the issue be fixed if the assignment[assignee][topic] list is filtered out in order to not include null values?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants