Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle partition reassignment #2

Merged
merged 6 commits into from
Mar 8, 2022

Conversation

Nevon
Copy link

@Nevon Nevon commented Mar 7, 2022

Attempt at fixing tulios#1258 (comment)

src/consumer/worker.js Outdated Show resolved Hide resolved
src/consumer/fetchManager.js Outdated Show resolved Hide resolved
src/consumer/worker.spec.js Outdated Show resolved Hide resolved
})
continue
}
partitionAssignments.set(key, workerId)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A batch can be in the queue, while it's not being processed.
In that case, a fetcher can still add a duplicate batch to the queue.

I would suggest keeping the partition assignments in fetch manager, but do something similar in fetcher instead - i.e. assign partitions from fetch response to nodeId(?), and for any fetch request, filter out partitions that are not in progress/queue already.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'll give this a crack and see if I can make it work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it up to the fetch manager/fetcher now, to make sure that only one of them can push batches for a topic-partition at any given time. I removed the filtering from the worker, as I don't believe it should be necessary anymore.

Nevon and others added 4 commits March 8, 2022 07:46
Co-authored-by: Priit Käärd <priitkaard123@gmail.com>
If a partition is reassigned to a different broker, it's
possible that a fetcher currently has a batch for that
topic-partition in the worker queue. This would cause double
processing. To avoid this, the fetch manager now keeps track
of which fetcher is currently handling each topic-partition,
and the fetchers will filter out batches for any topic-partition
that is currently being processed by a different fetcher.
Removes batch filtering from worker, which is now moved up
to the fetch manager.
@priitkaard priitkaard merged commit 6ea295c into priitkaard:master Mar 8, 2022
@Nevon Nevon deleted the handle-partition-reassignment branch March 8, 2022 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants