Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance by replacing reduce with spread with forEach #1436

Merged
merged 1 commit into from Aug 16, 2022

Conversation

AlexBrohshtut
Copy link
Contributor

@AlexBrohshtut AlexBrohshtut commented Aug 15, 2022

We are using KafkaJS with a cluster that has a lot of topics - thousands of them. Now we want to migrate to KafkaJS 2, but, unfortunately, there is a very heavy computation happening in the getActiveTopicPartitions function (see screenshot).

The issue is caused by using spread in reduce, which has n^2 complexity. For the case of 12k topics, with the current implementation, it takes 40 seconds to compute the result of this function. However, the proposed implementation only takes 20 milliseconds.

image

See this article for more trough explanation.

It is a show stopper for us for an upgrade.

Also, other places in the code have similar behavior. They are less critical for the tested flow but can be changed to forEach. WDYT?

Thank you.

@Nevon
Copy link
Collaborator

Nevon commented Aug 16, 2022

Thanks for the contribution!

Indeed, this pattern is very inefficient. I think there was probably a baked-in assumption that n would never be large enough for it to make a difference, but certainly if you've got 12k topics it will. We've generally taken the approach to write code in whatever way we find most expressive, and then optimizing when something is actually shown to be a bottleneck, as you've done here.

Just to make sure I'm not missing something, the issue isn't really with the usage of reduce, but rather the excessive iterating over the properties of the previousValue of the reducer callback and then copying that into a new object on each iteration? In other words, the following would be equivalent to your foreach-and-append solution from a performance perspective:

this.subscriptionState
  .active()
  .reduce((acc, { topic, partitions }) => {
    acc[topic] = new Set(partitions)
    return acc
  }, {})

EDIT: At least in Chrome, forEach and reduce without spreading seem to be identical from a performance standpoint. A for-loop seems to be a fair bit faster.

Screenshot 2022-08-16 at 08 55 08

@Nevon Nevon merged commit 1ab72f2 into tulios:master Aug 16, 2022
@Nevon
Copy link
Collaborator

Nevon commented Aug 16, 2022

Also, other places in the code have similar behavior. They are less critical for the tested flow but can be changed to forEach. WDYT?

I'm open to it when the input size is not bounded. There's no point changing it when the input is fixed (for example iterating over the protocol API keys, where we know N is going to be around 30), but it can definitely be a good idea when we've assumed that N will be reasonably small and not considered that there might be cases where it won't be, like in this case.

One of my co-maintainers put it to me as something along the lines of: "It doesn't matter if you make a single cog in a machine spin really fast, it won't make the machine faster", but speeding up the slowest cog definitely speeds up the machine - so we just gotta find the slowest cogs.

@AlexBrohshtut
Copy link
Contributor Author

Thank you for your quick response.
Yes, it is object recreation that makes this reduce slow - not the reduce itself.
I will try to use the improved version and see if there are more bottlenecks.
When do you plan to release it?

@Nevon
Copy link
Collaborator

Nevon commented Aug 16, 2022

When do you plan to release it?

As soon as #1437 is green

@AlexBrohshtut
Copy link
Contributor Author

Great, I see some other places which do have some performance issues for the simular reason (number of topics), so I will have a look at it as well.
Thank you🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants