[wip] Cumulative micro opts (1-2% perf. gain) #1092

ryoqun · 2024-03-08T07:28:23Z

Well, I REALLY want my other prs to be merged (or decided to be rejected, so that i can decide to fork crossbeam for my work....). All of them got stale for long time for some reason.

So, I'm wondering whether it could help by actively contributing to crossbeam or not, in order to indicate my willingness to spend my time in exchange of my prs got being reviewed. :)

here it is. Assorted of micro ops, i came up with. note that I haven't evaluated each in depth. This pr is to start triage them from more knowledgeable people. so that I can create more serious prs for each.

General theme of them is to reduce instructions.

(note to self):

inject to-be-dropped buffer from receiver to sender
new with at-the-end of index so that initial alloc can happen even with non-option buffer

taiki-e · 2024-05-02T19:02:17Z

So, I'm wondering whether it could help by actively contributing to crossbeam or not, in order to indicate my willingness to spend my time in exchange of my prs got being reviewed. :)

I appreciate the contribution, but a patch that mixes various different-purpose changes without any particular explanation of each change is also the kind of patch that takes time to review...

As for the changes:

Subword (8-bit and 16-bit) atomic can be very inefficient on platforms like riscv, and benchmarks that actually use it in the unbounded queue have shown that it doesn't perform that well even on x86_64 (Add benchmarks for unbounded queues smol-rs/concurrent-queue#13). I think atomic of the same size as c_int would probably be ok, but it needs a separate benchmark anyway.
Separation of slots to states and msgs may increase the chance of encountering false sharing (Improvement on cache invalidation #462). I think it is possible that it could be handled well in the way described in https://arxiv.org/pdf/1908.04511, but it needs a separate benchmark anyway.
At this point I realize that there is no information on how the 1-2% gain was actually measured. For example, if the result of all this change is as little as a 1% improvement in a single microbenchmark, the worth of the improvement relative to the complexity of the change is questionable.
assume_init_drop -> ptr::read + drop is expected to degrade performance when messages are relatively large due to the extra moves.
Most of the other changes seem to be code shuffling, but it was not clear from what little I saw what exactly you were trying to improve.

If this PR's purpose is to help merge your other PRs, I honestly think it would be better to add tests to complete them. It is not easy to review PRs that add new APIs but have no tests (and there is no possibility of such PRs being merged as is).

ryoqun · 2024-05-04T13:15:48Z

(thanks for various feedback; I'll appreciate very much; I'll reply to these later. focusing on #1040 for now)

[wip] Cumulative micro opts (1-2% perf. gain)

bf1ff3e

ryoqun force-pushed the micro-ops branch from dc34944 to bf1ff3e Compare March 8, 2024 07:32

Fix Token...

8727c37

ryoqun force-pushed the micro-ops branch from d9f64b0 to 8727c37 Compare March 8, 2024 13:05

taiki-e added the crossbeam-channel label May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wip] Cumulative micro opts (1-2% perf. gain) #1092

[wip] Cumulative micro opts (1-2% perf. gain) #1092

ryoqun commented Mar 8, 2024 •

edited

taiki-e commented May 2, 2024 •

edited

ryoqun commented May 4, 2024

[wip] Cumulative micro opts (1-2% perf. gain) #1092

Are you sure you want to change the base?

[wip] Cumulative micro opts (1-2% perf. gain) #1092

Conversation

ryoqun commented Mar 8, 2024 • edited

taiki-e commented May 2, 2024 • edited

ryoqun commented May 4, 2024

ryoqun commented Mar 8, 2024 •

edited

taiki-e commented May 2, 2024 •

edited