Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use native indexOf whenever possible #61

Merged
merged 4 commits into from Oct 8, 2018

Conversation

reconbot
Copy link
Contributor

@reconbot reconbot commented Oct 5, 2018

This uses a windowing approach to determine how much of a buffer can be searched by the native indexOf when searching. It falls back to JS when it hits a boundary and if there is no available native search window. I'm pretty sure a BufferList can't be searched drastically faster than this.

  • Actual performance needs to be measured.
  • moved tests into their own file
  • search is now always converted to a buffer, this seems like an ok tradeoff, I'm sure we could delay copying it and check if the search is larger than the BufferList but the usual case is the search is larger than the buffer list. 🤷‍♂️

This uses a windowing approach to determine how much of a buffer can be searched by the native indexOf when searching. It falls back to JS when it hits a boundary and if there is no available native search window. I'm pretty sure a BufferList can't be searched drastically faster than this.

Actual performance needs to be measured.
@reconbot
Copy link
Contributor Author

reconbot commented Oct 5, 2018

I can't believe it's not obviously faster 🤔 https://gist.github.com/reconbot/abde2da91a76b6fa94f028db9fd27eab

@mcollina
Copy link
Collaborator

mcollina commented Oct 6, 2018

I think the bottleneck is something else. As you can see the length of the buffer is not impacting the measurements at all. Have you tried generating a flamegraph?

You might also try varying the number of buffer in the list and see if that impacts the benchmarks at all.

@reconbot
Copy link
Contributor Author

reconbot commented Oct 6, 2018

I think you're right.

Running the new approach today

smallBuffers small search x 2,490,521 ops/sec ±3.04% (84 runs sampled)
smallBuffers medium search x 2,422,262 ops/sec ±2.88% (89 runs sampled)
smallBuffers large search x 2,600,668 ops/sec ±0.56% (94 runs sampled)
smallBuffersLong small search x 2,607,599 ops/sec ±0.59% (90 runs sampled)
smallBuffersLong medium search x 2,551,642 ops/sec ±2.46% (89 runs sampled)
smallBuffersLong large search x 2,588,665 ops/sec ±0.69% (91 runs sampled)
largeBuffers small search x 2,549,903 ops/sec ±2.64% (90 runs sampled)
largeBuffers medium search x 2,557,784 ops/sec ±1.33% (88 runs sampled)
largeBuffers large search x 2,581,822 ops/sec ±0.74% (88 runs sampled)
mixedBuffers small search x 2,470,664 ops/sec ±2.31% (91 runs sampled)
mixedBuffers medium search x 2,565,537 ops/sec ±0.62% (91 runs sampled)
mixedBuffers large search x 2,525,416 ops/sec ±2.39% (92 runs sampled)
veryLargeBuffers small search x 2,568,693 ops/sec ±0.78% (88 runs sampled)
veryLargeBuffers medium search x 2,557,850 ops/sec ±0.61% (89 runs sampled)
veryLargeBuffers large search x 2,487,267 ops/sec ±2.49% (90 runs sampled)

And now with new search init approach with less making of BL objects

smallBuffers small search x 28,823,176 ops/sec ±0.82% (94 runs sampled)
smallBuffers medium search x 28,685,310 ops/sec ±0.93% (95 runs sampled)
smallBuffers large search x 28,619,441 ops/sec ±0.85% (92 runs sampled)
smallBuffersLong small search x 28,845,733 ops/sec ±0.75% (94 runs sampled)
smallBuffersLong medium search x 28,998,154 ops/sec ±0.59% (92 runs sampled)
smallBuffersLong large search x 28,860,999 ops/sec ±0.79% (94 runs sampled)
largeBuffers small search x 28,934,405 ops/sec ±0.76% (90 runs sampled)
largeBuffers medium search x 28,965,175 ops/sec ±0.78% (93 runs sampled)
largeBuffers large search x 25,850,403 ops/sec ±1.59% (85 runs sampled)
mixedBuffers small search x 24,613,339 ops/sec ±0.92% (94 runs sampled)
mixedBuffers medium search x 28,834,310 ops/sec ±0.92% (91 runs sampled)
mixedBuffers large search x 28,900,673 ops/sec ±0.58% (92 runs sampled)
veryLargeBuffers small search x 28,583,304 ops/sec ±0.78% (91 runs sampled)
veryLargeBuffers medium search x 29,038,767 ops/sec ±0.72% (94 runs sampled)
veryLargeBuffers large search x 28,374,545 ops/sec ±1.06% (91 runs sampled)

So that's a lot better still not a great difference in time due to length.

Flame graph has nothing obvious yet...

@reconbot
Copy link
Contributor Author

reconbot commented Oct 6, 2018

I screwed up, new Array(n).map(()=>1) will return an array of undefined no matter what.

These results make a lot more sense, this change is objectively better.

// windowing approach

smallBuffers small search x 6,868 ops/sec ±1.12% (90 runs sampled)
smallBuffers medium search x 11,344 ops/sec ±1.09% (90 runs sampled)
smallBuffers large search x 18,718 ops/sec ±1.17% (89 runs sampled)
smallBuffersLong small search x 115 ops/sec ±1.32% (74 runs sampled)
smallBuffersLong medium search x 101 ops/sec ±0.80% (74 runs sampled)
smallBuffersLong large search x 108 ops/sec ±0.92% (79 runs sampled)
largeBuffers small search x 6,679 ops/sec ±1.11% (92 runs sampled)
largeBuffers medium search x 93.14 ops/sec ±0.80% (75 runs sampled)
largeBuffers large search x 46.27 ops/sec ±0.74% (61 runs sampled)
mixedBuffers small search x 6,891 ops/sec ±0.68% (89 runs sampled)
mixedBuffers medium search x 129 ops/sec ±1.10% (73 runs sampled)
mixedBuffers large search x 101 ops/sec ±0.87% (74 runs sampled)
veryLargeBuffers small search x 187 ops/sec ±2.08% (79 runs sampled)
veryLargeBuffers medium search x 59.67 ops/sec ±1.18% (63 runs sampled)
veryLargeBuffers large search x 35.32 ops/sec ±1.23% (62 runs sampled)

// old approach

smallBuffers small search x 5,248 ops/sec ±1.43% (89 runs sampled)
smallBuffers medium search x 5,210 ops/sec ±1.07% (89 runs sampled)
smallBuffers large search x 5,400 ops/sec ±1.08% (91 runs sampled)
smallBuffersLong small search x 159 ops/sec ±1.10% (81 runs sampled)
smallBuffersLong medium search x 158 ops/sec ±1.20% (80 runs sampled)
smallBuffersLong large search x 157 ops/sec ±1.03% (81 runs sampled)
largeBuffers small search x 42.98 ops/sec ±1.42% (57 runs sampled)
largeBuffers medium search x 43.01 ops/sec ±1.79% (57 runs sampled)
largeBuffers large search x 42.33 ops/sec ±1.03% (56 runs sampled)
mixedBuffers small search x 96.89 ops/sec ±1.20% (73 runs sampled)
mixedBuffers medium search x 98.41 ops/sec ±1.19% (72 runs sampled)
mixedBuffers large search x 97.30 ops/sec ±1.06% (72 runs sampled)
veryLargeBuffers small search x 0.04 ops/sec ±2.17% (5 runs sampled)

// the very large buffers took over 4 minutes and I got board and killed it

The gist has the new perftest code.

@mcollina
Copy link
Collaborator

mcollina commented Oct 6, 2018

I think there is a huge performance cliff somewhere. Consider these two:

largeBuffers small search x 6,679 ops/sec ±1.11% (92 runs sampled)
largeBuffers medium search x 93.14 ops/sec ±0.80% (75 runs sampled)

This is a 60x drop. I think it's overall better than the previous one anyway, but maybe this should be investigated.

Do you plan to work more on this or we can land?

@reconbot
Copy link
Contributor Author

reconbot commented Oct 6, 2018 via email

@reconbot
Copy link
Contributor Author

reconbot commented Oct 7, 2018

ok one more patch, .get() can easily be made faster and it's used for looking between buffers.

before

get performance x 12,057,870 ops/sec ±0.95% (90 runs sampled)
smallBuffers small search x 6,707 ops/sec ±0.70% (91 runs sampled)
smallBuffers medium search x 11,849 ops/sec ±1.01% (90 runs sampled)
smallBuffers large search x 18,889 ops/sec ±0.92% (88 runs sampled)
smallBuffersLong small search x 124 ops/sec ±0.93% (79 runs sampled)
smallBuffersLong medium search x 100 ops/sec ±0.98% (73 runs sampled)
smallBuffersLong large search x 108 ops/sec ±0.96% (79 runs sampled)
largeBuffers small search x 7,065 ops/sec ±0.83% (94 runs sampled)
largeBuffers medium search x 98.31 ops/sec ±0.86% (73 runs sampled)
largeBuffers large search x 49.31 ops/sec ±0.99% (64 runs sampled)
mixedBuffers small search x 7,373 ops/sec ±0.76% (94 runs sampled)
mixedBuffers medium search x 139 ops/sec ±0.82% (79 runs sampled)
mixedBuffers large search x 109 ops/sec ±1.29% (80 runs sampled)
veryLargeBuffers small search x 203 ops/sec ±1.48% (79 runs sampled)
veryLargeBuffers medium search x 64.49 ops/sec ±1.00% (67 runs sampled)
veryLargeBuffers large search x 38.00 ops/sec ±1.28% (51 runs sampled)

after

get performance x 116,746,182 ops/sec ±0.67% (92 runs sampled)
smallBuffers small search x 11,821 ops/sec ±0.96% (90 runs sampled)
smallBuffers medium search x 16,909 ops/sec ±0.99% (89 runs sampled)
smallBuffers large search x 19,734 ops/sec ±0.73% (93 runs sampled)
smallBuffersLong small search x 142 ops/sec ±0.91% (81 runs sampled)
smallBuffersLong medium search x 116 ops/sec ±0.96% (75 runs sampled)
smallBuffersLong large search x 124 ops/sec ±0.96% (79 runs sampled)
largeBuffers small search x 11,877 ops/sec ±0.99% (92 runs sampled)
largeBuffers medium search x 175 ops/sec ±0.98% (81 runs sampled)
largeBuffers large search x 87.49 ops/sec ±1.14% (75 runs sampled)
mixedBuffers small search x 12,191 ops/sec ±0.81% (94 runs sampled)
mixedBuffers medium search x 236 ops/sec ±0.88% (86 runs sampled)
mixedBuffers large search x 181 ops/sec ±1.61% (84 runs sampled)
veryLargeBuffers small search x 200 ops/sec ±1.27% (78 runs sampled)
veryLargeBuffers medium search x 93.03 ops/sec ±0.96% (79 runs sampled)
veryLargeBuffers large search x 58.33 ops/sec ±1.01% (61 runs sampled)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants