Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/vm: Move interprepreter intrerruption check to jump instructions #24026

Merged
merged 3 commits into from Dec 3, 2021

Conversation

gumb0
Copy link
Member

@gumb0 gumb0 commented Dec 1, 2021

Fixes #23969

@gumb0
Copy link
Member Author

gumb0 commented Dec 1, 2021

On my machine all 3 variants (no check, check every jump, check every 64th jump) look to have the same performance.

No check vs check every jump:

name                                  old time/op  new time/op  delta
main/blake2b_shifts/2805nulls         13.2ms ± 2%  13.1ms ± 2%  -0.88%  (p=0.004 n=16+17)
main/blake2b_shifts/5610nulls         29.9ms ± 1%  29.7ms ± 1%  -0.51%  (p=0.003 n=19+20)
main/blake2b_shifts/8415nulls         44.0ms ± 0%  43.8ms ± 1%  -0.59%  (p=0.000 n=18+18)
main/weierstrudel/0                    470µs ± 1%   469µs ± 1%    ~     (p=0.513 n=20+19)
main/weierstrudel/1                    978µs ± 0%   984µs ± 1%  +0.68%  (p=0.000 n=20+18)
main/weierstrudel/3                   1.52ms ± 0%  1.54ms ± 0%  +1.27%  (p=0.000 n=18+20)
main/weierstrudel/9                   3.12ms ± 1%  3.16ms ± 1%  +1.58%  (p=0.000 n=20+20)
main/weierstrudel/14                  4.45ms ± 1%  4.51ms ± 0%  +1.38%  (p=0.000 n=18+17)
main/sha1_divs/empty                   194µs ± 1%   194µs ± 0%    ~     (p=0.723 n=20+19)
main/sha1_divs/1351                   3.79ms ± 0%  3.79ms ± 0%    ~     (p=0.773 n=19+19)
main/sha1_divs/2737                   7.38ms ± 1%  7.38ms ± 0%    ~     (p=0.987 n=16+19)
main/sha1_divs/5311                   14.6ms ± 4%  14.5ms ± 1%    ~     (p=0.515 n=18+20)
main/sha1_shifts/empty                 158µs ± 1%   152µs ± 1%  -3.27%  (p=0.000 n=18+19)
main/sha1_shifts/1351                 3.13ms ± 1%  3.03ms ± 1%  -3.24%  (p=0.000 n=20+19)
main/sha1_shifts/2737                 6.06ms ± 0%  5.87ms ± 1%  -3.11%  (p=0.000 n=18+18)
main/sha1_shifts/5311                 11.9ms ± 0%  11.5ms ± 1%  -3.21%  (p=0.000 n=16+18)
main/blake2b_huff/empty                102µs ± 0%   103µs ± 1%  +0.69%  (p=0.000 n=18+20)
main/blake2b_huff/2805nulls           1.94ms ± 2%  1.94ms ± 1%    ~     (p=0.457 n=20+20)
main/blake2b_huff/5610nulls           3.73ms ± 2%  3.73ms ± 1%    ~     (p=0.443 n=18+19)
main/blake2b_huff/8415nulls           5.48ms ± 1%  5.46ms ± 1%    ~     (p=0.331 n=20+18)
micro/JUMPDEST_n0/empty               2.49ms ± 1%  2.49ms ± 1%  +0.26%  (p=0.004 n=20+20)
micro/jump_around/empty                144µs ± 0%   144µs ± 1%    ~     (p=0.330 n=18+16)
micro/memory_grow_mstore/nogrow        440µs ± 1%   440µs ± 0%    ~     (p=0.781 n=19+16)
micro/memory_grow_mstore/by1           467µs ± 1%   466µs ± 1%    ~     (p=0.101 n=19+20)
micro/memory_grow_mstore/by16          576µs ± 1%   576µs ± 1%    ~     (p=0.627 n=16+18)
micro/memory_grow_mstore/by32          655µs ± 0%   656µs ± 1%    ~     (p=0.258 n=20+19)
micro/signextend/zero                  318µs ± 1%   322µs ± 2%  +1.50%  (p=0.000 n=19+20)
micro/signextend/one                   328µs ± 1%   331µs ± 1%  +0.95%  (p=0.000 n=20+20)
micro/loop_with_many_jumpdests/empty  1.84µs ± 3%  1.85µs ± 4%    ~     (p=0.583 n=20+19)
micro/memory_grow_mload/nogrow         214µs ± 1%   215µs ± 1%  +0.36%  (p=0.010 n=20+18)
micro/memory_grow_mload/by1            230µs ± 1%   231µs ± 1%  +0.56%  (p=0.001 n=17+19)
micro/memory_grow_mload/by16           309µs ± 5%   317µs ± 4%  +2.62%  (p=0.005 n=19+18)
micro/memory_grow_mload/by32           412µs ± 3%   410µs ± 2%    ~     (p=0.056 n=17+20)
[Geo mean]                            1.13ms       1.13ms       -0.14%

Check every jump vs every 64th jump:

name                                  old time/op  new time/op  delta
main/blake2b_shifts/2805nulls         13.2ms ± 2%  13.2ms ± 4%    ~     (p=0.765 n=16+20)
main/blake2b_shifts/5610nulls         29.9ms ± 1%  29.1ms ± 2%  -2.40%  (p=0.000 n=19+15)
main/blake2b_shifts/8415nulls         44.0ms ± 0%  43.0ms ± 1%  -2.29%  (p=0.000 n=18+19)
main/weierstrudel/0                    470µs ± 1%   460µs ± 1%  -2.12%  (p=0.000 n=20+19)
main/weierstrudel/1                    978µs ± 0%   947µs ± 0%  -3.16%  (p=0.000 n=20+18)
main/weierstrudel/3                   1.52ms ± 0%  1.47ms ± 0%  -3.21%  (p=0.000 n=18+19)
main/weierstrudel/9                   3.12ms ± 1%  3.03ms ± 0%  -2.83%  (p=0.000 n=20+17)
main/weierstrudel/14                  4.45ms ± 1%  4.32ms ± 1%  -2.95%  (p=0.000 n=18+20)
main/sha1_divs/empty                   194µs ± 1%   194µs ± 1%    ~     (p=0.119 n=20+18)
main/sha1_divs/1351                   3.79ms ± 0%  3.83ms ± 1%  +1.11%  (p=0.000 n=19+20)
main/sha1_divs/2737                   7.38ms ± 1%  7.46ms ± 1%  +1.14%  (p=0.000 n=16+18)
main/sha1_divs/5311                   14.6ms ± 4%  14.6ms ± 1%    ~     (p=0.361 n=18+20)
main/sha1_shifts/empty                 158µs ± 1%   155µs ± 1%  -1.87%  (p=0.000 n=18+20)
main/sha1_shifts/1351                 3.13ms ± 1%  3.07ms ± 1%  -1.78%  (p=0.000 n=20+20)
main/sha1_shifts/2737                 6.06ms ± 0%  5.95ms ± 1%  -1.76%  (p=0.000 n=18+17)
main/sha1_shifts/5311                 11.9ms ± 0%  11.7ms ± 1%  -2.08%  (p=0.000 n=16+20)
main/blake2b_huff/empty                102µs ± 0%   101µs ± 0%  -1.26%  (p=0.000 n=18+18)
main/blake2b_huff/2805nulls           1.94ms ± 2%  1.91ms ± 1%  -1.25%  (p=0.000 n=20+19)
main/blake2b_huff/5610nulls           3.73ms ± 2%  3.68ms ± 1%  -1.28%  (p=0.000 n=18+19)
main/blake2b_huff/8415nulls           5.48ms ± 1%  5.40ms ± 1%  -1.42%  (p=0.000 n=20+18)
micro/JUMPDEST_n0/empty               2.49ms ± 1%  2.45ms ± 1%  -1.48%  (p=0.000 n=20+20)
micro/jump_around/empty                144µs ± 0%   142µs ± 1%  -1.32%  (p=0.000 n=18+20)
micro/memory_grow_mstore/nogrow        440µs ± 1%   433µs ± 1%  -1.78%  (p=0.000 n=19+20)
micro/memory_grow_mstore/by1           467µs ± 1%   458µs ± 1%  -1.94%  (p=0.000 n=19+20)
micro/memory_grow_mstore/by16          576µs ± 1%   569µs ± 2%  -1.22%  (p=0.000 n=16+19)
micro/memory_grow_mstore/by32          655µs ± 0%   660µs ± 0%  +0.65%  (p=0.000 n=20+16)
micro/signextend/zero                  318µs ± 1%   344µs ± 1%  +8.20%  (p=0.000 n=19+18)
micro/signextend/one                   328µs ± 1%   355µs ± 1%  +8.04%  (p=0.000 n=20+19)
micro/loop_with_many_jumpdests/empty  1.84µs ± 3%  1.83µs ± 2%    ~     (p=0.129 n=20+20)
micro/memory_grow_mload/nogrow         214µs ± 1%   211µs ± 1%  -1.39%  (p=0.000 n=20+19)
micro/memory_grow_mload/by1            230µs ± 1%   226µs ± 1%  -1.43%  (p=0.000 n=17+19)
micro/memory_grow_mload/by16           309µs ± 5%   312µs ± 3%    ~     (p=0.066 n=19+17)
micro/memory_grow_mload/by32           412µs ± 3%   411µs ± 4%    ~     (p=0.832 n=17+18)
[Geo mean]                            1.13ms       1.13ms       -0.71%

@chfast
Copy link
Contributor

chfast commented Dec 1, 2021

I think we need some unit tests for this use case first: execute long running loop in one go-routine and abort it in second go-routine.

@holiman
Copy link
Contributor

holiman commented Dec 1, 2021

If it's indeed 'natively' atomic on amd64, IMO we could just check every jump, and not bother with the counter

@chfast
Copy link
Contributor

chfast commented Dec 1, 2021

If it's indeed 'natively' atomic on amd64, IMO we could just check every jump, and not bother with the counter

It is just MOV instruction because on amd64 every memory load is atomic by design. But this is not the case on arm64 so someone with raspberry pi or Apple M1 may want to benchmark this.
So with the counter we have 1 load + 1 store to just handle the counter. Without counter we just have 1 load.

However, the version with counter looks faster on average, but not in a benchmark purely focusing on jumps like "jump_around".

Interestingly, in the disasm view in the version without counter the errStopToken branch is considered unlikely and move to the end of the function (good decision). This does not happen in the version with the counter. I was not able to reproduce this compiler decision in isolation.

name                                  old time/op  new time/op  delta                                                                                                                                                             
micro/memory_grow_mload/nogrow         221µs ± 1%   220µs ± 0%    ~     (p=0.089 n=20+17)                                                                                                                                         
micro/memory_grow_mload/by1            230µs ± 1%   230µs ± 1%    ~     (p=0.624 n=19+19)                                                                                                                                         
micro/memory_grow_mload/by16           291µs ± 1%   292µs ± 1%    ~     (p=0.534 n=20+20)                                                                                                                                         
micro/memory_grow_mload/by32           366µs ± 1%   367µs ± 1%  +0.25%  (p=0.041 n=19+20)                                                                                                                                         
micro/signextend/zero                  407µs ± 0%   399µs ± 0%  -2.17%  (p=0.000 n=20+20)                                                                                                                                         
micro/signextend/one                   422µs ± 1%   409µs ± 1%  -3.11%  (p=0.000 n=20+20)                                                                                                                                         
micro/loop_with_many_jumpdests/empty  1.70µs ± 2%  1.69µs ± 2%    ~     (p=0.441 n=20+20)                                                                                                                                         
micro/jump_around/empty                146µs ± 1%   146µs ± 1%    ~     (p=0.141 n=20+19)                                                                                                                                         
micro/memory_grow_mstore/nogrow        454µs ± 1%   453µs ± 1%  -0.23%  (p=0.001 n=18+20)                                                                                                                                         
micro/memory_grow_mstore/by1           468µs ± 0%   468µs ± 1%    ~     (p=0.698 n=20+20)                                                                                                                                         
micro/memory_grow_mstore/by16          548µs ± 0%   549µs ± 1%  +0.30%  (p=0.000 n=18+20)                                                                                                                                         
micro/memory_grow_mstore/by32          612µs ± 0%   612µs ± 1%    ~     (p=0.620 n=20+20)                                                                                                                                         
micro/JUMPDEST_n0/empty               2.84ms ± 0%  2.85ms ± 0%  +0.07%  (p=0.028 n=16+17)                                                                                                                                         
main/sha1_divs/empty                   204µs ± 0%   197µs ± 0%  -3.60%  (p=0.000 n=17+20)
main/sha1_divs/1351                   4.14ms ± 0%  3.98ms ± 0%  -3.85%  (p=0.000 n=17+19)
main/sha1_divs/2737                   8.08ms ± 0%  7.76ms ± 0%  -3.97%  (p=0.000 n=18+19)
main/sha1_divs/5311                   15.8ms ± 0%  15.2ms ± 0%  -3.81%  (p=0.000 n=20+18)
main/weierstrudel/0                    468µs ± 0%   467µs ± 1%  -0.19%  (p=0.002 n=19+19)
main/weierstrudel/1                    978µs ± 1%   976µs ± 0%  -0.26%  (p=0.004 n=19+19)
main/weierstrudel/3                   1.54ms ± 1%  1.54ms ± 1%    ~     (p=0.184 n=19+20)
main/weierstrudel/9                   3.21ms ± 1%  3.19ms ± 1%  -0.67%  (p=0.000 n=20+19)
main/weierstrudel/14                  4.61ms ± 1%  4.56ms ± 1%  -0.96%  (p=0.000 n=20+19)
main/sha1_shifts/empty                 151µs ± 1%   148µs ± 1%  -1.55%  (p=0.000 n=20+19)
main/sha1_shifts/1351                 3.09ms ± 1%  3.02ms ± 1%  -2.41%  (p=0.000 n=19+19)
main/sha1_shifts/2737                 6.03ms ± 0%  5.88ms ± 0%  -2.37%  (p=0.000 n=17+18)
main/sha1_shifts/5311                 11.8ms ± 0%  11.5ms ± 0%  -2.26%  (p=0.000 n=20+16)
main/blake2b_huff/empty                102µs ± 1%   100µs ± 0%  -2.18%  (p=0.000 n=19+18)
main/blake2b_huff/2805nulls           1.96ms ± 0%  1.90ms ± 1%  -2.88%  (p=0.000 n=19+20)
main/blake2b_huff/5610nulls           3.80ms ± 1%  3.71ms ± 1%  -2.50%  (p=0.000 n=20+20)
main/blake2b_huff/8415nulls           5.57ms ± 1%  5.42ms ± 0%  -2.65%  (p=0.000 n=16+18)
main/blake2b_shifts/2805nulls         14.4ms ± 1%  14.3ms ± 1%  -0.60%  (p=0.000 n=20+18)
main/blake2b_shifts/5610nulls         28.7ms ± 1%  28.6ms ± 1%  -0.40%  (p=0.012 n=20+20)
main/blake2b_shifts/8415nulls         42.7ms ± 0%  42.7ms ± 2%    ~     (p=0.626 n=16+20)
[Geo mean]                            1.16ms       1.14ms       -1.29%

@chfast
Copy link
Contributor

chfast commented Dec 1, 2021

In both cases it seems the compiler predicts branch correctly:

  • core/vm/instructions.go:530:49: Branch prediction rule default < ret
  • core/vm/instructions.go:531:78: Branch prediction rule default < ret

but if you look closely you will see it only considers the second condition of interpreter.jumps%64 == 0 && atomic.LoadInt32(&interpreter.evm.abort) != 0. The first one is not analyzed at all and probably because of this the transformation does not happen.

@gumb0
Copy link
Member Author

gumb0 commented Dec 2, 2021

I think we need some unit tests for this use case first: execute long running loop in one go-routine and abort it in second go-routine.

Unit test added.

@gumb0 gumb0 force-pushed the interpreter-loop-interrupt branch 2 times, most recently from 956cce9 to 2a7e913 Compare December 2, 2021 16:56
@gumb0
Copy link
Member Author

gumb0 commented Dec 2, 2021

I removed the commit with checking every 64th jump and made the unit test end by timeout if evm.Cancel() doesn't work

@gumb0 gumb0 marked this pull request as ready for review December 2, 2021 20:54
Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@holiman holiman merged commit b02fe53 into ethereum:master Dec 3, 2021
@holiman holiman deleted the interpreter-loop-interrupt branch December 3, 2021 10:10
@holiman holiman added this to the 1.10.14 milestone Dec 3, 2021
JacekGlen pushed a commit to JacekGlen/go-ethereum that referenced this pull request May 26, 2022
…hereum#24026)

* core/vm: Remove interpreter loop interruption check

* core/vm: Unit test for interpreter loop interruption

* core/vm: Check for interpreter loop abort on every jump
gzliudan pushed a commit to gzliudan/XDPoSChain that referenced this pull request Feb 23, 2024
…hereum#24026)

* core/vm: Remove interpreter loop interruption check

* core/vm: Unit test for interpreter loop interruption

* core/vm: Check for interpreter loop abort on every jump
gzliudan pushed a commit to gzliudan/XDPoSChain that referenced this pull request Feb 23, 2024
…hereum#24026)

* core/vm: Remove interpreter loop interruption check

* core/vm: Unit test for interpreter loop interruption

* core/vm: Check for interpreter loop abort on every jump
gzliudan pushed a commit to gzliudan/XDPoSChain that referenced this pull request Feb 23, 2024
…hereum#24026)

* core/vm: Remove interpreter loop interruption check

* core/vm: Unit test for interpreter loop interruption

* core/vm: Check for interpreter loop abort on every jump
gzliudan pushed a commit to gzliudan/XDPoSChain that referenced this pull request Feb 23, 2024
…hereum#24026)

* core/vm: Remove interpreter loop interruption check

* core/vm: Unit test for interpreter loop interruption

* core/vm: Check for interpreter loop abort on every jump
gzliudan pushed a commit to gzliudan/XDPoSChain that referenced this pull request Feb 26, 2024
…hereum#24026)

* core/vm: Remove interpreter loop interruption check

* core/vm: Unit test for interpreter loop interruption

* core/vm: Check for interpreter loop abort on every jump
gzliudan pushed a commit to gzliudan/XDPoSChain that referenced this pull request Feb 27, 2024
…hereum#24026)

* core/vm: Remove interpreter loop interruption check

* core/vm: Unit test for interpreter loop interruption

* core/vm: Check for interpreter loop abort on every jump
gzliudan pushed a commit to gzliudan/XDPoSChain that referenced this pull request Feb 27, 2024
…hereum#24026)

* core/vm: Remove interpreter loop interruption check

* core/vm: Unit test for interpreter loop interruption

* core/vm: Check for interpreter loop abort on every jump
gzliudan pushed a commit to gzliudan/XDPoSChain that referenced this pull request Feb 27, 2024
…hereum#24026)

* core/vm: Remove interpreter loop interruption check

* core/vm: Unit test for interpreter loop interruption

* core/vm: Check for interpreter loop abort on every jump
gzliudan pushed a commit to gzliudan/XDPoSChain that referenced this pull request Mar 1, 2024
…hereum#24026)

* core/vm: Remove interpreter loop interruption check

* core/vm: Unit test for interpreter loop interruption

* core/vm: Check for interpreter loop abort on every jump
gzliudan pushed a commit to gzliudan/XDPoSChain that referenced this pull request Mar 1, 2024
…hereum#24026)

* core/vm: Remove interpreter loop interruption check

* core/vm: Unit test for interpreter loop interruption

* core/vm: Check for interpreter loop abort on every jump
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Redesign async EVM termination
3 participants