Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVM integration tests #1507

Open
wants to merge 85 commits into
base: master
Choose a base branch
from
Open

EVM integration tests #1507

wants to merge 85 commits into from

Conversation

aakoshh
Copy link
Contributor

@aakoshh aakoshh commented Jan 18, 2023

The PR is an effort to add integration tests for EVM opcodes through the ref-fvm, so the whole executor/callmanager/kernel stack is involved. The integration tests are expected to exercise more complex scenarios than the unit tests in the builtin-actors do, mixing different lifecycle events with various call types.

Initially I thought about using state machine tests, a.k.a fuzzing. The beginning of that is added in testing/integration/src/smt.rs, which expects to use arbitrary::Arbitrary and arbtest to generate commands and compare the state of the system under test to a reference model.

This effort was halted in favour of hand written test scenarios. To write scenarios, I thought we could use Gherkin syntax and the Cucumber test runner, so we the scenarios stay readable.

I started adding Solidity test contracts under testing/integration/tests/evm. The contracts are compiled to bytecode and a Rust facade is generated using the ethers tools. The output is under testing/integration/tests/evm/artifacts and testing/integration/tests/evm/src. This project also holds the contract behaviour specifications under testing/integration/tests/evm/features.

These contracts are used in testing/integration/tests/fevm_features, which is where the features files are parsed and executed. There is a common.rs here which can be used to capture shared idioms.

Example

cargo test --release --test fevm_features
   Compiling fvm_integration_tests v0.1.1-alpha.1 (/home/aakoshh/projects/consensuslab/ref-fvm/testing/integration)
    Finished release [optimized] target(s) in 25.32s
     Running tests/fevm.rs (/home/aakoshh/projects/consensuslab/ref-fvm/target/release/deps/fevm-ebc615f4af0751af)
Feature: SimpleCoin
  Rule: Owner has initial balance
    Scenario: When we deploy the contract, the owner gets 10000 coins
     ✔  Given 1 random account
     ✔  When account 1 creates a SimpleCoin contract
     ✔  Then the balance of account 1 is 10000 coins
...
[Summary]
1 feature
4 rules
9 scenarios (9 passed)
40 steps (40 passed)

Tips:

  • To run a single scenario tag it like @wip and run it with cargo test --release --test fevm_features -- -t @wip.
  • Do not forget to add --release or loading a contract will take ~30 seconds!

Questions

  • The tester allows creating two accounts with the same private key, but different actor IDs. Then, it mixes up their seqno. Can this happen in real life? We believe it's a bug in the tester.
  • When I create an actor, I save its ActorID in the CreateReturn and later use it to invoke the contract. However, when I make a recursive CALL, this is not the address I see in the sender, but probably the delegated address. It would be helpful to include more comments about which of the three addresses in CreateReturn should be used for what purpose.
  • When I create a contract, I expect the balance of the account that does it to go down by the amount of gas used, yet it stays the same 10,000 the tester created the account with. The reason is probably because the gas_fee_cap is not set on the Message and the system doesn't check that it's at least as high as the base_fee. Is that a check only Lotus is expected to do? The gas_fee_cap is set to 0 on the Message by the tester, so there's no charge, only miner penalty.
  • Would it be okay to reference the EVM actor code in the tests, so we can use the data structures, such as EthAddress? The above PR duplicates them to avoid a circular dependency in src, but in the tests we are already referencing the actor bundle, and the actual types have extra methods we can use, for example to convert from Address to EthAddress.

TODO:

  • Provide an example of parsing events. It looks like currently the fields are private, so there's no way to pass them to ethers. I made them public in this PR to get on with it.
  • Test a contract that performs recursive calls with CALL, DELEGATECALL and REVERT
  • Test STATICCALL
  • Test creating contracts within contracts
  • Provide example of calling a contract with constructor arguments
  • Test SELFDESTRUCT refund scenarios
  • Test metamorphic contracts
  • Test SSTORE and SLOAD
  • Finish the first self destruct scenario Then the execution fails with message '???': is it really expected to fail? It doesn't.

Why Cucumber

It's easy to find articles on the internet that warn against using Cucumber for BDD, however, they are more often than not saying that in the context of testing UIs, where Selenium and RSpec can do a much better job without the overhead of the english syntax for scenarios. They warn that in rapidly evolving applications like a UI, Cucumber and Gherkin tests soon become unmaintainable.

In our case, however, we are dealing with smart contracts that once deployed, might never be change. For these, having a few scenarios describe their expected behaviour is not difficult to maintain.

Another motivation to reach for these was the personal circumstances when I was tasked with testing certain EVM opcodes with complex lifecycle scenarios: I didn't know (i) how to trigger these opcodes in Solidity and (ii) what their expected behaviour was. A situation like this is more what Cucumber was intended for: for a developer to sit down with an analyst to hash out scenarios and examples of what we are expecting from the system.

For simpler cases like returning the code size of a contract, this is definitely not the right tool.

Last, but not least, it has music going for it 🥒

@aakoshh aakoshh marked this pull request as ready for review January 19, 2023 18:56
@codecov-commenter
Copy link

codecov-commenter commented Jan 30, 2023

Codecov Report

Merging #1507 (bb9d8d8) into master (cd2c767) will increase coverage by 0.15%.
The diff coverage is 0.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1507      +/-   ##
==========================================
+ Coverage   53.54%   53.69%   +0.15%     
==========================================
  Files         143      144       +1     
  Lines       13500    13464      -36     
==========================================
+ Hits         7229     7230       +1     
+ Misses       6271     6234      -37     
Impacted Files Coverage Δ
shared/src/event/mod.rs 0.00% <ø> (ø)
tools/fvm-bench/src/fevm.rs 0.00% <0.00%> (ø)
fvm/src/gas/price_list.rs 26.49% <0.00%> (-1.00%) ⬇️
fvm/src/kernel/default.rs 14.54% <0.00%> (-0.29%) ⬇️
fvm/src/machine/mod.rs 57.81% <0.00%> (ø)
fvm/src/machine/boxed.rs 0.00% <0.00%> (ø)
fvm/src/syscalls/event.rs 0.00% <0.00%> (ø)
fvm/src/executor/default.rs 5.45% <0.00%> (ø)
fvm/src/call_manager/default.rs 0.00% <0.00%> (ø)
fvm/src/blockstore/discard.rs 0.00% <0.00%> (ø)
... and 2 more

Copy link
Contributor

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, i didnt look into the contracts all that deeply, but i think this is a tentative go.

My main concern is about framework creep here and the myriad of test deps, each with its own little dsl and mental model and so on.
Personally i think this is going to be quite hard to maintain, so i post an objection here.

@Stebalien i would like your opinion here; if you add a go to my tentative one, i'll resolve the conflicts and bring it to mergable state.

@aakoshh
Copy link
Contributor Author

aakoshh commented Mar 3, 2023

@vyzo thanks for taking a look.

I also read that Cucumber DSL is hard to maintain in the context of a rapidly changing website and an RSpec style test with before and after supported in the code is much better. I don't disagree with that.

I this case, however, I think these smart contract basically never change. You have the EVM spec, you have a Solidity file that demonstrates some behaviour, and in real life once that's deployed, it is immutable, so similarly the surface area of the tests will not change.

What changes is the underlying mapping of how the FVM supports deploying, executing and querying the contracts, but that is hidden by the mostly reusable DSL implementations.

I think this is a good example of how a Business Analyst could engage with the specification of a smart contract without having to wade through code.

But no, I don't really expect that you would personally enjoy adding to this test

@aakoshh
Copy link
Contributor Author

aakoshh commented Mar 6, 2023

Another comment on how hard something is to maintain: recently @raulk raised a point on the undocumented Gerbil scripts which were checked in along with the test files they generated, yet we added them to the repo after some comments were added. These tests are completely automated, so there's no risk of them rotting away, and I took care to leave comments and docs on every method I call.

You mention that tests have their own mental models and mini DSLs, and I agree, since they are individual Solidity contracts demonstrating some function or quirk of the EVM. However, I hope that this actually brings benefit to some developers, namely the ones who know Solidity, but not FEVM or the FVM. They can read the contract and the english description of what is expected to build up their mental model, and only then dig deeper into how things are achieved with FVM.

Honestly I spent half of my time doing just that, trying to figure out what combination of addresses and deserializers I have to use to achieve what the mental model of the contract wants me to demonstrate.

@maciejwitowski maciejwitowski removed this from the Pre-Mainnet Launch milestone Mar 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants