wasmparser: Remove offset param in VisitOperator #804

nagisa · 2022-10-26T17:57:42Z

Implementing VisitOperator is complicated by the additional passing-through of the offset everywhere. A much simpler alternative code wise is to stash the offset away within the implementation of VisitOperator before invoking SomeReader::visit_operator.

This raises some complications with visitors that depend on the offsets and rely on them for functional correctness, though. This is the case for the FuncValidator which uses offsets to verify that the very last end operator seen is at the same offset as the reader when the validator is finalized.

nagisa · 2022-10-26T18:02:50Z

crates/wasmparser/src/validator/func.rs

-                self.validator.with_resources(&self.resources)
-                    .$visit(offset $($(,$arg)*)?)
+            fn $visit(&mut self $($(,$arg: $argty)*)?) -> Result<()> {
+                // TODO: whatever happens if the caller does not set `op_offset` ahead of time?


The public VisitOperator implementation on this type is problematic. The validation today depends on offsets always getting set-up correctly for more than just the error messages. In particular it sets self.end_which_emptied_control to the offset of the very last end for the function, which is then checked in the finish method.

On the other hand, it is possible for the users to call reader.visit_operator(func_validator) while forgetting to set-up the op_offset. This is distinct from FuncValidator::op which does the right thing.

I would say resolving this will require to remove or move this implementation to some other type which forces callers to set up the offset one way or another, much like how with_resources below does.

Maybe something like

fn visit_operator(&self, reader: BinaryReader) -> ... { }

Hm yeah that is indeed unfortunate. It's fine to implement the end_which_emptied_control bit entirely differently, that was just the first thing I reached for.

One possibility, though, could perhaps be a default-does-nothing trait method which is along the lines of "I'm about to start visiting this offset", which could be called at the start of visit_operator and ignored by almost everything except the validator?

Yeah, adding a visit_offset or such to the VisitOperator is an option. It is an obvious way forward, but it isn’t clear to me if it is the best way forward quite yet.

I want to give it some thought and maybe experiment with them to see how things work out if done differently.

I ended up with what I think is a simpler approach for now of adding a visitor method to the FuncValidator which returns an opaque impl VisitOperator. This is basically what all of the methods were delegating to anyway.

crates/wasmparser/src/validator/core.rs

crates/wasmparser/src/readers/core/operators.rs

Robbepop · 2022-10-27T13:16:59Z

The main motivation behind the VisitOperator API was as a more efficient way to parse, validate and transform Wasm bytecode. I can understand the motivation behind this PR but really hope that we won't regress performance of validation or other means of using this API.
Originally I proposed the VisitOperator trait with an Input associated type that allowed for a generic first parameter instead of a offset: usize one to model other usages. I see this design as an alternative.

Implementing `VisitOperator` is complicated by the additional passing-through of the `offset` everywhere. A much simpler alternative code wise is to stash the `offset` away within the implementation of `VisitOperator` before invoking `SomeReader::visit_operator`. This raises some complications with visitors that depend on the offsets and rely on them for functional correctness, though. This is the case for the `FuncValidator` which uses offsets to verify that the very last `end` operator seen is at the same offset as the reader when the validator is finalized.

This type depends on `offset`s being passed in correctly, that’s how it validates whether the last `End` operator is actually the end of the function. Prior to removal of the `offset` parameters, they’d be correct by the definition of `reader` passing the offset in. However, with move of this responsibility to the user code, it now became straightforward to invoke the visitor methods in a way that violates the assumptions in the implementation of the `FuncValidator`. Instead, in order to obtain a visitor, users are now required to call a method which also sets up the offsets accordingly.

nagisa · 2022-10-27T14:05:18Z

The main motivation behind the VisitOperator API was as a more efficient way to parse, validate and transform Wasm bytecode. I can understand the motivation behind this PR but really hope that we won't regress performance of validation or other means of using this API.

I definitely am aware of that. I wouldn’t have proposed this change if my intuition didn’t suggest the different ways of passing the offset around didn’t meaningfully change the perf characteristics of the code.

My dev machine isn’t really set-up for quality benchmarking, with a ton of stuff running in there, but here are the results:

parse/tests             time:   [2.5424 ms 2.5431 ms 2.5438 ms]                         
                        change: [+1.2777% +2.1963% +2.9292%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

validate/tests          time:   [9.2935 ms 9.2974 ms 9.3014 ms]                           
                        change: [-1.5510% -1.1343% -0.8100%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

validate/bz2            time:   [953.63 µs 954.91 µs 956.74 µs]                         
                        change: [-0.2498% -0.1441% -0.0084%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

parse/bz2               time:   [578.27 µs 578.31 µs 578.35 µs]                      
                        change: [+0.9696% +1.0978% +1.1982%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

validate/intgemm-simd   time:   [1.9817 ms 1.9839 ms 1.9867 ms]                                   
                        change: [+0.6718% +0.7915% +0.9672%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

Benchmarking parse/intgemm-simd: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.6s, enable fla
t sampling, or reduce sample count to 60.
parse/intgemm-simd      time:   [1.1027 ms 1.1030 ms 1.1035 ms]                                
                        change: [+1.8445% +1.9715% +2.0582%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  7 (7.00%) high severe

validate/pulldown-cmark time:   [2.2530 ms 2.2542 ms 2.2555 ms]                                     
                        change: [+0.8914% +0.9679% +1.0370%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking parse/pulldown-cmark: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.7s, enable fla
t sampling, or reduce sample count to 60.
parse/pulldown-cmark    time:   [1.3279 ms 1.3281 ms 1.3285 ms]                                  
                        change: [+0.8563% +0.8996% +0.9471%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) low severe
  6 (6.00%) high mild
  9 (9.00%) high severe

Benchmarking validate/spidermonkey: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, or reduce 
sample count to 80.
validate/spidermonkey   time:   [56.373 ms 56.653 ms 57.075 ms]                                  
                        change: [+0.9209% +1.4428% +2.2000%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

parse/spidermonkey      time:   [33.288 ms 33.293 ms 33.299 ms]                               
                        change: [+0.7947% +0.8221% +0.8500%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  8 (8.00%) high mild
  1 (1.00%) high severe

the benchmark runner is indeed reporting some regressions over the baseline, however where there are regressions, they are definitely still within run-to-run variance on my machine, as running the same benchmark will readily report an improvement of about as many percent.

alexcrichton · 2022-10-27T15:01:27Z

This seems reasonable to me and I like the way it's integrated with FuncValidator. Was there anything else you wanted to add before merging?

For benchmarking using criterion is not useful to detect very small changes, which if this has any effect it'd be quite small. I would hope, though, that any slowdown could be recovered via other means rather than plumbing offset everywhere.

nagisa · 2022-10-27T15:14:52Z

I think this is ready to merge at this point. I wasn’t thinking of making any significant additional changes in this PR, except for addressing any feedback.

nagisa commented Oct 26, 2022

View reviewed changes

crates/wasmparser/src/validator/core.rs Show resolved Hide resolved

nagisa commented Oct 26, 2022

View reviewed changes

crates/wasmparser/src/readers/core/operators.rs Outdated Show resolved Hide resolved

nagisa force-pushed the remove-offsets branch from 36f4db8 to 5fa2c9f Compare October 27, 2022 12:58

nagisa added 4 commits October 27, 2022 16:45

wasmparser: expand docs for visit_operator

04825b3

Do not run fuzzing for cargo bench --all

510ab04

nagisa force-pushed the remove-offsets branch from 5fa2c9f to 510ab04 Compare October 27, 2022 13:45

alexcrichton approved these changes Oct 27, 2022

View reviewed changes

alexcrichton merged commit d1c9ad2 into bytecodealliance:main Oct 27, 2022

alexcrichton mentioned this pull request Oct 31, 2022

VisitOperator: don’t bake in offset into visit methods #803

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

wasmparser: Remove offset param in VisitOperator #804

wasmparser: Remove offset param in VisitOperator #804

nagisa commented Oct 26, 2022

Uh oh!

nagisa Oct 26, 2022

Uh oh!

alexcrichton Oct 26, 2022

Uh oh!

nagisa Oct 27, 2022

Uh oh!

nagisa Oct 27, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Robbepop commented Oct 27, 2022 •

edited

Loading

Uh oh!

nagisa commented Oct 27, 2022

Uh oh!

alexcrichton commented Oct 27, 2022

Uh oh!

nagisa commented Oct 27, 2022

Uh oh!

wasmparser: Remove offset param in VisitOperator #804

wasmparser: Remove offset param in VisitOperator #804

Conversation

nagisa commented Oct 26, 2022

Uh oh!

nagisa Oct 26, 2022

Choose a reason for hiding this comment

Uh oh!

alexcrichton Oct 26, 2022

Choose a reason for hiding this comment

Uh oh!

nagisa Oct 27, 2022

Choose a reason for hiding this comment

Uh oh!

nagisa Oct 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Robbepop commented Oct 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nagisa commented Oct 27, 2022

Uh oh!

alexcrichton commented Oct 27, 2022

Uh oh!

nagisa commented Oct 27, 2022

Uh oh!

nagisa Oct 27, 2022 •

edited

Loading

Robbepop commented Oct 27, 2022 •

edited

Loading