Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift: don't hoist instructions whose results are only used in cold blocks #8197

Open
jameysharp opened this issue Mar 20, 2024 · 1 comment
Labels
bug Incorrect behavior in the current implementation that needs fixing cranelift Issues related to the Cranelift code generator

Comments

@jameysharp
Copy link
Contributor

.clif Test Case

Wasmtime currently translates this WebAssembly module:

(module
  (type $fn (func (result i32)))
  (table $fnptrs 2 2 funcref)
  (func (param i32)
        loop
          local.get 0
          call_indirect $fnptrs (type $fn)
          br 0
        end))

To this CLIF, before optimization:

function u0:0(i64 vmctx, i64, i32) fast {
    gv0 = vmctx
    gv1 = load.i64 notrap aligned readonly gv0+8
    gv2 = load.i64 notrap aligned gv1
    gv3 = vmctx
    gv4 = load.i64 notrap aligned gv3+72
    sig0 = (i64 vmctx, i64) -> i32 fast
    sig1 = (i64 vmctx, i32 uext, i32 uext) -> i64 system_v
    stack_limit = gv2

                                block0(v0: i64, v1: i64, v2: i32):
                                    v3 -> v2
                                    v28 -> v3
@0027                               jump block2

                                block2:
@002b                               v4 = iconst.i32 2
@002b                               v5 = icmp.i32 uge v3, v4  ; v4 = 2
@002b                               v6 = uextend.i64 v3
@002b                               v7 = global_value.i64 gv4
@002b                               v8 = ishl_imm v6, 3
@002b                               v9 = iadd v7, v8
@002b                               v10 = iconst.i64 0
@002b                               v11 = select_spectre_guard v5, v10, v9  ; v10 = 0
@002b                               v12 = load.i64 table_oob aligned table v11
@002b                               v13 = band_imm v12, -2
@002b                               brif v12, block5(v13), block4

                                block4 cold:
@002b                               v15 = iconst.i32 0
@002b                               v16 = global_value.i64 gv3
@002b                               v17 = load.i64 notrap aligned readonly v16+56
@002b                               v18 = load.i64 notrap aligned readonly v17+72
@002b                               v19 = call_indirect sig1, v18(v16, v15, v3)  ; v15 = 0
@002b                               jump block5(v19)

                                block5(v14: i64):
@002b                               v20 = global_value.i64 gv3
@002b                               v21 = load.i64 notrap aligned readonly v20+64
@002b                               v22 = load.i32 notrap aligned readonly v21
@002b                               v23 = load.i32 icall_null aligned readonly v14+24
@002b                               v24 = icmp eq v23, v22
@002b                               trapz v24, bad_sig
@002b                               v25 = load.i64 notrap aligned readonly v14+16
@002b                               v26 = load.i64 notrap aligned readonly v14+32
@002b                               v27 = call_indirect sig0, v25(v26, v0)
@002e                               jump block2
}

Steps to Reproduce

Optimize this CLIF with the egraph pass enabled using opt_level=speed.

Expected Results

I think the loads of v17 and v18 should stay in the cold block.

Actual Results

The loads of v17 and v18 are hoisted to the entry block, since they have the notrap and readonly flags set and depend only on the vmctx parameter, v0.

Versions and Environment

Cranelift version or commit: tested on afaf1c7

Extra Info

This is not a correctness bug but might be a performance problem. I haven't measured it, just noticed it while looking at the CLIF that Wasmtime currently generates for call_indirect.

cc: @cfallin

@jameysharp jameysharp added bug Incorrect behavior in the current implementation that needs fixing cranelift Issues related to the Cranelift code generator labels Mar 20, 2024
jameysharp added a commit to jameysharp/wasmtime that referenced this issue Mar 21, 2024
I found these tests useful in thinking about bytecodealliance#8195, bytecodealliance#8197, and bytecodealliance#8200, so
I thought we should track our codegen for these specific cases over
time.
github-merge-queue bot pushed a commit that referenced this issue Mar 21, 2024
I found these tests useful in thinking about #8195, #8197, and #8200, so
I thought we should track our codegen for these specific cases over
time.
@cfallin
Copy link
Member

cfallin commented Mar 23, 2024

I suppose we could add a(nother) special case to the block-placement logic, where LICM etc is implemented: if the original block is cold, keep the op in the same block. That's probably limited in scope enough that it's both (i) tractable and (ii) shouldn't have unforeseen impacts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior in the current implementation that needs fixing cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

No branches or pull requests

2 participants