Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for cold blocks #22

Open
Amanieu opened this issue Dec 26, 2021 · 2 comments
Open

Support for cold blocks #22

Amanieu opened this issue Dec 26, 2021 · 2 comments

Comments

@Amanieu
Copy link
Contributor

Amanieu commented Dec 26, 2021

It would be useful to be able to mark some blocks as "cold" which means that they are rarely taken cold paths. The register allocator should prefer placing spills and moves in cold blocks if possible.

It turns out that very little needs to be done if we take advantage of the block ordering by requiring all cold blocks to be after normal blocks in terms of block index and instruction indices. This has the following consequences:

  • Due to the sorting of blockparams_out, bundle merging will attempt to merge all branch parameters coming from normal blocks before attempting to merge ones coming from cold blocks.
  • If a requirement conflict exists within a bundle, the bundle is split at the first conflict point. Since ranges are iterated in order, this will place the split in a cold block if there are no conflicts within normal blocks.
  • If register allocation requires splitting a bundle, the first conflict point is used as the split point. Same as above.

My only concern is that the block order will no longer be in RPO which is the ordering recommended by the documentation. While regalloc2 will still function properly, I am less sure of the impact it may have on the heuristics.

Note that there are no requirements related to the ordering of blocks, and there is no requirement that the control flow be reducible. Some heuristics used by the allocator will perform better if the code is reducible and ordered in reverse postorder (RPO), however: in particular, (1) this interacts better with the contiguous-range-of-instruction-indices live range representation that we use, and (2) the "approximate loop depth" metric will actually be exact if both these conditions are met.

@bjorn3
Copy link
Contributor

bjorn3 commented Dec 26, 2021

FYI there is already an issue on the wasmtime side: bytecodealliance/wasmtime#2747

@cfallin
Copy link
Member

cfallin commented Jan 18, 2022

@Amanieu I think that placing cold blocks at the end of the function in the linear block order should basically just work, as you say.

The two issues in the design doc could potentially have an impact on compile time (first issue -- because we'll have longer, discontiguous live ranges) and code quality (second issue -- because an out-of-lined block from an inner loop, sunk to the end, could cause the approximate metric to treat the entire remainder of the function as a hot inner loop body).

For the second, I think a simple-enough answer is to stop the approximate-loop-depth scan before any code blocks (and treat them as zero depth). That would also have a side-effect of making the spill cost low in the cold paths, which is what we want.

So I can imagine adding a method to the Function trait something like fn first_cold_block(&self) -> Option<Block> and then use that to end this scan early, then that should be it. Does that seem reasonable to you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants