Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try manually assigning registers #68

Open
pelletier opened this issue Jan 11, 2022 · 3 comments
Open

Try manually assigning registers #68

pelletier opened this issue Jan 11, 2022 · 3 comments

Comments

@pelletier
Copy link
Contributor

#58 (comment)

@mmcloughlin
Copy link
Contributor

Could you elaborate on the problem you've seen with the avo allocator, and consider filing an issue on the avo side?

@pelletier
Copy link
Contributor Author

This issue comes from @chriso saying there may be dependencies being created between YMM registers that could be avoiding by manually picking registers: #58 (comment).

I haven't looked closely at what dependencies exist today though, so I don't know if there is an actual problem. I'd be happy to open an issue on avo if I have something concrete! Did a quick scan of the register allocation, which looks like:

Y0:  incompleteMaskY
Y1:  continuation4BytesY
Y2:  continuation3BytesY
Y3:  nibble1Y
Y4:  nibble2Y
Y5:  nibble3Y
Y6:  nibbleMaskY
Y7:  msbMaskY
Y8:  previousBlockY		vp		off3	continuationBitsY
Y9:  errorY
Y10: incompletePreviousBlockY	previousY	off2	lowPrev			highCurr	tmpY
Y11: currentBlockY		tmp2
Y12: zeroes			blend		highPrev
Y13: shuffle
Y14: tmp1
// Y15?

I'm curious why Y15 wasn't used. At first glance I'm a little surprised to have registers like Y10 that are used for many things, but Y13 and Y14 that are only used on a short section of the loop.

@mmcloughlin
Copy link
Contributor

Ah, okay, so it's not failing to find an allocation, but you think a different allocation would improve performance.

Regarding Y15, I think the algorithm as written will use the minimum number of registers it can, so that should be equal to the max number of live variables at any point. In your case that must be 15, so the highest used is Y14.

I can't remember the precise implementation, but it's picking an allocation from a pool of available registers. Perhaps changing this logic to LRU would approximate what you're looking for, and minimize dependencies. There's likely a ton of research on this that I should read up on. The original allocator was intended to be simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants