Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for querying cpuid #1118

Merged
merged 1 commit into from Feb 11, 2023
Merged

Conversation

evacchi
Copy link
Contributor

@evacchi evacchi commented Feb 10, 2023

fixes #1111, adds setup for fixing #580

While I haven't tested this on a proper x86 machine (but I do have a dusty core duo laptop which may really be the thing I need, if I manage to turn it on :D) I have sketched a solution by putting together everyone's feedback (including @ncruces' 馃殌). In the meantime I have also discovered that GOARCH=amd64 has the ABM feature flag off on Rosetta2.

  • this is in a new cupid_amd64.{s,go} because I felt it would pollute too much impl and arch didn't feel like the right place (that's for runtime)
  • For now I have only added the constants that we use and/or plan to use (e.g. SSE3, 4.1, 4.2), and only the related bitmaps
  • you still have to carefully pick the right set of flags because there is no proper type checking in place (all are uint64). Suggestions welcome for better ergonomics/idioms

Values are kept in a struct that is local to the compiler instance. This is re-inited at for each amd64Compiler -- TBF it may be created once and then copied over when necessary. I wanted to keep it local to the compiler instance, instead of global, to make sure we can properly test it.

Speaking of testing: the compiler test right now is kind of silly because it checks for the length of the generated byte array. The one reason is that I couldn't figure out how to decode or inspect the list of opcodes the right way. This is the main reason why this is draft.

@evacchi evacchi force-pushed the cpuid-abm-query branch 2 times, most recently from d013459 to 98decae Compare February 10, 2023 15:45
@ncruces
Copy link
Collaborator

ncruces commented Feb 10, 2023

I tried this on the affected machine and I can report that it worked:

@evacchi
Copy link
Contributor Author

evacchi commented Feb 10, 2023

I am not 100% sure I am doing the right thing here 馃槵

in your PR you are and-ing 0x0400_0000, in this PR it's 0x20. For some reason on my (dusty old and excruciatingly slow) machine this still prints "100000000..." instead of "2.500..."; and it only works with 0x0400_0000.

However, according to the spec the 5th bit on ECX should be 0x20 馃お

@ncruces
Copy link
Collaborator

ncruces commented Feb 10, 2023

I am not 100% sure I am doing the right thing here 馃槵

in your PR you are and-ing 0x0400_0000, in this PR it's 0x20. For some reason on my (dusty old and excruciatingly slow) machine this still prints "100000000..." instead of "2.500..."; and it only works with 0x0400_0000.

However, according to the spec the 5th bit on ECX should be 0x20 馃お

OK, then I need to recheck. Did the tests pass for me because of caching, or because I have both 0x20 and 0x0400_0000?

I did notice that. From the docs, I assumed it's the 5th bit from the left: ((1<<31) >> 5) == 0x0400_0000, which did the right thing on my machine. You assumed the opposite (1 << 5) == 0x20.

I thing we need to figure it out from other implementations.

@ncruces
Copy link
Collaborator

ncruces commented Feb 10, 2023

I think, based on the Intel "sponsored" library (github.com/intel-go/cpuid, notice the redirect), the correct implementation is to first get leaf0x80000000 (which I didn't do, to make sure you can get leaf0x80000001) then get leaf0x80000001.
https://github.com/aregm/cpuid/blob/219e067757cbc0ddb71908b19826ddca4177987d/cpuid_amd64.go#L218-L229

Then compare extraFeatureFlags with 0x20.
This is HasExtraFeature: https://github.com/aregm/cpuid/blob/219e067757cbc0ddb71908b19826ddca4177987d/cpuid.go#L101-L104
This is ABM: https://github.com/aregm/cpuid/blob/219e067757cbc0ddb71908b19826ddca4177987d/cpuid.go#L461-L467

So it seems you're right, and I messed up, but maybe your excruciatingly slow computer needs to check leaf0x80000000 before doing leaf0x80000001?

Comment on lines 3 to 10
// lifted from github.com/intel-go/cpuid and src/internal/cpu/cpu_x86.s
// func cpuid(arg1, arg2 uint32) (eax, ebx, ecx, edx uint32)
TEXT 路cpuid(SB), NOSPLIT, $0-24
MOVL arg1+0(FP), AX
MOVL arg2+4(FP), CX
CPUID
MOVL AX, eax+8(FP)
MOVL BX, ebx+12(FP)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need AX and BX register values? I think we could just delete them if you don't use.

Suggested change
// lifted from github.com/intel-go/cpuid and src/internal/cpu/cpu_x86.s
// func cpuid(arg1, arg2 uint32) (eax, ebx, ecx, edx uint32)
TEXT 路cpuid(SB), NOSPLIT, $0-24
MOVL arg1+0(FP), AX
MOVL arg2+4(FP), CX
CPUID
MOVL AX, eax+8(FP)
MOVL BX, ebx+12(FP)
TEXT 路cpuid(SB), NOSPLIT, $0-16
MOVL arg1+0(FP), AX
MOVL arg2+4(FP), CX
CPUID

Copy link
Collaborator

@ncruces ncruces Feb 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other feature queries might use all 4 (e.g. BMI uses EBX). This makes it the only assembly you'll ever need to feature test amd64.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in theory (maybe not in practice) you should call cpuid(0, 0) and inspect EAX to figure out if you can trust cpuid(1, 0), and similarly call cpuid(0x8000_0000, 0) before cpuid(0x8000_0001, 0).

https://www.amd.com/system/files/TechDocs/25481.pdf

The smallest function number of the standard function range is Fn0000_0000. The largest function number of the standard function range, for a particular implementation, is returned in CPUID
Fn0000_0000_EAX.
The smallest function number of the extended function range is Fn8000_0000. The largest function number of the extended function range, for a particular implementation, is returned in CPUID
Fn8000_0000_EAX.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks for the info!

internal/engine/compiler/cpuid_amd64.go Outdated Show resolved Hide resolved
internal/engine/compiler/impl_amd64.go Outdated Show resolved Hide resolved
internal/engine/compiler/impl_amd64_test.go Outdated Show resolved Hide resolved
internal/engine/compiler/impl_amd64_test.go Outdated Show resolved Hide resolved
internal/engine/compiler/impl_amd64_test.go Outdated Show resolved Hide resolved
internal/engine/compiler/impl_amd64_test.go Show resolved Hide resolved
@evacchi evacchi marked this pull request as ready for review February 11, 2023 10:40
Copy link
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!馃憦

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
@mathetake mathetake merged commit 5895873 into tetratelabs:main Feb 11, 2023
@codefromthecrypt
Copy link
Collaborator

OSS collaboration like this makes my day, thanks @ncruces @evacchi @mathetake!

@evacchi evacchi deleted the cpuid-abm-query branch February 12, 2023 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

amd64 compiler bug
4 participants