Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LTO should be enabled on architecture specific source files when using "native" optimizations #1112

Open
KungFuJesus opened this issue Jan 12, 2022 · 10 comments

Comments

@KungFuJesus
Copy link
Contributor

The motivation for turning this off was to prevent the compiler from generating invalid code that wasn't protected by runtime checks. However, if the user is using -march=native, they care nothing about portability and the chances of getting something that works on multiple targets on their ISA is basically 0 unless they are compiling for an ancient target. For that case, I'm sure the user would prefer link time optimizations, as the compiler is only going to compile code that the compiler detects is compatible with their CPU, anyway.

@mtl1979
Copy link
Collaborator

mtl1979 commented Jan 12, 2022

This pretty much requires test that both the compiler and linker recognize -flto, otherwise the build might fail.

If we enable LTO by default for native builds, there also needs to be option to not enable it, if something doesn't work correctly with specific compiler or linker version.

@gdevenyi
Copy link
Contributor

CMake can do this right

https://stackoverflow.com/a/47370726

@mtl1979
Copy link
Collaborator

mtl1979 commented Jan 12, 2022

We're currently requiring cmake version 3.5.1, so any module added after that is not available unless we guard any use with explicit version check.

@nmoinvaz
Copy link
Member

nmoinvaz commented Apr 4, 2022

Is it possible to split arch-specific code into its own static library? Then use LTO on the main zlib static library only - without regard to -mnative..?

@KungFuJesus
Copy link
Contributor Author

Static libraries are the stage just prior to LTO, I think. When you link the final library, that's when the IPOs and other clever things that happen that cause trouble.

@kloczek
Copy link

kloczek commented Apr 4, 2022

Please do not hardcode any optimisations.
LTO optimistation is possible to pass over $AR, $NM, $RANLIB, $CFLAGS, $LDFLAGS env variables witout touching source code of any project.

@KungFuJesus
Copy link
Contributor Author

KungFuJesus commented Apr 5, 2022

@kloczek that's not exactly what's happening here. They are specifically disabling LTO for the sake of producing a portable binary package. What's there to disable it now is a fairly weak '-fno-lto' for packages that are built with special vectorizations. This is meant to prevent an issue where AVX code leaks into the general code path, causing crashes due to interprocedural optimization deduplicating a code block for the multiple implementations and using the first thing it sees, even though it's an illegal instruction on some microarchitectures.

My argument here is that this "fno-lto" flag doesn't need to exist if you are compiling with native optimizations. If the user wants to create a super optimized binary that will run nowhere else but their machine (implicit with -march=native) they should be free to enable LTO as they please.

@nmoinvaz
Copy link
Member

nmoinvaz commented Mar 6, 2024

This should be possible now with -D WITH_RUNTIME_CPU_DETECTION=OFF correct?

@Dead2
Copy link
Member

Dead2 commented Mar 7, 2024

It should work, although I am unsure whether anyone has actually tested it. Might potentially be some CMake changes required still. Should perhaps also have changed one or two of the native CI builds to enable LTO.

@mtl1979
Copy link
Collaborator

mtl1979 commented Mar 9, 2024

-DWITH_RUNTIME_CPU_DETECTION=OFFis guaranteed to only work with x86/x86_64, for other architectures it might not work correctly with all compiler versions or processors. It definitely doesn't work when cross-compiling, so CI can only test LTO build when targeting x86_64 (aka AMD64).

When building locally without containers or emulator, it might be possible to manually disable some optimisations that depend on run-time detection if the current processor doesn't support such optimisations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants