Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit endbr64 instructions on amd64 to support OpenBSD indirect branch target control flow enforcement. #13023

Open
wants to merge 12 commits into
base: trunk
Choose a base branch
from

Conversation

voutilad
Copy link

A previous PR (#12918) set the linker arg nobtcfi when building on OpenBSD. This PR adds in usage of the endbr64 instruction to allow dropping that linker arg and supporting this type of control flow enforcement on OpenBSD and any other platforms that adopt it. A separate commit in the PR drops the linker arg.

If unaware, endbr64 ends up a 4 byte nop-equivalent on amd64 systems that don't support or enable it.

I've never hacked on the ocaml compiler before, so not sure if work needs to be done to tweak for i386. Currently, OpenBSD doesn't support indirect branch target enforcement on i386 and I'm not aware of any dev currently working on it. (If it was in use, we'd need an endbr32 instruction instead.)

I've only been able to test on amd64, but the resulting native binaries run and tests seem to pass:

Summary:
  1354 tests passed
    51 tests skipped
     0 tests failed
     0 tests not started (parent test skipped or failed)
     0 unexpected errors
  1405 tests considered
gmake[1]: Leaving directory '/home/dv/src/ocaml.git/testsuite'

Couple points to note:

  1. I haven't tested reducing the endbr64 usage I'm proposing to find the minimal change needed. Some of the insertions at labels may not be required, but it's hard to tell without experimentation. (At least from my very limited knowledge of the compiler.)
  2. My autoconf change restricts a check specifically against aarch64 for OpenBSD. Currently only amd64 and arm64 platforms enforce IBT on OpenBSD. If and when support for riscv64, the pattern will need to be changed...but I'm willing to get ocaml will get the necessary tweaks to drop this linker flag for arm64 before then 馃槅

I've tested a port of this change to 5.1 and 4.14. The changes are similar, but need some slight tweaking.

OpenBSD added support for indirect branch target control flow
enforcement in 7.4. Previously, ocaml used a linker argument to
disable enforcement on OpenBSD. This commit inserts endbr64
instructions after function declarations and labels.

Disabling the use of the linker argument is committed separately.
@nojb
Copy link
Contributor

nojb commented Mar 11, 2024

not sure if work needs to be done to tweak for i386.

The native-code compiler does not support i386 anymore, so nothing needs to be done.

Copy link
Contributor

@dustanddreams dustanddreams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there is nothing to do for i386 in OCaml 5, as there is no native code generator for 32-bit platforms. However OCaml 4 has a native i386 backend.

asmcomp/emitaux.ml Outdated Show resolved Hide resolved
asmcomp/x86_gas.ml Outdated Show resolved Hide resolved
asmcomp/x86_gas.ml Outdated Show resolved Hide resolved
runtime/amd64.S Outdated Show resolved Hide resolved
runtime/amd64.S Outdated Show resolved Hide resolved
runtime/amd64.S Outdated Show resolved Hide resolved
runtime/amd64.S Outdated Show resolved Hide resolved
- remove some extraneous endbr64 usage around some globals
- remove endbr64 for cfi_startproc, move to function/funcdecl
runtime/amd64.S Outdated Show resolved Hide resolved
@dustanddreams
Copy link
Contributor

Also, there is the risk that, on some targets (such as macos...), as does not recognize endbr64, so the use of this instruction should be made conditional (at least target system-dependent) in asmcomp and runtime...

@voutilad
Copy link
Author

Also, there is the risk that, on some targets (such as macos...), as does not recognize endbr64, so the use of this instruction should be made conditional (at least target system-dependent) in asmcomp and runtime...

Hmm good point. Working on a tweak to do this. Is it safe to say if the system isn't macOS or Windows, skip emitting endbr64? Or should I overcome my fear of autoconf and build in an actual check that the assembler available can handle an endbr64 instruction 馃槵

@shindere
Copy link
Contributor

shindere commented Mar 11, 2024 via email

@dustanddreams
Copy link
Contributor

Hmm good point. Working on a tweak to do this. Is it safe to say if the system isn't macOS or Windows, skip emitting endbr64? Or should I overcome my fear of autoconf and build in an actual check that the assembler available can handle an endbr64 instruction 馃槵

Unfortunately, given there are about MAXINT different Linux distros, each with a slightly different version of binutils in use, there won't be an easy way to skirt autotools...

@voutilad
Copy link
Author

I dusted off the autotools portion of my brain. Got something working that tests an endbr64 instruction. Not sure if it's completely valid as it's not using the local assembler directly. (For some reason I cannot for the life of me get that to work as I'm not getting shell expansion of something like $AS.)

@shindere
Copy link
Contributor

shindere commented Mar 11, 2024 via email

@voutilad
Copy link
Author

@shindere ah, I missed your message. Yes that looks like the approach I want for the assembler test. I'll adapt my latest changes.

@voutilad
Copy link
Author

@shindere My latest commit creates and uses an m4 macro. For the assembler, do I need to check the preprocessor assembler, too?

@dustanddreams
Copy link
Contributor

This is starting to take shape.

In order to help cross-platform maintainability, I suggest the following instead:

  • in configure, check for <cet.h> and enable bti if this header exists and the compiler defines __CET__ with a non-zero value.
  • if this header is available, include it in runtime/amd64.S, otherwise #define _CET_ENDBR, and use _CET_ENDBR unconditionally in this file.

@shindere
Copy link
Contributor

shindere commented Mar 13, 2024 via email

@voutilad
Copy link
Author

The newer approach of AC_CHECK_HEADER simplifies things nicely.

There are some cases of extra endbr64 instructions being emitted. Might need some eyeballs from someone more familiar with asmcomp/amd64/emit.mlp.

@dustanddreams
Copy link
Contributor

There are some cases of extra endbr64 instructions being emitted. Might need some eyeballs from someone more familiar with asmcomp/amd64/emit.mlp.

You can probably drop the endbr64 when translating Llabel. Those labels are only used in jumps, not calls.

@shindere
Copy link
Contributor

shindere commented Mar 14, 2024 via email

@voutilad
Copy link
Author

There are some cases of extra endbr64 instructions being emitted. Might need some eyeballs from someone more familiar with asmcomp/amd64/emit.mlp.

You can probably drop the endbr64 when translating Llabel. Those labels are only used in jumps, not calls.

Unfortunately, CET affects any indirect jmp in addition to the use of call. That's what makes it a little tricky. Removing the emit_endbr64 () from the Llabel lbl match arm breaks things currently.

@xavierleroy
Copy link
Contributor

The impact of this change on code size and on execution speed needs to be assessed.

Until it is proved to be completely negligible, I would prefer a configure flag to select endbr64 instruction generation. This flag would be on by default on OpenBSD 7.4 and other platforms that mandate control-flow integrity checking, and off by default on all other platforms.

@shindere
Copy link
Contributor

shindere commented Mar 15, 2024 via email

@voutilad
Copy link
Author

The impact of this change on code size and on execution speed needs to be assessed.

Until it is proved to be completely negligible, I would prefer a configure flag to select endbr64 instruction generation. This flag would be on by default on OpenBSD 7.4 and other platforms that mandate control-flow integrity checking, and off by default on all other platforms.

I've just updated to tuck the detection and enablement behind a --enable-bti flag. The naming is obviously subject to input. I tend to describe this as "BTI" and not "CET" or "IBT". Tried to remain consistent for now in the PR.

@shindere
Copy link
Contributor

shindere commented Mar 21, 2024 via email

@dustanddreams
Copy link
Contributor

Am I correct that there are platforms were this is a requirement (i.e., if it's not enabled things simply don't work)? If that's correct, how about enabling it only when it turns out it's necessary, at least for the time being?

In the current state of things (without this PR), extra linker flags are used to ask for no call-flow-integrity enforcement.

The plan in this PR is to emit the needed endbr64 instructions so as to be able to remove these linker flags.

@voutilad
Copy link
Author

Thinking through putting this behind a feature flag. How does opam gain new options for usage with opam switch? I think that might be required to trigger this configuration flag.

@avsm
Copy link
Member

avsm commented Mar 30, 2024

Thanks for putting this patch together @voutilad; perfect timing to fix OCaml on my desktop! I've confirmed it passes all tests on my OpenBSD-current/x86_64 box which has hardware cfbti support (and that OCaml segfaults with SIGILLs without this patch).

I've pushed two patches to avsm@bd46cbf and avsm@5ac8012 that you may want to pull into your branch. They change the configure behaviour to that suggested by @xavierleroy above:

  • --enable-bti and --disable-bti will cause endbr64 instructions to be emitted or not if they are explicitly specified on the configure line. Instead of checking if cet.h is available, the runtime patches check for the configure flag instead. (I think we should error out if the user explicitly asks for bti but the compiler lacks support for it).
  • Not specifying an explicit flag will cause the autoconf scripts to activate it by default if OpenBSD 7.4/x86_64+ is detected, and leave it off otherwise. This seems like a good default for OpenBSD ports where we want to always harden the binary, but we could add a --disable-bti port flavor there if there is demand.
  • I've checked that passing --enable-bti works fine on Linux with a modern clang as well (which is already emitting some endbr64 instructions at the C level).
  • My configure scheme will also "just work" with the opam compiler packages on OpenBSD and Linux, as the detection logic is now sensible by default.

There's about a 1.5%-2% code size increase with the endbr64 instructions emitted for ocamlopt.opt, so there's probably a performance hit here. However, I noticed that we're sometimes emitting double endbr64s for some OCaml function preludes; e.g. from ocamlopt.opt below. Is this intentional?

00000000007ea120 <camlStdlib__Map.add_seq_930>:
  7ea120:       f3 0f 1e fa             endbr64
  7ea124:       f3 0f 1e fa             endbr64
  7ea128:       48 89 c6                mov    %rax,%rsi
  7ea12b:       49 83 ef 28             sub    $0x28,%r15
  7ea12f:       4d 3b 3e                cmp    (%r14),%r15
  7ea132:       72 3f                   jb     7ea173 <camlStdlib__Map.add_seq_930+0x53>
  7ea134:       49 8d 47 08             lea    0x8(%r15),%rax
  7ea138:       48 c7 40 f8 f7 10 00    movq   $0x10f7,-0x8(%rax)
  7ea13f:       00
  7ea140:       48 8d 15 79 13 bf ff    lea    -0x40ec87(%rip),%rdx        # 3db4c0 <caml_curry2>
  7ea147:       48 89 10                mov    %rdx,(%rax)
  7ea14a:       48 ba 07 00 00 00 00    movabs $0x200000000000007,%rdx
  7ea151:       00 00 02
  7ea154:       48 89 50 08             mov    %rdx,0x8(%rax)
  7ea158:       48 8d 15 21 00 00 00    lea    0x21(%rip),%rdx        # 7ea180 <camlStdlib__Map.fun_1623>
  7ea15f:       48 89 50 10             mov    %rdx,0x10(%rax)
  7ea163:       48 8b 7f 18             mov    0x18(%rdi),%rdi
  7ea167:       48 89 78 18             mov    %rdi,0x18(%rax)
  7ea16b:       48 89 f7                mov    %rsi,%rdi
  7ea16e:       e9 4d 56 fd ff          jmp    7bf7c0 <camlStdlib__Seq.fold_left_323>
  7ea173:       e8 4c 7c 07 00          call   861dc4 <caml_call_gc>
  7ea178:       eb ba                   jmp    7ea134 <camlStdlib__Map.add_seq_930+0x14>
  7ea17a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

00000000007ea180 <camlStdlib__Map.fun_1623>:
  7ea180:       f3 0f 1e fa             endbr64
  7ea184:       f3 0f 1e fa             endbr64
  7ea188:       48 89 c2                mov    %rax,%rdx
  7ea18b:       48 8b 4b 08             mov    0x8(%rbx),%rcx
  7ea18f:       48 8b 03                mov    (%rbx),%rax
  7ea192:       48 8b 77 18             mov    0x18(%rdi),%rsi
  7ea196:       48 89 cb                mov    %rcx,%rbx
  7ea199:       48 89 d7                mov    %rdx,%rdi
  7ea19c:       e9 ff d6 ff ff          jmp    7e78a0 <camlStdlib__Map.add_436>
  7ea1a1:       66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
  7ea1a8:       00 00 00
  6ea1ab:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

edit: sorry, I missed the discussion of this duplication above with @dustanddreams! It's tempting to use NOTRACK isn't it ;-)

avsm and others added 3 commits April 1, 2024 10:56
This also obeys an explicit directive for the configure flag
if specified. So --enable-bti will always attempt to compile with
BTI, and --disable-bti will always disable it (even if required on
the platform).  Not specifying the flag will cause the configure
script to try and do the right thing.
Depending on the presence of cet.h will end up emitting them on
most modern compilers, so this specifically checks for the
--enable-bti flag being pass to configure
@voutilad
Copy link
Author

voutilad commented Apr 1, 2024

However, I noticed that we're sometimes emitting double endbr64s for some OCaml function preludes; e.g. from ocamlopt.opt below. Is this intentional?

@avsm no, definitely not intentional. I noticed that as well and haven't been able to find the exact cause, but I'm thinking it's how functions and labels are emitted in OCaml. Since endbr64 needs to be at indirect jmp targets and there's no way to distinguish those at the moment (I could be missing it), it is probably emitting twice.

@voutilad
Copy link
Author

voutilad commented Apr 1, 2024

@avsm cherry-picked those commits, btw, and regenerated configure.

@dustanddreams
Copy link
Contributor

This is looking good. However, if one explicitly runs configure --enable-bti, you should nevertheless check for cet.h and only enable (bti=true) if found. Otherwise, naive ricers gentoo users will try to use this configure switch on systems lacking cfi support.

@gasche
Copy link
Member

gasche commented Apr 3, 2024

As a naive non-configure expert, I would expect a hard failure at configure time if the user asks for --enable-bti but the system does not actually support it. (Rather than silently ignoring the user's demand.) Or maybe the command could be respected, but with a warning pointing out that this may not work, if the test is only a heuristic and could fail on systems where endbr64 is in fact supported.

@shindere
Copy link
Contributor

shindere commented Apr 3, 2024 via email

@shindere
Copy link
Contributor

shindere commented Apr 3, 2024 via email

@avsm
Copy link
Member

avsm commented Apr 4, 2024

Checking for cet.h really seems unnecessary to me, since we also need to support arm64; does anyone have a machine with this activated? I tried to get OpenBSD/arm64 bti support to kick in by booting a qemu/hvt VM on my M2 Mac, but it didn't pass through enough ARMv8 instruction goodness to activate the enforcement...

@dustanddreams
Copy link
Contributor

Checking for cet.h really seems unnecessary to me, since we also need to support arm64; does anyone have a machine with this activated? I tried to get OpenBSD/arm64 bti support to kick in by booting a qemu/hvt VM on my M2 Mac, but it didn't pass through enough ARMv8 instruction goodness to activate the enforcement...

Checking for cet.h makes sense on x86 since we want to use this file in runtime/amd64.S.

There is no similar file for arm (yet). I don't have access to arm64 systems with working control flow enforcement. Try installing a native OpenBSD on your M2 on an external USB disk (-:

@voutilad
Copy link
Author

voutilad commented Apr 4, 2024

FWIW, my arm64 devices that run OpenBSD also don't support BTI 馃槥 . I'm looking to get one soon, but at the moment have no way to work on that support as part of this PR.

@voutilad
Copy link
Author

Well, I now have an OpenBSD arm64 machine with BTI...so if we can agree on a general path forward here for amd64 I can also draft a PR for arm64.

@kettenis
Copy link

Checking for cet.h really seems unnecessary to me, since we also need to support arm64; does anyone have a machine with this activated? I tried to get OpenBSD/arm64 bti support to kick in by booting a qemu/hvt VM on my M2 Mac, but it didn't pass through enough ARMv8 instruction goodness to activate the enforcement...

Actually I think that BTI support will be active even if qemu doesn't pass through the processor feature bit. At least someone reported that this happens on an M3 Mac. I haven't decided whether I consider this a bug or a feature ;)

@@ -897,6 +897,7 @@ let fundecl fundecl =
D.global (emit_symbol fundecl.fun_name);
D.label (emit_symbol fundecl.fun_name);
emit_debug_info fundecl.fun_dbg;
I.endbr64 ();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The endbr64 instruction should really be after .cfi_startproc

@voutilad
Copy link
Author

@dustanddreams, @avsm how do you want me to proceed with the PR?

@dustanddreams
Copy link
Contributor

Well I still would like the existence of <cet.h> to be checked on amd64, as mentiond in this comment.

@xavierleroy
Copy link
Contributor

There's about a 1.5%-2% code size increase with the endbr64 instructions emitted for ocamlopt.opt, so there's probably a performance hit here.

I made some measurements on x86_64, see the table below. For ocamlopt.opt there's a 2.9% size increase for the code segment (as reported by size ocamlopt.opt). But what worries me more is that a number of standard library modules see code size increase by 10% or more. (I think ocamlopt.opt is a bit special in having a lot of straight-line initialization code, not containing labels.)

So, I'm going to insist that endbr64 instructions should be generated only for labels that can be the targets of indirect jumps. If I'm not mistaken, these are the labels mentioned in Lswitch and Lpushtrap Linear instructions.

9517298 9795226 2,9聽%   ocamlopt.opt
12948	13889	7,3聽%	stdlib.o
14095	15028	6,6聽%	stdlib__Arg.o
21883	23715	8,4聽%	stdlib__Array.o
1793	1801	0,4聽%	stdlib__ArrayLabels.o
846	926	9,5聽%	stdlib__Atomic.o
11607	12375	6,6聽%	stdlib__Bigarray.o
737	817	10,9聽%	stdlib__Bool.o
10843	11639	7,3聽%	stdlib__Buffer.o
23504	25392	8,0聽%	stdlib__Bytes.o
3737	3745	0,2聽%	stdlib__BytesLabels.o
337	373	10,7聽%	stdlib__Callback.o
1227	1351	10,1聽%	stdlib__Char.o
2773	2929	5,6聽%	stdlib__Complex.o
385	425	10,4聽%	stdlib__Condition.o
5833	6193	6,2聽%	stdlib__Digest.o
5426	5767	6,3聽%	stdlib__Domain.o
16728	18075	8,1聽%	stdlib__Dynarray.o
3679	3922	6,6聽%	stdlib__Effect.o
1635	1791	9,5聽%	stdlib__Either.o
26126	27760	6,3聽%	stdlib__Ephemeron.o
15272	16229	6,3聽%	stdlib__Filename.o
22373	23748	6,1聽%	stdlib__Float.o
44878	47943	6,8聽%	stdlib__Format.o
1338	1422	6,3聽%	stdlib__Fun.o
2928	3116	6,4聽%	stdlib__Gc.o
19749	21043	6,6聽%	stdlib__Hashtbl.o
5321	5730	7,7聽%	stdlib__In_channel.o
1013	1105	9,1聽%	stdlib__Int.o
2402	2590	7,8聽%	stdlib__Int32.o
2415	2611	8,1聽%	stdlib__Int64.o
4577	4739	3,5聽%	stdlib__Lazy.o
4256	4595	8,0聽%	stdlib__Lexing.o
29522	31360	6,2聽%	stdlib__List.o
2953	2961	0,3聽%	stdlib__ListLabels.o
17317	18568	7,2聽%	stdlib__Map.o
982	1062	8,1聽%	stdlib__Marshal.o
197	205	4,1聽%	stdlib__MoreLabels.o
631	687	8,9聽%	stdlib__Mutex.o
2457	2653	8,0聽%	stdlib__Nativeint.o
3258	3494	7,2聽%	stdlib__Obj.o
210	218	3,8聽%	stdlib__Oo.o
1742	1938	11,3聽%	stdlib__Option.o
2123	2231	5,1聽%	stdlib__Out_channel.o
4348	4572	5,2聽%	stdlib__Parsing.o
9518	10237	7,6聽%	stdlib__Printexc.o
1762	1922	9,1聽%	stdlib__Printf.o
3275	3527	7,7聽%	stdlib__Queue.o
12312	12946	5,1聽%	stdlib__Random.o
2246	2494	11,0聽%	stdlib__Result.o
54761	58988	7,7聽%	stdlib__Scanf.o
1770	1894	7,0聽%	stdlib__Semaphore.o
23707	25288	6,7聽%	stdlib__Seq.o
18215	19555	7,4聽%	stdlib__Set.o
1906	2070	8,6聽%	stdlib__Stack.o
86	94	9,3聽%	stdlib__StdLabels.o
9676	10448	8,0聽%	stdlib__String.o
2746	2754	0,3聽%	stdlib__StringLabels.o
2740	2820	2,9聽%	stdlib__Sys.o
539	579	7,4聽%	stdlib__Type.o
2691	2947	9,5聽%	stdlib__Uchar.o
273	305	11,7聽%	stdlib__Unit.o
11086	11855	6,9聽%	stdlib__Weak.o

@gasche gasche added back-end enhancement submitter-action-needed This PR is waiting for an action of its submitter. portabilty Hardware and operating system support labels May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
back-end enhancement portabilty Hardware and operating system support submitter-action-needed This PR is waiting for an action of its submitter.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants