New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spectre mitigations: add a mode that monitors branch mispredictions and dynamically turns on fences #8175
Comments
One slight tweak: those three |
Is a single |
I think so? At the very least, in the Linux implementation, the memory-map changes are made under one lock, and one IPI is performed to other cores if needed; it'd be neat to find something in the POSIX spec either way to cite though. |
Even with low amounts of mispredicted branches it would be possible to (slowly) leak data, right? |
The idea is that one would set the quota according to the desired probability (leak bit-rate bound). I haven't thought too much about the control algorithm here but perhaps one puts a module in "non-speculative mode" for the remaining duration of any individual instance alive at the time of the heightened branch mispredict rate (one could implement this with epochs, labeling instances at startup and keeping a count of active instances in epoch N-1 and N). Or something like that. I should also note that this can be layered with existing mitigations: so e.g. any explicit bounds checks are protected already (cannot read others' heaps even in misspeculation) and this technique is mainly to address the "indirect branches can jump anywhere and find a read gadget" problem, which itself should have a lower effective bit-rate... |
I just experimented a bit with this idea by writing a little program that mmaps two assembly routines over the top of each other -- identical except for LFENCE's vs. 3-byte NOP's -- while running, and observing the effective timing difference. (The second thread can actually mmap back and forth with different duty cycles and one can observe that smoothly changing the runtime by altering how much speculation occurs -- a very weird sort of PWM.) Here is the gist. Note that this doesn't verify the page-crossing behavior (the little snippet lives on one page), it just shows that the remap-it-live action does work. |
In discussion today with @fitzgen, @jameysharp, @elliottt and @lpereira, we were considering the idea to dynamically monitor branch mispredictions and isolate execution of any Wasm instance that had used up a "misspeculation quota". I realized that actually what we could do is (effectively) turn off speculation -- you run out, you can't use it anymore! -- by dynamically inserting
lfence
s.Specifically: the (one?) neat thing about fully coherent icaches on x86 is that we can switch out the code that's running, on the fly, even if other threads are in the middle of functions we're switching out, as long as we're very careful to do it atomically (state between any two stores is valid code).
Consider the case where we want an
lfence
before every indirect branch (say; or before every branch; orthogonal detail) and we have:we can replace the three bytes of
nop
(0x90, 0x90, 0x90
) withlfence
(0x0f
,0xae
,0xe8
) if we want to "turn off speculation" for this module for a little bit.There are at least three ways to do that on an x86 machine (with coherent icaches):
nop
to make this a 32-bit region we could overwrite with one 32-bit store.The last one is pretty neat:
mmap
is atomic with respect to every other thread (appears as a single store in the total store order; it must, because if other thread had it mapped, it would receive an IPI, which is a synchronizing edge). So we basically "yank out the code ROM and replace it" in between instructions, and the new code doesn't speculate.Using this, we can build a control loop in a separate thread that monitors mispredict counters, and can flip the switch at will for any module that has excessive counts. It doesn't have to be a one-way trapdoor: a module could have a "mispredict quota" per time unit, and could reset to the fast code (no
lfence
s) after a set period. There is no impact on other modules -- it only impacts the module with the mispredicts.Finally, I suspect this will be a bit harder on non-coherent-icache architectures (aarch64, riscv64), but actually maybe the "mmap a new thing on top of running code" is enough of a jolt to yoink all other cores into coherent happiness again. Note that I haven't tested that!
The text was updated successfully, but these errors were encountered: