New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sporadic panic in async-io thread with esp-sys-svc async TLS example #84
Comments
To be more concrete you never saw it before applying this optimization? |
To be honest, it seems to depend on code layout. It might trigger in other conditions, I have not managed to narrow it down. It smells like a race condition of some sort. |
cc @ivmarkov Any idea why this would be happening? |
The issue actually appears to be in our locking implementation, specifically |
No idea. Will try to reproduce once I'm back. |
@ivmarkov have you had any luck reproducing this ? Let me know if you need further input and testing from my part. |
Couldn't get to there yet. Sorry about that! Very likely will try to do something related to async-io in the next couple of days. |
I am running in the same issue as described by @jothan I have done some investigation and I believe the issue boils down to the pinning of event_listener In this case the address generated is not 32 bit aligned (ex : 0x3ffe860f not sure if this matters) but more importantly point back to an area of the stack where the first byte is not 0x0. Using GDB i can see that the value it points to is 0x3f (which is the MSB of an address used during the creation of the thread) since the remain bytes are 0 maybe 0xffe8610 would be the correct pinned address? let event_listener = EventListener::new();
let _ = event_listener.is_listening();
pin!(event_listener); The pinned address will be 32bit aligned pointing to a stack area where all bytes are 0 and the bug doesn't happen. For reference here are the two assembly dump of the function Hope that helps @ivmarkov |
@npmenard Thank you for the detailed analysis! I can reproduce the issue (though I haven't connected with GDB to examine the address), and I confirm that the dummy (Also, not sure if that matters, but I'm consistently reproducing the issue with opt-levels "s" and "z" (haven't tried with other opt levels yet, will do)). As for:
I assume by address 0x3ffe860f you mean the
Even more suspicious. This sounds a bit like a bug in the LLVM xtensa backend. @MabezDev - sorry for pulling you here, but do you think you and/or Andrei can help? There are assembly dumps attached to the previous comment. |
It's the address that would be printed when formatting event_listener with I have noticed something odd but I am not sure if it will shade some light, whenever the bug occurs the value printed by Lastly I am attaching a backtrace showing the code path that writes at the location in stack where ultimatley the Pin will point to. It happens during WindowOverflow8 exception is triggered when setting a thread local variable. Hopefully this helps a bit. Let me know if there is anything else I can do to help. |
With the new Rust 1.78 for ESP32 (in pre-release in the rust-build project), debug assertions for std are now enabled and seem to catch a similar alignment problem with an assertion in core::ptr::read::precondition_check. Full backtrace: backtrace.txt Partial backtrace:
|
Looking at the backtraces, it would seem that OnceCell does something weird that causes EventListener trouble down the line. |
Why would OnceCell be doing something weird and moreover, this something only happens with esp idf + xtensa and not on x86 and (likely) arm? I still find it more likely that the root cause is a miscompilation bug in the llvm xtensa backend. Will open a bug at the xtensa llvm fork shortly btw. And will test (again) with riscv32imc on the esp idf, though I'm relatively sure the issue did not happen there. |
@ivmarkov Doing something weird in this case is triggering the miscompilation. I don't doubt your analysis. |
Other than dereferencing a raw pointer, I don't see anything else weird, unfortunately... |
I know there is little to no progress here, for which I apologize. In the meantime, what I found to work for me is setting the version of We'll resume the work on finding the root cause shortly. |
Code: https://github.com/esp-rs/esp-idf-svc/blob/4fff46bba1be66ae3c6945d3b8bda30a589f6f6b/examples/tls_async.rs
Backtrace:
backtrace.txt
Crate versions:
Cargo.lock
I am running the example on a ESPcam board, with an esp32 (revision v3.0) Xtensa module (Tinker AI ESP32-S, to be exact).
The example usually crashes once or twice after reset before working.
I've only managed to reproduce this with an optimized debug build with codegen-units=1.
The text was updated successfully, but these errors were encountered: