`Lazy` panics under bare metal environment #154

Lorilandly · 2023-06-12T09:05:04Z

Hi,
I was trying to replace lazy_static! with Lazy. While it works fine under normal environment (tried with rust playground), when I tried running it in bare metal (qemu rv64gc no compression), code panics with message: "Lazy instance has previously been poisoned" spin-0.9.8/src/lazy.rs: 100:21.
Here is the code segment

use spin::{Lazy, Mutex};

static ST: Lazy<Mutex<St>> = Lazy::new(||{
    let s = St::new();
    s.init();
    Mutex::new(s)
});
struct St {}

impl St {
    fn new() -> Self {
        println!("new");
        Self{}
    }
    
    fn init(&self) {
        println!("init");
    }
    
    fn dos(&mut self) {
        println!("did something mutable");
    }
}

pub extern "C" fn main() -> ! {
    // do initialization stuff

    ST.lock().dos();
    
    loop{
        unsafe { wfi() };
    }
}

In my understanding, Lazy should have the same behavior with lazy_static, or are they actually different and shouldn't be mix used? I'm using it primarily for initializing peripherals like uart and stuff, if there are better alternatives, please tell me.
Thank you!

The text was updated successfully, but these errors were encountered:

zesterer · 2023-06-12T12:07:10Z

This seems to me like a bug. I'm going to investigate this today.

zesterer · 2023-06-12T12:20:39Z

I've gone of the implementation quite carefully. Is there anything unusual about the platform you're running on? Perhaps unusual or non-existent support for atomics, or a CPU that starts with SMP enabled by default?

Lorilandly · 2023-06-13T05:15:22Z

Here is my qemu command

qemu-system-riscv64 -M virt -smp 1 -m 128M \
            -display none -serial stdio \
            -bios none -kernel [PATH_TO_ELF]

The elf is compiled with the target riscv64gc-unknown-none-elf with the following rustflag

-Ctarget-feature=-c

I don't think there is there's anything unusual with the configuration. They are written to .cargo/config.toml file. I tried on macos and ubuntu, but the error is the same.

zesterer · 2023-06-13T09:10:28Z

Is it possible that you have something like an interrupt handler being invoked at an unexpected time?

Lorilandly · 2023-06-13T12:00:45Z

Just for debugging, I've changed the interrupt handler to

  .align 2
trap_entry:
  j 3f

3: // loop infinitely
  wfi
  j 3b

If the interrupt is invoked, it should end up in the dead-loop. Yet interrupt is not triggered, the code panics still.

taiki-e · 2023-06-14T07:31:02Z

(Disclaimer: I don't know what linker scripts are being used or what assembly is being generated, so I can only speak to some guesses.)

First, one of the differences between lazy_static and Lazy is that lazy_static does not store initialization code in static, while Lazy stores initialization code in static as a function pointer (as a field of Lazy). This gives the following differences:

Lazy at least is aligned to the same alignment as the function pointer. (AFAIK, except for AVR, the function pointer alignment is equal to its size, i.e., greater than 8-bit)
- This probably optimizes the alignment calculation in 8-bit CAS used in Once. ¹
In some environments, storing the function pointer in static will cause the compiler bug. However, AFAIK, this should not affect riscv*-unknown-none-elf.

Second, rustc does not automatically adjust sp alignment in riscv*-unknown-none-elf at startup. If your code or the library you are using does not adjust the alignment here, the result can be various bugs due to misalignment. (rust-lang/rust#86693) For example:

The compiler performs optimization based on the assumption that references are properly aligned, which can lead to incorrect calculations in the alignment calculation mentioned above.
In most architectures including RISC-V, one of the requirements for memory access to be atomic is that addresses be properly aligned.² Misalignment can cause silently loss of atomicity.

As mentioned at the outset, I'm not certain that misalignment is the cause of the problem, but the fact is that I could not reproduce this problem in an environment³ where sp is properly⁴ aligned.

-Ctarget-feature=-c

BTW, probably not related to this problem, but I guess that if you have not recompiled the standard library with this flag, the standard library compiled with the c target-feature will be linked, so the binary will have a mix of compressed parts and non-compressed parts.

RISC-V's {8,16}-bit CAS/RMW are internally emulated by 32-bit LR/SC, so alignment calculation is required ↩
The proper alignment here is usually the same as the size. Note that this may be larger than the alignment of the Rust integer type. ↩
The minimal setup (two linker scripts (1, 2) and one line of initialization code) to test riscv*-unknown-none-elf with QEMU used in portable-atomic/semihosting did not reproduce this problem. ↩
The proper alignment here is 16 bytes (riscv64) / 8 bytes (riscv32): https://github.com/riscv-non-isa/riscv-eabi-spec/blob/HEAD/EABI.adoc#eabi-stack-alignment ↩

Lorilandly · 2023-06-15T05:04:27Z

Thank you so much for this detailed explanation. The code was compiled with #![no_std] so it couldn't be the standard lib. After some troubleshooting I can confirm that the issue is caused by a bad alignment. However, it wasn't sp, but the alignment of .data section for whatever reason. I had:

.data : {
    . = ALIGN(4096);
    PROVIDE(_data_start = .);
    *(.sdata .sdata.*) *(.data .data.*)
    PROVIDE(_data_end = .);
} >RAM AT>RAM :data

When I changed it to

 .data : ALIGN(4096) {
    PROVIDE(_data_start = .);
    *(.sdata .sdata.*) *(.data .data.*)
    PROVIDE(_data_end = .);
} >RAM AT>RAM :data

the problem was solved. Some quirk with the linker!

Again, thank you so much for dealing with this bug that was caused by myself after all!

taiki-e · 2023-06-15T05:23:29Z

The code was compiled with #![no_std] so it couldn't be the standard lib.

To be clear: I think "standard library" in my comment was misleading (I also meant core and alloc), but even with no_std, core is always linked, and it is also common to link to alloc. In riscv64gc-unknown-none-elf, both are precompiled libraries by default.

Lorilandly · 2023-06-15T05:30:46Z

Oh ok great! That actually answers why I was having some strange issues with core libs when I tries to run the binary on the real chip. Again thanks for the clarification😄

zesterer · 2023-06-15T08:35:38Z

Woo, good to hear this is solved! I was worried this might have been a soundness issue in spin for a moment 😀

Thanks @taiki-e for providing your knowledge!

Lorilandly closed this as completed Jun 15, 2023

taiki-e mentioned this issue Oct 2, 2023

Is there a mechanism to emulate only CAS operations? taiki-e/portable-atomic#123

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Lazy` panics under bare metal environment #154

`Lazy` panics under bare metal environment #154

Lorilandly commented Jun 12, 2023

zesterer commented Jun 12, 2023

zesterer commented Jun 12, 2023

Lorilandly commented Jun 13, 2023

zesterer commented Jun 13, 2023

Lorilandly commented Jun 13, 2023

taiki-e commented Jun 14, 2023 •

edited

Lorilandly commented Jun 15, 2023

taiki-e commented Jun 15, 2023

Lorilandly commented Jun 15, 2023

zesterer commented Jun 15, 2023 •

edited

Lazy panics under bare metal environment #154

Lazy panics under bare metal environment #154

Comments

Lorilandly commented Jun 12, 2023

zesterer commented Jun 12, 2023

zesterer commented Jun 12, 2023

Lorilandly commented Jun 13, 2023

zesterer commented Jun 13, 2023

Lorilandly commented Jun 13, 2023

taiki-e commented Jun 14, 2023 • edited

Footnotes

Lorilandly commented Jun 15, 2023

taiki-e commented Jun 15, 2023

Lorilandly commented Jun 15, 2023

zesterer commented Jun 15, 2023 • edited

`Lazy` panics under bare metal environment #154

`Lazy` panics under bare metal environment #154

taiki-e commented Jun 14, 2023 •

edited

zesterer commented Jun 15, 2023 •

edited