Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy panics under bare metal environment #154

Closed
Lorilandly opened this issue Jun 12, 2023 · 10 comments
Closed

Lazy panics under bare metal environment #154

Lorilandly opened this issue Jun 12, 2023 · 10 comments

Comments

@Lorilandly
Copy link

Hi,
I was trying to replace lazy_static! with Lazy. While it works fine under normal environment (tried with rust playground), when I tried running it in bare metal (qemu rv64gc no compression), code panics with message: "Lazy instance has previously been poisoned" spin-0.9.8/src/lazy.rs: 100:21.
Here is the code segment

use spin::{Lazy, Mutex};

static ST: Lazy<Mutex<St>> = Lazy::new(||{
    let s = St::new();
    s.init();
    Mutex::new(s)
});
struct St {}

impl St {
    fn new() -> Self {
        println!("new");
        Self{}
    }
    
    fn init(&self) {
        println!("init");
    }
    
    fn dos(&mut self) {
        println!("did something mutable");
    }
}

pub extern "C" fn main() -> ! {
    // do initialization stuff

    ST.lock().dos();
    
    loop{
        unsafe { wfi() };
    }
}

In my understanding, Lazy should have the same behavior with lazy_static, or are they actually different and shouldn't be mix used? I'm using it primarily for initializing peripherals like uart and stuff, if there are better alternatives, please tell me.
Thank you!

@zesterer
Copy link
Collaborator

This seems to me like a bug. I'm going to investigate this today.

@zesterer
Copy link
Collaborator

I've gone of the implementation quite carefully. Is there anything unusual about the platform you're running on? Perhaps unusual or non-existent support for atomics, or a CPU that starts with SMP enabled by default?

@Lorilandly
Copy link
Author

Here is my qemu command

qemu-system-riscv64 -M virt -smp 1 -m 128M \
            -display none -serial stdio \
            -bios none -kernel [PATH_TO_ELF]

The elf is compiled with the target riscv64gc-unknown-none-elf with the following rustflag

-Ctarget-feature=-c

I don't think there is there's anything unusual with the configuration. They are written to .cargo/config.toml file. I tried on macos and ubuntu, but the error is the same.

@zesterer
Copy link
Collaborator

Is it possible that you have something like an interrupt handler being invoked at an unexpected time?

@Lorilandly
Copy link
Author

Just for debugging, I've changed the interrupt handler to

  .align 2
trap_entry:
  j 3f

3: // loop infinitely
  wfi
  j 3b

If the interrupt is invoked, it should end up in the dead-loop. Yet interrupt is not triggered, the code panics still.

@taiki-e
Copy link
Contributor

taiki-e commented Jun 14, 2023

(Disclaimer: I don't know what linker scripts are being used or what assembly is being generated, so I can only speak to some guesses.)

First, one of the differences between lazy_static and Lazy is that lazy_static does not store initialization code in static, while Lazy stores initialization code in static as a function pointer (as a field of Lazy). This gives the following differences:

  • Lazy at least is aligned to the same alignment as the function pointer. (AFAIK, except for AVR, the function pointer alignment is equal to its size, i.e., greater than 8-bit)
    • This probably optimizes the alignment calculation in 8-bit CAS used in Once. 1
  • In some environments, storing the function pointer in static will cause the compiler bug. However, AFAIK, this should not affect riscv*-unknown-none-elf.

Second, rustc does not automatically adjust sp alignment in riscv*-unknown-none-elf at startup. If your code or the library you are using does not adjust the alignment here, the result can be various bugs due to misalignment. (rust-lang/rust#86693) For example:

  • The compiler performs optimization based on the assumption that references are properly aligned, which can lead to incorrect calculations in the alignment calculation mentioned above.
  • In most architectures including RISC-V, one of the requirements for memory access to be atomic is that addresses be properly aligned.2 Misalignment can cause silently loss of atomicity.

As mentioned at the outset, I'm not certain that misalignment is the cause of the problem, but the fact is that I could not reproduce this problem in an environment3 where sp is properly4 aligned.


-Ctarget-feature=-c

BTW, probably not related to this problem, but I guess that if you have not recompiled the standard library with this flag, the standard library compiled with the c target-feature will be linked, so the binary will have a mix of compressed parts and non-compressed parts.

Footnotes

  1. RISC-V's {8,16}-bit CAS/RMW are internally emulated by 32-bit LR/SC, so alignment calculation is required

  2. The proper alignment here is usually the same as the size. Note that this may be larger than the alignment of the Rust integer type.

  3. The minimal setup (two linker scripts (1, 2) and one line of initialization code) to test riscv*-unknown-none-elf with QEMU used in portable-atomic/semihosting did not reproduce this problem.

  4. The proper alignment here is 16 bytes (riscv64) / 8 bytes (riscv32): https://github.com/riscv-non-isa/riscv-eabi-spec/blob/HEAD/EABI.adoc#eabi-stack-alignment

@Lorilandly
Copy link
Author

Thank you so much for this detailed explanation. The code was compiled with #![no_std] so it couldn't be the standard lib. After some troubleshooting I can confirm that the issue is caused by a bad alignment. However, it wasn't sp, but the alignment of .data section for whatever reason. I had:

.data : {
    . = ALIGN(4096);
    PROVIDE(_data_start = .);
    *(.sdata .sdata.*) *(.data .data.*)
    PROVIDE(_data_end = .);
} >RAM AT>RAM :data

When I changed it to

 .data : ALIGN(4096) {
    PROVIDE(_data_start = .);
    *(.sdata .sdata.*) *(.data .data.*)
    PROVIDE(_data_end = .);
} >RAM AT>RAM :data

the problem was solved. Some quirk with the linker!

Again, thank you so much for dealing with this bug that was caused by myself after all!

@taiki-e
Copy link
Contributor

taiki-e commented Jun 15, 2023

The code was compiled with #![no_std] so it couldn't be the standard lib.

To be clear: I think "standard library" in my comment was misleading (I also meant core and alloc), but even with no_std, core is always linked, and it is also common to link to alloc. In riscv64gc-unknown-none-elf, both are precompiled libraries by default.

@Lorilandly
Copy link
Author

Oh ok great! That actually answers why I was having some strange issues with core libs when I tries to run the binary on the real chip. Again thanks for the clarification😄

@zesterer
Copy link
Collaborator

zesterer commented Jun 15, 2023

Woo, good to hear this is solved! I was worried this might have been a soundness issue in spin for a moment 😀

Thanks @taiki-e for providing your knowledge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants