Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"fat" LTO Miscompilation #146

Open
zRedShift opened this issue Nov 5, 2022 · 6 comments
Open

"fat" LTO Miscompilation #146

zRedShift opened this issue Nov 5, 2022 · 6 comments

Comments

@zRedShift
Copy link

I tried this code (repo: esp-spinlock-repro):

fn main() -> anyhow::Result<()> {
    esp_idf_sys::link_patches();
    esp_idf_hal::task::critical_section::link();
    esp_idf_svc::timer::embassy_time::driver::link();
    esp_idf_svc::timer::embassy_time::queue::link();
    esp_idf_svc::log::EspLogger::initialize_default();

    let executor = esp_idf_hal::task::executor::EspExecutor::<4, _>::new();
    let task = executor.spawn_local(async {
        let mut counter = 0.0f64;
        loop {
            log::info!("counter at {counter}");
            embassy_time::Timer::after(embassy_time::Duration::from_millis(10)).await;
            counter += 1.0;
        }
    })?;

    executor.run_tasks(|| true, [task]);

    Ok(())
}

When I'm running code I've compiled with "fat" LTO, on 1.64.0.0 or 1.65.0.0, with the xtensa-esp32s3-espidf target on my ESP32S3, regardless of the esp-idf branch, my code hangs indefinitely after 256 iteratiots:

I (314) esp_spinlock_repro: counter at 0
I (324) esp_spinlock_repro: counter at 1
I (334) esp_spinlock_repro: counter at 2
I (344) esp_spinlock_repro: counter at 3
...

I (2914) esp_spinlock_repro: counter at 252
I (2924) esp_spinlock_repro: counter at 253
I (2934) esp_spinlock_repro: counter at 254
I (2944) esp_spinlock_repro: counter at 255

If I disable LTO or move to "thin" LTO, the code keeps chugging along with no end in sight (what I expect to see).

Meta

rustc --version --verbose:

rustc 1.65.0-nightly (bf1b78e4a 2022-11-02)
binary: rustc
commit-hash: bf1b78e4a60871d6fed85f288274db7cdbd82912
commit-date: 2022-11-02
host: x86_64-unknown-linux-gnu
release: 1.65.0-nightly
LLVM version: 15.0.0
@N3xed
Copy link

N3xed commented Nov 5, 2022

Likely related to a scheduling issue I've had a while back.

Basically, the MPMC queue used for the scheduler is implemented as a ringbuffer using atomic operations, and that ringbuffer has 8 bit (i.e. 256) indices.
Now presumably through some atomics bug, wrapping arithmetic bug or a bug in the implementation, the ringbuffer never wraps around, leaving the queue in the full state indefinitely. So no more tasks can be enqueued and so the timer stops working.

Initially, @ivmarkov solved it by using the crossbeam_queue::ArrayQueue as the mpmc queue instead. But I think, more likely is that the issue just got delayed since crossbeams's ArrayQueue uses 32 bit indices and its implementation is based on the same source (i.e. Dmitry Vyukov's bounded MPMC queue). With 32 bit indices the mpmc queue's index would need 2^24 times longer to reach u32::MAX, though I've not tested it if the same wrap-around issue also happens with bigger indices. (Note also that the issue also "goes away" if heapless's feature mpmc_large is used).

Since heapless::mpmc::MpMcQueue is the mpmc queue used by default in edge-executor as its crossbeam-queue feature is disabled by default with default-features = false in esp-idf-hal, I think this is indeed the same issue.

My issue occurred regardless of LTO mode, and someone (I can't remember who) tested it on the esp32c3 where it worked fine. Also note, that this issue even occurred in wokwi for me, so unlikely to be a hardware bug.

@MabezDev
Copy link
Member

@N3xed Do you know where the original discussion for the MPMC issue exists? I can't seem to find it across the various esp-rs repos. Might be helpful in producing a smaller example, especially if it was happening without LTO.

@N3xed
Copy link

N3xed commented Nov 24, 2022

@N3xed Do you know where the original discussion for the MPMC issue exists? I can't seem to find it across the various esp-rs repos. Might be helpful in producing a smaller example, especially if it was happening without LTO.

Unfortunately, I've never created an issue for this, so this is the only place as far as I know. When I discovered this issue I worked around it, went on to other stuff, and never got around to it until @zRedShift brought it up in the matrix chat.

There is my discussion about it with @ivmarkov in the matrix chat though.

@zRedShift
Copy link
Author

@MabezDev @N3xed @ivmarkov
I significantly reduced the minimal reproduction and moved it to no_std. I also updated the README.md and added the disassembly diff (sort of).

#![no_std]
#![no_main]

use esp_backtrace as _;

#[xtensa_lx_rt::entry]
fn main() -> ! {
    let queue = heapless::mpmc::MpMcQueue::<(), 8>::new();
    let mut counter = 0;
    loop {
        esp_println::println!("counter at {counter}");
        queue.enqueue(()).unwrap();
        queue.dequeue().unwrap();
        counter += 1;
    }
}

Please look for more at: https://github.com/zRedShift/esp-spinlock-repro

@zRedShift
Copy link
Author

@MabezDev this is getting to the limit of what I can do, but compiling the following branch: https://github.com/zRedShift/esp-spinlock-repro/blob/without-heapless/src/main.rs (without heapless) with thin and fat LTO and comparing the disassembly (the decompiled C code seemingly is the exact same), we have only a minimal set of changes:

thin LTO
                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             uint __stdcall inner(uint param_1)
             uint              o2:4           <RETURN>
             uint              i2:4           param_1
                             _ZN18esp_spinlock_repro5inner17h372c0f243cfa6d  XREF[3]:     420003b4(*),
                             esp_spinlock_repro::esp_spinlock_repro::inner                __xtensa_lx_rt_main_trampoline:4
                                                                                          .debug_frame::00000d8c(*)
        420036f4 36 81 00        entry      a1,0x40
        420036f7 81 1f f3        l32r       a8,DAT_42000374                                  = 07060504h
        420036fa 89 31           s32i.n     a8,a1,0xc
        420036fc 81 1f f3        l32r       a8,DAT_42000378                                  = 03020100h
        420036ff 89 21           s32i.n     a8,a1,0x8
        42003701 0c 08           movi.n     a8,0x0
        42003703 82 51 08        s16i       a8,a1,0x10
        42003706 0c 78           movi.n     a8,0x7
        42003708 89 11           s32i.n     a8,a1,0x4
        4200370a 0c 39           movi.n     a9,0x3
        4200370c a2 a0 ff        movi       a10,0xff
        4200370f 7c fb           movi.n     a11,0xff
        42003711 46 03 00        j          LAB_42003722
                             LAB_42003714                                    XREF[1]:     420037ea(j)
        42003714 82 c1 08        addi       a8,a1,0x8
        42003717 d0 88 20        or         a8,a8,a13
        4200371a c0 20 00        memw
        4200371d 8b cc           addi.n     a12,a12,0x8
        4200371f c2 48 00        s8i        a12,a8,0x0
                             LAB_42003722                                    XREF[2]:     42003711(j), 42003805(j)
        42003722 d2 01 11        l8ui       a13,a1,0x11
        42003725 88 11           l32i.n     a8,a1,0x4
        42003727 80 ed 10        and        a14,a13,a8
        4200372a 1b cd           addi.n     a12,a13,0x1
                             LAB_4200372c                                    XREF[1]:     4200377d(j)
        4200372c f2 c1 08        addi       a15,a1,0x8
        4200372f e0 ff 20        or         a15,a15,a14
        42003732 f2 0f 00        l8ui       a15,a15,0x0
        42003735 c0 20 00        memw
        42003738 d7 1f 02        beq        a15,a13,LAB_4200373e
        4200373b 46 2f 00        j          LAB_420037fc
                             LAB_4200373e                                    XREF[1]:     42003738(j)
        4200373e f2 c1 08        addi       a15,a1,0x8
        42003741 9b ff           addi.n     a15,a15,0x9
        42003743 f0 69 10        and        a6,a9,a15
        42003746 60 7f c0        sub        a7,a15,a6
        42003749 d0 f6 11        slli       a15,a6,0x3
        4200374c 00 1f 40        ssl        a15
        4200374f 00 6a a1        sll        a6,a10
        42003752 b0 66 30        xor        a6,a6,a11
        42003755 58 07           l32i.n     a5,a7,0x0
        42003757 60 25 10        and        a2,a5,a6
        4200375a 00 5d a1        sll        a5,a13
        4200375d 00 4c a1        sll        a4,a12
                             LAB_42003760                                    XREF[1]:     42003771(j)
        42003760 8d 02           mov.n      a8,a2
        42003762 80 25 20        or         a2,a5,a8
        42003765 80 34 20        or         a3,a4,a8
        42003768 20 0c 13        wsr        a2,SCOMPARE1
        4200376b 32 e7 00        s32c1i     a3,a7,0x0
        4200376e 60 23 10        and        a2,a3,a6
        42003771 87 92 eb        bne        a2,a8,LAB_42003760
        42003774 00 0f 40        ssr        a15
        42003777 30 80 91        srl        a8,a3
        4200377a a0 88 10        and        a8,a8,a10
        4200377d d7 98 ab        bne        a8,a13,LAB_4200372c
        42003780 82 c1 08        addi       a8,a1,0x8
        42003783 e0 88 20        or         a8,a8,a14
        42003786 c0 20 00        memw
        42003789 c2 48 00        s8i        a12,a8,0x0
                             LAB_4200378c                                    XREF[1]:     420037f6(j)
        4200378c c2 01 10        l8ui       a12,a1,0x10
        4200378f 88 11           l32i.n     a8,a1,0x4
        42003791 80 dc 10        and        a13,a12,a8
        42003794 1b ec           addi.n     a14,a12,0x1
                             LAB_42003796                                    XREF[1]:     420037e7(j)
        42003796 a0 8e 10        and        a8,a14,a10
        42003799 f2 c1 08        addi       a15,a1,0x8
        4200379c d0 ff 20        or         a15,a15,a13
        4200379f 62 0f 00        l8ui       a6,a15,0x0
        420037a2 c0 20 00        memw
        420037a5 87 96 44        bne        a6,a8,LAB_420037ed
        420037a8 82 c1 08        addi       a8,a1,0x8
        420037ab 8b 88           addi.n     a8,a8,0x8
        420037ad 80 f9 10        and        a15,a9,a8
        420037b0 f0 78 c0        sub        a7,a8,a15
        420037b3 d0 ff 11        slli       a15,a15,0x3
        420037b6 00 1f 40        ssl        a15
        420037b9 00 8a a1        sll        a8,a10
        420037bc b0 58 30        xor        a5,a8,a11
        420037bf 88 07           l32i.n     a8,a7,0x0
        420037c1 50 28 10        and        a2,a8,a5
        420037c4 00 4c a1        sll        a4,a12
        420037c7 00 66 a1        sll        a6,a6
                             LAB_420037ca                                    XREF[1]:     420037db(j)
        420037ca 8d 02           mov.n      a8,a2
        420037cc 80 24 20        or         a2,a4,a8
        420037cf 80 36 20        or         a3,a6,a8
        420037d2 20 0c 13        wsr        a2,SCOMPARE1
        420037d5 32 e7 00        s32c1i     a3,a7,0x0
        420037d8 50 23 10        and        a2,a3,a5
        420037db 87 92 eb        bne        a2,a8,LAB_420037ca
        420037de 00 0f 40        ssr        a15
        420037e1 30 80 91        srl        a8,a3
        420037e4 a0 88 10        and        a8,a8,a10
        420037e7 c7 98 ab        bne        a8,a12,LAB_42003796
        420037ea 86 c9 ff        j          LAB_42003714
                             LAB_420037ed                                    XREF[1]:     420037a5(j)
        420037ed e0 86 c0        sub        a8,a6,a14
        420037f0 80 88 01        slli       a8,a8,0x18
        420037f3 80 88 31        srai       a8,a8,0x18
        420037f6 d6 28 f9        bgez       a8,LAB_4200378c
        420037f9 c6 02 00        j          LAB_42003808
                             LAB_420037fc                                    XREF[1]:     4200373b(j)
        420037fc d0 8f c0        sub        a8,a15,a13
        420037ff 80 88 01        slli       a8,a8,0x18
        42003802 80 88 31        srai       a8,a8,0x18
        42003805 d6 98 f1        bgez       a8,LAB_42003722
                             LAB_42003808                                    XREF[1]:     420037f9(j)
        42003808 1d f0           retw.n
        4200380a 00              ??         00h
        4200380b 00              ??         00h
fat LTO
                             **************************************************************
                             * esp_spinlock_repro::inner                                  *
                             **************************************************************
                             uint __stdcall inner(uint param_1)
             uint              o2:4           <RETURN>
             uint              i2:4           param_1
                             _ZN18esp_spinlock_repro5inner17h4a0b3ef61100c1  XREF[3]:     42000068(*), main:42000681(c),
                             esp_spinlock_repro::inner
                                                                                          .debug_frame::0000001c(*)
        4200045c 36 81 00        entry      a1,0x40
        4200045f 81 f0 fe        l32r       a8,_text_start                                   = 07060504h
        42000462 89 31           s32i.n     a8,a1,0xc
        42000464 81 f0 fe        l32r       a8,DAT_42000024                                  = 03020100h
        42000467 89 21           s32i.n     a8,a1,0x8
        42000469 0c 08           movi.n     a8,0x0
        4200046b 82 51 08        s16i       a8,a1,0x10
        4200046e 0c 78           movi.n     a8,0x7
        42000470 89 11           s32i.n     a8,a1,0x4
        42000472 0c 39           movi.n     a9,0x3
        42000474 a2 a0 ff        movi       a10,0xff
        42000477 7c fb           movi.n     a11,0xff
        42000479 46 03 00        j          LAB_4200048a
                             LAB_4200047c                                    XREF[1]:     42000552(j)
        4200047c 82 c1 08        addi       a8,a1,0x8
        4200047f d0 88 20        or         a8,a8,a13
        42000482 c0 20 00        memw
        42000485 8b cc           addi.n     a12,a12,0x8
        42000487 c2 48 00        s8i        a12,a8,0x0
                             LAB_4200048a                                    XREF[2]:     42000479(j), 4200056d(j)
        4200048a d2 01 11        l8ui       a13,a1,0x11
        4200048d 88 11           l32i.n     a8,a1,0x4
        4200048f 80 ed 10        and        a14,a13,a8
        42000492 1b cd           addi.n     a12,a13,0x1
                             LAB_42000494                                    XREF[1]:     420004e5(j)
        42000494 f2 c1 08        addi       a15,a1,0x8
        42000497 e0 ff 20        or         a15,a15,a14
        4200049a f2 0f 00        l8ui       a15,a15,0x0
        4200049d c0 20 00        memw
        420004a0 d7 1f 02        beq        a15,a13,LAB_420004a6
        420004a3 46 2f 00        j          LAB_42000564
                             LAB_420004a6                                    XREF[1]:     420004a0(j)
        420004a6 f2 c1 08        addi       a15,a1,0x8
        420004a9 9b ff           addi.n     a15,a15,0x9
        420004ab f0 69 10        and        a6,a9,a15
        420004ae 60 7f c0        sub        a7,a15,a6
        420004b1 d0 f6 11        slli       a15,a6,0x3
        420004b4 00 1f 40        ssl        a15
        420004b7 00 6a a1        sll        a6,a10
        420004ba b0 66 30        xor        a6,a6,a11
        420004bd 58 07           l32i.n     a5,a7,0x0
        420004bf 60 25 10        and        a2,a5,a6
        420004c2 00 5d a1        sll        a5,a13
        420004c5 00 4c a1        sll        a4,a12
                             LAB_420004c8                                    XREF[1]:     420004d9(j)
        420004c8 8d 02           mov.n      a8,a2
        420004ca 80 25 20        or         a2,a5,a8
        420004cd 80 34 20        or         a3,a4,a8
        420004d0 20 0c 13        wsr        a2,SCOMPARE1
        420004d3 32 e7 00        s32c1i     a3,a7,0x0
        420004d6 60 23 10        and        a2,a3,a6
        420004d9 87 92 eb        bne        a2,a8,LAB_420004c8
        420004dc 00 0f 40        ssr        a15
        420004df 30 80 91        srl        a8,a3
        420004e2 a0 88 10        and        a8,a8,a10
        420004e5 d7 98 ab        bne        a8,a13,LAB_42000494
        420004e8 82 c1 08        addi       a8,a1,0x8
        420004eb e0 88 20        or         a8,a8,a14
        420004ee c0 20 00        memw
        420004f1 c2 48 00        s8i        a12,a8,0x0
                             LAB_420004f4                                    XREF[1]:     4200055e(j)
        420004f4 c2 01 10        l8ui       a12,a1,0x10
        420004f7 88 11           l32i.n     a8,a1,0x4
        420004f9 80 dc 10        and        a13,a12,a8
        420004fc 1b ec           addi.n     a14,a12,0x1
                             LAB_420004fe                                    XREF[1]:     4200054f(j)
        420004fe a0 8e 10        and        a8,a14,a10
        42000501 f2 c1 08        addi       a15,a1,0x8
        42000504 d0 ff 20        or         a15,a15,a13
        42000507 f2 0f 00        l8ui       a15,a15,0x0
        4200050a c0 20 00        memw
        4200050d 87 9f 44        bne        a15,a8,LAB_42000555
        42000510 82 c1 08        addi       a8,a1,0x8
        42000513 8b 88           addi.n     a8,a8,0x8
        42000515 80 f9 10        and        a15,a9,a8
        42000518 f0 78 c0        sub        a7,a8,a15
        4200051b d0 ff 11        slli       a15,a15,0x3
        4200051e 00 1f 40        ssl        a15
        42000521 00 8a a1        sll        a8,a10
        42000524 b0 68 30        xor        a6,a8,a11
        42000527 88 07           l32i.n     a8,a7,0x0
        42000529 60 28 10        and        a2,a8,a6
        4200052c 00 5c a1        sll        a5,a12
        4200052f 00 4e a1        sll        a4,a14
                             LAB_42000532                                    XREF[1]:     42000543(j)
        42000532 8d 02           mov.n      a8,a2
        42000534 80 25 20        or         a2,a5,a8
        42000537 80 34 20        or         a3,a4,a8
        4200053a 20 0c 13        wsr        a2,SCOMPARE1
        4200053d 32 e7 00        s32c1i     a3,a7,0x0
        42000540 60 23 10        and        a2,a3,a6
        42000543 87 92 eb        bne        a2,a8,LAB_42000532
        42000546 00 0f 40        ssr        a15
        42000549 30 80 91        srl        a8,a3
        4200054c a0 88 10        and        a8,a8,a10
        4200054f c7 98 ab        bne        a8,a12,LAB_420004fe
        42000552 86 c9 ff        j          LAB_4200047c
                             LAB_42000555                                    XREF[1]:     4200050d(j)
        42000555 e0 8f c0        sub        a8,a15,a14
        42000558 80 88 01        slli       a8,a8,0x18
        4200055b 80 88 31        srai       a8,a8,0x18
        4200055e d6 28 f9        bgez       a8,LAB_420004f4
        42000561 c6 02 00        j          LAB_42000570
                             LAB_42000564                                    XREF[1]:     420004a3(j)
        42000564 d0 8f c0        sub        a8,a15,a13
        42000567 80 88 01        slli       a8,a8,0x18
        4200056a 80 88 31        srai       a8,a8,0x18
        4200056d d6 98 f1        bgez       a8,LAB_4200048a
                             LAB_42000570                                    XREF[1]:     42000561(j)
        42000570 1d f0           retw.n
        42000572 00              ??         00h
        42000573 00              ??         00h

Besides label, cross-references and debug-info, the only difference is the register allocation, which after equalizing the labels comes down to this diff:

79c79
<         4200379f 62 0f 00        l8ui       a6,a15,0x0
---
>         4200379f f2 0f 00        l8ui       a15,a15,0x0
81c81
<         420037a5 87 96 44        bne        a6,a8,LAB_420037ed
---
>         420037a5 87 9f 44        bne        a15,a8,LAB_420037ed
89c89
<         420037bc b0 58 30        xor        a5,a8,a11
---
>         420037bc b0 68 30        xor        a6,a8,a11
91,93c91,93
<         420037c1 50 28 10        and        a2,a8,a5
<         420037c4 00 4c a1        sll        a4,a12
<         420037c7 00 66 a1        sll        a6,a6
---
>         420037c1 60 28 10        and        a2,a8,a6
>         420037c4 00 5c a1        sll        a5,a12
>         420037c7 00 4e a1        sll        a4,a14
96,97c96,97
<         420037cc 80 24 20        or         a2,a4,a8
<         420037cf 80 36 20        or         a3,a6,a8
---
>         420037cc 80 25 20        or         a2,a5,a8
>         420037cf 80 34 20        or         a3,a4,a8
100c100
<         420037d8 50 23 10        and        a2,a3,a5
---
>         420037d8 60 23 10        and        a2,a3,a6
108c108
<         420037ed e0 86 c0        sub        a8,a6,a14
---
>         420037ed e0 8f c0        sub        a8,a15,a14

image

@zRedShift
Copy link
Author

Narrowed it down to 4200052f 00 4e a1 sll a4,a14 in the fat lto file
it should be 4200052f 00 4f a1 sll a4,a15, right before the CAS loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants