Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: MemorySanitizer is erroring (ASAN job) #2579

Open
tcharding opened this issue Mar 12, 2024 · 12 comments
Open

CI: MemorySanitizer is erroring (ASAN job) #2579

tcharding opened this issue Mar 12, 2024 · 12 comments

Comments

@tcharding
Copy link
Member

tcharding commented Mar 12, 2024

We are hitting an intermittent MemorySanitizer (MSAN) error, locally on master I can repro it every time but some CI runs are hitting it and some not:

Minimal Reproducer

Apply the following diff:

diff --git a/contrib/run_task.sh b/contrib/run_task.sh
index f7aa21ac..9b92399c 100755
--- a/contrib/run_task.sh
+++ b/contrib/run_task.sh
@@ -208,14 +208,9 @@ do_wasm() {
 
 do_asan() {
     cargo clean
-    CC='clang -fsanitize=address -fno-omit-frame-pointer'                                        \
-      RUSTFLAGS='-Zsanitizer=address -Clinker=clang -Cforce-frame-pointers=yes'                    \
-      ASAN_OPTIONS='detect_leaks=1 detect_invalid_pointer_pairs=1 detect_stack_use_after_return=1' \
-      cargo test --lib --no-default-features --features="$ASAN_FEATURES" -Zbuild-std --target x86_64-unknown-linux-gnu
-    cargo clean
     CC='clang -fsanitize=memory -fno-omit-frame-pointer'                                         \
       RUSTFLAGS='-Zsanitizer=memory -Zsanitizer-memory-track-origins -Cforce-frame-pointers=yes'   \
-      cargo test --lib --no-default-features --features="$ASAN_FEATURES" -Zbuild-std --target x86_64-unknown-linux-gnu
+      cargo test pubkey_read_write --lib --no-default-features --features="$ASAN_FEATURES" -Zbuild-std --target x86_64-unknown-linux-gnu
 }
 
 # Bench only works with a non-stable toolchain (nightly, beta).

And run RUSTUP_TOOLCHAIN=nightly contrib/run_task.sh bitcoin asan

@tcharding
Copy link
Member Author

I'm debugging now, posted the issue for any tips or pointers. I've never debugged memory sanitizer bugs before.

@tcharding
Copy link
Member Author

tcharding commented Mar 12, 2024

Putting llvm-symbolizer in my path gives better output (I just symlinked to llvm-symbolizer-15)

@tcharding
Copy link
Member Author

I was unable to reslove this today, here are my findings:

  • The call chain is bitcoin::PublicKey::from_slice -> secp256k1::PublicKey::from_slice -> ffi::secp256k1_ec_pubkey_parse
  • secp256k1_ec_pubkey_parse uses data.as_c_ptr() to get a pointer to the data array (unsigned char *)
  • Accessing the first element of the data array triggers the MSAN error
  • Right before the FFI call the data slice is non-empty
            // Sanity, just ensure we can access the first element here.
            if data[0] == 0xff {
                panic!("first element of data is 0xff")
            }
            let pointer = data.as_c_ptr();
            if pointer.is_null() {
                panic!("pointer is null");
            }

No further ideas ATM.

@tcharding
Copy link
Member Author

tcharding commented Mar 13, 2024

Had another go, I'm hesitant to say it but it might be a bug in the MemorySanitazer? Or else something odd is going on. I changed rust-secp256k1/secp256k1-sys/depend/secp256k/src/secp256k1.c to be:

int rustsecp256k1_v0_9_2_ec_pubkey_parse(const rustsecp256k1_v0_9_2_context* ctx, rustsecp256k1_v0_9_2_pubkey* _pubkey, const unsigned char *input, size_t inputlen) {
    rustsecp256k1_v0_9_2_ge Q;

    VERIFY_CHECK(ctx != NULL);

    rustsecp256k1_v0_9_2_pubkey pk;
    rustsecp256k1_v0_9_2_pubkey *pubkey = &pk;

    if (pubkey == NULL) {
        return 1;
    }

    ARG_CHECK(pubkey != NULL);
    memset(pubkey, 0, sizeof(*pubkey));
    ARG_CHECK(input != NULL);

    if (input == NULL) {
        return 20;
    }

    if (input[0] == 4) {
        return 0;
    }

    /*
      * Array access inside this function call is causing MSAN error but the array access above is ok
      * 
      * This makes me think the bug is not ours because there should be no problem passing 
      * an arg to another function within C code.
      * */
    if (!rustsecp256k1_v0_9_2_eckey_pubkey_parse(&Q, input, inputlen)) {
        return 0;
    }
    if (!rustsecp256k1_v0_9_2_ge_is_in_correct_subgroup(&Q)) {
        return 0;
    }
    rustsecp256k1_v0_9_2_pubkey_save(pubkey, &Q);
    rustsecp256k1_v0_9_2_ge_clear(&Q);
    return 1;
}

@tcharding tcharding changed the title MemorySanitizer is erroring CI: MemorySanitizer is erroring (ASAN job) Mar 14, 2024
@tcharding
Copy link
Member Author

@apoelstra if you get a chance can you give this ten minutes and give some pointers on which direction to look please?

@tcharding
Copy link
Member Author

ooo, it might not be the pub arg at all - debugging now. Posting incase you get here at the same time.

@tcharding
Copy link
Member Author

MSAN also errors for size_t arguments, even when I replace the usize on the otherside of the FFI boundry with a local variable.

Change key.rs lines 556 - 567 to be (ie, line 560 is the FFI function call)

        let mut ret = [0_u8; 66];
        let mut ret_len = 66;

        let res = unsafe {
            ffi::secp256k1_ec_pubkey_serialize(
                ffi::secp256k1_context_no_precomp,
                ret.as_mut_c_ptr(),
                &mut ret_len,
                self.as_c_ptr(),
                flag,
            )
        };

Still gives

==892789==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5591f749a2b7 in rustsecp256k1_v0_9_2_ec_pubkey_serialize /home/tobin/build/github.com/tcharding/rust-secp256k1/master/secp256k1-sys/depend/secp256k1/src/secp256k1.c:279:5
    #1 0x5591f6881b09 in secp256k1::key::PublicKey::serialize_internal::hd1155a157cd04844 /home/tobin/build/github.com/tcharding/rust-secp256k1/master/src/key.rs:560:13
    #2 0x5591f6881b09 in secp256k1::key::PublicKey::serialize_uncompressed::hc4d3025786944379 /home/tobin/build/github.com/tcharding/rust-secp256k1/master/src/key.rs:550:9

@apoelstra
Copy link
Member

I cannot repro. When I try to run the test I get

FATAL: Code 0x60c539d809b0 is out of application range. Non-PIE build?
FATAL: MemorySanitizer can not mmap the shadow memory.
FATAL: Make sure to compile with -fPIE and to link with -pie.
FATAL: Disabling ASLR is known to cause this error.
FATAL: If running under GDB, try 'set disable-randomization off'.

@apoelstra
Copy link
Member

Running in gdb everything seems fine. If I run in valgrind it says ==1124717== Warning: set address range perms: large range [0x10000000000, 0x100000000000) (defined) then apperas to lock up.

apoelstra added a commit that referenced this issue Mar 15, 2024
097a00b CI: Disable MSAN job (Tobin C. Harding)

Pull request description:

  I believe there is currently a bug in the MemorySanitizer, when we pass various types across the FFI boundry MSAN gives a unititialized variable error:

  - `usize` passed as `size_t` (cannot be uninitialized)
  - byte slice passed as `const char *`

  In order to let other work continue disable the MSAN job.

  The issue is further described in #2579

ACKs for top commit:
  apoelstra:
    ACK 097a00b

Tree-SHA512: 0252ef0bd21afd55e878e495be1182d5b45b54e931dca9eb2e111731acd889cb5f0eb38b670b239cfb8511af5bf2145875b8853bd919d46bc278de12cda93414
@apoelstra
Copy link
Member

I wonder if we are hitting bitcoin-core/secp256k1#1506

@real-or-random
Copy link
Contributor

I wonder if we are hitting bitcoin-core/secp256k1#1506

Yes, this really looks like google/sanitizers#1614

@apoelstra
Copy link
Member

In the linked libsecp issue fanquake says that the upstream problem should be fixed (in github actions), so maybe we can re-enable msan.

But can maybe wait til after the release because futzing with CI is time-consuming and annoying.

tcharding added a commit to tcharding/rust-secp256k1 that referenced this issue Mar 26, 2024
There is a bug in the memory sanitizer at the moment, for more info see

- rust-bitcoin/rust-bitcoin#2600
- rust-bitcoin/rust-bitcoin#2579

As we did in `rust-bitcoin` temporarily disable MSAN in CI.
tcharding added a commit to tcharding/rust-secp256k1 that referenced this issue Mar 26, 2024
There is a bug in the memory sanitizer at the moment, for more info see

- rust-bitcoin/rust-bitcoin#2600
- rust-bitcoin/rust-bitcoin#2579

As we did in `rust-bitcoin` temporarily disable MSAN in CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants