Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bazel CI] library tests are failing with Bazel@HEAD #2188

Open
sgowroji opened this issue May 9, 2024 · 5 comments · May be fixed by #2202
Open

[Bazel CI] library tests are failing with Bazel@HEAD #2188

sgowroji opened this issue May 9, 2024 · 5 comments · May be fixed by #2202
Assignees

Comments

@sgowroji
Copy link
Contributor

sgowroji commented May 9, 2024

CI: https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/3841#018f5b83-19fe-48b2-b301-77acb6e1c285

Platform: Ubuntu

Logs:

FAILED: //tests/library-empty:library-empty (Summary)

FAILED: //tests/library-with-static-cc-dep:library-with-static-cc-dep-dynamic (Summary)

Culprit: bazelbuild/bazel@2482322

CC Greenteam @mai93 @Wyverald

@sgowroji
Copy link
Contributor Author

CC @aherrmann @mboes

@aherrmann
Copy link
Member

The error message is a bit puzzling:

==================== Test output for //tests/indirect-link:indirect-link-dynamic:
/var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/2aebe93b1a8f9ac29a2a4c83872246a4/sandbox/linux-sandbox/1928/execroot/rules_haskell_tests/bazel-out/k8-fastbuild/bin/tests/indirect-link/indirect-link-dynamic.runfiles/rules_haskell_tests/tests/indirect-link/indirect-link-dynamic: error while loading shared libraries: libHSrts-1.0.2_thr-ghc9.4.6.so: cannot open shared object file: No such file or directory
================================================================================
==================== Test output for //tests/indirect-link:indirect-link-dynamic:
/var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/2aebe93b1a8f9ac29a2a4c83872246a4/sandbox/linux-sandbox/1939/execroot/rules_haskell_tests/bazel-out/k8-fastbuild/bin/tests/indirect-link/indirect-link-dynamic.runfiles/rules_haskell_tests/tests/indirect-link/indirect-link-dynamic: error while loading shared libraries: libHSrts-1.0.2_thr-ghc9.4.6.so: cannot open shared object file: No such file or directory
================================================================================
==================== Test output for //tests/indirect-link:indirect-link-dynamic:
/var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/2aebe93b1a8f9ac29a2a4c83872246a4/sandbox/linux-sandbox/1955/execroot/rules_haskell_tests/bazel-out/k8-fastbuild/bin/tests/indirect-link/indirect-link-dynamic.runfiles/rules_haskell_tests/tests/indirect-link/indirect-link-dynamic: error while loading shared libraries: libHSrts-1.0.2_thr-ghc9.4.6.so: cannot open shared object file: No such file or directory
================================================================================

IIUC the effective change of bazelbuild/bazel@2482322 is to turn

-lstdc++ -lm
...
-Wl,-no-as-needed -no-as-needed

into

-Wl,--push-state,-as-needed -lstdc++ -Wl,--pop-state
-Wl,--push-state,-as-needed -lm -Wl,--pop-state
... 
-Wl,-no-as-needed

@avdv is this something you could look into?

@avdv
Copy link
Member

avdv commented May 15, 2024

I tried to reproduce it inside of a Docker container (gcr.io/bazel-public/ubuntu1804-java11:latest) but could not:

# export USE_BAZEL_VERSION=last_green
# bazel --version
2024/05/15 06:58:59 Using unreleased version at commit 88a230f4cf28deec1455cb2caed4dc9f81e108c9
2024/05/15 06:58:59 Downloading https://storage.googleapis.com/bazel-builds/artifacts/centos7/88a230f4cf28deec1455cb2caed4dc9f81e108c9/bazel...
Downloading: 70 MB out of 70 MB (100%) 
bazel no_version
# git show
git show
commit cbf57268dc222a5867fe2a578f0eed06875405ee (HEAD -> check_head_bazel)
Merge: 5e8a6bc2 b525e7bc
Author: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Date:   Mon May 6 10:58:07 2024 +0000

    Merge pull request #2183 from tweag/cb/fix-default-ghc-snapshot
    
    rules_haskell_tests: Fix ghcide stack pointing to wrong snapshot file
# bazel build --show_progress_rate_limit=5 --curses=yes --color=yes --terminal_columns=143 --show_timestamps --verbose_failures --jobs=30 --announce_rc --experimental_repository_cache_hardlinks --disk_cache= --sandbox_tmpfs_path=/tmp  --config=ci-common --config=linux-bindist --build_tag_filters=-requires_nix,-requires_lz4,-requires_shellcheck,-requires_threaded_rts,-dont_test_with_bindist,-dont_test_on_bazelci,-integration --test_env=HOME --test_env=BAZELISK_USER_AGENT --test_env=USE_BAZEL_VERSION --lockfile_mode=off -- //tests/...
Extracting Bazel installation...
...
(08:08:31) INFO: Build completed successfully, 798 total actions

# bazel test --flaky_test_attempts=3 --build_tests_only --local_test_jobs=12 --show_progress_rate_limit=5 --curses=yes --color=yes --terminal_columns=143 --show_timestamps --verbose_failures --jobs=30 --announce_rc --experimental_repository_cache_hardlinks --disk_cache= --sandbox_tmpfs_path=/tmp --experimental_build_event_json_file_path_conversion=false   --config=ci-common --config=linux-bindist --test_tag_filters=-requires_nix,-requires_lz4,-requires_shellcheck,-requires_threaded_rts,-dont_test_with_bindist,-dont_test_on_bazelci,-integration --test_env=HOME --test_env=BAZELISK_USER_AGENT --test_env=USE_BAZEL_VERSION  --lockfile_mode=off -- //tests/...
...
Executed 138 out of 138 tests: 138 tests pass.

Culprit: bazelbuild/bazel@2482322

@sgowroji Why do you think this is the problem here? I would expect to see some linker errors if the standard C++ lib / libm is missing, but we actually see this:

/var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/2aebe93b1a8f9ac29a2a4c83872246a4/sandbox/linux-sandbox/1744/execroot/rules_haskell_tests/bazel-out/k8-fastbuild/bin/tests/binary-with-lib-dynamic/binary-with-lib-dynamic.runfiles/rules_haskell_tests/tests/binary-with-lib-dynamic/binary-with-lib-dynamic: error while loading shared libraries: libHSrts-1.0.2_thr-ghc9.4.6.so: cannot open shared object file: No such file or directory

/var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/2aebe93b1a8f9ac29a2a4c83872246a4/sandbox/linux-sandbox/1773/execroot/rules_haskell_tests/bazel-out/k8-fastbuild/bin/tests/library-empty/library-empty.runfiles/rules_haskell_tests/tests/library-empty/library-empty: error while loading shared libraries: libHSrts-1.0.2_thr-ghc9.4.6.so: cannot open shared object file: No such file or directory

This file should be available in the rules_haskell_ghc_linux_amd64 repository:

# ldd bazel-ci-bin/tests/library-empty/library-empty
	linux-vdso.so.1 (0x00007ffe9b0a6000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f742006c000)
	libHSbase-4.17.2.0-ghc9.4.6.so => /rules_haskell/rules_haskell_tests/bazel-ci-bin/tests/library-empty/../../_solib_k8/external_Srules_Uhaskell_Ughc_Ulinux_Uamd64/libHSbase-4.17.2.0-ghc9.4.6.so (0x00007f741f5dd000)
	libHSghc-bignum-1.3-ghc9.4.6.so => /rules_haskell/rules_haskell_tests/bazel-ci-bin/tests/library-empty/../../_solib_k8/external_Srules_Uhaskell_Ughc_Ulinux_Uamd64/libHSghc-bignum-1.3-ghc9.4.6.so (0x00007f74205d4000)
	libHSghc-prim-0.9.1-ghc9.4.6.so => /rules_haskell/rules_haskell_tests/bazel-ci-bin/tests/library-empty/../../_solib_k8/external_Srules_Uhaskell_Ughc_Ulinux_Uamd64/libHSghc-prim-0.9.1-ghc9.4.6.so (0x00007f741f0e2000)
	libHSrts-1.0.2_thr-ghc9.4.6.so => /root/.cache/bazel/_bazel_root/16f1ee034ef5b66efb3b6cb7da500a21/execroot/rules_haskell_tests/external/rules_haskell_ghc_linux_amd64/bin/../lib/lib/../lib/x86_64-linux-ghc-9.4.6/libHSrts-1.0.2_thr-ghc9.4.6.so (0x00007f7420529000)
	libffi.so.8 => /rules_haskell/rules_haskell_tests/bazel-ci-bin/tests/library-empty/../../_solib_k8/external_Srules_Uhaskell_Ughc_Ulinux_Uamd64/libffi.so.8 (0x00007f741eed4000)
	libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f741ec53000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f741e862000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f741e65a000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f741e456000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f741e237000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f742040a000)

@Wyverald
Copy link
Contributor

Culprit: bazelbuild/bazel@2482322

@sgowroji Why do you think this is the problem here?

Just FYI: the culprit commit is identified by an automatic bisect tool, and is just the first commit that started failing. So it's not always accurate (i.e. not necessarily the commit that caused the current failure).

@avdv
Copy link
Member

avdv commented May 16, 2024

I debugged this here: https://buildkite.com/bazel/rules-haskell-haskell/builds/2005

Basically it works on ubuntu2004, but fails on ubuntu1804.

The binaries for the //tests/library-empty/library-empty target seem to be mostly identical:

# ubuntu 1804
$ readelf -d bazel-ci-bin/tests/library-empty/library-empty
Dynamic section at offset 0x1cc0 contains 38 entries:
  Tag        Type                         Name/Value
 0x0000000000000003 (PLTGOT)             0x2fd0
 0x0000000000000002 (PLTRELSZ)           72 (bytes)
 0x0000000000000017 (JMPREL)             0xc18
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000007 (RELA)               0x948
 0x0000000000000008 (RELASZ)             720 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffff9 (RELACOUNT)          10
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000006 (SYMTAB)             0x298
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000005 (STRTAB)             0x4f0
 0x000000000000000a (STRSZ)              964 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x8b8
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libHSbase-4.17.2.0-ghc9.4.6.so]
 0x0000000000000001 (NEEDED)             Shared library: [libHSghc-bignum-1.3-ghc9.4.6.so]
 0x0000000000000001 (NEEDED)             Shared library: [libHSghc-prim-0.9.1-ghc9.4.6.so]
 0x0000000000000001 (NEEDED)             Shared library: [libHSrts-1.0.2_thr-ghc9.4.6.so]
 0x0000000000000001 (NEEDED)             Shared library: [libffi.so.8]
 0x0000000000000001 (NEEDED)             Shared library: [libgmp.so.10]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x000000000000000c (INIT)               0xc60
 0x000000000000000d (FINI)               0x1014
 0x000000000000001a (FINI_ARRAY)         0x2cb0
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x0000000000000019 (INIT_ARRAY)         0x2cb8
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001d (RUNPATH)            Library runpath: [$ORIGIN/../../_solib_k8/external_Srules_Uhaskell_Ughc_Ulinux_Uamd64:/var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/2efebdb1cb14e111ec858014b0a163bd/execroot/rules_haskell_tests/external/rules_haskell_ghc_linux_amd64/bin/../lib/lib/../lib/x86_64-linux-ghc-9.4.6]
 0x000000000000001e (FLAGS)              BIND_NOW
 0x000000006ffffffb (FLAGS_1)            Flags: NOW
 0x000000006ffffff0 (VERSYM)             0x8f0
 0x000000006ffffffe (VERNEED)            0x924
 0x000000006fffffff (VERNEEDNUM)         1
 0x0000000000000000 (NULL)               0x0
# ubuntu 2004
$ readelf -d bazel-ci-bin/tests/library-empty/library-empty
Dynamic section at offset 0x1cc0 contains 38 entries:
  Tag        Type                         Name/Value
 0x0000000000000003 (PLTGOT)             0x2fd0
 0x0000000000000002 (PLTRELSZ)           72 (bytes)
 0x0000000000000017 (JMPREL)             0xbc8
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000007 (RELA)               0x8f8
 0x0000000000000008 (RELASZ)             720 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffff9 (RELACOUNT)          10
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000006 (SYMTAB)             0x298
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000005 (STRTAB)             0x4c0
 0x000000000000000a (STRSZ)              952 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x878
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libHSbase-4.17.2.0-ghc9.4.6.so]
 0x0000000000000001 (NEEDED)             Shared library: [libHSghc-bignum-1.3-ghc9.4.6.so]
 0x0000000000000001 (NEEDED)             Shared library: [libHSghc-prim-0.9.1-ghc9.4.6.so]
 0x0000000000000001 (NEEDED)             Shared library: [libHSrts-1.0.2_thr-ghc9.4.6.so]
 0x0000000000000001 (NEEDED)             Shared library: [libffi.so.8]
 0x0000000000000001 (NEEDED)             Shared library: [libgmp.so.10]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x000000000000000c (INIT)               0xc10
 0x000000000000000d (FINI)               0xfb8
 0x000000000000001a (FINI_ARRAY)         0x2cb0
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x0000000000000019 (INIT_ARRAY)         0x2cb8
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001d (RUNPATH)            Library runpath: [$ORIGIN/../../_solib_k8/external_Srules_Uhaskell_Ughc_Ulinux_Uamd64:/var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/b29c7eb0e49d1926dfb75bf93b64c07e/execroot/rules_haskell_tests/external/rules_haskell_ghc_linux_amd64/bin/../lib/lib/../lib/x86_64-linux-ghc-9.4.6]
 0x000000000000001e (FLAGS)              BIND_NOW
 0x000000006ffffffb (FLAGS_1)            Flags: NOW
 0x000000006ffffff0 (VERSYM)             0x8a8
 0x000000006ffffffe (VERNEED)            0x8d8
 0x000000006fffffff (VERNEEDNUM)         1
 0x0000000000000000 (NULL)               0x0

But the RUNPATH entries are different. Lo and behold, the path from the first run does not exist:

$ ls -lh $( readelf -d bazel-ci-bin/tests/library-empty/library-empty | sed -ne '/RUNPATH/s,.*[$]ORIGIN.*:\([^]]*\)\].*,\1,p' )
ls: cannot access '/var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/2efebdb1cb14e111ec858014b0a163bd/execroot/rules_haskell_tests/external/rules_haskell_ghc_linux_amd64/bin/../lib/lib/../lib/x86_64-linux-ghc-9.4.6': No such file or directory

So that looks like a caching issue, right?

@avdv avdv linked a pull request May 30, 2024 that will close this issue
@avdv avdv self-assigned this May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants