Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect node binary uploaded for remote execution #3444

Open
gregjacobs opened this issue May 6, 2022 · 6 comments
Open

Incorrect node binary uploaded for remote execution #3444

gregjacobs opened this issue May 6, 2022 · 6 comments
Assignees
Milestone

Comments

@gregjacobs
Copy link

gregjacobs commented May 6, 2022

🐞 bug report

Affected Rule

Seemingly any rule that executes run_node

In my particular case, ts_project is causing this error, but I get the same error with other run_node-based rules (including ones that I have written).

Is this a regression?

Not sure

Description

I'm not sure if this is specifically a rules_nodejs problem, but it seems like it might be.

When running any run_node-based rule with --remote_executor, Bazel seems to upload the Node binary of the host platform rather than the binary for the execution platform. I'm running Bazel from Mac (host platform), and trying to execute on a remote Linux Buildbarn cluster.

Here is the error that I'm getting:

ERROR: /dev/bazel-poc/package-1/BUILD:5:16: Compiling TypeScript project //package-1:library_tsc [tsc -p package-1/tsconfig.json] failed: (Exit 126): tsc.sh failed: error executing command 
  (cd /private/var/tmp/_bazel_user/df7a4c614b8242298350c2cf3d5949ba/execroot/bazel_poc && \
  exec env - \
    BAZEL_NODE_MODULES_ROOTS='' \
    COMPILATION_MODE=fastbuild \
  bazel-out/darwin-opt-exec-2B5CBBC6/bin/external/npm/typescript/bin/tsc.sh --project package-1/tsconfig.json --outDir bazel-out/darwin-fastbuild/bin/package-1 --rootDir package-1/src --declarationDir bazel-out/darwin-fastbuild/bin/package-1 --tsBuildInfoFile bazel-out/darwin-fastbuild/bin/package-1/library_tsc.tsbuildinfo '--bazel_node_modules_manifest=bazel-out/darwin-fastbuild/bin/package-1/_library_tsc_TsProject.module_mappings.json')
# Configuration: 3202722b467eefd72f81b74e361edbcb387d5f20595fef5e6262727e3ce81321
# Execution platform: @local_config_platform//:host
Action details (uncached result): http://buildbarn.<internal-url>.com/uncached_action_result/rhel7/66ed02cf7dc475eeeb567965de44399c5fa000d64b2d1e691833b0304bdcc8b7/543/
bazel-out/darwin-opt-exec-2B5CBBC6/bin/external/npm/typescript/bin/tsc.sh: line 224: /worker/build/3476511f3cd3bb47/root/bazel-out/darwin-opt-exec-2B5CBBC6/bin/external/npm/typescript/bin/tsc.sh.runfiles/nodejs_darwin_amd64/bin/nodejs/bin/node: cannot execute binary file
Target //package-1:library failed to build

Note in the second to last line of output that the machine can't execute nodejs_darwin_amd64/bin/nodejs/bin/node (trying to run the Mac binary on a Linux machine).

🔬 Minimal Reproduction

Unfortunately I don't have a publicly-facing remote build cluster to share a reproduction on, but the project is simply running a ts_project rule and using --remote-executor. The ts_project rule works correctly when building locally.

I think the simplest minimal reproduction is simply running a ts_project rule from a host machine that is of a different platform than the remote executor's platform.

🔥 Exception or Error

bazel-out/darwin-opt-exec-2B5CBBC6/bin/external/npm/typescript/bin/tsc.sh: line 224: /worker/build/3476511f3cd3bb47/root/bazel-out/darwin-opt-exec-2B5CBBC6/bin/external/npm/typescript/bin/tsc.sh.runfiles/nodejs_darwin_amd64/bin/nodejs/bin/node: cannot execute binary file

^-- Bazel attempting to run MacOS binary on Linux

🌍 Your Environment

Operating System:

MacOS 10.14.6 (Mohave)

Output of bazel version:

Build label: 5.1.1
Build target: bazel-out/darwin-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Apr 8 15:57:36 2022 (1649433456)
Build timestamp: 1649433456
Build timestamp as int: 1649433456

Rules_nodejs version:

5.4.2

Anything else relevant?

It looks like there are a few related issues. Not sure if the first one has been addressed even though it was closed.

@gregjacobs gregjacobs changed the title Incorrect node binary uploaded to remote execution Incorrect node binary uploaded for remote execution May 6, 2022
@gregjacobs
Copy link
Author

Hey guys, just to ask: what would cause this? I can try taking a look at doing some modifications to rules_nodejs within my corporate network (where I can try the builds on buildbarn), and then contribute back any fixes, but I'm still learning Bazel and just not quite sure where to start. Any pointers on where to look would be helpful.

Thanks,
Greg

@danigar
Copy link

danigar commented Jun 2, 2022

Hi there! I think I'm hitting this as well. In my case is with pkg_web and RBE. Trying to execute assembler.js inside pkg_web throws nodejs_linux_amd64/bin/nodejs/bin/node: cannot execute binary file.

It seems weird to me because looking at pkg_web code, the execution requirements for assembler.js nodejs_binary contains no-remote and no-remote-exec.

@alexeagle alexeagle added this to the 6.0 milestone Jun 2, 2022
@alexeagle
Copy link
Collaborator

FWIW, @gregmagolan used this toolchain (from a Mac host) with Engflow RBE yesterday and got a correct nodejs interpreter on the exec platform, so I don't think this is totally broken. We might need a repro.

@danigar
Copy link

danigar commented Jun 6, 2022

hi @alexeagle @gregmagolan. I've created a repro. The failing target is in the branch pkg_web of the following repo: https://github.com/danigar/bazel-swc-lab/tree/pkg_web

Hope this helps =)

@yuriyhanysh
Copy link

Hi! I've bumped into the same stuff with RBE. Are there any known workarounds?

@sharmilajesupaul
Copy link

sharmilajesupaul commented Aug 20, 2022

I recently ran into a similar issue at work; we're also using EngFlow for RBE.

I was running a rule that uses the nodejs toolchain. I wanted to run a build on a local Mac first using nodejs & then hand the result off to RBE to run tests against it (the testing is also using nodejs).

My build was always resolving nodejs_linux_amd64/bin/nodejs/bin/node on local and remote. And would fail in the same way, where the Linux binary couldn't be executed on my Mac.

I traced the issue down to our Engflow cross-platform configuration in .bazelrc. It turns out that Bazel doesn't fully support this kind of cross-platform handoff, i.e., running some rules on one platform & others on a different platform. We're dealing with the interaction between macOS <> Linux by lying about the host platform in our configuration and always setting the host machine to Linux when the RBE config is used.

I think this is the common thing to do for EngFlow. For example, the configs have --host_platform=//some-linux:platform, pointing at some platform target with Linux constraints in your .bazelrc (e.g., in bazel-swc-lab). If I understand all this, that config will make all toolchains, including the one from rules_nodejs, resolve to platforms that support Linux.

I'm not sure what the ideal solution is. We can't drop the platform overrides in many cases because we need to support RBE and we need to be able to use EngFlow for both local & remote builds.

One terrible workaround we're considering for our custom toolchains is lying in the toolchain definition and saying that the macOS binaries are also compatible with Linux 🤦🏾‍♀️ . That way, we can expose all compatible binaries, and make an entry point for the toolchain that detects which platform we're on and uses the correct binary. When we run on macoOS with the host platform set to Linux, we can detect and run the right binary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants