Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaks on (at least some) self-hosted runners #98

Open
dhess opened this issue Oct 1, 2021 · 24 comments
Open

Breaks on (at least some) self-hosted runners #98

dhess opened this issue Oct 1, 2021 · 24 comments
Labels
bug Something isn't working

Comments

@dhess
Copy link

dhess commented Oct 1, 2021

We recently started using https://github.com/philips-labs/terraform-aws-github-runner. It creates ephemeral self-hosted runners in EC2. Most of the time, the instances are freshly-created, but they will stick around for a bit after finishing a workflow in case another workflow quickly becomes available, in which case you might get the same instance again.

When this happens, and both the previous workflow and the next workflow use this action, we get the following failure:

...
2021-10-01T00:45:54.7702575Z �[34m---- let's talk about sudo -----------------------------------------------------
2021-10-01T00:45:54.7714295Z �[0mThis script is going to call sudo a lot. Normally, it would show you
2021-10-01T00:45:54.7716361Z exactly what commands it is running and why. However, the script is
2021-10-01T00:45:54.7718730Z run in a headless fashion, like this:
2021-10-01T00:45:54.7719651Z 
2021-10-01T00:45:54.7720930Z   $ curl -L https://nixos.org/nix/install | sh
2021-10-01T00:45:54.7721756Z 
2021-10-01T00:45:54.7723160Z or maybe in a CI pipeline. Because of that, we're going to skip the
2021-10-01T00:45:54.7724515Z verbose output in the interest of brevity.
2021-10-01T00:45:54.7725249Z 
2021-10-01T00:45:54.7725953Z If you would like to
2021-10-01T00:45:54.7726840Z see the output, try like this:
2021-10-01T00:45:54.7727745Z 
2021-10-01T00:45:54.7729161Z   $ curl -L -o install-nix https://nixos.org/nix/install
2021-10-01T00:45:54.7730595Z   $ sh ./install-nix
2021-10-01T00:45:54.7731151Z 
2021-10-01T00:45:54.7747047Z 
2021-10-01T00:45:54.7748571Z �[34m---- oh no! --------------------------------------------------------------------
2021-10-01T00:45:54.7787295Z �[0m�[31mWhen this script runs, it backs up the current /etc/bashrc to
2021-10-01T00:45:54.7788936Z /etc/bashrc.backup-before-nix. This backup file already exists, though.
2021-10-01T00:45:54.7789763Z 
2021-10-01T00:45:54.7790547Z Please follow these instructions to clean up the old backup file:
2021-10-01T00:45:54.7791475Z 
2021-10-01T00:45:54.7792633Z 1. Copy /etc/bashrc and /etc/bashrc.backup-before-nix to another place, just
2021-10-01T00:45:54.7793594Z in case.
2021-10-01T00:45:54.7794042Z 
2021-10-01T00:45:54.7795175Z 2. Take care to make sure that /etc/bashrc.backup-before-nix doesn't look like
2021-10-01T00:45:54.7796625Z it has anything nix-related in it. If it does, something is probably
2021-10-01T00:45:54.7797738Z quite wrong. Please open an issue or get in touch immediately.
2021-10-01T00:45:54.7801951Z �[0m
2021-10-01T00:45:54.7817828Z �[31mWe'd love to help if you need it.
2021-10-01T00:45:54.7818492Z 
2021-10-01T00:45:54.7819575Z You can open an issue at https://github.com/nixos/nix/issues
2021-10-01T00:45:54.7820828Z 
2021-10-01T00:45:54.7821661Z Or feel free to contact the team:
2021-10-01T00:45:54.7822900Z  - Matrix: #nix:nixos.org
2021-10-01T00:45:54.7824170Z  - IRC: in #nixos on irc.libera.chat
2021-10-01T00:45:54.7825429Z  - twitter: @nixos_org
2021-10-01T00:45:54.7826903Z  - forum: https://discourse.nixos.org
2021-10-01T00:45:54.8703411Z child_process.js:642
2021-10-01T00:45:54.8708857Z     throw err;
2021-10-01T00:45:54.8709521Z     ^
2021-10-01T00:45:54.8709947Z 
2021-10-01T00:45:54.8716836Z Error: Command failed: /home/ec2-user/actions-runner/_work/_actions/cachix/install-nix-action/v14/lib/install-nix.sh
2021-10-01T00:45:54.8718487Z     at checkExecSyncError (child_process.js:621:11)
2021-10-01T00:45:54.8719881Z     at Object.execFileSync (child_process.js:639:15)
2021-10-01T00:45:54.8721521Z     at Object.<anonymous> (/home/ec2-user/actions-runner/_work/_actions/cachix/install-nix-action/v14/lib/main.js:4:17)
2021-10-01T00:45:54.8722812Z     at Module._compile (internal/modules/cjs/loader.js:959:30)
2021-10-01T00:45:54.8723869Z     at Object.Module._extensions..js (internal/modules/cjs/loader.js:995:10)
2021-10-01T00:45:54.8724925Z     at Module.load (internal/modules/cjs/loader.js:815:32)
2021-10-01T00:45:54.8725913Z     at Function.Module._load (internal/modules/cjs/loader.js:727:14)
2021-10-01T00:45:54.8727036Z     at Function.Module.runMain (internal/modules/cjs/loader.js:1047:10)
2021-10-01T00:45:54.8728020Z     at internal/main/run_main_module.js:17:11 {
2021-10-01T00:45:54.8728629Z   status: 1,
2021-10-01T00:45:54.8729126Z   signal: null,
2021-10-01T00:45:54.8729704Z   output: [ null, null, null ],
2021-10-01T00:45:54.8730237Z   pid: 16656,
2021-10-01T00:45:54.8730725Z   stdout: null,
2021-10-01T00:45:54.8731242Z   stderr: null
2021-10-01T00:45:54.8731693Z }

I would have thought that this action checks for an existing /nix/store, and it appears that it used to, but that's been removed: 82ce26d

@domenkozar
Copy link
Member

@domenkozar
Copy link
Member

We have two goals that kind of contradict each other:

a) make sure we don't install Nix if it's already installed
b) if the previous installation didn't succeed, try again

There's no definite way of detecting if Nix was installed successfully, the best shot is to source profile and check for a nix binary.

@domenkozar
Copy link
Member

I'll tackle this in a few days.

@dhess
Copy link
Author

dhess commented Nov 3, 2021

I just wanted to add that we're no longer using this EC2 AWS GitHub runner project, so this is no longer a problem for us. It seems like it's worth keeping open as an issue, in any case, but I'll close it if you prefer.

@shajra
Copy link

shajra commented Nov 5, 2021

Is this possibly the same as what I'm seeing here with the macos-latest image?

@domenkozar
Copy link
Member

domenkozar commented Nov 5, 2021

@shajra that's fixed by install-nix-action v14.1 (sadly for some reason dependabot is not picking it up).

@dhess let's leave this open since I'd still like to fix it.

@domenkozar domenkozar added the bug Something isn't working label Nov 30, 2021
kquick added a commit to GaloisInc/macaw that referenced this issue Jan 12, 2022
There appears to be some unknowns regarding ensuring nix is installed,
but not overwritten (see
cachix/install-nix-action#98 and
https://github.com/cachix/install-nix-actions/pull/100/files).  It is
reported that version 14.1 does not have this issue.
kquick added a commit to GaloisInc/macaw that referenced this issue Jan 12, 2022
There appears to be some unknowns regarding ensuring nix is installed,
but not overwritten (see
cachix/install-nix-action#98 and
https://github.com/cachix/install-nix-actions/pull/100/files).
kquick added a commit to GaloisInc/macaw that referenced this issue Jan 12, 2022
There appears to be some unknowns regarding ensuring nix is installed,
but not overwritten (see
cachix/install-nix-action#98 and
https://github.com/cachix/install-nix-actions/pull/100/files).

Ultimately however, this step is not needed for the current Galois
internal runners that are already nix-based configurations.
kquick added a commit to GaloisInc/macaw that referenced this issue Jan 12, 2022
There appears to be some unknowns regarding ensuring nix is installed,
but not overwritten (see
cachix/install-nix-action#98 and
https://github.com/cachix/install-nix-actions/pull/100/files).
kquick added a commit to GaloisInc/macaw that referenced this issue Jan 12, 2022
There appears to be some unknowns regarding ensuring nix is installed,
but not overwritten (see
cachix/install-nix-action#98 and
https://github.com/cachix/install-nix-actions/pull/100/files).
kquick added a commit to GaloisInc/macaw that referenced this issue Jan 12, 2022
There appears to be some unknowns regarding ensuring nix is installed,
but not overwritten (see
cachix/install-nix-action#98 and
https://github.com/cachix/install-nix-actions/pull/100/files).
kquick added a commit to GaloisInc/macaw that referenced this issue Jan 12, 2022
There appears to be some unknowns regarding ensuring nix is installed,
but not overwritten (see
cachix/install-nix-action#98 and
https://github.com/cachix/install-nix-actions/pull/100/files).
kquick added a commit to GaloisInc/macaw that referenced this issue Jan 12, 2022
There appears to be some unknowns regarding ensuring nix is installed,
but not overwritten (see
cachix/install-nix-action#98 and
https://github.com/cachix/install-nix-actions/pull/100/files).
kquick added a commit to GaloisInc/macaw that referenced this issue Jan 12, 2022
There appears to be some unknowns regarding ensuring nix is installed,
but not overwritten (see
cachix/install-nix-action#98 and
https://github.com/cachix/install-nix-actions/pull/100/files).
kquick added a commit to kquick/macaw that referenced this issue Jan 13, 2022
There appears to be some unknowns regarding ensuring nix is installed,
but not overwritten (see
cachix/install-nix-action#98 and
https://github.com/cachix/install-nix-actions/pull/100/files).
kquick added a commit to kquick/macaw that referenced this issue Jan 13, 2022
There appears to be some unknowns regarding ensuring nix is installed,
but not overwritten (see
cachix/install-nix-action#98 and
https://github.com/cachix/install-nix-actions/pull/100/files).
kquick added a commit to kquick/macaw that referenced this issue Jan 13, 2022
There appears to be some unknowns regarding ensuring nix is installed,
but not overwritten (see
cachix/install-nix-action#98 and
https://github.com/cachix/install-nix-actions/pull/100/files).
@domenkozar
Copy link
Member

I'd appreciate if someone can tell if they can reproduce this with v18

@sandangel
Copy link

@domenkozar I have the same issue with latest master. it looks like the check for nix already install is by-passed, It always try to reinstall nix (non-ephemeral runners). This could be because nix binary was not found in PATH for some reason.

if type -p nix &>/dev/null ; then
  echo "Aborting: Nix is already installed at $(type -p nix)"
  exit
fi

@domenkozar
Copy link
Member

What kind of OS are you running and how was Nix installed? Can I get the same image to test?

@sandangel
Copy link

We just use ubuntu image from AWS image builder @domenkozar

@domenkozar
Copy link
Member

I'll hopefully be able to look into this in a week or so.

@sandangel
Copy link

image

Could this be the reason?
Bash with norc, so the runner could not pick up the nix binary in PATH. bash norc is the default shell for many self hosted runners. I think we should not rely on checking nix in the PATH to proceed with installation.

@domenkozar
Copy link
Member

That's a good find! We'll need to look into how to load Nix environment without relying on rc.

@domenkozar
Copy link
Member

@sandangel I'm happy to accept a PR to fix this one :)

@domenkozar
Copy link
Member

As I've been setting up self-hosted runners I can say that I'd suggest not using install-nix-action at all and just make sure nix & cachix are installed before the runner goes into action.

Sometimes that might not be possible so I'll leave this open, but for the majority of the cases that should be the right way to do it.

@domenkozar
Copy link
Member

@sandangel
Copy link

right now I stopped using install-nix-action and use nixpkgs/devcontainer:latest docker image. It supports actions/checkout@v3 out of the box.

@nebez
Copy link

nebez commented Jan 3, 2023

Also running into this issue on my self-hosted runner.

uname -a
Linux [redacted] 5.15.0-1026-aws #30-Ubuntu SMP Wed Nov 23 17:01:09 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

@domenkozar
Copy link
Member

domenkozar commented Jan 4, 2023

If someone can give me a https://tmate.io session I can help debug this one.

JeroenKnoops added a commit to JeroenKnoops/install-nix-action that referenced this issue Apr 11, 2023
For some reason the github.action_path is not set correctly.

Now using the environment variable.

Relates to: cachix#98
@dpc
Copy link

dpc commented Apr 29, 2023

I'm setting up a self-hosted runner and my current workaround is basically:

 
+        - run: sudo rm -f /etc/bash.bashrc.backup-before-nix /etc/zshrc.backup-before-nix || true
+
         - uses: cachix/install-nix-action@v20
           with:
             nix_path: nixpkgs=channel:nixos-22.05
+
+        - run: sudo rm -f /etc/bash.bashrc.backup-before-nix /etc/zshrc.backup-before-nix || true
+

though I wonder if all this installing globally nix all the time is not going to have some race conditions.

I need a workaround that works shared between hosted and non-hosted runners.

I noticed some oddities, when I was looking at it (e.g. /nix/store was not there after the build succeeded the first time? maybe I've got confused, but I don't think so...). I wonder if the Github Runner for MacOS is not using some overlay file systems etc. or maybe just a chroot shared between all jobs?

The /etc/bash.bashrc.backup-before-nix is definitely left there, so it looks like jobs are sharing the FS. But the shell starts with a really bare PATH and sourceing the init script manually did not seem to work between steps, so I'm out of better ideas.

Edit: Didn't took long to discover that indeed two jobs running at the same time can still choke on these backup files and one will fail.

@dpc
Copy link

dpc commented Apr 29, 2023

I don't get it.

So nix installer on MacOS (only?) installs the initialization scripts in /etc/bashrc. But if you look in:

https://www.gnu.org/software/bash/manual/html_node/Bash-Startup-Files.html

There's nothing about that file being ever sourced. ~/.bashrc is sometimes sourced ... though a custom line in ~/.bash_profile. So why is Nix installer trying to use /etc/bashrc?!

@dpc
Copy link

dpc commented Apr 29, 2023

So github is hardcoding that command and gives no way to set the defaults:

One can go around and change the shell: bash ... on every action and it will work, but that doesn't seem very practical...

Whole life working around idiotic problems in terrible software. Damn it.

@dpc
Copy link

dpc commented Apr 29, 2023

So it's the installer not running successfully that breaks /nix/... on this machine. NixOS/nix#7106

This is got to be some bad joke...

@dpc
Copy link

dpc commented Apr 29, 2023

OK, so here is how I make it work:

Start the runner via script:

#!/usr/bin/env bash

cd $HOME/actions-runner
env PATH="$HOME/actions-runner-hack/:$PATH" ./run.sh

As you can see it adds an extra dir to the PATH. In this dir, we put a bash wrapper script. Since actions runner is using just bash it will pick the first one available on the PATH.

And that hacked bash looks like this:

#!/bin/bash

if [ -z "$SOURCED_NIX_ALREADY" ]; then
  export SOURCED_NIX_ALREADY=yes
  . '/nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh'
fi

exec /bin/bash "$@"

So basically - it can do any initialization that we need (in this case sourcing nix) and call the real /bin/bash. If need we could do stuff like skipping --noprofile etc. , but it seems unnecessary.

Now everything just works 🤞, without any modifications to github actions scripts, with all hacking done on the runner machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants