New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi g-w: Interactive username cltbld does not match task user task_171042065159733
#6952
Comments
Upvote |
This means, Generic Worker ran, and created user What is the reason for the I think the fix here is not to log into the machine as the cltbld user, or if it is needed, to perform some administrative activity that can't be done over ssh, then Note, Generic Worker can't really decide to self-fix this issue - it demonstrates that something is wrong. A trusted environment has logged in as the wrong user, and done something in conflict with Generic Worker. This is an error, so it panics. It created the user, and rebooted the machine, so it expects that user to be logged in. If it isn't, and it just fixed the issue and rebooted, you would never be able to log in as cltbld, because Generic Worker would immediately reboot the machine if it had an auto-fix. I think the underlying problem is not Generic Worker behaviour, it is that something/someone is logging in as cltbld, which interferes with the worker workflow. If this is for interactive tasks, is there a reason users can't use the taskcluster interactive feature directly? That is guarded by scopes, does not require that anyone share passwords with users and has no administrative burden. It guards access to workers, and makes sure any changes they apply occur in an isolated environment. If they need to be able to do things as root, the task user can be granted privileges by adding |
Note, the fact it is a reboot-loop is probably beyond the scope of generic worker. Generic Worker just exits with a particular exit code, presumably something else detects this, and then reboots the machine, causing it to boot-loop. If that thing is worker runner, that is another Worker Runner bug, which will be gone when #6229 lands. |
Describe the bug
g-w multi workers reboot loop with the message
To Reproduce
Steps to reproduce the behavior:
Unsure how they are getting to this state. Does the error mean that someone submitted a job trying to be interactive as the 'cltbld' user (but we're in multiuser so that user isn't valid?)? This shouldn't be fatal for the worker (definitely the job) if that's what's going on.
Expected behavior
The worker would keep working.
We resolve this with a
sudo rm /opt/worker/*user.json
and a reboot. It seems like g-w could detect this state and do the same and avoid having to manually intervene.Taskcluster version
generic-worker (multiuser engine) 60.3.4 [ revision: https://github.com/taskcluster/taskcluster/commits/943a6f2b0d14fa0270280bc6f23acc2945d0fe45 ]
Platform (please complete the following information):
Mac OS X
Additional context
The text was updated successfully, but these errors were encountered: