Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host Container Unable to Create Container Task #3970

Open
gitskamzn opened this issue May 17, 2024 · 6 comments
Open

Host Container Unable to Create Container Task #3970

gitskamzn opened this issue May 17, 2024 · 6 comments
Labels
status/needs-info Further information is requested type/bug Something isn't working

Comments

@gitskamzn
Copy link

gitskamzn commented May 17, 2024

Image I'm using:
Bottlerocket K8s 1.29
VERSION_ID: 1.19.4
Build_ID=4f0a078e

What I expected to happen:
Expect container task to start every iteration.

What actually happened:
Container task fails to start after it encounters failure.

How to reproduce the problem:

I can see the container task start when a new node comes up. During its regular run, it fails to start again and complains the task already exists. This appears to happen when the previous deletion fails. I can see in logs the deletion of task fail with context deadline exceeded error.

level=error msg=failed to delete container task" error="failed to delete task: context deadline exceeded: unknown"
level=error msg=failed to cleanup container" error="cannot delete running task taskname: failed precondition"

Subsequent runs are unable to get the task started as it seems to exist already.

@gitskamzn gitskamzn added status/needs-triage Pending triage or re-evaluation type/bug Something isn't working labels May 17, 2024
@vigh-m
Copy link
Contributor

vigh-m commented May 17, 2024

Hi @gitskamzn, Thanks for reaching out with this issue.

  1. Was this working on previous versions of Bottlerocket?
  2. Can you share some details of the container and instance type you are launching?
  3. Any data about how you are launching your containers?

@vigh-m vigh-m added status/needs-info Further information is requested and removed status/needs-triage Pending triage or re-evaluation labels May 17, 2024
@gitskamzn
Copy link
Author

Hi @vigh-m - This is an ongoing issue that I see happening with instances using Bottlerocket OS 1.19.2 - 1. 19.4 versions. One of the instance types i see this in is m6i.4xlarge. These instances are part of EKS clusters that are launched using IaC.
Containerd is enabled for these instances and the host-containers settings are added as explained here --> https://bottlerocket.dev/en/os/1.19.x/api/settings/host-containers/#container_source

I can see the 10 seconds timeout here :

cleanup, cancel := context.WithTimeout(context.Background(), 10*time.Second)
which seems to reflect in the host container logs as well.
I see failed to delete container task. error: failed to delete task: context deadline exceeded:unknown error appear exactly 10 seconds after container task exited log message.

@gitskamzn
Copy link
Author

Just for scenarios like this one, is adding gracePeriod a good idea when a context deadline exceeds or somehow ensure/force clean the task.
Also, have we considered to parameterize the retries and 45 second delays? It was probably looked into but not implemented. #1430.

@vigh-m
Copy link
Contributor

vigh-m commented May 20, 2024

Hi,
So, host-ctr was not intended to have complex orchestration strategies and parameters. It's recommended to use an orchestrator to enable those features. Can you share more detailed logs around the container tasks that you are seeing?
Also;

  1. What is the size of this container image?
  2. What is the expected time for the task being executed on this container?
  3. Do you see issues with the admin and control container provided by Bottlerocket?

@gitskamzn
Copy link
Author

Please see below:

  1. What is the size of this container image? - 220 MB
    
  2. What is the expected time for the task being executed on this container?  - It executes 3 apiclient commands: update check, get and set version-lock. Takes about 4 seconds for a successful run. Failures start after its unable to cleanp the task and container in allocated 10 seconds.
    
  3. Do you see issues with the admin and control container provided by Bottlerocket? - Host Container 
    

Update Method being used: https://bottlerocket.dev/en/os/1.19.x/update/methods/in-place/

@arnaldo2792
Copy link
Contributor

Do you loop after you are done with your apiclient commands? Otherwise, the container will exit and systemd (which is what we use to execute host-ctr) will try to restart the process since it exited.

Is this something you want to keep running every time? Why not setting all these configurations through user data, or even a bootstrap container with mode = once?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/needs-info Further information is requested type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants