Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack deployed on wrong endpoint #11787

Open
2 tasks done
emagiz opened this issue May 7, 2024 · 0 comments · May be fixed by #11788
Open
2 tasks done

Stack deployed on wrong endpoint #11787

emagiz opened this issue May 7, 2024 · 0 comments · May be fixed by #11788
Labels

Comments

@emagiz
Copy link

emagiz commented May 7, 2024

Before you start please confirm the following.

Problem Description

We've seen that during a call to portainer API to list container, the containers of the wrong endpoint were reported. As a result, a call to PUT a stack, landed up on the wrong endpoint aswell. This should not happen. We have confirmed that we called the right endpoint ID, and it landed up on a different endpoint ID.

Expected Behavior

When a list container or stack put call is done to a certain endpoint, it also lands on the right endpoint

Actual Behavior

The containers of another endpoint are listed, and the stack ends up on the wrong endpoint.

Steps to Reproduce

Unreproducable, propbabely a very very low chance of it happening

Portainer logs or screenshots

No response

Portainer version

2.18.4

Portainer Edition

Community Edition (CE)

Platform and Version

Docker 24.0.2

OS and Architecture

Ubuntu 22.04.3 LTS

Browser

not applicable

What command did you use to deploy Portainer?

Ansible:



    - name: Create portainer
      community.docker.docker_container:
        name: "portainer"
        image: portainer/portainer-ce:2.18.4
        state: started
        restart_policy: always
        command: "--admin-password-file=/resources/password --sslcert /resources/certificate.crt --sslkey /resources/certificate.key"
        published_ports:
          - 8000:8000
          - 443:9443
        networks:
          - name: "containers"
        networks_cli_compatible: yes
        log_driver: local
        log_options:
          max-size: 100m
          max-file: "5"
        mounts:
          - source: /var/run/docker.sock
            target: /var/run/docker.sock
            type: bind
          - source: /opt/resources/portainer_data
            target: /data
            type: bind
          - source: /opt/resources
            target: /resources
            type: bind

Additional Information

A PR will be added with a suspicioun to where the problem originates.
Slack link: https://portainer.slack.com/archives/C2AGKR5JB/p1714747552266709

Our suspicioun is the following happens:

- We ListContainers, which connects agent A on port X. As our automation sees something is missing, it start the process of deploying a new stack file.
- Meanwhile, another ListContainer happens on another endpoint, connecting agentB
- For some reason, agent A has disconnected from port X, and as agent B wants to connect, connects to port X.
- The preparation is done, and the API call to deploy a stack is sent to portainer
- Portainer still things that agent A is connected to port X, and thus sends it over port X. 
- Meanwhile, agent B now receives the request and thus deploys the stack
- This would explain why our call for endpoint A ends up on endpoint B.
- 
We have not seen it happening before, so it would be a very very uncommon case, but very problematic as you can imagine. Can you confirm this is impossible? Are we 100% sure if we call a deployment on endpoint A it can never end up in endpoint B? Or is there a chance that it can happen it gets send to a different endpoint?

We suspect the cause is a lack of a lock in several places, for example: https://github.com/portainer/portainer/blob/develop/api/chisel/tunnel.go#L150

@emagiz emagiz linked a pull request May 7, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant