Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove services from the blue/green dance #417

Open
sampaccoud opened this issue Dec 31, 2019 · 2 comments
Open

Remove services from the blue/green dance #417

sampaccoud opened this issue Dec 31, 2019 · 2 comments
Assignees

Comments

@sampaccoud
Copy link
Contributor

sampaccoud commented Dec 31, 2019

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
Arnold supports blue/green deployments by deploying complete stacks of services, pods, jobs, endpoints, etc. and switching the current route to the service of the new stack.

To do this, services include a deployment stamp in their name. This has raised several issues because it is not possible to use service names in the settings of another app because the name will change as soon as the service is redeployed.

Concrete cases when this is a problem:

  • to run FUN's dogwood.3-fun image, we use the elasticsearch cluster from the forum app. The problem was solved temporarily by retrieving dynamically the host name of elasticsearch:
    {% set
      forum_current_service =
        lookup(
          'k8s',
          namespace=project_name,
          kind='Route',
          label_selector='app=forum,route_prefix=current'
        ) | json_query('spec.to.name')
    %}
    
    {% set
      forum_current_deployment_stamp =
        lookup(
          'k8s',
          namespace=project_name,
          kind='Service',
          resource_name=forum_current_service
        ) | json_query('metadata.labels.deployment_stamp')
    %}
    
    This is very brittle because the connection will break if we redeploy the forum, and it can only be repaired by subsequently redeploying edxapp with a downtime >30 minutes for the forum.
  • the edxapp application needs to connect to the forum application which is in a separate app, both being deployed with blue/green. In order to have a stable address for the forum, we had to expose the forum on a route. This would not be necessary if we could connect to the forum service directly. The above trick used to connect to elasticsearch could also be used here, but it would be brittle and could only be temporary.

Describe the solution you'd like
We should take services out of the blue/green dance.

Services would be created at project initialization, with a fixed name (e.g. edxapp-lms). Routes would be connected to their services once for all.

The switch playbook would patch each service's selector instead of patching the target service in the route.

Thanks to the selector, it should be possible to make the switch in 2 steps:

  1. add the next (green) stack to the selector. After this step, the traffic would go both to the current and to the next stacks.
  2. remove the current (blue) stack from the selector. After this step, all the traffic would go to the new stack.

This two-steps switch presents to advantages:

  • zero downtime (note that switching the target service on the route was creating a small interruption)
  • possibility to do canary deployments
@jmaupetit
Copy link
Contributor

Is the two-steps switch required? Why not patching the service once for all to target the new stack?

@sampaccoud
Copy link
Contributor Author

@jmaupetit I proposed this two-steps because I noticed that the route switch is inducing a short downtime. If it is not the case then fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants