Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscaler WebHook fallback Policy #3686

Open
nrwiersma opened this issue Mar 8, 2024 · 8 comments
Open

Autoscaler WebHook fallback Policy #3686

nrwiersma opened this issue Mar 8, 2024 · 8 comments
Labels
kind/feature New features for Agones

Comments

@nrwiersma
Copy link
Contributor

Is your feature request related to a problem? Please describe.
While using the Autoscaler WebHook, it is possible, if not guaranteed, that at some point the service the WebHook points to will become unavailable for some or other reason. If the service does not recover within the autoscaling interval the fleets will not scale at all.

Describe the solution you'd like
It would be useful to be able to configure both the WebHook and another Policy (e.g. BufferPolicy). If the WebHook fails to respond, the other policy would be used. If no other policy is specified, the current behaviour persists, in that no replica change is made. Perhaps there could be a WebHook failure allowance (number of allowed failures) before the other Policy is used.

Describe alternatives you've considered
There are none. As it stands it is all or nothing with the WebHook.

Additional context
While it is obviously not desirable for the WebHook service to be unavailable for extended periods, anyone that has seen production knows that this unfortunately happens. This feature would bring a measure of resilience when using a WebHook.

@nrwiersma nrwiersma added the kind/feature New features for Agones label Mar 8, 2024
@markmandel
Copy link
Member

This sounds like a good thing 👍🏻

Any suggestions on what the configuration should look like?

@nrwiersma
Copy link
Contributor Author

nrwiersma commented Mar 11, 2024

I would keep the config the same, but the addition of something like Fallback: true on the WebHook config. As it stands, the autoscaler validation does not stop you from setting multiple policies, they just get validated in a specific order. By moving WebHook forward in that list and checking Fallback it could continue down the list to find the next policy to validate, and the implementation would basically work the same.

Should allowing a number of failures before fallback be desired, that too could be added to the WebHook.

@nrwiersma
Copy link
Contributor Author

Actually, that would not work, you would need a FallbackType to specify which type you want to fallback to.

@markmandel
Copy link
Member

I was thinking some kind of nested config under webhook?

apiVersion: autoscaling.agones.dev/v1
kind: FleetAutoscaler
metadata:
  name: webhook-fleet-autoscaler
spec:
  fleetName: simple-game-server
  policy:
    # type of the policy - this example is Webhook
    type: Webhook
    # parameters for the webhook policy - this is a WebhookClientConfig, as per other K8s webhooks
    webhook:
      # use a service, or URL
      service:
        name: autoscaler-webhook-service
        namespace: default
        path: scale
      fallback: # Basically this is a copy of the above - so you could fallback to another webhook - but only allow one level
        policy:
          type: Buffer
          buffer:
            bufferSize: 5
            minReplicas: 10
            maxReplicas: 20

@nrwiersma
Copy link
Contributor Author

Works as well. I would personally not have allowed Webhook falling back to Webhook, because purely from a configuration PoV it looks like more than 1 level could work. There will always be a reason to extend it by one more level. But happy to defer to your experience here.

@markmandel
Copy link
Member

Works as well. I would personally not have allowed Webhook falling back to Webhook

Yeah, but also, if you really want to - should we stop you?

Also it makes our configuration parsing and management easier - the fallback is the same config as the parent - so everything becomes simple, rather than having to keep the fallback up to date with whatever newer config options we come up with down the line.

This also assumes that Helm lets us do some kind of include here that allows us to be at least a little self-referential 😄

@nrwiersma
Copy link
Contributor Author

Yeah, but also, if you really want to - should we stop you?

Fair point 😄

This also assumes that Helm lets us do some kind of include here that allows us to be at least a little self-referential 😄

Helm playing ball is not an experience I have had too often, so it is not something I would bet on.

We would be happy to take this on, if it works for you.

@markmandel
Copy link
Member

Helm playing ball is not an experience I have had too often, so it is not something I would bet on.

Hah. Quite possibly. I'm thinking an include where you can turn off the fallback option, so it doesn't recurse forever.

We would be happy to take this on, if it works for you.

No issue for me, have at it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New features for Agones
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants