Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: many API requests from Routes Controller #395

Open
apricote opened this issue Mar 15, 2023 · 7 comments
Open

fix: many API requests from Routes Controller #395

apricote opened this issue Mar 15, 2023 · 7 comments
Labels
bug Something isn't working pinned

Comments

@apricote
Copy link
Member

We are sending a request every ten seconds to Hetzner Cloud API if the routing functionality is active (default when Networking is activated). This equals 1/10th of the default API rate limit on new accounts.

Example metrics that demonstrate the issue described here

This happens even though nothing has changed on the Kubernetes side, and there is no real need to reconcile this often. It should be enough to reconcile this on Node IP changes and when Nodes are added or removed.

This behaviour is coming from k/cloud-provider and there exists an upstream issue for it: kubernetes/kubernetes#60646

@apricote apricote added the bug Something isn't working label Mar 15, 2023
@samcday
Copy link
Contributor

samcday commented Mar 15, 2023

The main issue here is the fact that route reconciliation isn't triggered properly.

If the reconciliation logic was more internally coherent, we could safely increase the default 10s re-reconciliation interval to something like 1m. This would mean that new servers should get routes quickly, but routes that were modified out of band might take a minute or two to self-heal.

Longer term we've been discussing the possibility/merits of implementing hccm's logic in a standard kubebuilder project. In that case we would have full control over things like this and be able to much more easily tune this without resorting to forking the upstream or pinging a 5 year old ticket.

In the interim, I propose configuring the reconciliation period from 10s -> 30s. This means that new servers take a little longer to get set up, but the normal case is 3x less API traffic.

apricote added a commit that referenced this issue Mar 17, 2023
Reduce the number of API requests coming from the routes controller by reducing the reconciliation interval from 10s to 30s.

This is only a temporary fix until we can properly refactor the routes controller to only reconcile when necessary.

Related to #395
@apricote
Copy link
Member Author

In the interim, I propose configuring the reconciliation period from 10s -> 30s. This means that new servers take a little longer to get set up, but the normal case is 3x less API traffic.

Implemented in #403.

apricote added a commit that referenced this issue Mar 17, 2023
Reduce the number of API requests coming from the routes controller by
reducing the reconciliation interval from 10s to 30s.

This is only a temporary fix until we can properly refactor the routes
controller to only reconcile when necessary.

Related to #395
@github-actions
Copy link
Contributor

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

@github-actions github-actions bot added the stale label May 16, 2023
@apricote apricote added pinned and removed stale labels May 16, 2023
@github-actions
Copy link
Contributor

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

@github-actions github-actions bot added the stale label Jul 16, 2023
@apricote apricote removed the stale label Jul 17, 2023
@jooola
Copy link
Member

jooola commented Jul 24, 2023

We recently fixed a bug related to the network API calls 5461038

Is there a chance that this issue is related to the bug we fixed above ? Do we have some metrics that proves that this isn't an issue anymore ?

@apricote
Copy link
Member Author

Unfortunately this is not fixed yet. Reconciliation still happens on a 30s interval (or whatever people configure). This can only be properly fixed by upstream k/cloud-provider: kubernetes/kubernetes#60646

@github-actions
Copy link
Contributor

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

@github-actions github-actions bot added the stale label Sep 22, 2023
@apricote apricote removed the stale label Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pinned
Projects
None yet
Development

No branches or pull requests

3 participants