New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix hashring for all possible failure scenarios #80
Conversation
Signed-off-by: SriKrishna Paparaju <paparaju@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code it appears that we have always mixed up a pod in Running
phase with a container Ready
to serve requests - it would be good to clean that up in this PR.
Also - I'm wondering if there is a smarter way we can be informed of changes to the hashrings we are watching?
The newFilteredStatefulSetInformer function accepts a filtering function - perhaps we could parse out the pods that are suitable there?
edit: I think we could use the ListOptions.FieldSelectors to do this.
pods get to |
@spaparaju and I chatted about this PR last Thursday, there are a couple of things to do before merging:
|
|
Really nice demos there @spaparaju what an effective way to do code review :) LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
The CI seem to be stuck, probably for some time already 😞, we should take a look. |
@bwplotka do you have perms to nuke this and start it again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about change, but does not look significant, lgtm
} | ||
time.Sleep(c.options.scaleTimeout) // Give some time for all replicas before they receive hundreds req/s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this was moved?
Unfortunately I don't have perms to this - also it looks like there is no link to drone CI so connection did not even start. Maybe @squat has some ideas? If not we should spend 30m to move to githubActions quickly. Otherwsie we would merge this PR without any CI. |
I think that the hosted version of Drone CI hasn't been working so well ever since it was acquired by Harness or it has been entirely decommissioned :/ you can't even find https://cloud.drone.io on google anymore. I just disabled and re-enabled the project in Drone and now webhooks are working again, but CI still doesn't run. I think it's time to finally move this project onto GitHub Actions. And also rename master -> main. This repo needs a bit of TLC |
I opened #89 to address the issue where non-ready replicas were being populated in the hashring configuration. My approach is similar to the one taken in this PR, however I filter only for |
Unless there are objections, I think we should close this in favor of #89, I believe the approach to change hashring only when scaling the stateful set to be preferable here. |
This PR adds a new flag 'allow-only-ready-replicas' for hashring to contain Thanos Receive replicas which are in the 'Ready' status.
Under this flag, this PR includes fixes for :
Signed-off-by: SriKrishna Paparaju paparaju@gmail.com