Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better output from kops rolling-update cluster command #14122

Open
UncleEricB opened this issue Aug 12, 2022 · 8 comments
Open

Better output from kops rolling-update cluster command #14122

UncleEricB opened this issue Aug 12, 2022 · 8 comments
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature.

Comments

@UncleEricB
Copy link

/kind feature

1. Describe IN DETAIL the feature/behavior/change you would like to see.
There are multiple reasons a k8s node can be in NeedsUpdate state. I want a more focused explanation of the trigger for nodes in an InstanceGroup being in NeedsUpdate state when kops rolling-update cluster is run, possibly at a verbosity around 4.

The reason for this request is that there are multiple (four) triggers for a node being in a NeedsUpdate state. That documentation doesn't clearly state how to check those possible causes. I guess "The instance was created with a specification that is older" refers to Launch Template versions? Maybe "The instance was detached" refers to a cordon Taint?

This will speed up debugging and improve uptime. It will also expand the pool of SREs capable of debugging as not everyone has the same level of kOps/k8s expertise.

2. Feel free to provide a design supporting your feature request.
Preferred Output
$ kops rolling-update cluster cactus-1-23.k8s.sproutsocial.com --state s3://infra-kops-state -v4 ~/sandbox/sprout_development_env/NeedsUpdateChecker
I0812 11:52:07.404391 4005 factory.go:68] state store s3://infra-kops-state
...snip...
I0812 11:52:10.825012 4005 aws_cloud.go:1551] Querying EC2 for all valid zones in region "us-east-1"
I0812 11:52:10.826233 4005 request_logger.go:45] AWS request: ec2/DescribeAvailabilityZones
I0812 11:52:11.322863 4005 aws_cloud.go:629] Listing all Autoscaling groups matching cluster tags
I0812 11:52:11.324043 4005 request_logger.go:45] AWS request: autoscaling/DescribeTags
I0812 11:52:11.841028 4005 request_logger.go:45] AWS request: autoscaling/DescribeAutoScalingGroups
I0812 11:52:12.022521 4005 aws_cloud.go:743] Launch Template Version Specified By ASG: $Latest
I0812 11:52:12.023747 4005 request_logger.go:45] AWS request: ec2/DescribeLaunchTemplates
I0812 11:52:12.141730 4005 aws_cloud.go:762] Launch Template Version used for compare: "3"
I0812 11:52:12.141732 4005 aws_cloud.go:764] InstanceGroup nodes-us-east-1a nodes Launch Template are behind!
I0812 11:52:14.051511 4005 aws_cloud.go:743] Launch Template Version Specified By ASG: $Latest
I0812 11:52:14.051654 4005 request_logger.go:45] AWS request: ec2/DescribeLaunchTemplates
I0812 11:52:14.178106 4005 aws_cloud.go:762] Launch Template Version used for compare: "4"
I0812 11:52:14.178108 4005 aws_cloud.go:765] InstanceGroup nodes-us-east-1b nodes have a Cordon Taint!
I0812 11:52:14.532158 4005 aws_cloud.go:743] Launch Template Version Specified By ASG: $Latest
I0812 11:52:14.532365 4005 request_logger.go:45] AWS request: ec2/DescribeLaunchTemplates
I0812 11:52:14.647179 4005 aws_cloud.go:762] Launch Template Version used for compare: "4"
I0812 11:52:14.647181 4005 aws_cloud.go:766] InstanceGroup nodes-us-east-1d nodes have needs-update annotation
...snip...

--or even--
NAME STATUS NEEDUPDATE READY MIN TARGET MAX NODES REASON
master-us-east-1a Ready 0 1 1 1 1 1
master-us-east-1b Ready 0 1 1 1 1 1
master-us-east-1d Ready 0 1 1 1 1 1
nodes-us-east-1a NeedsUpdate 2 0 2 2 2 2 Launch Template version
nodes-us-east-1b NeedsUpdate 2 0 2 2 2 2 Cordon Taint
nodes-us-east-1d NeedsUpdate 2 0 2 2 2 2 kops.k8s.io/needs-update

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 12, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 10, 2022
@olemarkus
Copy link
Member

I think these are good suggestions, but probably hard to prioritise for most of the maintainers. It should however be low-hanging fruit for new contributors.

@olemarkus olemarkus added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. labels Nov 20, 2022
@johngmyers
Copy link
Member

The places that need this logging:

func (group *CloudInstanceGroup) AdjustNeedUpdate() {

func getCloudGroups(c GCECloud, cluster *kops.Cluster, instancegroups []*kops.InstanceGroup, warnUnmatched bool, nodes []v1.Node) (map[string]*cloudinstances.CloudInstanceGroup, error) {

func awsBuildCloudInstanceGroup(c AWSCloud, cluster *kops.Cluster, ig *kops.InstanceGroup, g *autoscaling.Group, nodeMap map[string]*v1.Node) (*cloudinstances.CloudInstanceGroup, error) {

and any place that assigns the value CloudInstanceStatusNeedsUpdate

@ShivamTyagi12345
Copy link
Member

ShivamTyagi12345 commented Dec 3, 2022

/assign

I would be taking this issue @olemarkus

@olemarkus
Copy link
Member

Thanks for that.

I suggest writing user-facing text directly to stdout and not go through klog. The remaining klog lines could go through -v2.

@ShivamTyagi12345
Copy link
Member

@olemarkus I have difficulty understanding what needs to be done in order to complete this task. Can you please break it down into steps

@olemarkus
Copy link
Member

The information that users should read should just be outputted with fmt.Printf(). The things that are less useful should use e.g klog.V(2).Infof().

@vaibhav2107
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants