Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add logs when pod failed to start when using WaitUntilPodAvailableE #1252

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lahabana
Copy link
Contributor

@lahabana lahabana commented Mar 2, 2023

Description

Fixes #1251.

Here's an example of this change in kumahq/kuma:

   2023-03-02T12:32:27+01:00 retry.go:91: Wait for pod kuma-control-plane-5948bcb646-7wwn7 to be provisioned.
   2023-03-02T12:32:27+01:00 client.go:42: Configuring Kubernetes client using config file /Users/charly.molter@konghq.com/.kube/kind-kuma-1-config with context
   2023-03-02T12:32:27+01:00 retry.go:104: Wait for pod kuma-control-plane-5948bcb646-7wwn7 to be provisioned. returned an error: Pod kuma-control-plane-5948bcb646-7wwn7 is not available, reason: , message: , phase: Pending, conditions: [{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason:ContainersNotReady Message:containers with unready status: [control-plane]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason:ContainersNotReady Message:containers with unready status: [control-plane]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason: Message:}]. Sleeping for 3s and will try again.
   2023-03-02T12:32:30+01:00 client.go:42: Configuring Kubernetes client using config file /Users/charly.molter@konghq.com/.kube/kind-kuma-1-config with context
   2023-03-02T12:32:30+01:00 retry.go:104: Wait for pod kuma-control-plane-5948bcb646-7wwn7 to be provisioned. returned an error: Pod kuma-control-plane-5948bcb646-7wwn7 is not available, reason: , message: , phase: Running, conditions: [{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason:ContainersNotReady Message:containers with unready status: [control-plane]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason:ContainersNotReady Message:containers with unready status: [control-plane]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason: Message:}]. Sleeping for 3s and will try again.
   2023-03-02T12:32:33+01:00 client.go:42: Configuring Kubernetes client using config file /Users/charly.molter@konghq.com/.kube/kind-kuma-1-config with context
   2023-03-02T12:32:33+01:00 retry.go:104: Wait for pod kuma-control-plane-5948bcb646-7wwn7 to be provisioned. returned an error: Pod kuma-control-plane-5948bcb646-7wwn7 is not available, reason: , message: , phase: Running, conditions: [{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason:ContainersNotReady Message:containers with unready status: [control-plane]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason:ContainersNotReady Message:containers with unready status: [control-plane]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason: Message:}]. Sleeping for 3s and will try again.
   2023-03-02T12:32:36+01:00 client.go:42: Configuring Kubernetes client using config file /Users/charly.molter@konghq.com/.kube/kind-kuma-1-config with context
   2023-03-02T12:32:36+01:00 retry.go:104: Wait for pod kuma-control-plane-5948bcb646-7wwn7 to be provisioned. returned an error: Pod kuma-control-plane-5948bcb646-7wwn7 is not available, reason: , message: , phase: Running, conditions: [{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason:ContainersNotReady Message:containers with unready status: [control-plane]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason:ContainersNotReady Message:containers with unready status: [control-plane]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason: Message:}]. Sleeping for 3s and will try again.
   2023-03-02T12:32:39+01:00 client.go:42: Configuring Kubernetes client using config file /Users/charly.molter@konghq.com/.kube/kind-kuma-1-config with context
   2023-03-02T12:32:39+01:00 retry.go:104: Wait for pod kuma-control-plane-5948bcb646-7wwn7 to be provisioned. returned an error: Pod kuma-control-plane-5948bcb646-7wwn7 is not available, reason: , message: , phase: Running, conditions: [{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason:ContainersNotReady Message:containers with unready status: [control-plane]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason:ContainersNotReady Message:containers with unready status: [control-plane]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason: Message:}]. Sleeping for 3s and will try again.
   2023-03-02T12:32:42+01:00 client.go:42: Configuring Kubernetes client using config file /Users/charly.molter@konghq.com/.kube/kind-kuma-1-config with context
   2023-03-02T12:32:42+01:00 retry.go:104: Wait for pod kuma-control-plane-5948bcb646-7wwn7 to be provisioned. returned an error: Pod kuma-control-plane-5948bcb646-7wwn7 is not available, reason: , message: , phase: Running, conditions: [{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason:ContainersNotReady Message:containers with unready status: [control-plane]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason:ContainersNotReady Message:containers with unready status: [control-plane]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2023-03-02 12:32:27 +0100 CET Reason: Message:}]. Sleeping for 3s and will try again.
   2023-03-02T12:32:45+01:00 pod.go:133: Timedout waiting for Pod to be provisioned 'Wait for pod kuma-control-plane-5948bcb646-7wwn7 to be provisioned.' unsuccessful after 5 retries
   2023-03-02T12:32:45+01:00 logger.go:66: Running command kubectl with args [--kubeconfig /Users/charly.molter@konghq.com/.kube/kind-kuma-1-config --namespace kuma-system logs kuma-control-plane-5948bcb646-7wwn7]
   2023-03-02T12:32:45+01:00 logger.go:66: 2023-03-02T11:32:36.281Z	INFO	plugin.runtime.gateway	registered gateway plugin
   2023-03-02T12:32:45+01:00 logger.go:66: I0302 11:32:37.322522       1 request.go:690] Waited for 1.032916s due to client-side throttling, not priority and fairness, request: GET:https://10.43.0.1:443/apis/admissionregistration.k8s.io/v1beta1?timeout=32s
   2023-03-02T12:32:45+01:00 logger.go:66: 2023-03-02T11:32:38.378Z	INFO	plugin.runtime.k8s	[WARNING] Experimental GatewayAPI feature is enabled, but CRDs are not registered. Disabling support
   2023-03-02T12:32:45+01:00 logger.go:66: 2023-03-02T11:32:38.383Z	INFO	controller-runtime.webhook	Registering webhook	{"path": "/validate-kuma-io-v1alpha1"}
   2023-03-02T12:32:45+01:00 logger.go:66: 2023-03-02T11:32:38.383Z	INFO	controller-runtime.webhook	Registering webhook	{"path": "/validate-v1-service"}
   2023-03-02T12:32:45+01:00 logger.go:66: 2023-03-02T11:32:38.383Z	INFO	controller-runtime.webhook	Registering webhook	{"path": "/validate-v1-secret"}

TODOs

Read the Gruntwork contribution guidelines.

  • Update the docs. -- N/A
  • Run the relevant tests successfully, including pre-commit checks.
  • Ensure any 3rd party code adheres with our license policy or delete this line if its not applicable.
  • Include release notes. If this PR is backward incompatible, include a migration guide.

Release Notes (draft)

Added more information when pod fail to start

Migration Guide

@lahabana
Copy link
Contributor Author

lahabana commented Mar 7, 2023

@denis256 we'd love to be able to use this change in kumahq/kuma any chance to get a review?

@@ -68,7 +68,7 @@ type PodNotAvailable struct {

// Error is a simple function to return a formatted error message as a string
func (err PodNotAvailable) Error() string {
return fmt.Sprintf("Pod %s is not available, reason: %s, message: %s", err.pod.Name, err.pod.Status.Reason, err.pod.Status.Message)
return fmt.Sprintf("Pod %s is not available, reason: %s, message: %s, phase: %s, conditions: %+v", err.pod.Name, err.pod.Status.Reason, err.pod.Status.Message, err.pod.Status.Phase, err.pod.Status.Conditions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if can be added tests to track generated errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean Unit Tests or integration tests? Unit tests for sure I can add some.
For integration tests I'll need a bit of hand holding (at least pointers to similar tests) as I don't seem to get how terratest is e2e tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added a bunch of tests. I've only reworked things a little to make less verbose and able to detect more complex failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denis256 any input here?

Fix gruntwork-io#1251

Signed-off-by: Charly Molter <charly.molter@konghq.com>
return err
}

func TestUnknownImage(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For K8S tests in the test directory, naming convention for functions is TestKubernetes*, with current names - tests functions weren't executed in CICD

Examples:
https://github.com/gruntwork-io/terratest/blob/master/test/kubernetes_basic_example_test.go#L25

https://github.com/gruntwork-io/terratest/blob/master/test/kubernetes_basic_example_service_check_test.go#L28

break
}
logger.Logf(t, "Wait %s before checking if pod %s is still available", sleepBetweenRetries, podName)
time.Sleep(sleepBetweenRetries)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic looks to be for a quite specific case: pod becomes ready and after a while changes status to not ready - not sure if should be included in PR for adding additional logs when pod fail to start

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you I can either do this as a separate issue or update PR title. The reason why this was added here is that in the case of a failing startup the pod may become ready and very quickly switch to not ready.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should go as a separate feature

@denis256
Copy link
Member

Noticed that some of added Kubernetes are failing

=== CONT  TestKubernetesUnknownImage
    kubernetes_failed_pod_test.go:83: 
        	Error Trace:	kubernetes_failed_pod_test.go:83
        	Error:      	"Pod test-app-65f4c8b78c-jzsdj is not available, phase:Pending, failingConditions:['Ready:ContainersNotReady->containers with unready status: [app]','ContainersReady:ContainersNotReady->containers with unready status: [app]'], nonRunningContainerState:['waiting(app):ErrImagePull->rpc error: code = Unknown desc = Error response from daemon: pull access denied for not-an-place/not-an-app, repository does not exist or may require 'docker login': denied: requested access to the resource is denied']" does not contain "failed to pull and unpack image \"docker.io/not-an-place/not-an-app:latest\""
        	Test:       	TestKubernetesUnknownImage
--- FAIL: TestKubernetesUnknownImage (7.05s)

TestKubernetesStopsAfter5s 2023-03-24T17:39:49Z logger.go:66: deployment.apps "test-app" deleted
TestKubernetesStopsAfter5s 2023-03-24T17:39:49Z client.go:42: Configuring Kubernetes client using config file /home/circleci/.kube/config with context 
=== CONT  TestKubernetesStopsAfter5s
    kubernetes_failed_pod_test.go:117: 
        	Error Trace:	kubernetes_failed_pod_test.go:117
        	Error:      	"Pod test-app-786fcc5c44-cptfz is not available, phase:Pending, failingConditions:['Ready:ContainersNotReady->containers with unready status: [app]','ContainersReady:ContainersNotReady->containers with unready status: [app]'], nonRunningContainerState:['waiting(app):ContainerCreating->']" does not contain "terminated(app):Completed"
        	Test:       	TestKubernetesStopsAfter5s
--- FAIL: TestKubernetesStopsAfter5s (8.74s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Show pod logs when pod never become available
2 participants