Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intermediate progress reports for asynchronous assertions (Eventually, Consistently, ...) #574

Open
pohly opened this issue Aug 24, 2022 · 3 comments

Comments

@pohly
Copy link

pohly commented Aug 24, 2022

Suppose I have a complex Eventually assertion which can block for a long time. When running a test manually, I have to wait for the timeout before there is any indication of what went wrong. If nothing goes wrong, it's not obvious that the test was stuck in Eventually.

Example:

	gomega.Eventually(func(g gomega.Gomega) {
		var notRegistered []string
		for hostname, plugin := range d.Hosts {
			if !plugin.IsRegistered() {
				notRegistered = append(notRegistered, hostname)
			}
		}
		sort.Strings(notRegistered)
		g.Expect(notRegistered).To(gomega.BeEmpty(), "hosts where the plugin has not been registered yet")
	}).WithTimeout(time.Minute).WithPolling(time.Second).Should(gomega.Succeed())

One possible solution would be to wrap this in ginkgo.By and extend ginkgo.By so that it produces output at regular intervals, similar to the warning about slow tests:

     ginkgo.By("waiting for plugin registration", func() {
           gomega.Eventually...
     })

Possible output:

STEP: waiting for plugin registration <time>
STEP still running: waiting for plugin registration <time>
...

The interval for when to print the "still running" part would have to be configurable, similar to --slow-spec-threshold: --slow-step-threshold?

That may be a useful enhancement by itself, but doesn't address the "why does it not complete" part. For that, gomega itself would have to have a way to report the intermediate failure. For the special case above, a wrapper around the callback can dump failures at regular intervals, but that is not a generic solution.

@pohly
Copy link
Author

pohly commented Aug 24, 2022

See kubernetes/kubernetes#112013 for an implementation outside of ginkgo and gomega.

@onsi
Copy link
Owner

onsi commented Aug 24, 2022

hey @pohly

first off:

If nothing goes wrong, it's not obvious that the test was stuck in Eventually.

sounds like perhaps the timeout is too high? Is this the sort of case where "usually this passes in 3 seconds, but sometimes it takes 50+ seconds"? If so I'd honestly suggest dropping down the timeout (or making it configurable depending on location (e.g. longer on CI, shorter when running locally) or context (e.g. longer if I'm running lots of specs in parallel and straining a target cluster; shorter if I'm just focusing one spec). In general failing faster seems better to me than adding reporting verbosity.

With that said, I can imagine doing what you're asking for like this:

gomega.Eventually(func() {...}).WithTimeout(...).WithPolling(...).CallIfStillFailing(func(failure string) {}, 5*time.Second).Should(...)

You would then need to supply your own

func(failure string) {
    GinkgoWriter.Println(failure)
}

the reason being that Gomega doesn't know about Ginkgo/GinkgoWriter/By - so you would have to provide that bridge code yourself.

I hesitate to add this complexity, though. You could accomplish something similar by adding a GinkgoWriter.Println() in the polled function, though I appreciate it will be hard to get the error message that way.

@pohly
Copy link
Author

pohly commented Aug 24, 2022

sounds like perhaps the timeout is too high?

We have to have very high timeouts in Kubernetes E2E testing because sometimes an operation simply takes longer on a busy cluster. If we make the timeouts too close to what the duration normally is, then we get too many test flakes. I agree that it would be nicer to have better control over the timeouts, but we have too many developers who write tests and not enough reviewers who can check that the tests are written with such issues in mind.

I hesitate to add this complexity, though. You could accomplish something similar by adding a GinkgoWriter.Println() in the polled function, though I appreciate it will be hard to get the error message that way.

Putting it into the called function means writing extra code, which most developers won't do. We need something very simple.

What do you think of the proposal in kubernetes/kubernetes#112013?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants