Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e test for KafkaTopic webhook #1031

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

mihaialexandrescu
Copy link
Contributor

@mihaialexandrescu mihaialexandrescu commented Aug 3, 2023

Description

This PR aims to provide an e2e test for the KafkaTopic validating webhook (for create and update operations).

I chose to implement this by using the --dry-run=server mode of kubectl as much as possible because that triggers all the api-server-side logic but without persisting the objects to storage (etcd) which is in exactly what we want in most cases during such tests. This way we have fewer objects to clean up and account/wait for.
This mean I had to provide a local shortened replica of Terratest's KubectlApplyFromStringE because the function chain involved in the regular call to our applyK8sResourceFromTemplate() does not allow passing ...args soon enough to be able to inject the --dry-run=server I wanted.
The chain in question is : applyK8sResourceFromTemplate -> applyK8sResourceManifestFromString -> now moving into terratest's k8s pkg -> KubectlApplyFromStringE -> KubectlApplyE (inserts "apply -f") -> RunKubectlE -> RunKubectlAndGetOutputE -> shell.RunCommandAndGetOutputE.

ℹ️ Update operation tests look similar to Create ones but they are not identical neither in intent nor in output (there are some cases and errors specific only to Update). That is why I did not group them further in some common function.

ℹ️ The reason for assertions like Expect(len(strings.Split(output, "\n"))).To(Equal(1)) is that I intend to cause particular errors and test cases so I also want to match that little else happens outside of the targeted validation error.

⚠️ This PR will be followed by some refactoring after #1020 because that PR, via it's kubeconfig injection logic, will inadvertently fix an issue we have today which makes functions like requireDeployKafkaTopic() unusable outside of their initial context.
The core issue at play here is that the pattern present in many tests like this one is incorrect.

	return When("Internally produce and consume message to/from Kafka cluster", func() {
		var kubectlOptions k8s.KubectlOptions
		var err error

		It("Acquiring K8s config and context", func() {
			kubectlOptions, err = kubectlOptionsForCurrentContext()
			Expect(err).NotTo(HaveOccurred())
		})

		kubectlOptions.Namespace = koperatorLocalHelmDescriptor.Namespace

		requireDeployingKcatPod(kubectlOptions, kcatPodName, "")
		requireDeployingKafkaTopic(kubectlOptions, testInternalTopicName)

Ginkgo runs in phases which means that at runtime, the code above doesn't quite execute in the order that it is written.
Doc reference : https://onsi.github.io/ginkgo/#mental-model-how-ginkgo-traverses-the-spec-hierarchy
To be precise :

When Ginkgo runs a suite it does so in two phases. The Tree Construction Phase followed by the Run Phase.

During the Tree Construction Phase Ginkgo enters all container nodes by invoking their closures to construct the spec tree. During this phase Ginkgo is capturing and saving off the various setup and subject node closures it encounters in the tree without running them. Only container node closures run during this phase and Ginkgo does not expect to encounter any assertions as no specs are running yet.
...
Once the spec tree is constructed Ginkgo walks the tree to generate a flattened list of specs.
...
During the Run Phase Ginkgo runs through each spec in the spec list sequentially. When running a spec Ginkgo invokes the setup and subject nodes closures in the correct order and tracks any failed assertions. Note that container node closures are never invoked during the run phase.

What this means is that, in that current pattern, the kubectlOptions that is passed (by copy) to those functions is empty because the copying is done during the Tree Construction Phase (in a When) but the useful value is populated in kubectlOptions much later during the Run Phase (in an It). This can be easily proven with debug/print statements.

For this issue, during this PR, I only provided a temporary local fix so that the new tests can run and reuse the necessary functions. To some extent I'm asking you to overlook that part during review. It will get refactored after #1020.

ℹ️ I indicated some of my refactoring plans with comments including about moving some functions to other files. The reason for keeping as much as possible within a single file during this PR (but not forever!) is that #1020 is large and complicated and in case my PR gets merged first (on account of being far less complex), it will be significantly simpler to resolve conflicts or rebase #1020 if my PR mainly adds one noteworthy file (that I will break up later as indicated in the comments).

Type of Change

  • Other (please describe): e2e test

Checklist

  • I have read the contributing guidelines
  • I have verified this change is not present in other open pull requests
  • All code style checks pass

@mihaialexandrescu mihaialexandrescu marked this pull request as ready for review August 3, 2023 11:57
@mihaialexandrescu mihaialexandrescu requested a review from a team as a code owner August 3, 2023 11:57
)

// TODO(mihalexa): move to k8s.go
// applyK8sResourceFromTemplateWithDryRun is copy of applyK8sResourceFromTemplate which calls a "--dry-run=<strategy>" kubectl command
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could be solved with just adding an option for dry-run to the original command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such an option would need to become an input parameter and modifying the initial function was overkill, in my mind, as it's plenty good the way it is and our use of it doesn't normally (aka 90+% of the time) need it to execute with --dry-run in mind.
Outside of what I deemed to be an optimized workflow for webhook testing (aka making use of the dry-run option), in our e2e tests we always want to fully create objects (aka have them written to etcd) because that's in the nature of the tests: we create objects and we check their functionality.

The initial function is also part of a chain - it is not really standalone. Modifying its signature means modifying the signatures of all functions that relate to it in any way.

I preferred to add it as an optional/alternative function.

I could be convinced of the opposite but right now I don't feel like it is a 50-50 choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I get that but at he same time I'd like to avoid major code duplication if we can, and that in my head is more important than what you've written here but I won't hold up the PR on this if everyone else is fine with this reasoning, it's a personal preference thing for me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a difficult topic.
I definitely wouldn't add a dryRunStrategy string parameter to the original function.

But maybe we should pay the one time cost and add an extraArgs map[string][]string parameter to that one which is passed down to the TerraTest call through the other helpers.
I know it's a big refactor and I can live with postponing it, but that looks like the desirable ideal solution to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chain in question is : applyK8sResourceFromTemplate -> applyK8sResourceManifestFromString -> now moving into terratest's k8s pkg -> KubectlApplyFromStringE -> KubectlApplyE (inserts "apply -f") -> RunKubectlE -> RunKubectlAndGetOutputE -> shell.RunCommandAndGetOutputE.

Part of the issue with this chain is that if we want to use terratest's KubectlApply<...> functions (link here), they don't have ...args in their signature at all - that's the root of the problem and otherwise it would've probably been an easy refactor. The first point in the chain where we get ...args is RunKubectlE.
Examples of function signatures relevant to us in the current implementation (also check comments I added at the end of some lines):

func KubectlApplyFromStringE(t testing.TestingT, options *KubectlOptions, configData string) error { 
  ... 
  return KubectlApplyE(t, options, tmpfile)
}
func KubectlApplyE(t testing.TestingT, options *KubectlOptions, configPath string) error {
	return RunKubectlE(t, options, "apply", "-f", configPath)    <<<<< first args...  ;  this is where "apply -f" gets added
}
func RunKubectlE(t testing.TestingT, options *KubectlOptions, args ...string) error {    <<<<< first args...
	_, err := RunKubectlAndGetOutputE(t, options, args...)
	return err
}

It is from the implementations behind that chain of functions that I pieced together a direct path to my goal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to take Marton's suggestion from #1031 (comment) (which I am guessing was meant for this thread) and refactor applyK8sResourceManifestFromString (and upwards one level in our own functions) to approximate the KubectlApply... from terratest instead of using it directly ? (and in the process eliminate the need for a new function for dry-run ?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to take Marton's suggestion from #1031 (comment) (which I am guessing was meant for this thread) and refactor applyK8sResourceManifestFromString (and upwards one level in our own functions) to approximate the KubectlApply... from terratest instead of using it directly ? (and in the process eliminate the need for a new function for dry-run ?)

Yeah, I think that was kind of my point as well, I meant it somewhat that way, I see why my thought was simpler but not working, but the end result seems similar to me.

Comment on lines 78 to 42
requireCreatingKafkaCluster(kubectlOptions, "../../config/samples/simplekafkacluster.yaml")
testWebhookCreateKafkaTopic(kubectlOptions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation of e2e tests shouldn't this be done by this point? Not that these should cause problems in that case, it just might make stuff slower.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure about what you meant in the question. I will try to answer what I understood from it but please follow up if I didn't get it.

I favored having the webhook testing not rely on other previous tests and the flexibility this gives us.

There a few things I considered :

  1. our resources have different prerequisites (e.g. KafkaTopic requires that the indicated KafkaCluster exists) and we need a clear place to aggregate those
    • this PR also aimed to provide a structure where other webhook e2e tests (and their prerequisites) could easily slot in ;
      • e.g. kafkatopics may only require a/any kafkacluster but other things may require more intricate or specific setups
  2. our KafkaCluster related testing may not always be done on the absolutely most minimal setting possible while for the purposes of KafkaTopic webhook testing we don't care about the kafkacluster resource (so minimum is best) but there is no way to go around not having the KafkaCluster.
    • this structure allowed me to decouple what kafkaclusters we test when we test kafkaclusters from what kafkaclusters we use when we test webhooks

Copy link
Contributor

@Kuvesz Kuvesz Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm fine with this reasoning, though I'd like to see how this looks when Marton's PR is merged.

Update: Checked back on that and I'd suggest modifying this to only really deal with the webhook tests themselves, not with installing a kafka clsuter and such as any real test case we might create which includes the webhook tests will include installing a kafka cluster and such, like it does now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think either we have to do everything from deploying components here or only do the webhooks and let the rest be built up at somewhere else, we are in the middle here which is not good because ensuring the kafka cluster is not the webhook test's responsibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can create an explicit or implicit dependency on the KafkaCluster CR name, remove the creation and cleanup steps and reorder the function calls in the suite file so that webhook testing comes right after one of the KafkaCluster creations we do.

One worry I voiced is about how such code scales for other webhooks and their CRs and their prerequisites and dependencies.
For example, explicit dependency injection (via function parameters like a KafkaClusterName) might mean we would keeeeep adding params to function definitions and that's not something we want either.
Or it could be "implicit" and we just "assume" certain constants are used for names - which we already do "here and there".

I'll explain my initial stance: I wanted to separate the setup steps for each webhook because I worried the prerequisites between webhooks could vary enough to make another type of implementation very messy. That's why the sequence of I am/was proposing was : testWebhooks() -> testWebhookKafkaTopic(kubectlOptions) -> -> testCreate + testUpdate -> teardown.

I can be convinced to, for instance, not have the setup phase as part of testWebhookKafkaTopic().
Later edit: as in to not have it at all and depend on resources installed in the main test suite function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can create an explicit or implicit dependency on the KafkaCluster CR name, remove the creation and cleanup steps and reorder the function calls in the suite file so that webhook testing comes right after one of the KafkaCluster creations we do.

Explicit rather, but otherwise this was my thought, yes.

I can be convinced to, for instance, not have the setup phase as part of testWebhookKafkaTopic().

Yeah with this naming IMO the setup/teardown of the components or clusters shouldn't be part of the function. Either we should name it testKafkaClusterCreateAndWebhookKafkaTopic() or we need to make something else do the setup.

tests/e2e/kafkatopic_webhook.go Outdated Show resolved Hide resolved
Kuvesz
Kuvesz previously approved these changes Aug 9, 2023
Copy link
Member

@panyuenlau panyuenlau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a couple of typos and left a question for my own understanding, no major issues. Good job!

})
}

func testWebhookKafkTopic(kubectlOptions k8s.KubectlOptions) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Suggested change
func testWebhookKafkTopic(kubectlOptions k8s.KubectlOptions) {
func testWebhookKafkaTopic(kubectlOptions k8s.KubectlOptions) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this. I'll have to update it in the function definition and at the call site.

Done in 10b2248.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are using VSCode/VSCodium the extension CodeSpellChecker (streetsidesoftware.code-spell-checker) can catch this in editing time.

tests/e2e/kafkatopic_webhook.go Outdated Show resolved Hide resolved
tests/e2e/koperator_suite_test.go Outdated Show resolved Hide resolved
Kuvesz
Kuvesz previously approved these changes Aug 10, 2023
panyuenlau
panyuenlau previously approved these changes Aug 10, 2023
Copy link
Member

@pregnor pregnor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, couple comments, overall looks really good, thanks.

@@ -64,6 +64,7 @@ var _ = When("Testing e2e test altogether", Ordered, func() {
testInstallKafkaCluster("../../config/samples/simplekafkacluster_ssl.yaml")
testProduceConsumeInternalSSL(defaultTLSSecretName)
testUninstallKafkaCluster()
testWebhooks()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Shouldn't this be at line 62? This sounds like something which is both easier to test and required sooner than what comes from line 62.
I know you discussed somewhat this with Dávid, I'm coming from a slightly different angle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current implementation where webhook prerequisites are decoupled, the logical trouble I saw was that webhook testing would end up doing the install testing for the big CRs before those dedicated tests happen.

I'll give the example we have regarding KafkaTopic : in order to test anything about KafkaTopic we need a KafkaCluster to be up which means we would end up testing KafkaCluster installation (without meaning to) before the dedicated KafkaCluster installation and functional tests.

I saw that as not good so I dedided to place webhook testing towards the end as a category of its own which can then rely on dedicated install and functional tests working (and having been addressed in their own sections beforehand).

I can be convinced otherwise, especially within the context of this conversation : #1031 (comment) .

tests/e2e/kafkatopic_webhook.go Outdated Show resolved Hide resolved
)

// TODO(mihalexa): move to k8s.go
// applyK8sResourceFromTemplateWithDryRun is copy of applyK8sResourceFromTemplate which calls a "--dry-run=<strategy>" kubectl command
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a difficult topic.
I definitely wouldn't add a dryRunStrategy string parameter to the original function.

But maybe we should pay the one time cost and add an extraArgs map[string][]string parameter to that one which is passed down to the TerraTest call through the other helpers.
I know it's a big refactor and I can live with postponing it, but that looks like the desirable ideal solution to me.

Comment on lines 78 to 42
requireCreatingKafkaCluster(kubectlOptions, "../../config/samples/simplekafkacluster.yaml")
testWebhookCreateKafkaTopic(kubectlOptions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think either we have to do everything from deploying components here or only do the webhooks and let the rest be built up at somewhere else, we are in the middle here which is not good because ensuring the kafka cluster is not the webhook test's responsibility.

dryRunStrategyServer,
)
Expect(err).To(HaveOccurred())
// Example error: The KafkaTopic "topic-test-internal" is invalid: spec.clusterRef.name: Invalid value: "kafkaNOT": kafkaCluster 'kafkaNOT' in the namespace 'kafka' does not exist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is this really meaningful with line 108 being present? I see it as somewhat of a noise, but could be wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to show the messages in full and then match on noteworthy parts of them.
Also, in future refactors, someone may have a better matching logic idea if they see the "real" and full expected message/output.

Also on the point of consistency: if it's deemed useful for most other situations to have the "example error" comment, then I'd rather see it even in situations where it can look redundant on first glance, at the very least as an example of sticking to a developer "communication" practice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In such cases my practice was to build the message clearly from 1 level of indirection if variables are defined - but mostly no variables for reuse in end to end tests for clarity purposes so string literals are the interface we are using.
I'm biased towards my experience but I can try and convince myself about this approach as well.

@bartam1
Copy link
Contributor

bartam1 commented Aug 16, 2023

LGTM!
I agree with pregnor that it would be better to go deeper and propagate upwards the extraArgs map[string][]string from here:

func applyK8sResourceManifestFromString(kubectlOptions k8s.KubectlOptions, manifest string) error {
. So we would go deeper at this point and use something like this:

       tmpfile, err := StoreConfigToTempFileE(t, configData)
	if err != nil {
		return err
	}
	defer os.Remove(tmpfile)
        
       args := []string{"apply", "-f","tmpfile"}
       args := append(args,extraArgs)


	_, err := k8s.RunKubectlAndGetOutputE(
		GinkgoT(),
		&kubectlOptions,
		args...,
	)
	return err

like here: func deleteK8sResource(
This would not be a big refactor and we would have bigger consistency

@mihaialexandrescu mihaialexandrescu force-pushed the e2etest/webhook_kafkatopic branch 2 times, most recently from 94f0a6e to d76b478 Compare August 22, 2023 08:06
applyK8sResourceManifest(kubectlOptions, tempPath)
err = applyK8sResourceManifest(kubectlOptions, tempPath)
if err != nil {
return errors.WrapIfWithDetails(err, "applying CRD failed", "crd", string(crd))
Copy link
Contributor

@bartam1 bartam1 Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note:
When we use the emperror.dev/errors's WithDetails feature then IMHO the error message will not contain the details because it is stored in a separate structure member variable and not in the error message itself.

I think we should figure out something to solve this problem.
I favor the std Go error handling with the fmt.Errorf but I know the stacktrace will be not there. (it is also stored in a separate struct member variable in emperror case)

I know we use that lot of places in the e2e codebase

tests/e2e/k8s.go Outdated Show resolved Hide resolved
tests/e2e/kafkatopic_webhook.go Outdated Show resolved Hide resolved
@mihaialexandrescu mihaialexandrescu marked this pull request as draft August 25, 2023 11:03
@mihaialexandrescu mihaialexandrescu force-pushed the e2etest/webhook_kafkatopic branch 4 times, most recently from 38e9f64 to e0282d3 Compare August 31, 2023 08:00
@mihaialexandrescu mihaialexandrescu force-pushed the e2etest/webhook_kafkatopic branch 3 times, most recently from b66f9e3 to 274bc65 Compare September 5, 2023 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants