Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code review comments from @ialidzhikov #5

Open
23 of 29 tasks
ialidzhikov opened this issue Feb 20, 2024 · 0 comments
Open
23 of 29 tasks

Code review comments from @ialidzhikov #5

ialidzhikov opened this issue Feb 20, 2024 · 0 comments

Comments

@ialidzhikov
Copy link
Member

ialidzhikov commented Feb 20, 2024

First, you could address the findings from Oliver's review against the main branch by creating a PR that addresses his comments.


Mid

Minor

  • Drop the .docforge dir and switch to the central manifest. After Use central and new manifest format documentation#431, the repos do no longer needs to define a .docforge dir and the manifests are maintained centrally. See the linked issue for more details. Additionally, as a consequence
    .PHONY: check-docforge
    check-docforge: $(DOCFORGE)
    @$(REPO_ROOT)/hack/gardener-util/check-docforge.sh $(REPO_ROOT) $(REPO_ROOT)/.docforge/manifest.yaml ".docforge/;docs/" $(NAME) false
    has to be dropped. Drop also the make check-docforge target. It should be no longer needed.: Fix make verify #11
  • make check is reporting golangci-lint findings. You could fix them.: Fix make verify #11
  • make format is failing that there is no test/ dir.: Fix make verify #11
  • make generateis not implemented (
    .PHONY: generate
    generate: $(CONTROLLER_GEN) $(GEN_CRD_API_REFERENCE_DOCS) $(HELM) $(YQ)
    echo "Code generation is currently not implemented"
    # @$(REPO_ROOT)/hack/gardener-util/generate.sh ./cmd/... ./pkg/... ./test/...
    # $(MAKE) format
    ): Does the project need code generation at all? If not, let's remove it.
  • ############# base image # TODO: Andrey: P1: Move to distroless
    - +1, let's use distroless instead of alpine. It is also part of the component checklist - the component should not run as a root user, if possible.: Fix make verify #11
  • The pkg/version pkg - we usually don't define such pkg in other repos and rather reuse the k8s.io/component-base/version/verflag pkg. You should be already familiar with it as in Add support for a --version command line flag gardener-extension-runtime-gvisor#38 you used this pkg and eliminated a custom version pkg in the runtime-gvisor extension: Move from GCR to artifact registry #10
  • All source files are missing a REUSE license header: Switch to use REUSE license format  #12
  • testIsolation metricsClientTestIsolation // Provides indirections necessary to isolate the unit during tests
    : Instead of having metricsClientTestIsolation you could directly have a field that is rest.HTTPClient. When you instantiate in non-test code, you pass real client to a constuctor func such as NewMetricsClient(httpClient). When you instantiate in test code, you pass fake/mock client.
  • You could move the copied code from gardener/gardener from ./pkg/util/gardener to /third_party/. Example: gardener/gardener@a1eb2fb: Drop vendoring #13
  • log.V(app.VerbosityInfo).Info("Creating client set")
    if _, err := k8sclient.GetClientSet(appOptions.RestOptions.Kubeconfig); err != nil {
    return &log, nil, nil, fmt.Errorf("create client set: %w", err)
    }
    : A ClientSet is created but then it is not used. What is the rational behind it? I asssume we can drop it.
  • func Wrap(prefixMessage string, err error, varargs ...any) error {
    if err == nil {
    return nil
    }
    return fmt.Errorf(prefixMessage+": %w", append(varargs, err)...)
    }
    : Why we need such utility func and why it is not possible to use fmt.Errorf natively as in every other place in the gardener code-base.

Nits (really, really, really minor)

Questions:

  • Why the repo is private?: The repo has been made public.
  • metricsUrl := fmt.Sprintf("https://%s/metrics", pod.Status.PodIP)
    : What happens when pod.Status.PodIP is empty. According to the doc string of the field, pod.Status.PodIP will be empty if not yet allocated.
    • [andrerun-new]: See the log entry below. It's a bit ugly - a project outsider may have a hard time figuring out what's going on. @ialidzhikov, a pod gets stuck without an IP address every now and then, right? It's not an extremely rare event? If so, I think I should add special handling for this case and log a nicer message. [under-discussion]
      ERROR	gardener-custom-metrics.input.scraper	Kapi metrics retrieval failed	{"op": "scrape", "namespace": "shoot--local--local", "pod": "kube-apiserver-5588c58789-crm72", "error": "metrics client: making http request: Get \"https:///metrics\": http: no Host in request URL"}
      
  • MetricsUrl string // The URL where metrics for the pod can be scraped
    : Why we store the whole MetricsUrl? Instead you could only store the Pod IP and construct the metrics URL when fetching the metrics.
  • // information necessary to scrape such metrics.
    • [andrerun-new]: The design reason is I want to keep the decision 'where to scrape', outside of the scraper. There's also a minor runtime concern - I prefer less object creation/GC churn.

3: Storing the same Pod labels would be a lot waste of memory. I see that you need the Pod labels to allow selecting metrics by object labelSelector. Maybe the whole model has to be adapted. We can for example accept that Pod labels are immutable and store them only once and not for every new metric value. [under-discussion]

  • : IIUC, the benefit of running 2 replicas is only that the 2nd Pod waits in "stand by" mode and on issues with the leader replica, the "stand by" can take over faster. By faster - we don't to wait a new Pod to be scheduled and started. Updating the Endpoint manually to influence the traffic to go to the leader replica looks hacky. We were running metrics-server for Shoots and ManagedSeeds for years with a single replica and I don't recall us having issues related to it. https://github.com/kubernetes-sigs/metrics-server/tree/master?tab=readme-ov-file#high-availability: metrics-server seems to have a real HA mode where 2 of the replicas are serving (?). We can check what they do and how. And I agree with Proposed #3 (comment) - this approach is error-prone a lot.
    • [andrerun]: The main benefit I see in the second replica is that it ties compute resources in another AZ, so it guards against AZ resource shortage disrupting failover. Overall, I have my reservations regarding the need for a second replica, considering the intended use of the component, but that was a hard requirement introduced by the GEP review process. I'll elaborate offline. [under-discussion]
  • I didn't manage to test the component in local setup at all (due to missing docs/instructions) but I wanted to ask how it behaves on restarts and whether the HPA acting on the custom metric is fine with it. I assume on Pod restart the leader will change and the newly elected replica won't report any metrics (or will report 0-ed metrics value). Is HPA able to deal with unavailability of the gardener-custom-metrics component?

Final notes. I didn't deep dive into non-trivial packages like ./pkg/input/metrics_scraper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant