Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: idea to provide robot nodes without robot credentials #603

Closed
wants to merge 1 commit into from

Conversation

pmdroid
Copy link

@pmdroid pmdroid commented Jan 10, 2024

This is not a complete implementation just a MVP and idea.

With the mock server a customer that does not want the cluster to know his robots credentials can use the cloud controller.
All needed information for the cloud controller are already assigned to the node when its started.

If you consider this or a similar idea please let me know i am easily can adjust the implementation and add docs for this use case.

@pmdroid pmdroid requested a review from a team as a code owner January 10, 2024 16:11
@apricote
Copy link
Member

Hey @pmdroid, thank you for your proposal!

You are right that with the labels you provided, the Node can be succesfully initialized without talking the the Robot API. There are three things that currently require ongoing access to the Robot API:

  • Node Shutdown status (your PR returns false -> Running) this can be used to automatically reschedule Pods to other Nodes
  • Node Exists (your PR returns true -> Exists) this is used to automatically delete Nodes when they are deleted in the Cloud Provider API, probably does not happen automatically anyway for Robot servers
  • For reconciling Load Balancer targets we need to know which Robot servers exists and what IPs they have.

The info that is provided through labels in your proposal would be used to initialize the Node and set some fields on it.
The cloud-provider initialization process looks like this:

  1. Operator sets --cloud-provider=external on the Kubelet
  2. When kubelet registers the Node object with the control-plane it includes a taint node.cloudprovider.kubernetes.io/uninitialized: "NoSchedule"
  3. k/cloud-provider (used in HCCM) sees the "new" (tainted) Node
  4. It asks our code for the metadata, we try to match the Node against Cloud/Robot APIs and if we find a match return the Metadata
  5. k/cloud-provider verifies that the response is plausible (ie. does not conflict existing status.addresses)
  6. k/cloud-provider patches the Node object as follows:
 metadata:
   annotations:
     # Stable Annotations
+    node.kubernetes.io/instance-type: {{ .Metadata.InstanceType }}
+    topology.kubernetes.io/region: {{ .Metadata.Region }}
+    topology.kubernetes.io/zone: {{ .Metadata.Zone }}
     # Beta Annotations
+    beta.kubernetes.io/instance-type: {{ .Metadata.InstanceType }}
+    failure-domain.beta.kubernetes.io/region: {{ .Metadata.Region }}
+    failure-domain.beta.kubernetes.io/zone: {{ .Metadata.Zone }}
 spec:
+  providerID: {{ .Metadata.ProviderID }}
   taints:
-    - Key: "node.cloudprovider.kubernetes.io/uninitialized"
-      Effect: "NoSchedule"
-      Value: "true"
 status:
+  addresses: {{ .Metadata.Addresses }}

Instance (cloud/instances.go)

Instead of moving this data through HCCM, what do you think about making the changes to the Node object directly? The taint could be removed by just not setting --cloud-provider=external in Kubelet.

If no Robot Credentials are supplied, InstanceExists and InstanceShutdown both return errors for the node, which would be logged but no other action would be taken.

Load Balancer targets (internal/hcops.ReconcileHCLBTargets)

This leaves the issue of this API call:

	if l.Cfg.Robot.Enabled {
		dedicatedServers, err := l.RobotClient.ServerGetList()
		if err != nil {
			return changed, fmt.Errorf("%s: failed to get list of dedicated servers: %w", op, err)
		}

		for _, s := range dedicatedServers {
			robotIPsToIDs[s.ServerIP] = s.ServerNumber
			robotIDToIPv4[s.ServerNumber] = s.ServerIP
		}
	}

At a quick glance I believe we can replace this by parsing the required information from the Kubernetes Nodes (.status.addresses, .spec.providerID).

@apricote
Copy link
Member

While talking to someone from the Robot Team I learned that you can also configure a special "Webservice User" which only has access to the Robot Webservice, not your full Hetzner Account. Perhaps this is also a good alternative for you. (See #608)

@pmdroid
Copy link
Author

pmdroid commented Jan 16, 2024

hey @apricote, thanks for the answer!

I implement this to prevent the HCCM from removing a node from the cluster that is ready just not managed by the cloud provider. This happens because the HCCM cannot find any Node that matches the name.

It might be nice to have something similar to this that makes it possble to use the Cloud Nodes and other Nodes together but keep the other Nodes away from the Cloud Provider maybe with a Flag "instance.hetzner.cloud/self-managed"?

Thanks for the information about the "Webservice User" but even this is still to verbose when i comes to the permission, but the same applies also to the Cloud Token. It would be nice if its possible to create token that are limited to function, for example to read metadata and create and delete nodes (autoscaling).

Copy link
Contributor

This PR has been marked as stale because it has not had recent activity. The bot will close the PR if no further action occurs.

@github-actions github-actions bot added the stale label Apr 16, 2024
@github-actions github-actions bot closed this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants