Get a working build for the etcd operator #1

neoaggelos · 2022-02-25T07:57:34Z

Summary

Update dependencies and upgrade to Go 1.17

Logging

RBAC rules

xrl · 2022-03-18T15:51:21Z

I have code for an "out of cluster" config -- it lets me run the operator from my computer as a regular process, respecting my ~/.kube/config. It has made development very straight-forward. Should I open a PR against this branch?

neoaggelos · 2022-03-18T16:36:06Z

I have code for an "out of cluster" config -- it lets me run the operator from my computer as a regular process, respecting my ~/.kube/config. It has made development very straight-forward. Should I open a PR against this branch?

@xrl Feel free, thanks!

xrl · 2022-03-23T16:48:43Z

Turns out the architecture is not compatible with running in local mode, the operator wants to use an etcd client against the replicas. I can't do that easily from an out of cluster process. So I'm just going to forget about it.

xrl · 2022-03-28T01:24:49Z

The helm chart isn't accessible the way it has been configured, github's raw HTTP server won't follow the symlink on latest:

works: https://raw.githubusercontent.com/neoaggelos/etcd-operator/revive/chart/v0.1.0/etcd-operator-0.1.0.tgz
does not work (404): https://raw.githubusercontent.com/neoaggelos/etcd-operator/revive/chart/v0.1.0/etcd-operator-0.1.0.tgz

another issue, the index.yaml lists the 0.1.0 version twice: https://github.com/neoaggelos/etcd-operator/blob/revive/chart/index.yaml#L23 and https://github.com/neoaggelos/etcd-operator/blob/revive/chart/index.yaml#L13 . I bet removing the latest flavor would work. As it is, the helm repo isn't usable.

My hack for testing:

helm repo add etcd-operator https://raw.githubusercontent.com/neoaggelos/etcd-operator/revive/chart/
helm repo update
helm upgrade --install internal etcd-operator/etcd-operator -f values/internal.yaml --kube-context=ents-us-dev --namespace=kv --version=0.1.0 --debug

history.go:56: [debug] getting history for release internal
Error: failed to fetch https://raw.githubusercontent.com/neoaggelos/etcd-operator/revive/chart/latest/etcd-operator-0.1.0.tgz : 404 Not Found
helm.go:84: [debug] failed to fetch https://raw.githubusercontent.com/neoaggelos/etcd-operator/revive/chart/latest/etcd-operator-0.1.0.tgz : 404 Not Found
helm.sh/helm/v3/pkg/getter.(*HTTPGetter).get
[[[ SNIP ]]]

xrl · 2022-03-28T02:33:24Z

I am running this in a rancher cluster with some security constraints in place, I had to add this to the deployment.yaml:

      securityContext:
        runAsUser: 1000
        runAsGroup: 1000

might be good to make this a mountable object in the helm config, something like:

      securityContext:
        {{- toYaml .Values.securityContext | nindent 12 }}

what do you think?

neoaggelos · 2022-03-30T21:46:17Z

@xrl Thank you for the catch.

Fixed the latest issue, the latest symlink is not even required it seems.

Also added the securityContext option under operator.securityContext

I have been keeping this PR open for a while in case issues crop up, but I am considering merging it soon

xrl · 2022-04-11T04:35:55Z

FYI, 3.5.2 is no longer considered "production ready"

neoaggelos · 2022-04-11T17:47:20Z

Thanks for the heads up

xrl · 2022-04-18T15:50:03Z

I have hit a problem with the TLS example, I ran the example gen-cert.sh and then uploaded the certs/keys as kube secrets, the example tls cluster was marked as failed because:

time="2022-04-18T15:46:59Z" level=error msg="cluster failed to setup: stat /tmp: no such file or directory" cluster-name=example cluster-namespace=kv pkg=cluster
time="2022-04-18T15:46:59Z" level=warning msg="fail to handle event: ignore failed cluster (example). Please delete its CR" pkg=controller

I'll work to sort it out now. I wonder why I don't have a tmp folder.

xrl · 2022-04-21T15:42:23Z

I recommend changing pkg/apis/etcd/v1beta3/cluster.go to set DefaultEtcdVersion = "3.2.13" to DefaultEtcdVersion = "3.5.3". 3.5.3 is out now, by the way.

neoaggelos · 2022-04-21T17:10:41Z

Awesome, thanks for the heads up!

Have you had any luck/progress with the TLS issues?

neoaggelos · 2022-04-21T17:12:15Z

I think I will wait for etcd-io/etcd#13948 to be resolved, and skip to 3.5.4 directly.

xrl · 2022-04-23T18:00:43Z

I did get TLS working! I ended up switching the operator image to ubuntu so it has a /tmp folder. My colleague was saying there's a trick to creating a /tmp with permissions inside of a scratch image but I couldn't get it working. Do you think it's important to stay with the scratch image of the operator?

Now the next problem is how to automate the management the CA and generation the peer/server/client certs, but that's out of scope for this repo. Here's the script for creating an appropriate set of secrets for the example/tls/example-tls-cluster.yaml:

kubectl create secret generic etcd-peer-tls --from-file=peer.crt --from-file=peer.key --from-file=peer-ca.crt
kubectl create secret generic etcd-client-tls --from-file=etcd-client.crt --from-file=etcd-client.key --from-file=etcd-client-ca.crt
kubectl create secret generic etcd-server-tls --from-file=server.crt --from-file=server.key --from-file=server-ca.crt

I would submit PRs to you but I have an internal branch which includes some drone-ci tooling, hopefully these breadcrumbs aren't too tiresome.

Also, waiting for 3.5.4 makes sense. A colleague mentioned there was a SRV DNS issue and I didn't know what they were talking about, thanks for the link.

neoaggelos · 2022-04-23T18:50:09Z

The fact that /tmp is implicitly required is intriguing to say the least. I wonder if one can get away with just making /tmp an emptyDir: {} volume on the operator deployment.

neoaggelos · 2022-04-23T19:07:27Z

I went ahead and tried it, seems to be working OK. Would you mind trying it out?

xrl · 2022-04-24T17:02:46Z

Yeah, that does work but I'm not sure it's the deployment's responsibility to fix a flaw in the base image. I think the Dockerfile could be modified to include a tmp file but it would require ADD tmp.tar /tmp or similar with a magic tmp folder, we have to rely on the untar routine preserving permissions from the archive.

xrl · 2022-04-24T17:06:37Z

But I agree, it is curious the code was error about a tmp folder missing. Reading over the code it looks like the TLS config object is built from the secret and kept entirely in memory. I should break my cluster and get that exact error, sorry for the wild goose chase!

neoaggelos · 2022-04-24T18:52:15Z

Yes, indeed. Perhaps an alpine image image could be a good middle ground. This would also give the operator image a basic sh and some busybox utils for simple debugging.

neoaggelos added 22 commits February 25, 2022 09:28

Switch to go modules

18d23b8

Update etcd, kubernetes, prometheus dependencies

ffa5241

Update CRD to v1

f175969

Define allow-all schema

bb0b0a2

Pretty print etcd clusters

bbb708a

Pretty print CRD

59d2e7e

Take the CRD definition out of the operator

e9749be

Update API to v1beta3

fd034d4

Print latest condition with etcdcluster

8dbc438

add phase to printer columns

2e414cb

Remove obsolete Gopkg.toml file

375aee6

Support hostPath volumes

4331623

Set HostPath volume type to DirectoryOrCreate

745991f

Add manifests for deploying mayastor etcd

67aa8fb

Support custom restart policy for etcd pods

587e44f

Add LimitSizeToMaxReadyNodes field

b5471ef

fixup! Add LimitSizeToMaxReadyNodes field

7edf598

fixup! Add LimitSizeToMaxReadyNodes field

4a582d6

Logging

Fix deployment and RBAC yaml files

0d6c1c1

fixup! Add LimitSizeToMaxReadyNodes field

4f99d15

RBAC rules

Update ChangeLog

e47644a

Update example etcd cluster to 3 nodes

8248d3e

neoaggelos force-pushed the revive branch from 1b866d6 to 8248d3e Compare February 27, 2022 11:23

neoaggelos added 2 commits February 27, 2022 15:13

Add a working image to the example deployment

568fe11

Add etcd-operator helm chart

3ddc64b

neoaggelos force-pushed the revive branch from 6077e0a to 3ddc64b Compare March 11, 2022 09:40

neoaggelos added 4 commits March 14, 2022 18:35

Change group to etcd.database.canonical.com

f51090f

Bump version to 0.10.0+git

7ce294c

Add Dockerfile

eeeb733

Update image in etcd-operator helm chart

cc372ea

fix empty pod policy spec breaking operator

61fd2ff

neoaggelos added 2 commits March 31, 2022 00:42

Remove symlink for latest chart

1a3008f

Make securityContext configurable

6325456

neoaggelos added 4 commits March 31, 2022 01:02

Merge branch 'fix-helm' into revive

4108d90

Enable PublishNotReadyAddresses on the etcd peer service

2c19720

Update operator image tag to include latest fixes

216f3fa

Fix helm chart repository URL

10aabff

Fix clusters crashing when pod spec is empty

f242df9

Add a /tmp emptyDir to the etcd operator deployment

29fb1ab

neoaggelos added 3 commits May 11, 2022 16:29

Use etcd 3.5.4 by default

5adfa00

Mention etcd version change in changelog

df04d3e

Add makefile for multi-arch images

39401b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get a working build for the etcd operator #1

Get a working build for the etcd operator #1

neoaggelos commented Feb 25, 2022

xrl commented Mar 18, 2022

neoaggelos commented Mar 18, 2022

xrl commented Mar 23, 2022

xrl commented Mar 28, 2022

xrl commented Mar 28, 2022

neoaggelos commented Mar 30, 2022

xrl commented Apr 11, 2022

neoaggelos commented Apr 11, 2022

xrl commented Apr 18, 2022 •

edited

xrl commented Apr 21, 2022

neoaggelos commented Apr 21, 2022

neoaggelos commented Apr 21, 2022

xrl commented Apr 23, 2022 •

edited

neoaggelos commented Apr 23, 2022

neoaggelos commented Apr 23, 2022

xrl commented Apr 24, 2022

xrl commented Apr 24, 2022

neoaggelos commented Apr 24, 2022

Get a working build for the etcd operator #1

Are you sure you want to change the base?

Get a working build for the etcd operator #1

Conversation

neoaggelos commented Feb 25, 2022

Summary

xrl commented Mar 18, 2022

neoaggelos commented Mar 18, 2022

xrl commented Mar 23, 2022

xrl commented Mar 28, 2022

xrl commented Mar 28, 2022

neoaggelos commented Mar 30, 2022

xrl commented Apr 11, 2022

neoaggelos commented Apr 11, 2022

xrl commented Apr 18, 2022 • edited

xrl commented Apr 21, 2022

neoaggelos commented Apr 21, 2022

neoaggelos commented Apr 21, 2022

xrl commented Apr 23, 2022 • edited

neoaggelos commented Apr 23, 2022

neoaggelos commented Apr 23, 2022

xrl commented Apr 24, 2022

xrl commented Apr 24, 2022

neoaggelos commented Apr 24, 2022

xrl commented Apr 18, 2022 •

edited

xrl commented Apr 23, 2022 •

edited