Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade k8s-openapi to 0.16 #1008

Merged
merged 21 commits into from Sep 21, 2022
Merged

Upgrade k8s-openapi to 0.16 #1008

merged 21 commits into from Sep 21, 2022

Conversation

clux
Copy link
Member

@clux clux commented Sep 15, 2022

TL;DR: upstream bug is bad, breaking, and we need to do something to support v1_25, but we cannot detect it, so it needs to be user-opt in. Have left the evar-based fix in a branch. See comments below.

@clux clux linked an issue Sep 15, 2022 that may be closed by this pull request
@clux clux added this to the 0.75.0 milestone Sep 15, 2022
@clux clux added changelog-change changelog change category for prs dependencies upgrades to dependencies labels Sep 15, 2022
@clux clux changed the title cargo upgrade k8s-openapi --workspace Upgrade k8s-openapi to 0.16 Sep 15, 2022
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
clux added a commit to clux/dotfiles that referenced this pull request Sep 16, 2022
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
@codecov-commenter
Copy link

codecov-commenter commented Sep 16, 2022

Codecov Report

Merging #1008 (67517b5) into master (cc2a2b9) will increase coverage by 0.03%.
The diff coverage is 95.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1008      +/-   ##
==========================================
+ Coverage   71.89%   71.93%   +0.03%     
==========================================
  Files          65       65              
  Lines        4637     4647      +10     
==========================================
+ Hits         3334     3343       +9     
- Misses       1303     1304       +1     
Impacted Files Coverage Δ
kube-core/src/request.rs 95.74% <83.33%> (-0.33%) ⬇️
kube-core/src/params.rs 78.47% <100.00%> (+0.77%) ⬆️
kube/src/lib.rs 87.85% <100.00%> (+0.26%) ⬆️

@clux
Copy link
Member Author

clux commented Sep 16, 2022

Integration test failure from an integration test in derived_resource_queriable_and_has_subresources:
https://github.com/kube-rs/kube-rs/blob/master/kube/src/lib.rs#L234-L303

with

Error: Api(ErrorResponse { status: "Failure", message: "Object 'Kind' is missing in '{}'", reason: "", code: 500 })

looks veeery similar to the original bug but it's actually happening on get_status...

EDIT: get_status is a red herring.. happens because cleanup does not happen.

@clux
Copy link
Member Author

clux commented Sep 16, 2022

Error: Api(ErrorResponse { status: "Failure", message: "Object 'Kind' is missing in '{}'", reason: "", code: 500 })

happens exactly on the

foos.delete_collection(&DeleteParams::default(), &Default::default()).await?

line. so we can reproduce Arnav's issue on k3d locally with k3d flag: --image=rancher/k3s:v1.25.0-k3s1-amd64

@clux
Copy link
Member Author

clux commented Sep 16, 2022

Ok, I think the only thing we NEED to do here is to force in the kind and apiVersion of DeleteParams:

apiVersion: meta.k8s.io/v1
kind: DeleteOptions

to the output. Never done this before, but how hard can it be.

Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
@clux
Copy link
Member Author

clux commented Sep 16, 2022

Well, great. Serializing with kind/api_version fails in Kubernetes v1.20 (our MK8SV) with:

Error: Api(ErrorResponse { status: "Failure", message: "no kind "DeleteOptions" is registered for version "meta.k8s.io/v1" in scheme "k8s.io/apiextensions-apiserver/pkg/apiserver/apiserver.go:52"", reason: "", code: 500 })

@clux
Copy link
Member Author

clux commented Sep 16, 2022

Guess it could be wrapped in an k8s_openapi::k8s_if_ge_1_25! check... Will look at it more tomorrow.

Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
@clux
Copy link
Member Author

clux commented Sep 17, 2022

Well, this is really quite bad. It's not even trivial for users to deal with this because it effectively forces the user to deal with a zero version skew (against our versioning policies).

We have two error cases:

Serializing kind/apiVersion breaks on Kubernetes < 1.25

api.delete_collection returns:

(Error: Api(ErrorResponse { status: "Failure", message: "no kind \"DeleteOptions\" is registered for version \"meta.k8s.io/v1\" in scheme \"k8s.io/apimachinery@v1.24.1-k3s1/pkg/runtime/scheme.go:100\"", reason: "", code: 500 })

Not serializing kind/apiVersion breaks on Kubernetes >= 1.25

api.delete_collection returns:

Error: Api(ErrorResponse { status: "Failure", message: "Object 'Kind' is missing in '{}'", reason: "", code: 500 })

So the user gotta pick the v1_25 feature just as they upgrade to it...
we also would need to propagate k8s-openapi features in integration tests for just this.

I don't see any good solutions for this at the moment. Comments welcome.

@clux
Copy link
Member Author

clux commented Sep 17, 2022

Have reported findings upstream in kubernetes/kubernetes#111985 .

In the mean time, options as I see are:

  1. release with the caveat that v1_25 is likely broken for delete_collection calls
  2. release with zero-version-skew requirement and do a (currently untestable) cfg(feature = "v1_25") in the middle of core
  3. hold off releasing until upstream is dealt with

I don't like 3, because historically it is always very confusing for users to have to pick an older k8s-openapi.
I don't like 2, as it's super awkward, gives the impression that all is good, when it really breaks the version skew requirements (which people definitely do not expect to break on them during cluster upgrades).

So Leaning towards 1 with a caveated release and hope pressure can help upstream.

Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
@clux
Copy link
Member Author

clux commented Sep 18, 2022

Ok, have added an unstable evar to track this because there's no good way to support v1_25 without the user making a choice here.

Why?

  • if we detect Kubernetes version via k8s-openapi features, it's still broken if user is running on a newer cluster than the feature picked
  • if we detect Kubernetes version via an api call, it might be broken when/if Kubernetes actually resolves this

Thus, we need to make this an opt in thing. Either via an unstable EVAR or via an unstable feature.
Went with an evar rather than a feature because it's less taxing on our setup, and it doesn't break when/if upstream fixes the bug and we have to remove the feature.

Planning on removing the hacky evar in api.delete_collection when upstream fixes the bug.
Also planning on communicating that you do need to set KUBE_UNSTABLE_V1_25_DELETE evar when using kube with Kubernetes 1.25 because upstream has seriously fucked up here.

@clux clux marked this pull request as ready for review September 18, 2022 17:28
@clux clux requested a review from a team September 18, 2022 17:29
.github/workflows/ci.yml Outdated Show resolved Hide resolved
kube-core/src/request.rs Outdated Show resolved Hide resolved
basically special-case the empty params case with an internal is_default
(cannot do Eq/PartialEq because upstream types do not have it).

which does mean delete_collection with default delete params keeps on
working on 1.25 (but then immediately breaks if they change from default
without the evar)

Signed-off-by: clux <sszynrae@gmail.com>
@nightkr
Copy link
Member

nightkr commented Sep 20, 2022

Honestly I'd just say upstream K8s is broken here and let them fix it (option 1). Especially since it looks like they're already working on a fix, and that I'd wager that there isn't a single K8s cluster in the world that is only running kube-rs clients.

@clux
Copy link
Member Author

clux commented Sep 20, 2022

Honestly I'd just say upstream K8s is broken here and let them fix it (option 1). Especially since it looks like they're already working on a fix, and that I'd wager that there isn't a single K8s cluster in the world that is only running kube-rs clients.

It's not too different from what we are doing now tbh. We don't have to announce the evar as a workaround (given it is going away if they are fixing it) if that is better, but we should really have the workaround there for us to at least keep CI happy.

@kazk
Copy link
Member

kazk commented Sep 20, 2022

we should really have the workaround there for us to at least keep CI happy.

Does it still fail with f440415 ?

I think integration tests are all using the default.

@clux
Copy link
Member Author

clux commented Sep 20, 2022

Does it still fail with f440415 ?

I think integration tests are all using the default.

Think it will fail the e2e test which is using background deletion. EDIT: wrong.

@kazk
Copy link
Member

kazk commented Sep 20, 2022

@clux
Copy link
Member Author

clux commented Sep 20, 2022

Ah, sorry, I am wrong.

  • delete is not affected (just tested)
  • e2e does not fail on this in 1.25 (because it's using delete and not delete_collection)

@clux
Copy link
Member Author

clux commented Sep 20, 2022

So we can technically remove all this evar gunk for ci 🤔

Personally, I feel like having valid work around for an actual release is useful here. Otherwise users have no way around this if they are testing with v1_25 - save for recreating their cluster with an older version - and we can take out the hack when they fix it upstream without much ceremony. But it is an awkward evar lookup on the other hand.

Will remove the workaround if both of you feel the same.

kube-core/src/request.rs Outdated Show resolved Hide resolved
Signed-off-by: clux <sszynrae@gmail.com>
@clux
Copy link
Member Author

clux commented Sep 21, 2022

Anyone for an approval for this?

@kazk
Copy link
Member

kazk commented Sep 21, 2022

I'm still debating whether this workaround should be included or not.

Options:

  1. Include this workaround and let users test against 1.25 with evar
  2. Don't include this workaround, but mention this in release note, so users can create a temporary fork with it to test against 1.25
  3. Don't include this workaround, but create a branch that users can use to test against 1.25

1 is most convenient for users, but I'd prefer not to include this hack, especially because most users won't be using 1.25.
2 may be annoying for users, but I don't think many will need 1.25 now.
3 seems like a nice middle ground.

I'm leaning towards not including (3).

@clux
Copy link
Member Author

clux commented Sep 21, 2022

I really don't think it is a big deal either way, I just don't want us to say we don't support a fully released Kubernetes version because we don't think it's needed yet. That sends a bad signal.

Happy to move the workaround out to a branch though, that is workable for keen users. And since both of you have reservations about including it, it'll at least be less controversial :-)

Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
@clux clux merged commit a65c249 into master Sep 21, 2022
@clux clux deleted the openapi016 branch September 21, 2022 19:45
clux added a commit that referenced this pull request Sep 21, 2022
Signed-off-by: clux <sszynrae@gmail.com>
@clux clux mentioned this pull request Sep 21, 2022
@clux
Copy link
Member Author

clux commented Sep 21, 2022

Released in 0.75.0.

Release notes are currently stashed in the draft release for 0.75.0. Will publish that tomorrow (with updated code links) at some point after the docs.rs build completes. Tried to highlight on user actions needed and made a fourth point for all the improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog-change changelog change category for prs dependencies upgrades to dependencies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

v1_25 support
5 participants