Watch not triggering reconciliation #1316

zlepper · 2023-10-21T05:42:21Z

Current and expected behavior

Run this example: https://github.com/zlepper/kube/blob/main/examples/crd_watcher.rs

At some point it will stall without any progress.
The examples does setup and teardown a controller many times to simulate a process being restarted (It was the only easy way i could reproduce without having to manually start a process over and over).

Possible solution

No response

Additional context

I originally wrote on Discord, so this thread might contain some additional context from Clux. https://discord.com/channels/500028886025895936/1164148573571661926

Environment

Local Docker Desktop kubernetes.
Azure kubernetes (AKS).

Configuration and features

No response

Affected crates

No response

Would you like to work on fixing this bug?

no

clux · 2023-10-21T15:15:01Z

This appears to be a lock-up from somewhere in the streaming code. It seems to trigger consistently on this branch with a simple example run in both --release or debug, with openssl and rustls.

Instrumented the example by forking the example and adding console instrumentation by adding console-subscriber = "0.2.0" to the examples cargo.toml and adding the layer via:

-    tracing_subscriber::fmt::init();
+    use tracing_subscriber::{prelude::*, Registry};
+    let logger = tracing_subscriber::fmt::layer().compact();
+    let console = console_subscriber::ConsoleLayer::builder()
+        .retention(std::time::Duration::from_secs(60))
+        .spawn();
+    let collector = Registry::default().with(console).with(logger);
+    tracing::subscriber::set_global_default(collector).unwrap();

running (while having tokio-console installed and running separately):

RUSTFLAGS="--cfg tokio_unstable" RUST_LOG=trace,runtime=trace,tokio=trace cargo run --example crd_watcher --release

which shows these tasks stuck:

which i guess come from these lines:

that doesn't super help me yet.

the task screenshot has a lot of similarities with this old hyper issue though: hyperium/hyper#2312

clux · 2023-10-21T15:25:43Z

I tried one of the older workarounds: hyperium/hyper#2312 (comment) with

diff --git a/kube-client/src/client/builder.rs b/kube-client/src/client/builder.rs
index bfaa945..2073c99 100644
--- a/kube-client/src/client/builder.rs
+++ b/kube-client/src/client/builder.rs
@@ -100,7 +100,10 @@ impl TryFrom<Config> for ClientBuilder<BoxService<Request<hyper::Body>, Response
             connector.set_read_timeout(config.read_timeout);
             connector.set_write_timeout(config.write_timeout);
 
-            hyper::Client::builder().build(connector)
+            hyper::Client::builder()
+                .pool_idle_timeout(std::time::Duration::from_millis(0))
+                .pool_max_idle_per_host(0)
+                .build(connector)
         };
 
         let stack = ServiceBuilder::new().layer(config.base_uri_layer()).into_inner();

which does seem to cause some of these tasks to mostly disappear from the console list, but the example does not really resume in this case so it practically has no real effect, and the example is still locked up.

clux · 2023-10-21T16:10:30Z

ok, using an aggressive combination of timeouts

pool idle timeout of 1s (hyper client builder above)
no max idle setting (removed from hyper client builder above)
bringing in Set a default write timeout #1314 and setting CONN, READ, WRITE timeouts to 10s, 30s, 10s resp

the example actually does seem to slowly move along:

2023-10-21T15:48:56.250526Z DEBUG hyper::proto::h1::conn: incoming body is chunked encoding
2023-10-21T15:48:56.250523Z TRACE kube_client::client: headers: {"audit-id": "65f89e1f-788c-4b58-8f17-9a91e972f8e3", "cache-control": "no-cache, private", "content-type": "application/json", "x-kubernetes-pf-flowschema-uid": "a0d31a0b-f5eb-4791-a329-a0a398dbedf8", "x-kubernetes-pf-prioritylevel-uid": "9545e2a2-9141-487d-bfde-afd024e7d35c", "date": "Sat, 21 Oct 2023 15:48:56 GMT", "transfer-encoding": "chunked"}
2023-10-21T15:48:56.250546Z TRACE kube_client::client: headers: {"audit-id": "335638b4-6092-4439-9ca8-f9668818b9f7", "cache-control": "no-cache, private", "content-type": "application/json", "x-kubernetes-pf-flowschema-uid": "a0d31a0b-f5eb-4791-a329-a0a398dbedf8", "x-kubernetes-pf-prioritylevel-uid": "9545e2a2-9141-487d-bfde-afd024e7d35c", "date": "Sat, 21 Oct 2023 15:48:56 GMT", "transfer-encoding": "chunked"}
2023-10-21T15:48:56.250624Z DEBUG hyper::proto::h1::decode: incoming chunked header: 0x5C4 (1476 bytes)
2023-10-21T15:48:56.250679Z TRACE kube_runtime::utils::stream_backoff: Non-error received, resetting backoff
2023-10-21T15:48:56.250693Z TRACE kube_runtime::utils::stream_backoff: Non-error received, resetting backoff
2023-10-21T15:49:26.251406Z DEBUG hyper::proto::h1::conn: incoming body decode error: timed out
2023-10-21T15:49:26.251421Z DEBUG hyper::proto::h1::conn: incoming body decode error: timed out
2023-10-21T15:49:26.251431Z DEBUG rustls::common_state: Sending warning alert CloseNotify
2023-10-21T15:49:26.251433Z DEBUG rustls::common_state: Sending warning alert CloseNotify
2023-10-21T15:49:26.251459Z DEBUG kube_runtime::watcher: watcher error: ReadEvents(Custom { kind: Other, error: hyper::Error(Body, hyper::Error(Body, Kind(TimedOut))) })
2023-10-21T15:49:26.251473Z DEBUG kube_runtime::utils::stream_backoff: Error received, backing off deadline=Instant { tv_sec: 33588, tv_nsec: 216695282 } duration=7.718359015s
2023-10-21T15:49:26.251483Z  INFO crd_watcher: Handled updated reconciliation
2023-10-21T15:49:26.251513Z DEBUG tower::buffer::worker: service.ready=true processing request
2023-10-21T15:49:26.251531Z DEBUG HTTP{http.method=PATCH http.url=https://0.0.0.0:41217/apis/clux.dev/v1/namespaces/default/refererthings/my-referer? otel.name="patch" otel.kind="client"}: kube_client::client::builder: requesting
2023-10-21T15:49:26.251561Z DEBUG HTTP{http.method=PATCH http.url=https://0.0.0.0:41217/apis/clux.dev/v1/namespaces/default/refererthings/my-referer? otel.name="patch" otel.kind="client"}: hyper::client::connect::http: connecting to 0.0.0.0:41217
2023-10-21T15:49:26.251629Z DEBUG HTTP{http.method=PATCH http.url=https://0.0.0.0:41217/apis/clux.dev/v1/namespaces/default/refererthings/my-referer? otel.name="patch" otel.kind="client"}: hyper::client::connect::http: connected to 0.0.0.0:41217
2023-10-21T15:49:26.251651Z DEBUG HTTP{http.method=PATCH http.url=https://0.0.0.0:41217/apis/clux.dev/v1/namespaces/default/refererthings/my-referer? otel.name="patch" otel.kind="client"}: rustls::client::hs: Resuming session
2023-10-21T15:49:26.251989Z DEBUG HTTP{http.method=PATCH http.url=https://0.0.0.0:41217/apis/clux.dev/v1/namespaces/default/refererthings/my-referer? otel.name="patch" otel.kind="client"}: rustls::client::hs: Using ciphersuite TLS13_AES_128_GCM_SHA256
2023-10-21T15:49:26.251999Z DEBUG HTTP{http.method=PATCH http.url=https://0.0.0.0:41217/apis/clux.dev/v1/namespaces/default/refererthings/my-referer? otel.name="patch" otel.kind="client"}: rustls::client::tls13: Resuming using PSK
2023-10-21T15:49:26.252038Z DEBUG HTTP{http.method=PATCH http.url=https://0.0.0.0:41217/apis/clux.dev/v1/namespaces/default/refererthings/my-referer? otel.name="patch" otel.kind="client"}: rustls::client::tls13: TLS1.3 encrypted extensions: []
2023-10-21T15:49:26.252042Z DEBUG HTTP{http.method=PATCH http.url=https://0.0.0.0:41217/apis/clux.dev/v1/namespaces/default/refererthings/my-referer? otel.name="patch" otel.kind="client"}: rustls::client::hs: ALPN protocol is None
2023-10-21T15:49:26.252112Z DEBUG hyper::proto::h1::io: flushed 212 bytes
2023-10-21T15:49:26.294921Z DEBUG hyper::proto::h1::io: parsed 7 headers
2023-10-21T15:49:26.294927Z DEBUG hyper::proto::h1::conn: incoming body is content-length (709 bytes)
2023-10-21T15:49:26.294935Z DEBUG hyper::proto::h1::conn: incoming body completed
2023-10-21T15:49:26.294943Z DEBUG HTTP{http.method=PATCH http.url=https://0.0.0.0:41217/apis/clux.dev/v1/namespaces/default/refererthings/my-referer? otel.name="patch" otel.kind="client"}: hyper::client::pool: pooling idle connection for ("https", 0.0.0.0:41217)
2023-10-21T15:49:26.294986Z  INFO crd_watcher: updated referer resource 8 in 2
2023-10-21T15:49:26.294992Z TRACE kube_runtime::utils::stream_backoff: Still waiting for backoff sleep to complete deadline=Instant { tv_sec: 33588, tv_nsec: 216695282 } remaining_duration=7.674839479s
2023-10-21T15:49:26.294998Z TRACE kube_runtime::utils::stream_backoff: Still waiting for backoff sleep to complete deadline=Instant { tv_sec: 33588, tv_nsec: 216695282 } remaining_duration=7.674833107s
2023-10-21T15:49:28.077914Z DEBUG rustls::common_state: Sending warning alert CloseNotify
2023-10-21T15:49:33.972368Z DEBUG kube_runtime::utils::stream_backoff: Backoff complete, waking up deadline=Instant { tv_sec: 33588, tv_nsec: 216695282 }
2023-10-21T15:49:33.972400Z DEBUG kube_runtime::watcher: watcher error: ReadEvents(Custom { kind: Other, error: hyper::Error(Body, hyper::Error(Body, Kind(TimedOut))) })
2023-10-21T15:49:33.972409Z DEBUG kube_runtime::utils::stream_backoff: Error received, backing off deadline=Instant { tv_sec: 33607, tv_nsec: 129996104 } duration=18.910723232s

i.e. ReadEvents(Custom { kind: Other, error: hyper::Error(Body, hyper::Error(Body, Kind(TimedOut))) })
which seems to kick in 30s after when it appears to lock up (indicating it's the read timeout that hits here?)

and the example sloooowly moves along at a snails pace (about 1 iteration a minute as it hits the lockup very frequently)

i also tried setting the 3 timeouts to (5s, 5s, 5s) just to check if that helps much, and yes, a bit, but not a whole lot; we are then at the mercy of the watcher backoff. running the example without console took 9m to complete all iterations in release mode against a local cluster, and that's pretty bad considering how much of that is idle:

RUST_LOG=debug,kube=trace cargo run --example crd_watcher --release 41.61s user 1.52s system 7% cpu 9:22.68 total

clux · 2023-10-23T22:41:17Z

Ok, I've been playing around with this a bit more and I am starting to question the validity of the repro here tbh.

Simpler Examples

The above crd_watcher example was meant to simulate a startup condition (from the convo on discord), but I cannot reproduce the issue with a "more normal" Controller boot setup, which does make me think this is not a ubiquitous problem.

Have tried several other examples.

The one from discord; crd_watcher_single

This one avoids the teardown + recreate (single controller), just focusing on the messages, but it still will do a single startup and teardown, so we re-run it a bunch of times.
I ran it in hyperfine with 100 iterations. Never halts (1.5s std dev):

hyperfine -w 0 -r 100 'RUST_LOG=trace cargo run --example=crd_watcher_single --release'
Benchmark 1: RUST_LOG=trace cargo run --example=crd_watcher_single --release
  Time (mean ± σ):     17.034 s ±  1.486 s    [User: 0.365 s, System: 0.113 s]
  Range (min … max):   14.714 s … 20.305 s    100 runs

A faster version of the above; crd_watcher_mono

This one only starts the controller, runs a single iteration with only two patches + reconciles and shuts down. It takes 1s to run. Explicitly just trying to start the controller from scratch to see if it could happen on startup alone.
Ran 1000 iterations (since i had more time), it never locks up:

hyperfine -w 0 -r 1000 'RUST_LOG=trace cargo run --example=crd_watcher_mono --release'
Benchmark 1: RUST_LOG=trace cargo run --example=crd_watcher_mono --release
  Time (mean ± σ):      1.132 s ±  0.004 s    [User: 0.074 s, System: 0.039 s]
  Range (min … max):    1.123 s …  1.153 s    1000 runs

so it's something particular about this particular trashing of controllers and associated objects.

(if that's all that's broken, then i can live with it at the moment - i.e. it's not great, but it's not a normal use case either).

Suggestions

Some things that could be worth looking into;

are we exhausting hyper's client pool?

that could explain why idle timeouts and max idle connections helps recover (albeit very slowly). if so, that could indicate a bug in hyper, and it could be worth building with a more instrumented version of hyper to figure this out.

is the example's uneven polling causing problems?

we are using now_or_never combined with patching without anything awaiting until after the post is done. the stream is not being polled for a large part of the time. this is not a supported model.

as a side-note, you probably are not even draining the stream. expecting one event per patch is not necessarily true, you could have bookmark events.

anyway.. it's possible this does represent a real problem, and it's possible that this should work, but it's also a very strange way of doing things.

i'm going to step away from this for now, until more information surfaces. if anyone else has input i'll leave it open.

louismrose · 2023-11-20T09:54:15Z

Hey @zlepper,

I may be way off here, but... do you still see this issue on kube 0.87.1?

I ask because we run a decent sized kube project, and we hit a similar problem after upgrading to kube 0.86. Our issue was a little tricky to reproduce, but we did so via a simulation script that every 100ms takes one of the following actions:

Creates a new instance of our custom resource
Deletes an instance of our custom resource
Rollout restarts the deployment running our kube controller

On kube 0.86, this simulation script will eventually restart the controller, attempt a deletion and then hang forever. On kube 0.87.1, this no longer arises. In fact, I've narrowed the bug fix down to this commit (e78d488), which is present in kube 0.87.1 and not in kube 0.86. This was a surprise to us as #1324 and #1322 seem to indicate that the bug fix in e78d488 relates only to controllers running with bounded concurrency, but we're actually using the default (unbounded) and still saw the bug.

Anyhow, perhaps try kube 0.87.1 and see if that resolves your issue?

zlepper · 2023-11-22T13:29:44Z

Hey @zlepper,

I may be way off here, but... do you still see this issue on kube 0.87.1?

I ask because we run a decent sized kube project, and we hit a similar problem after upgrading to kube 0.86. Our issue was a little tricky to reproduce, but we did so via a simulation script that every 100ms takes one of the following actions:

Creates a new instance of our custom resource

Deletes an instance of our custom resource

Rollout restarts the deployment running our kube controller

On kube 0.86, this simulation script will eventually restart the controller, attempt a deletion and then hang forever. On kube 0.87.1, this no longer arises. In fact, I've narrowed the bug fix down to this commit (e78d488), which is present in kube 0.87.1 and not in kube 0.86. This was a surprise to us as #1324 and #1322 seem to indicate that the bug fix in e78d488 relates only to controllers running with bounded concurrency, but we're actually using the default (unbounded) and still saw the bug.

Anyhow, perhaps try kube 0.87.1 and see if that resolves your issue?

Having tested it for a couple of days it does indeed seem to have solved the issue, so great :D And thank you so much for the heads up

louismrose · 2023-11-22T15:39:32Z

@clux Is there perhaps a case to be made for yanking v0.86? The bug caused a prod incident for us, and if @zlepper's codebase has hit it too I wonder if other folks might inadvertently upgrade to this version and run into problems too. It seems that v0.87.1 doesn't exhibit the issue any more, so anybody on v0.86 should have a fairly straightforward upgrade path?

clux · 2023-11-23T08:50:54Z

Yeah, I think that's probably a good idea. Particularly since kube is mostly used in applications where lockfiles are present (so we only prevent new downloads - and more damage).

clux · 2023-11-23T08:54:21Z

kube@0.86.0 (and it's individual workspace crates) are now yanked as a result of this; https://crates.io/crates/kube/versions

zlepper added the bug Something isn't working label Oct 21, 2023

clux mentioned this issue Oct 21, 2023

Set a default write timeout #1314

Merged

clux added question Direction unclear; possibly a bug, possibly could be improved. and removed bug Something isn't working labels Oct 23, 2023

clux added runtime controller runtime related help wanted Not immediately prioritised, please help! and removed help wanted Not immediately prioritised, please help! labels Nov 20, 2023

zlepper closed this as completed Nov 22, 2023

sbernauer mentioned this issue Nov 23, 2023

chore: Bump kube to 0.87.1 stackabletech/operator-rs#685

Merged

clux mentioned this issue Dec 11, 2023

0.87 release planned 🎃 #1333

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Watch not triggering reconciliation #1316

Watch not triggering reconciliation #1316

zlepper commented Oct 21, 2023

clux commented Oct 21, 2023

clux commented Oct 21, 2023

clux commented Oct 21, 2023

clux commented Oct 23, 2023

louismrose commented Nov 20, 2023 •

edited

zlepper commented Nov 22, 2023

louismrose commented Nov 22, 2023

clux commented Nov 23, 2023

clux commented Nov 23, 2023

Watch not triggering reconciliation #1316

Watch not triggering reconciliation #1316

Comments

zlepper commented Oct 21, 2023

Current and expected behavior

Possible solution

Additional context

Environment

Configuration and features

Affected crates

Would you like to work on fixing this bug?

clux commented Oct 21, 2023

clux commented Oct 21, 2023

clux commented Oct 21, 2023

clux commented Oct 23, 2023

Simpler Examples

Suggestions

louismrose commented Nov 20, 2023 • edited

zlepper commented Nov 22, 2023

louismrose commented Nov 22, 2023

clux commented Nov 23, 2023

clux commented Nov 23, 2023

louismrose commented Nov 20, 2023 •

edited