Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outbound requests fail intermittently after the proxy reported "panicked at 'cancel sender lost'" #6086

Closed
Wenliang-CHEN opened this issue Apr 30, 2021 · 24 comments · Fixed by linkerd/linkerd2-proxy#1758

Comments

@Wenliang-CHEN
Copy link
Contributor

Wenliang-CHEN commented Apr 30, 2021

Bug Report

What is the issue?

The outbound requests of a meshed pod fail intermittently after its linkerd-proxy reported "panicked at 'cancel sender lost'".

We are not sure what triggers the issue. From the logs we can tell that after the linkerd-proxy emits the following:

thread 'main' panicked at 'cancel sender lost', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tower-0.4.6/src/ready_cache/cache.rs:397:13

Then around 50% of the outbound requests starts failing intermittently with message:

[2m[ 19948.510484s]�[0m �[33m WARN�[0m ThreadId(01) �[1mserver�[0m�[1m{�[0morig_dst=172.20.207.194:80�[1m}�[0m: linkerd_app_core::errors: Failed to proxy request: buffer's worker closed unexpectedly client.addr=10.250.162.208:59692

Additional context

The outbound destination is also a meshed service.

The linkerd-init container exited with "Completed" status in the pod.

Before and during the incident, there was no restart in either the application container or the proxy container.

Once we restarted the pod manually, the outbound traffic succeeds at 100% again.

linkerd check output

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist
linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
‼ issuer cert is valid for at least 60 days
    issuer certificate will expire on 2021-06-06T08:57:23Z
    see https://linkerd.io/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints
√ issuer cert is issued by the trust anchor
linkerd-webhooks-and-apisvc-tls
-------------------------------
√ proxy-injector webhook has valid cert
√ proxy-injector cert is valid for at least 60 days
√ sp-validator webhook has valid cert
√ sp-validator cert is valid for at least 60 days
linkerd-api
-----------
√ control plane pods are ready
√ can initialize the client
√ can query the control plane API
linkerd-version
---------------
√ can determine the latest version
‼ cli is up-to-date
    is running version 2.10.0 but the latest stable version is 2.10.1
    see https://linkerd.io/checks/#l5d-version-cli for hints
control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 2.10.0 but the latest stable version is 2.10.1
    see https://linkerd.io/checks/#l5d-version-control for hints
√ control plane and cli versions match
linkerd-ha-checks
-----------------
√ pod injection disabled on kube-system
Status check results are √
Linkerd extensions checks
=========================
linkerd-viz
-----------
√ linkerd-viz Namespace exists
√ linkerd-viz ClusterRoles exist
√ linkerd-viz ClusterRoleBindings exist
√ tap API server has valid cert
√ tap API server cert is valid for at least 60 days
√ tap API service is running
‼ linkerd-viz pods are injected
    could not find proxy container for prometheus-7b5758b6ff-xlqv4 pod
    see https://linkerd.io/checks/#https://linkerd.io/checks/#l5d-viz-pods-injection for hints
√ viz extension pods are running
√ prometheus is installed and configured correctly
√ can initialize the client
√ viz extension self-check
Status check results are √

Environment

  • Kubernetes Version: v1.18.9-eks-d1db3c
  • Cluster Environment: EKS:
  • Linkerd version:
    control plane: v2.10.0
    linkerd-proxy: happened both for v2.139 and v2.142
    linkerd-init: cr.l5d.io/linkerd/proxy-init:v1.3.9
@olix0r
Copy link
Member

olix0r commented Apr 30, 2021

Thanks for letting us know.

I've deleted the v2.142.0 proxy release -- it hit some other issues during integration tests and isn't yet ready for public consumption. In general, I'd only recommend using proxy versions that have been released on an edge release. Is there a specific reason you picked up v2.142.0? Are you using a patched version of the proxy?

For what it's worth, we plan on releasing a v2.10.2 release that use proxy version v2.141.1.

@Wenliang-CHEN
Copy link
Contributor Author

Hey @olix0r thanks for the reply. So...

Is there a specific reason you picked up v2.142.0? Are you using a patched version of the proxy?
Yes and yes we need this commit linkerd/linkerd2-proxy#965 to fix the ingress problem. But it seems in 2.141.1 the commit is there, so it is all good.

Just wanted to point out that the issue existed before we upgraded the proxy to v2.142. We were using v2.139. We saw the 1st occurrence during that time. I will update the issue description.

@olix0r
Copy link
Member

olix0r commented May 7, 2021

@Wenliang-CHEN We've been trying to reproduce this in library tests but haven't been able to get a solid lead on what's going on. Next week, we'll put together a branch that increases diagnostic logging in the ready-cache & balancer and ask you to test that out, if that works for you.

@Wenliang-CHEN
Copy link
Contributor Author

Wenliang-CHEN commented May 10, 2021

Hey @olix0r thanks for the update. We will upgrade to it once it is ready.

Also an update from our side:

We are not able to reproduce the issue either. But when it happens, we observed high request rate for outbound traffic.

It seems that it goes in such pattern during the incident:

  • problematic pod sends around 100 rps to dest A: all traffic succeeds
  • at the same time problematic pod sends requests to dest B: we saw failing requests via he proxy logs
  • once the high rate traffic to dest A finishes, everything goes back to normal

It seems to be a load related issue. (This seems not true)

@hawkw
Copy link
Member

hawkw commented May 11, 2021

Hi @Wenliang-CHEN, I've published a linkerd proxy image mycoliza/l2-proxy:ready-cache-debug which contains additional debug logging in the ready-cache code. If you can test out this proxy image and set the proxy log level to

linkerd=debug,tower::balance=trace,tower::ready_cache=trace

that would be extremely helpful.

Thanks!

@olix0r
Copy link
Member

olix0r commented May 12, 2021

specifically, you'll want to set workload annotations:

config.linkerd.io/proxy-image: docker.io/mycoliza/l2-proxy
config.linkerd.io/proxy-version: ready-cache-debug
config.linkerd.io/proxy-log-level: linkerd=debug,tower::balance=trace,tower::ready_cache=trace,warn

@Wenliang-CHEN
Copy link
Contributor Author

Hi @hawkw @olix0r thanks for the effort! We are going to try out the proxy for the service.

We will let you know once we found anything interesting.

@Wenliang-CHEN
Copy link
Contributor Author

Hello again. We are able to reproduce the issue with debug proxy. We observed the following pattern during the incident:

First we saw large number of logs like the following:

DEBUG ThreadId(01) outbound:accept{client.addr=10.250.187.39:35960}:server{orig_dst=172.20.207.194:80}:profile:http{v=1.x}:logical{dst=service-name.prod.svc.cluster.local:80}:concrete{addr=service-name-primary.prod.svc.cluster.local:80}: tower::ready_cache::cache: endpoint canceled

Then we saw the panic log

thread 'main' panicked at 'cancel sender lost', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tower-0.4.7/src/ready_cache/cache.rs:397:13

Then the following

[ 25455.324175s]  WARN ThreadId(01) outbound:accept{client.addr=10.250.187.39:35960}:server{orig_dst=172.20.207.194:80}:profile:http{v=1.x}    : linkerd_app_core::errors: Failed to proxy request: buffered service failed: panic client.addr=10.250.187.39:35960

[ 25455.324282s] DEBUG ThreadId(01) outbound:accept{client.addr=10.250.187.39:35960}: linkerd_app_core::serve: Connection closed

[ 25455.324205s] DEBUG ThreadId(01) outbound:accept{client.addr=10.250.187.39:35960}:server{orig_dst=172.20.207.194:80}:profile:http{v=1.x}    : linkerd_app_core::errors: Handling error with HTTP response status=502 Bad Gateway version=HTTP/1.1

[ 25455.324199s] DEBUG ThreadId(01) outbound:accept{client.addr=10.250.187.39:35960}:server{orig_dst=172.20.207.194:80}:profile:http{v=1.x}    : linkerd_app_core::errors: Closing server-side connection

[ 25455.324251s] DEBUG ThreadId(01) outbound:accept{client.addr=10.250.187.39:35960}:server{orig_dst=172.20.207.194:80}:profile:http{v=1.x}    : linkerd_proxy_http::server: The stack is tearing down the connection

[ 25455.318532s] DEBUG ThreadId(01) outbound:accept{client.addr=10.250.187.39:35960}:server{orig_dst=172.20.207.194:80}:profile:http{v=1.x}    :logical{dst=service-name.prod.svc.cluster.local:80}:concrete{addr=service-name-primary.prod.svc.cluster.local:80}: tower::ready_cache::cac    he: endpoint canceled

Afterwards, the full connection lifecycle looks like this:

[ 25455.329021s] DEBUG ThreadId(01) outbound:accept{client.addr=10.250.187.39:36226}:server{orig_dst=172.20.207.194:80}:profile:http{v=1.x}: linkerd_app_core::errors: Handling error with HTTP response status=502 Bad Gateway version=HTTP/1.1

[ 25455.329114s] DEBUG ThreadId(01) outbound:accept{client.addr=10.250.187.39:36226}: linkerd_app_core::serve: Connection closed

[ 25455.329072s] DEBUG ThreadId(01) outbound:accept{client.addr=10.250.187.39:36226}:server{orig_dst=172.20.207.194:80}:profile:http{v=1.x}: linkerd_proxy_http::server: The stack is tearing down the connection

[ 25455.328793s] DEBUG ThreadId(01) outbound:accept{client.addr=10.250.187.39:36226}:server{orig_dst=172.20.207.194:80}:profile: linkerd_detect: DetectResult protocol=Some(Http1) elapsed=17.323µs

[ 25455.328895s] DEBUG ThreadId(01) outbound:accept{client.addr=10.250.187.39:36226}:server{orig_dst=172.20.207.194:80}:profile:http{v=1.x}: linkerd_proxy_http::server: Handling as HTTP version=Http1

[ 25455.329003s]  WARN ThreadId(01) outbound:accept{client.addr=10.250.187.39:36226}:server{orig_dst=172.20.207.194:80}:profile:http{v=1.x}: linkerd_app_core::errors: Failed to proxy request: buffered service failed: buffered service failed: panic client.addr=10.250.187.39:36226

[ 25455.329015s] DEBUG ThreadId(01) outbound:accept{client.addr=10.250.187.39:36226}:server{orig_dst=172.20.207.194:80}:profile:http{v=1.x}: linkerd_app_core::errors: Closing server-side connection

[ 25455.328821s] DEBUG ThreadId(01) outbound:accept{client.addr=10.250.187.39:36226}:server{orig_dst=172.20.207.194:80}:profile:http{v=1.x}: linkerd_proxy_http::server: Creating HTTP service version=Http1

[ 25455.328971s] DEBUG ThreadId(01) outbound:accept{client.addr=10.250.187.39:36226}:server{orig_dst=172.20.207.194:80}:profile:http{v=1.x}:logical{dst=service-name.prod.svc.cluster.local:80}: linkerd_service_profiles::http::route_request: Updating HTTP routes routes=0

What's worth mentioning

  • The destination is also running with a Linkerd proxy - nginx - fpm setup
  • The target endpoint at destination is slow. It reaches execution timeout at 60s, which I think is where the "502 bad gateway" comes from
  • We did not paste the "keep alive" logs here as we think those are irrelevant. Please let us know if you need them too.

Thanks!

@olix0r
Copy link
Member

olix0r commented May 19, 2021 via email

@hawkw
Copy link
Member

hawkw commented May 19, 2021

First we saw large number of logs like the following:

DEBUG ThreadId(01) outbound:accept{client.addr=10.250.187.39:35960}:server{orig_dst=172.20.207.194:80}:profile:http{v=1.x}:logical{dst=service-name.prod.svc.cluster.local:80}:concrete{addr=service-name-primary.prod.svc.cluster.local:80}: tower::ready_cache::cache: endpoint canceled

Then we saw the panic log

thread 'main' panicked at 'cancel sender lost', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tower-0.4.7/src/ready_cache/cache.rs:397:13

Then the following

Hi @Wenliang-CHEN, is it possible to get more complete logs starting from before the first "endpoint canceled" message was logged? Thank you!

@Wenliang-CHEN
Copy link
Contributor Author

Wenliang-CHEN commented May 20, 2021

proxy-logs.csv

Hey @hawkw sure, please find the attached.

You will find the 1st occurrence of "endpoint cancelled" at line 12.

And the "cancel sender lost" at line 185.

hawkw added a commit to linkerd/linkerd2-proxy that referenced this issue Jun 1, 2021
This branch updates the `futures` crate to v0.3.15. This includes a fix
for task starvation with `FuturesUnordered` (added in 0.3.13). This may
or may not be related to issues that have been reported in the proxy
involving the load balancer (linkerd/linkerd2#6086), but we should
update to the fixed version regardless. This may also improve
performance in some cases, since we may now have to do fewer poll-wakeup
cycles when a load balancer has a large number of pending endpoints.
hawkw added a commit to linkerd/linkerd2-proxy that referenced this issue Jun 1, 2021
This branch updates the `futures` crate to v0.3.15. This includes a fix
for task starvation with `FuturesUnordered` (added in 0.3.13). This may
or may not be related to issues that have been reported in the proxy
involving the load balancer (linkerd/linkerd2#6086), but we should
update to the fixed version regardless. This may also improve
performance in some cases, since we may now have to do fewer poll-wakeup
cycles when a load balancer has a large number of pending endpoints.
olix0r pushed a commit to linkerd/linkerd2-proxy that referenced this issue Jun 2, 2021
This branch updates the `futures` crate to v0.3.15. This includes a fix
for task starvation with `FuturesUnordered` (added in 0.3.13). This may
or may not be related to issues that have been reported in the proxy
involving the load balancer (linkerd/linkerd2#6086), but we should
update to the fixed version regardless. This may also improve
performance in some cases, since we may now have to do fewer poll-wakeup
cycles when a load balancer has a large number of pending endpoints.
@olix0r
Copy link
Member

olix0r commented Jun 2, 2021

We've been stress testing the tower ready-cache dependency and have been able to trigger some unexpected behavior, though not the exact problems you seem to have captured. There's at least one fix we picked up from futures (rust-lang/futures-rs#2333) that should eliminate some pathological behavior when there are many endpoints in a balancer.

In your case roughly how many endpoints should exist in the target service? Are endpoints churning (being deleted/created) frequently? Or are they relatively static?

@Wenliang-CHEN
Copy link
Contributor Author

Thanks for the update @olix0r. The target service is a monolith that has around 200 endpoints. I would say 90% of them are static. And the service that makes the outbound call uses 8 of them, all static.

@olix0r
Copy link
Member

olix0r commented Jun 2, 2021

@Wenliang-CHEN Sorry, I should have been clearer: how many pods of the service are running? Are these panics at all correlated with deployments/restarts of the target service?

@Wenliang-CHEN
Copy link
Contributor Author

@olix0r there are 6 pods running. We did observe coincidence between the deployment of the target service and the panics. But it does not always happen. When panic happens, it does not happen to all the pods. It is mostly 1 or 2 pods.

@olix0r
Copy link
Member

olix0r commented Jun 2, 2021

@Wenliang-CHEN Thanks, this is helpful. I doubt that the futures change will help this issue. I suspect that there's a race condition around updating the balancer with new endpoints where we enter an illegal state. We'll focus more on stress testing the update path.

olix0r pushed a commit to linkerd/drain-rs that referenced this issue Jun 3, 2021
This branch updates the `futures` crate to v0.3.15. This includes a fix
for task starvation with `FuturesUnordered` (added in 0.3.13). This may
or may not be related to issues that have been reported in the proxy
involving the load balancer (linkerd/linkerd2#6086), but we should
update to the fixed version regardless. This may also improve
performance in some cases, since we may now have to do fewer poll-wakeup
cycles when a load balancer has a large number of pending endpoints.
olix0r added a commit to olix0r/tower that referenced this issue Jun 16, 2021
linkerd/linkerd2#6086 describes an issue that sounds closely related to
tower-rs#415: There's some sort of consistency issue between the
ready-cache's pending stream and its set of cancelations. Where the
latter issues describes triggering a panic in the stream receiver, the
former describes triggering a panic in the stream implementation.

There's no logical reason why we can't continue to operate in this
scenario, though it does indicate a real correctness issue.

So, this change prevents panicking in this scenario when not building
with debugging. Instead, we now emit WARN-level logs so that we have a
clearer signal they're occurring.

Finally, this change also adds `Debug` constraints to the cache's key
types (and hence the balancer's key types) so that we can more
reasonably debug this behavior.
olix0r added a commit to olix0r/tower that referenced this issue Jun 16, 2021
linkerd/linkerd2#6086 describes an issue that sounds closely related to
tower-rs#415: There's some sort of consistency issue between the
ready-cache's pending stream and its set of cancelations. Where the
latter issues describes triggering a panic in the stream receiver, the
former describes triggering a panic in the stream implementation.

There's no logical reason why we can't continue to operate in this
scenario, though it does indicate a real correctness issue.

So, this change prevents panicking in this scenario when not building
with debugging. Instead, we now emit WARN-level logs so that we have a
clearer signal they're occurring.

Finally, this change also adds `Debug` constraints to the cache's key
types (and hence the balancer's key types) so that we can more
reasonably debug this behavior.
@olix0r
Copy link
Member

olix0r commented Jun 16, 2021

We've had a lot of trouble reproducing this in tests, but I think this is very likely a manifestation of the same problem described in tower-rs/tower#415. I'm especially suspicious of tokio::sync::oneshot, but we need to do a better job of eliminating the application logic before investigating such a low-level primitive.

I've put together a tower branch that makes a few changes:

  • We log in more situations and now include the cache key, so we can track how individual entries move through the cache;
  • We no longer panic in this situation. We instead emit WARN-level logs
  • We now emit WARN-level logs (rather than DEBUG) for the issue described in Panicked at 'missing cancelation' in tower-ready-cache tower-rs/tower#415;
  • The above situation is now handled more gracefully by creating new cancelations rather than dropping the service.

I recommend setting the following annotations on your pod template:

config.linkerd.io/proxy-image: ghcr.io/olix0r/l2-proxy
config.linkerd.io/proxy-version: tower-ready-debug.ab6c68ee
config.linkerd.io/proxy-log-level: linkerd=info,tower::ready_cache=debug,warn

Your application should no longer panic. If you see WARN-level logs, it would be great if we could capture the preceding logs to get a better sense of the access pattern that may be triggering this.

@Wenliang-CHEN
Copy link
Contributor Author

Hey @olix0r thanks for the update. I will get the service running with the new proxy. will report here once I get anything interesting.

@Wenliang-CHEN
Copy link
Contributor Author

Hello @olix0r, so...we let the service run for 2 days with the new proxy. The panic does not happen anymore. And we do notice some WARN message in the logs.

Attached please find the the samples. Please let me know if anything is missing or incomplete.
error_111.txt
error_113.txt
maybe_during_destination_deloyment.txt
no_route_to_host.txt

@olix0r
Copy link
Member

olix0r commented Jun 18, 2021

Thanks! Interestingly, it looks like we haven't triggered the scenario we've seen previously: in those logs most of the warnings appear to come from the reconnect module (which is more-or-less expected if the target endpoint isn't available); but none of those warnings should really impact the balancer/ready_cache.

The logs I'm most interested in catching are Pending service lost its cancelation or Ready service had no associated cancelation.

If you want to filter out the reconnect log messages you could run with the log level linkerd=info,tower::ready_cache=debug,linkerd_reconnect=off,warn -- but it would be great if you could continue running with this proxy version so that we can hopefully hit one of these two cases.

@Wenliang-CHEN
Copy link
Contributor Author

Cool, I will keep it running. Maybe we just need a bit more time until the scenario gets triggered.

@olix0r
Copy link
Member

olix0r commented Jun 25, 2021

@Wenliang-CHEN have you seen any of these warnings over the past week?

@Wenliang-CHEN
Copy link
Contributor Author

Hello @olix0r , we have been running the service with the new proxy, the same log level for a week. There is no log related to "Pending service lost..." or "Ready service had no associated cancelation". And we observe no connection issue from that service any more.

Is it possible that the upgrade of the library somehow fixes/suppresses the issue?

@stale
Copy link

stale bot commented Sep 26, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Sep 26, 2021
@stale stale bot closed this as completed Oct 10, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 10, 2021
hawkw added a commit to linkerd/linkerd2-proxy that referenced this issue Jun 17, 2022
Tower [v0.4.13] includes a fix for a bug in the `tower::ready_cache`
module, tower-rs/tower#415. The `ready_cache` module is used internally
in Tower's load balancer. This bug resulted in panics in the proxy
(linkerd/linkerd2#8666, linkerd/linkerd2#6086) in cases where the
Destination service sends a very large number of service discovery
updates (see linkerd/linkerd2#8677).

This commit updates the proxy's dependency on `tower` to 0.4.13, to
ensure that this bugfix is picked up.

Fixes linkerd/linkerd2#8666
Fixes linkerd/linkerd2#6086

[v0.4.13]: https://github.com/tower-rs/tower/releases/tag/tower-0.4.13
olix0r pushed a commit to linkerd/linkerd2-proxy that referenced this issue Jun 17, 2022
Tower [v0.4.13] includes a fix for a bug in the `tower::ready_cache`
module, tower-rs/tower#415. The `ready_cache` module is used internally
in Tower's load balancer. This bug resulted in panics in the proxy
(linkerd/linkerd2#8666, linkerd/linkerd2#6086) in cases where the
Destination service sends a very large number of service discovery
updates (see linkerd/linkerd2#8677).

This commit updates the proxy's dependency on `tower` to 0.4.13, to
ensure that this bugfix is picked up.

Fixes linkerd/linkerd2#8666
Fixes linkerd/linkerd2#6086

[v0.4.13]: https://github.com/tower-rs/tower/releases/tag/tower-0.4.13
hawkw added a commit to linkerd/linkerd2-proxy that referenced this issue Jun 30, 2022
Tower [v0.4.13] includes a fix for a bug in the `tower::ready_cache`
module, tower-rs/tower#415. The `ready_cache` module is used internally
in Tower's load balancer. This bug resulted in panics in the proxy
(linkerd/linkerd2#8666, linkerd/linkerd2#6086) in cases where the
Destination service sends a very large number of service discovery
updates (see linkerd/linkerd2#8677).

This commit updates the proxy's dependency on `tower` to 0.4.13, to
ensure that this bugfix is picked up.

Fixes linkerd/linkerd2#8666
Fixes linkerd/linkerd2#6086

[v0.4.13]: https://github.com/tower-rs/tower/releases/tag/tower-0.4.13
hawkw added a commit to linkerd/linkerd2-proxy that referenced this issue Jun 30, 2022
Tower [v0.4.13] includes a fix for a bug in the `tower::ready_cache`
module, tower-rs/tower#415. The `ready_cache` module is used internally
in Tower's load balancer. This bug resulted in panics in the proxy
(linkerd/linkerd2#8666, linkerd/linkerd2#6086) in cases where the
Destination service sends a very large number of service discovery
updates (see linkerd/linkerd2#8677).

This commit updates the proxy's dependency on `tower` to 0.4.13, to
ensure that this bugfix is picked up.

Fixes linkerd/linkerd2#8666
Fixes linkerd/linkerd2#6086

[v0.4.13]: https://github.com/tower-rs/tower/releases/tag/tower-0.4.13
hawkw added a commit to linkerd/linkerd2-proxy that referenced this issue Jun 30, 2022
Tower [v0.4.13] includes a fix for a bug in the `tower::ready_cache`
module, tower-rs/tower#415. The `ready_cache` module is used internally
in Tower's load balancer. This bug resulted in panics in the proxy
(linkerd/linkerd2#8666, linkerd/linkerd2#6086) in cases where the
Destination service sends a very large number of service discovery
updates (see linkerd/linkerd2#8677).

This commit updates the proxy's dependency on `tower` to 0.4.13, to
ensure that this bugfix is picked up.

Fixes linkerd/linkerd2#8666
Fixes linkerd/linkerd2#6086

[v0.4.13]: https://github.com/tower-rs/tower/releases/tag/tower-0.4.13
hawkw added a commit to linkerd/linkerd2-proxy that referenced this issue Jun 30, 2022
Tower [v0.4.13] includes a fix for a bug in the `tower::ready_cache`
module, tower-rs/tower#415. The `ready_cache` module is used internally
in Tower's load balancer. This bug resulted in panics in the proxy
(linkerd/linkerd2#8666, linkerd/linkerd2#6086) in cases where the
Destination service sends a very large number of service discovery
updates (see linkerd/linkerd2#8677).

This commit updates the proxy's dependency on `tower` to 0.4.13, to
ensure that this bugfix is picked up.

Fixes linkerd/linkerd2#8666
Fixes linkerd/linkerd2#6086

[v0.4.13]: https://github.com/tower-rs/tower/releases/tag/tower-0.4.13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants