Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigation: grpc WARN logs "addrConn.createTransport failed to connect to..." #17842

Open
kisunji opened this issue Jun 22, 2023 · 3 comments
Labels
theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics type/bug Feature does not function as expected

Comments

@kisunji
Copy link
Contributor

kisunji commented Jun 22, 2023

Overview of the Issue

#10603 has been addressed by #15701 but users are still reporting seeing WARN logs such as

2023-02-06T05:50:55.844+0100 [WARN]  agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.192.0.5:8300 consul-3 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.192.0.5:8300: operation was canceled". Reconnecting...
2023-02-06T05:57:43.781+0100 [WARN]  agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.192.0.3:8300 consul-1.dc1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.192.0.3:8300: operation was canceled". Reconnecting...
2023-02-06T05:58:10.466+0100 [WARN]  agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.192.0.4:8300 consul-2.dc1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.192.0.4:8300: operation was canceled". Reconnecting...
2023-02-06T06:50:09.474+0100 [WARN]  agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.192.0.4:8300 consul-2.dc1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.192.0.4:8300: operation was canceled". Reconnecting...
2023-02-06T07:13:16.970+0100 [WARN]  agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.192.0.4:8300 consul-2.dc1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.192.0.4:8300: operation was canceled". Reconnecting...

These logs are, as far as the reports show, not as regular as #10603 and likely require separate investigation.

It is worth noting that #15892 has completely replaced Consul's grpc balancer internals. The WARN logs containing operation was canceled are likely (based on previous investigation) caused by updating a subconnection in quick succession, causing a CONNECTING subconnection to close.

These WARN logs should have no impact on Consul; they are a side-effect of periodic server-shuffling and error-handling. It is still inconvenient for operators to deal with WARN-level logs which cannot be easily suppressed.

Reproduction Steps

If you see these WARN logs frequently, please provide DEBUG logs for a single server agent (aggregated logs make it difficult to parse).

@kisunji kisunji added type/bug Feature does not function as expected theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics labels Jun 22, 2023
@rsommer
Copy link

rsommer commented Jul 8, 2023

We discovered the same behaviour a while ago and finally this has gone away after upgrading all consul agents (servers and clients) to 1.16. All nodes with older agents showed the described warnings.

@shilpakarthik
Copy link

@rsommer : Can u pls confirm which exact version of consul you have verified the above warnings since we are also getting similar production environment .ALso as per above comment it doesn seem to affect functionality of consul . As per these were u getting any issues ?

@rsommer
Copy link

rsommer commented Dec 7, 2023

@shilpakarthik The last active version before upgrading to 1.16.0 was 1.15.2. I don't recall any real problems besides having these messages filling up the logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

3 participants