You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What operating system (Linux, Windows, …) and version?
➜ ~ uname -a
Linux wenchang 5.19.0-1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.19.6-1 (2022-09-01) x86_64 GNU/Linux
What did you do?
The issue is shown in etcd-io/etcd#14487, which is to try to verify the gRPC reconnect functionality for the ETCD lease client's cache.
I create a repo https://github.com/fuweid/etcd-issue-14487 to reproduce the issue, which is more easier to understand.
In the testcase, there is bridge server as proxy between gRPC client and gRPC server.
When gRPC client is filing unaryCall, the other goroutine keeps disconnecting the connection in a loop until the unaryCall succeeds.
gRPC client will connect to server only if the subConn is idle. If the transport.NewClientTransport return nil and the preface is received and then connection is closed, the (ac *addrConn) createTransport will have a chance to reach the select branch case <-connClosed.Done().
// https://github.com/grpc/grpc-go/blob/v1.47.0/clientconn.go#L1323// createTransport creates a connection to addr. It returns an error if the// address was not successfully connected, or updates ac appropriately with the// new transport.func (ac*addrConn) createTransport(addr resolver.Address, copts transport.ConnectOptions, connectDeadline time.Time) error {
...case<-connClosed.Done():
// The transport has already closed. If we received the preface, too,// this is not an error.select {
case<-prefaceReceived.Done():
returnnildefault:
returnerrors.New("connection closed before server preface received")
}
}
}
If so, the subConn will be stuck in CONNECTING. And if there is only one subConn, the gRPC client will be stuck in pick state.
It is about timing. In order to make it easier, I add the time.Sleep after transport.NewClientTransport in fuweid/etcd-issue-14487@a8c3e6f.
//
// Notice: This API is EXPERIMENTAL and may be changed or removed in a
// later release.
@@ -1276,6 +1276,8 @@ func (ac *addrConn) createTransport(addr resolver.Address, copts transport.Conne
return err
}
+ time.Sleep(10 * time.Millisecond)+
select {
case <-connectCtx.Done():
// We didn't get the preface in time.
@@ -1325,6 +1327,7 @@ func (ac *addrConn) createTransport(addr resolver.Address, copts transport.Conne
// this is not an error.
select {
case <-prefaceReceived.Done():
+ channelz.Warningf(logger, ac.channelzID, "grpc: addrConn.createTransport connClosed and prefaceReceived: oops")
return nil
default:
return errors.New("connection closed before server preface received")
The reproduce steps are
cd /tmp
git clone https://github.com/fuweid/etcd-issue-14487.git
cd etcd-issue-14487
go test --cpu=4 -p=2 -v -count=1 --timeout=3m --race=false ./
I think the select branch case <-connClosed.Done() should be like case <-prefaceReceived.Done(): branch which turns the state into IDLE if the connClosed.HasFired().
The text was updated successfully, but these errors were encountered:
The select branch will be selected randomly, if there are several ready
branches. If the preface has been received and then connection is
closed, the `createTransport` might hit the `connClosed.Done()` branch.
Ideally, the subConn should go idle and reconnect.
Fixes: grpc#5688
Signed-off-by: Wei Fu <fuweid89@gmail.com>
fuweid
added a commit
to fuweid/grpc-go
that referenced
this issue
Oct 14, 2022
The select branch will be selected randomly, if there are several ready
branches. If the preface has been received and then connection is
closed, the `createTransport` might hit the `connClosed.Done()` branch.
Ideally, the subConn should go idle and reconnect.
Fixes: grpc#5688
Signed-off-by: Wei Fu <fuweid89@gmail.com>
What version of gRPC are you using?
v1.47.0
What version of Go are you using (
go version
)?go1.19.1
What operating system (Linux, Windows, …) and version?
What did you do?
The issue is shown in etcd-io/etcd#14487, which is to try to verify the gRPC reconnect functionality for the ETCD lease client's cache.
I create a repo https://github.com/fuweid/etcd-issue-14487 to reproduce the issue, which is more easier to understand.
In the testcase, there is bridge server as proxy between gRPC client and gRPC server.
When gRPC client is filing unaryCall, the other goroutine keeps disconnecting the connection in a loop until the unaryCall succeeds.
gRPC client will connect to server only if the subConn is idle. If the
transport.NewClientTransport
return nil and thepreface
is received and then connection is closed, the(ac *addrConn) createTransport
will have a chance to reach the select branchcase <-connClosed.Done()
.If so, the subConn will be stuck in CONNECTING. And if there is only one subConn, the gRPC client will be stuck in pick state.
It is about timing. In order to make it easier, I add the
time.Sleep
aftertransport.NewClientTransport
in fuweid/etcd-issue-14487@a8c3e6f.The reproduce steps are
What did you expect to see?
The gRPC client should handle reconnect well.
What did you see instead?
The test will show the following log
And timeout
I think the select branch
case <-connClosed.Done()
should be likecase <-prefaceReceived.Done():
branch which turns the state intoIDLE
if theconnClosed.HasFired()
.The text was updated successfully, but these errors were encountered: