Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't assume that c-ares won't retry failed writes in grpc_core::GrpcPolledFdWindows::SendVUDP #25726

Merged
merged 1 commit into from
Mar 16, 2021

Conversation

apolcyn
Copy link
Contributor

@apolcyn apolcyn commented Mar 16, 2021

Per conversation in #22555

cc @okravets

Looking at the trace provided in #22555 (comment), they interesting part is what happens to c-ares socket: 868. Filtering for that socket in the logs:

D0315 12:36:45.918000000  5828 grpc_ares_ev_driver_windows.cc:811] (c-ares resolver) fd:|c-ares socket: 868| created with params af:2 type:2 protocol:0
D0315 12:36:45.919000000  5828 grpc_ares_ev_driver_windows.cc:513] (c-ares resolver) fd:c-ares socket: 868 ConnectUDP
D0315 12:36:45.922000000  5828 grpc_ares_ev_driver_windows.cc:524] (c-ares resolver) fd:c-ares socket: 868 WSAConnect error code:|0| msg:|The operation completed successfully.
D0315 12:36:45.923000000  5828 grpc_ares_ev_driver_windows.cc:353] (c-ares resolver) fd:|c-ares socket: 868| SendV called connect_done_:1 wsa_connect_error_:0
D0315 12:36:45.924000000  5828 grpc_ares_ev_driver_windows.cc:377] (c-ares resolver) fd:|c-ares socket: 868| SendVUDP called
D0315 12:36:45.925000000  5828 grpc_ares_ev_driver_windows.cc:345] (c-ares resolver) fd:|c-ares socket: 868| SendWriteBuf WSASend buf.len:46 *bytes_sent_ptr:46 overlapped:0000000000000000 return:0 *wsa_error_code:0
D0315 12:36:45.929000000  5828 grpc_ares_ev_driver_windows.cc:353] (c-ares resolver) fd:|c-ares socket: 868| SendV called connect_done_:1 wsa_connect_error_:0
D0315 12:36:45.930000000  5828 grpc_ares_ev_driver_windows.cc:377] (c-ares resolver) fd:|c-ares socket: 868| SendVUDP called
D0315 12:36:45.931000000  5828 grpc_ares_ev_driver_windows.cc:345] (c-ares resolver) fd:|c-ares socket: 868| SendWriteBuf WSASend buf.len:46 *bytes_sent_ptr:46 overlapped:0000000000000000 return:0 *wsa_error_code:0
D0315 12:36:45.932000000  5828 grpc_ares_ev_driver.cc:392] (c-ares resolver) request:00000231EA8E7280 new fd: c-ares socket: 868
D0315 12:36:45.934000000  5828 grpc_ares_ev_driver.cc:407] (c-ares resolver) request:00000231EA8E7280 notify read on: c-ares socket: 868
D0315 12:36:45.942000000  5828 grpc_ares_ev_driver_windows.cc:168] (c-ares resolver) fd:|c-ares socket: 868| InnerContinueRegisterForOnReadableLocked wsa_connect_error_:0
D0315 12:36:45.944000000  5828 grpc_ares_ev_driver_windows.cc:189] (c-ares resolver) fd:|c-ares socket: 868| RegisterForOnReadableLocked WSARecvFrom error code:|997| msg:|Overlapped I/O operation is in progress.
D0315 12:36:46.879000000  4804 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:46.881000000  4804 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:47.880000000  1292 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:47.882000000  1292 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:48.878000000  4804 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:48.879000000  4804 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:49.878000000  1292 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:49.880000000  1292 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:50.867000000  4804 grpc_ares_ev_driver_windows.cc:651] (c-ares resolver) fd:|c-ares socket: 868| OnIocpReadable finishing. read buf length now:|96|
D0315 12:36:50.869000000  4804 grpc_ares_ev_driver.cc:313] (c-ares resolver) request:00000231EA8E7280 readable on c-ares socket: 868
D0315 12:36:50.870000000  4804 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:1 Current read buf length:|96|
D0315 12:36:50.872000000  4804 grpc_ares_ev_driver_windows.cc:353] (c-ares resolver) fd:|c-ares socket: 868| SendV called connect_done_:1 wsa_connect_error_:0
D0315 12:36:50.873000000  4804 grpc_ares_ev_driver_windows.cc:377] (c-ares resolver) fd:|c-ares socket: 868| SendVUDP called
D0315 12:36:50.875000000  4804 grpc_ares_ev_driver_windows.cc:345] (c-ares resolver) fd:|c-ares socket: 868| SendWriteBuf WSASend buf.len:46 *bytes_sent_ptr:0 overlapped:0000000000000000 return:-1 *wsa_error_code:10022
D0315 12:36:50.876000000  4804 grpc_ares_ev_driver_windows.cc:388] (c-ares resolver) fd:|c-ares socket: 868| SendVUDP SendWriteBuf error code:10022 msg:|An invalid argument was supplied.
D0315 12:36:50.897000000  4804 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|0|
D0315 12:36:50.900000000  4804 grpc_ares_ev_driver.cc:407] (c-ares resolver) request:00000231EA8E7280 notify read on: c-ares socket: 868
D0315 12:36:50.920000000  4804 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:50.922000000  4804 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:50.927000000  4804 grpc_ares_ev_driver_windows.cc:168] (c-ares resolver) fd:|c-ares socket: 868| InnerContinueRegisterForOnReadableLocked wsa_connect_error_:0
D0315 12:36:51.931000000  1292 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:51.933000000  1292 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:52.941000000 22148 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:52.942000000 22148 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:53.941000000  4804 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:53.943000000  4804 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:54.945000000  1292 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:54.947000000  1292 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:55.924000000 22148 grpc_ares_ev_driver_windows.cc:651] (c-ares resolver) fd:|c-ares socket: 868| OnIocpReadable finishing. read buf length now:|96|
D0315 12:36:55.925000000 22148 grpc_ares_ev_driver.cc:313] (c-ares resolver) request:00000231EA8E7280 readable on c-ares socket: 868
D0315 12:36:55.926000000 22148 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:1 Current read buf length:|96|
D0315 12:36:55.928000000 22148 grpc_ares_ev_driver_windows.cc:353] (c-ares resolver) fd:|c-ares socket: 868| SendV called connect_done_:1 wsa_connect_error_:0
D0315 12:36:55.929000000 22148 grpc_ares_ev_driver_windows.cc:377] (c-ares resolver) fd:|c-ares socket: 868| SendVUDP called
E0315 12:36:55.930000000 22148 grpc_ares_ev_driver_windows.cc:378] assertion failed: GRPC_SLICE_LENGTH(write_buf_) == 0

The main thing to note is that the c-ares library attempted a write on this UDP socket and got an "Invalid argument" error on that attempt (which can apparently happen on network issues). Then, later, the c-ares library attempted to write on the same socket, again.

The c-ares windows UDP socket handler currently assumes that if a UDP socket returns an error, that the c-ares library won't try to re-send data on that socket. However, looking closely at this codepath, c-ares won't necessarily do that. Thus, we are vulnerable to hitting the assertion error in here.

So the patch listed in #22555 (comment) looks like the right fix.

@apolcyn apolcyn added lang/core release notes: yes Indicates if PR needs to be in release notes labels Mar 16, 2021
@apolcyn apolcyn changed the title Attempt to fix https://github.com/grpc/grpc/issues/22555 Don't assume that no previous writes failed in grpc_core::GrpcPolledFdWindows::SendVUDP Mar 16, 2021
@apolcyn apolcyn changed the title Don't assume that no previous writes failed in grpc_core::GrpcPolledFdWindows::SendVUDP Don't assume that we won't retry failed writes in grpc_core::GrpcPolledFdWindows::SendVUDP Mar 16, 2021
@apolcyn apolcyn changed the title Don't assume that we won't retry failed writes in grpc_core::GrpcPolledFdWindows::SendVUDP Don't assume that c-ares won't retry failed writes in grpc_core::GrpcPolledFdWindows::SendVUDP Mar 16, 2021
@apolcyn apolcyn marked this pull request as ready for review March 16, 2021 19:21
@apolcyn apolcyn requested a review from markdroth as a code owner March 16, 2021 19:21
@apolcyn
Copy link
Contributor Author

apolcyn commented Mar 16, 2021

basic tests objc ios: #21858
interop cloud to cloud and cloud to prod: #25737
ios binary size diff: #25431

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang/core release notes: yes Indicates if PR needs to be in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants