Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Web] Support Grpc configuration in Web Client SDK #8048

Open
delaneyb opened this issue Feb 29, 2024 · 5 comments
Open

[Web] Support Grpc configuration in Web Client SDK #8048

delaneyb opened this issue Feb 29, 2024 · 5 comments

Comments

@delaneyb
Copy link

Operating System

Debian GNU/Linux 11 (bullseye) Linux 6.1.21-v8+

Browser Version

Node 16.20.0

Firebase SDK Version

10.8.1

Firebase SDK Product:

Auth, Firestore

Describe your project's tooling

esbuild ^0.19.5

Describe the problem

The Firebase JavaScript SDK when running in a Node.js environment (may also affect others) takes in excess of 15 minutes to detect a silently broken network connection (e.g. NAT entry erased or traffic blocking being applied) preventing Firestore listeners on the device from receiving updates and writes from reaching the server.

history.txt
grpc-logs.txt

Steps and code to reproduce issue

Recommended: Set environment flags GRPC_VERBOSITY=debug GRPC_TRACE=all for better observation of gRPC activity.

Launch Node program with at least one firestore listener.

Block internet traffic at the router or apply firewall rules blocking internet traffic to/from your device or over the specific connection to firestore backend to simulate silently killed network connection.

Observe how long it takes for a message such as the following to appear:

[2024-02-29T09:50:17.531Z]  @firebase/firestore: Firestore (10.3.1): GrpcConnection RPC 'Listen' stream 0x6ba5ef16 error. Code: 14 Message: 14 UNAVAILABLE: read ETIMEDOUT
[2024-02-29T09:50:17.531Z]  @firebase/firestore: Firestore (10.3.1): GrpcConnection RPC 'Listen' stream 0x40aaf17c error. Code: 14 Message: 14 UNAVAILABLE: read ECONNRESET

Or from gRPC:
D 2024-02-29T10:50:17.497Z | subchannel_call | [7] Node error event: message=read ECONNRESET code=ECONNRESET errno=Unknown system error -104 syscall=read

It appears, based on tcpdump, that the Firestore client doesn't even send keepalives to the backend server. The server sends keepalives every 45 seconds, but the client is either unable to or not configured to monitor for a certain number of missed keepalives before trying to reconnect.

In my test case, it took 32 minutes before the ECONNRESET was triggered by Node leading gRPC to start trying to reconnect.

Once the block is removed and the device is allowed to reconnect, even once the gRPC keepalives started flowing again, my Firestore listeners never started working again.

@delaneyb delaneyb added new A new issue that hasn't be categoirzed as question, bug or feature request question labels Feb 29, 2024
@jbalidiong jbalidiong added api: firestore needs-attention and removed api: auth new A new issue that hasn't be categoirzed as question, bug or feature request labels Feb 29, 2024
@cherylEnkidu cherylEnkidu self-assigned this Feb 29, 2024
@cherylEnkidu
Copy link
Contributor

Hi @delaneyb ,

If I understand correctly, there are two issues you described in the ticket:

  1. When network connection is broken, it takes too long for SDK to show error messages like the following:
@firebase/firestore: Firestore (10.3.1): GrpcConnection RPC 'Listen' stream 0x6ba5ef16 error. Code: 14 Message: 14 UNAVAILABLE: read ETIMEDOUT
  1. When network is back, Firestore listeners (onSnapshot) cannot receive latest snapshot.

Please let me know if I miss summarize anything.

@delaneyb
Copy link
Author

delaneyb commented Mar 1, 2024

Hi @cherylEnkidu,

That is correct.

In regards to 2., note the onError and onCompletion callbacks for the onSnapshot listener do not get called, so it is reasonable to expect it self-recovers and continues working once the network connection is reestablished.

Also, it is important to distinguish between scenarios where the socket is closed and the application is notified of this, and the connection just silently going dead due to some intervening infrastructure or software:

  • If I use gdb to close the connection as if it was closed from the other end (leading to @firebase/firestore: Firestore (10.8.1): GrpcConnection RPC 'Listen' stream 0x66551d64 error. Code: 1 Message: 1 CANCELLED: Call cancelled, the SDK or gRPC seems to immediately reestablish a new connection and everything continues functioning as normal (sudo gdb -p $(pgrep node) and then use call (int)shutdown(46, 0) followed by c to resume the program, replacing 46 with the fd of the connection to firestore backend which can be found via sudo lsof -nP -iTCP:443 -a -c node)
  • If we instead add a firewall rule on the router causing it to start dropping packets to and from the firestore backend whilst the socket remains open and functioning as far as the SDK is concerned this is where the problems occur. In this scenario I observed it taking 32 minutes just for node to fire the ECONNRESET, and we are dealing with a situation we believe is being caused by the client's firewall where the code we are getting is instead ETIMEDOUT, but again the point being again we have confirmed instances of this occurring as long as 15 minutes after the connection actually stopped working.

@cherylEnkidu
Copy link
Contributor

Hi @delaneyb ,

I consult our grpc team, their suggestions as the following:

Sometimes the client doesn't see connections drop, for whatever reason. The gRPC keepalive functionality can help here.
It can be configured using the client construction options grpc.keepalive_time_ms, grpc.keepalive_timeout_ms, grpc.keepalive_permit_without_calls.

When configured, the client will wait an amount of time equal to the grpc.keepalive_time_ms parameter, then send a ping. If it doesn't get a response within the grpc.keepalive_timeout_ms, it will consider the connection closed. If grpc.keepalive_permit_without_calls is set to 1, it will do this even if there are no streams active.

@delaneyb
Copy link
Author

delaneyb commented Mar 9, 2024

Hi @cherylEnkidu,

I have found related issues googleapis/nodejs-firestore#791 and googleapis/nodejs-firestore#1057 in nodejs-firestore, however firebase/firebase-js-sdk does not seem to expose a new Firestore() constructor where we can pass in grpc settings.

Using @google-cloud/firestore is not viable because it requires IAM/admin service accounts, which we do not want on devices running the Node.js program.

@cherylEnkidu
Copy link
Contributor

Hi @delaneyb ,

Unfortunately Web Client SDK doesn't have a way to config grpc settings via Firestore yet. I will make this ticket as a feature request and track the ticket(b/329681553). Thank you for your reporting again!

@cherylEnkidu cherylEnkidu changed the title Excessive time to detect firestore connectivity loss & failure to reestablish listeners once reconnected [Web] Support Grpc configuration in Web Client SDK Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants