Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intermittent downtimes after preferRest enabled #1823

Closed
omar-dev-amrk opened this issue Feb 3, 2023 · 8 comments
Closed

intermittent downtimes after preferRest enabled #1823

omar-dev-amrk opened this issue Feb 3, 2023 · 8 comments
Assignees
Labels
api: firestore Issues related to the googleapis/nodejs-firestore API. needs more info This issue needs more information from the customer to proceed. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@omar-dev-amrk
Copy link

omar-dev-amrk commented Feb 3, 2023

  1. Is this a client library issue or a product issue?
    Client

  2. Did someone already solve this?
    No

  3. Do you have a support contract?
    No

Environment details

  • OS: Ubuntu 20.04
  • Node.js version: 14.20.1
  • npm version: 8.15.0
  • @google-cloud/firestore version: 6.4.0 (through firebase-admin@11.5.0)

Steps to reproduce

Same product was working fine for over a year, until we decided to update firebase dependencies to enable the preferRest option. Since yesterday, functions kept crashing with errors like:

request to https://firestore.googleapis.com/v1/projects/PROJECT/databases/(default)/documents:commit? failed, reason: Client network socket disconnected before secure TLS connection was established at /workspace/app/orders/makeOrderDone.function.js:126:25

and

TypeError: callback is not a function
at /workspace/node_modules/google-gax/build/src/fallbackServiceStub.js:119:33
Function execution took 564 ms, finished with status: 'connection error'
Process exited with code 16 at process.<anonymous> (/layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/invoker.js:92:22) at process.emit (events.js:400:28) at process.emit (domain.js:475:12) at process.exit (internal/process/per_thread.js:179:15) at sendCrashResponse (/layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/logger.js:44:9) at process.<anonymous> (/layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/invoker.js:88:44) at process.emit (events.js:400:28) at process.emit (domain.js:475:12) at processPromiseRejections (internal/process/promises.js:245:33) at processTicksAndRejections (internal/process/task_queues.js:96:32)
 Function execution took 228 ms, finished with status: 'crash'
Error: A backoff operation is already in progress. at QueryWatch.<anonymous> (/workspace/app/orders/makeOrderDone.function.js:179:27) at QueryWatch.closeStream (/workspace/node_modules/@google-cloud/firestore/build/src/watch.js:226:18) at /workspace/node_modules/@google-cloud/firestore/build/src/watch.js:335:18 at runMicrotasks (<anonymous>) at processTicksAndRejections (internal/process/task_queues.js:95:5) at runNextTicks (internal/process/task_queues.js:64:3) at listOnTimeout (internal/timers.js:526:9) at processTimers (internal/timers.js:500:7)

For each function, after a while of errors, things become stable and actually the preferRest option starts working providing the speed we have always lacked. But then later all of a sudden, errors come back again.
Please note that we have a minimum instances setting set to 2 for each of these functions.

Of course for a production app this is unacceptable, so we had to disable preferRest and see if things stabilize. And they did.
But today I see different errors like:

Exception from a finished function: Error: Internal errors with Error: Error 13: An internal error occurred.
Error: A backoff operation is already in progress. at QueryWatch.<anonymous> (/workspace/app/orders/makeAllOrderDone.function.js:221:35) at QueryWatch.closeStream (/workspace/node_modules/@google-cloud/firestore/build/src/watch.js:226:18) at /workspace/node_modules/@google-cloud/firestore/build/src/watch.js:335:18 at runMicrotasks (<anonymous>) at processTicksAndRejections (internal/process/task_queues.js:95:5)
Error: Exceeded maximum number of retries allowed. at QueryWatch.<anonymous> (/workspace/app/orders/makeOrderDone.function.js:179:27) at QueryWatch.closeStream (/workspace/node_modules/@google-cloud/firestore/build/src/watch.js:226:18) at /workspace/node_modules/@google-cloud/firestore/build/src/watch.js:335:18 at runMicrotasks (<anonymous>) at processTicksAndRejections (internal/process/task_queues.js:95:5)

Error: Process exited with code 16 at process.<anonymous> (/layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/invoker.js:92:22) at process.emit (events.js:400:28) at process.emit (domain.js:475:12) at process.exit (internal/process/per_thread.js:179:15) at sendCrashResponse (/layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/logger.js:44:9) at process.<anonymous> (/layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/invoker.js:88:44) at process.emit (events.js:400:28) at process.emit (domain.js:475:12) at processPromiseRejections (internal/process/promises.js:245:33) at processTicksAndRejections (internal/process/task_queues.js:96:32)

Other details

firebase/firebase-admin-node#1901 (comment)

@omar-dev-amrk omar-dev-amrk added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Feb 3, 2023
@product-auto-label product-auto-label bot added the api: firestore Issues related to the googleapis/nodejs-firestore API. label Feb 3, 2023
@cherylEnkidu cherylEnkidu self-assigned this Feb 6, 2023
@MarkDuckworth MarkDuckworth self-assigned this Feb 6, 2023
@MarkDuckworth
Copy link
Contributor

Hi @omar-dev-amrk, I have a few questions for you that will help us resolve this.

  1. Are you using snapshot listeners? I.e., are you using onSnapshot(...) in your code?
  2. Are you using a proxy or vpn? If so, I found a similar error report on another googleapis SDK that may have a solution: https://stackoverflow.com/questions/53593182/client-network-socket-disconnected-before-secure-tls-connection-was-established
  3. You stated "so we had to disable preferRest and see if things stabilize. And they did. But today I see different errors like:..." Can you clarify, the last 4 errors you reported are occurring after setting preferRest = false?

Thanks!
Mark

@MarkDuckworth MarkDuckworth added the needs more info This issue needs more information from the customer to proceed. label Feb 6, 2023
@omar-dev-amrk
Copy link
Author

Hey @MarkDuckworth

(1+3) I have deployed the preferRest: true two times, each time it was a different set of functions. First set had onSnapshot and caused all errors above. The second set didn't have any "onSnapshot" at all(I literally checked to make sure yesterday), but still caused the 3 errors aside from the first one.

2- Not at all. I checked with the team now.

This error was logged yesterday(no onSnapshot ):

FetchError: request to https://firestore.googleapis.com/v1/projects/PROJECT/databases/(default)/documents:commit? failed, reason: socket hang up at /workspace/app/orders/addOrder.function.js:847:41 at runNextTicks (internal/process/task_queues.js:60:5) at processImmediate (internal/timers.js:437:9) at process.topLevelDomainCallback (domain.js:152:15) at process.callbackTrampoline (internal/async_hooks.js:128:24)

Let me know if you need anything else.

@MarkDuckworth
Copy link
Contributor

Thanks for the answers. Unfortunately, we may still need additional information to help us understand the issue. This can be a big ask, but are you able to create a minimal reproduce-able example demonstrating the issue? This is the cleanest way for us to understand how your code is configured, deployed, and what it's doing to produce this error.

@omar-dev-amrk
Copy link
Author

@MarkDuckworth I assume this means a new project, firestore, function with preferRest enabled and the least amount of code possible to re-produce the issue, correct?
It's not possible for me these two days, so if not Thursday, maybe next Sunday(Friday and Saturday holiday). Would that work?

@iAMkVIN-S
Copy link

We recently had to start using preferRest after having issues #1814 without it where firebase-admin would crash very often.

We made the changes on February 1st and have not crashed since. However, tonight we received reports of crashing happening again, and upon looking through the logs it seems like we have the same error as posted by @omar-dev-amrk :

Mar 08 22:38:35 /app/node_modules/google-gax/build/src/fallbackServiceStub.js:119
Mar 08 22:38:35 .catch((err) => callback(err));
Mar 08 22:38:35 ^
Mar 08 22:38:35
Mar 08 22:38:35 TypeError: callback is not a function
Mar 08 22:38:35 at /app/node_modules/google-gax/build/src/fallbackServiceStub.js:119:33
Mar 08 22:38:35 at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Mar 08 22:38:35
Mar 08 22:38:35 Node.js v18.14.2
Mar 08 22:38:37 running on port 3002

This is super crucial and crashing often, please advise ASAP

@MarkDuckworth
Copy link
Contributor

@iAMkVIN-S, are you also seeing the "Client network socket disconnected" and "socket hang up" reported by @omar-dev-amrk? What frequency is this issue occurring in your application now? If you have any code that may help us reproduce the issue, that may speed up a resolution here.

@iAMkVIN-S
Copy link

@MarkDuckworth It looks like google-gax has released a fix around the issue I reported yesterday in this issue

@MarkDuckworth
Copy link
Contributor

@iAMkVIN-S and @omar-dev-amrk, please let us know if the update to google-gax 3.5.8 resolves the issue for you. If you continue to have connectivity issues after updating, can you send us SDK log output? SDK logging can be enabled with setLogFunction(logger).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: firestore Issues related to the googleapis/nodejs-firestore API. needs more info This issue needs more information from the customer to proceed. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

4 participants