Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Critical grpc failure when doing large operation #1805

Closed
samborambo305 opened this issue Dec 15, 2022 · 4 comments
Closed

Critical grpc failure when doing large operation #1805

samborambo305 opened this issue Dec 15, 2022 · 4 comments
Assignees
Labels
api: firestore Issues related to the googleapis/nodejs-firestore API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@samborambo305
Copy link

Environment details

  • OS: Whatever OS is used in the node docker image
  • Node.js version: Whatever version is used in the node docker image
  • npm version: 8.19.1
  • @google-cloud/firestore version: 6.4.1

Steps to reproduce

Seems to be related to this: grpc/grpc-node#1769

On my production server, I am running high throughput code and my kubernetes pod seems to randomly die out of nowhere. There is no indication or error that anything occurred. While trying to run that same code locally, I get the error shown below. I tried to see if any high throughput operation would reproduce the error and I found that the sample code below did.

const { docs } = await firestore.collection('InventoryItems')
  .where('UserId', '==', userId)
  .select().get();
await Promise.all(docs.map(async (doc) => firestore.doc(`InventoryItems/${doc.id}`).delete()));

However, this code produced an error that I was able to catch. On production, the error seems to be silent, and doesn't get caught in try/catch blocks.

Error: 13 INTERNAL: Received RST_STREAM with code 2 triggered by internal client error: 80BA10EC01000000:error:0A0003FC:SSL routines:ssl3_read_bytes:sslv3 alert bad record mac:../deps/openssl/openssl/ssl/record/rec_layer_s3.c:1584:SSL alert number 20

    at Object.callErrorFromStatus (/Users/jakeleventhal/Code/Ecominate/node_modules/@grpc/grpc-js/src/call.ts:81:17)
    at Object.onReceiveStatus (/Users/jakeleventhal/Code/Ecominate/node_modules/@grpc/grpc-js/src/client.ts:358:36)
    at Object.onReceiveStatus (/Users/jakeleventhal/Code/Ecominate/node_modules/@grpc/grpc-js/src/client-interceptors.ts:462:34)
    at Object.onReceiveStatus (/Users/jakeleventhal/Code/Ecominate/node_modules/@grpc/grpc-js/src/client-interceptors.ts:424:48)
    at /Users/jakeleventhal/Code/Ecominate/node_modules/@grpc/grpc-js/src/call-stream.ts:330:24
    at processTicksAndRejections (node:internal/process/task_queues:77:11)
for call at
    at ServiceClientImpl.makeUnaryRequest (/Users/jakeleventhal/Code/Ecominate/node_modules/@grpc/grpc-js/src/client.ts:328:30)
    at ServiceClientImpl.<anonymous> (/Users/jakeleventhal/Code/Ecominate/node_modules/@grpc/grpc-js/src/make-client.ts:189:15)
    at /Users/jakeleventhal/Code/Ecominate/node_modules/@google-cloud/firestore/build/src/v1/firestore_client.js:225:29
    at /Users/jakeleventhal/Code/Ecominate/node_modules/google-gax/src/normalCalls/timeout.ts:54:13
    at repeat (/Users/jakeleventhal/Code/Ecominate/node_modules/google-gax/src/normalCalls/retries.ts:104:19)
    at /Users/jakeleventhal/Code/Ecominate/node_modules/google-gax/src/normalCalls/retries.ts:144:7
    at OngoingCallPromise.call (/Users/jakeleventhal/Code/Ecominate/node_modules/google-gax/src/call.ts:81:23)
    at NormalApiCaller.call (/Users/jakeleventhal/Code/Ecominate/node_modules/google-gax/src/normalCalls/normalApiCaller.ts:43:15)
    at /Users/jakeleventhal/Code/Ecominate/node_modules/google-gax/src/createApiCall.ts:118:26
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
Caused by: Error: 
    at WriteBatch.commit (/Users/jakeleventhal/Code/Ecominate/node_modules/@google-cloud/firestore/build/src/write-batch.js:433:23)
    at DocumentReference.delete (/Users/jakeleventhal/Code/Ecominate/node_modules/@google-cloud/firestore/build/src/reference.js:351:14)
    at /Users/jakeleventhal/Code/Ecominate/apps/onboarder/src/onboarder.ts:46:97
    at step (/Users/jakeleventhal/Code/Ecominate/apps/onboarder/src/onboarder.ts:33:23)
    at Object.next (/Users/jakeleventhal/Code/Ecominate/apps/onboarder/src/onboarder.ts:14:53)
    at /Users/jakeleventhal/Code/Ecominate/apps/onboarder/src/onboarder.ts:8:71
    at new Promise (<anonymous>)
    at __awaiter (/Users/jakeleventhal/Code/Ecominate/apps/onboarder/src/onboarder.ts:4:12)
    at /Users/jakeleventhal/Code/Ecominate/apps/onboarder/src/onboarder.ts:46:44
    at Array.map (<anonymous>)
    at /Users/jakeleventhal/Code/Ecominate/apps/onboarder/src/onboarder.ts:46:30
    at step (/Users/jakeleventhal/Code/Ecominate/apps/onboarder/src/onboarder.ts:33:23)
    at Object.next (/Users/jakeleventhal/Code/Ecominate/apps/onboarder/src/onboarder.ts:14:53)
    at fulfilled (/Users/jakeleventhal/Code/Ecominate/apps/onboarder/src/onboarder.ts:5:58)
    at processTicksAndRejections (node:internal/process/task_queues:95:5) {
  code: 13,
  details: 'Received RST_STREAM with code 2 triggered by internal client error: 80BA10EC01000000:error:0A0003FC:SSL routines:ssl3_read_bytes:sslv3 alert bad record mac:../deps/openssl/openssl/ssl/record/rec_layer_s3.c:1584:SSL alert number 20\n',
  metadata: Metadata { internalRepr: Map(0) {}, options: {} },
  note: 'Exception occurred in retry method that was not classified as transient'
}
```”
@samborambo305 samborambo305 added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Dec 15, 2022
@product-auto-label product-auto-label bot added the api: firestore Issues related to the googleapis/nodejs-firestore API. label Dec 15, 2022
@ehsannas ehsannas self-assigned this Dec 19, 2022
@ehsannas
Copy link
Contributor

Hi @samborambo305 , Thanks for reporting.

There are a few things to consider:

Updates to a single document
As you design your app, consider how quickly your app updates single documents. The best way to characterize your workload's performance is to perform load testing. The exact maximum rate that an app can update a single document depends highly on the workload. Factors include the write rate, contention among requests, and the number affected indexes.
A document write operation updates the document and any associated indexes, and Firestore synchronously applies the write operation across a quorum of replicas. At high enough write rates, the database will start to encounter contention, higher latency, or other errors.

This issue is known as hotspotting, and your application can experience hotspotting if it does any of the following:

  • Creates new documents at a very high rate and allocates its own monotonically increasing IDs.
  • Firestore allocates document IDs using a scatter algorithm. You should not encounter hotspotting on writes if you create new documents using automatic document IDs.
  • Creates new documents at a high rate in a collection with few documents.
  • Creates new documents with a monotonically increasing field, like a timestamp, at a very high rate.
  • Deletes documents in a collection at a high rate.
  • Writes to the database at a very high rate without gradually increasing traffic.
  • You should also consider making a batch (each batch can perform up to 500 operations). Something like
batch.delete(doc1);
batch.delete(doc2);
// ...
batch.delete(doc100);
await batch.commit();

See: https://firebase.google.com/docs/firestore/manage-data/transactions#batched-writes

@samborambo305
Copy link
Author

samborambo305 commented Dec 20, 2022

due to the fact that firestore doesn’t support infinitely sized atomic updates, i use the class in the attached file. essentially I am only using batch updates. this should address all the concerns for hot-spotting, throttling, etc.

the way i am using the class in my code that experiences the issue is like this

const batchUpdates = {};

LARGE_LIST.forEach((item) => {
  // preprocessing
  

  BatchUpdates.set(`InventoryItems/${item.id}`, item);
})

await BatchUpdates.commit();

I also ensured i am on the latest @google-cloud/firstore and @grpc/grpc-js

BatchUpdates.ts.zip

@samborambo305
Copy link
Author

I will be forced to move away from Firestore because of this issue.

@ehsannas
Copy link
Contributor

Closing this issue as it's a duplicate of #1023 and raising the issue internally again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: firestore Issues related to the googleapis/nodejs-firestore API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

2 participants