Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various Firebase functions started reporting crash: Maximum call stack size exceeded #1539

Closed
bhr opened this issue Mar 16, 2024 · 10 comments
Closed

Comments

@bhr
Copy link

bhr commented Mar 16, 2024

Various Firebase functions started reporting the crash: Maximum call stack size exceeded.
The issue started occurring after we deployed Firebase cloud functions to the production environment on March 9 2024 at ~1:30 PM GMT. The last deployment before the issue started appearing was Mar 1 2024 at 12:00 GMT.

Affected Firebase functions are triggered by Cloud Firestore or PubSub, not https.

Related issues

#1527 , but related to https functions

[REQUIRED] Version info

node:
20

firebase-functions:
firebase-functions@npm:4.6.0
reproducing on firebase-functions@npm:4.8.0 too

firebase-tools:
13.4.1

firebase-admin:
firebase-admin@npm:12.0.0

[REQUIRED] Test case

A simple cloud function that sets a value on the document that's changed. The issue appears on many cloud functions that perform Firestore update operations.

import { DocumentSnapshot, Timestamp } from 'firebase-admin/firestore';
import { Change, logger } from 'firebase-functions/v1';

export const processChangedDoc = (change: Change<DocumentSnapshot>) => {
  const afterSnapshot: DocumentSnapshot<AnyDocData> = change.after;
  const { updateTime } = afterSnapshot;
  if (!afterSnapshot?.exists || !updateTime) {
    // We have a delete event
    logger.debug('doc has no update time');
    return Promise.resolve();
  }

  const afterDocData = afterSnapshot.data();
  const beforeSnapshot: DocumentSnapshot<AnyDocData> = change.before;
  const beforeDocData = beforeSnapshot.data();

  const lastDateUpdated = afterDocData ? afterDocData.xDocDateUpdated : undefined;

  // We skip setting dateUpdated only if the last update was less than one hour ago
  const skipUpdate = !!lastDateUpdated && updateTime.seconds < lastDateUpdated.seconds + ONE_HOUR_SECONDS;

  if (skipUpdate) {
    logger.debug(`Skip update: lastUpdate: ${lastDateUpdated.toDate().toISOString()}, updateTime: ${updateTime.toDate().toISOString()}`);
    return Promise.resolve();
  }

  const docChange = {} as any;
  docChange[DocDateUpdatedKey] = updateTime;
  logger.debug(`doc update: before ${beforeDocData?.xDocDateUpdated?.toDate().toISOString()}, after ${updateTime.toDate().toISOString()},`);
  return afterSnapshot.ref.update(docChange).catch((err) => console.error(err));
};

export default region('europe-west1')
  .firestore.document('{collection}/{docId}')
  .onWrite((change) => {
    return processChangedDoc(change);
  });

[REQUIRED] Steps to reproduce

Deploy FCF. After a few hours, errors start appearing in Google Error Reporting. This only happens in the production environment where there's more activity (e.g. 600k invocations per 24hrs).

[REQUIRED] Expected behavior

FCF should not crash.

[REQUIRED] Actual behavior

Error: RangeError: Maximum call stack size exceeded
    at write (/workspace/node_modules/firebase-functions/lib/logger/index.js:66:74)
    at Object.debug (/workspace/node_modules/firebase-functions/lib/logger/index.js:76:5)
    at processChangedDoc (/workspace/lib/services/processChangedDocument.js:38:17)
    at /workspace/lib/api/db/all/levelOne.js:8:59
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
Error: RangeError: Maximum call stack size exceeded
    at get parent [as parent] (/workspace/node_modules/@google-cloud/firestore/build/src/reference.js:216:15)
    at processChangedDoc (/workspace/lib/services/processChangedDocument.js:17:73)
    at /workspace/lib/api/db/all/levelOne.js:8:59
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at entryFromArgs (/workspace/node_modules/firebase-functions/lib/logger/index.js:130:19)
    at Object.error (/workspace/node_modules/firebase-functions/lib/logger/index.js:116:11)
    at console.error (/workspace/lib/services/logging.js:15:28)
    at sendCrashResponse (/layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/logger.js:27:17)
    at sendResponse (/layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/invoker.js:37:40)
    at /layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/function_wrappers.js:34:40
    at bound (node:domain:432:15)
    at runBound (node:domain:443:12)
    at /layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/function_wrappers.js:142:60
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) 

Were you able to successfully deploy your functions?

Yes

No

@google-oss-bot
Copy link
Collaborator

I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.

@MattSkala
Copy link

We started experiencing the same errors today across our Firestore triggered functions, both with functions v4.7.0 and 4.8.0.

@jttaynton-chattr
Copy link

We started experiencing these issues intermittently as well on production after a deployment this past Friday afternoon. The changes that were deployed were fairly minor and we've had a tough time trying to isolate the cause. The errors are occurring in multiple trigger functions with no obvious reasoning. The errors don't seem to line up with actions that are actually occurring within the function thus the confusion in attempting to debug.

@inlined
Copy link
Member

inlined commented Mar 19, 2024

This seems related to a new feature released about a week ago. Downgrading the Firebase Functions SDK can give you a workaround. To help me diagnose:

  1. Are you actually using the feature (the new onInit callback)?
  2. Multiple users are talking about Firestore functions. Are they all Firestore 1st gen functions?

@jttaynton-chattr
Copy link

jttaynton-chattr commented Mar 19, 2024

@inlined We are not using onInit callback. We're using firebase functions 4.6.0 right now. We thought we had a fix on our side but the errors are still occurring and the places they are being thrown from don't seem to make much sense. Our errors are occurring from event trigger v1 functions that had been running fine for 2+ years.

@jttaynton-chattr
Copy link

jttaynton-chattr commented Mar 19, 2024

@inlined Is there a particular version of firebase functions that is safe from this error/issue?

I'm curious as to why we are seeing these errors yet we have version 4.6 in our package.json

@bhr
Copy link
Author

bhr commented Mar 19, 2024

This seems related to a new feature released about a week ago. Downgrading the Firebase Functions SDK can give you a workaround. To help me diagnose:

  1. Are you actually using the feature (the new onInit callback)?
  2. Multiple users are talking about Firestore functions. Are they all Firestore 1st gen functions?

@inlined thanks for looking into this!

  1. Not using onInit callback.
  2. All FCFs are 1st gen.

@jttaynton-chattr
Copy link

@bhr We installed the latest 4.8.1 just released and deployed one of our problem functions with that version. Short sample size but no new errors in about 35 minutes and the execution time of the trigger function has dropped back to where it was prior to issues and is staying there.

@bhr
Copy link
Author

bhr commented Mar 19, 2024

@jttaynton-chattr Thanks for the hint! Upgrading to 4.8.1 resolved the issue indeed. Execution time went down to level where it was supposed to be. No error's been thrown in the past 30 minutes

Screenshot 2024-03-19 at 9 56 57 PM

@inlined
Copy link
Member

inlined commented Mar 20, 2024

Thanks for your patience. We added a new onInit lifecycle callback and internally call withInit to wrap your handler with a decorator that ensures the init callback has been called. The 1st gen method for doing this accidentally called handler = withInit(handler) inside a lamda that is called as boilerplate; this lambda was capturing the wrapped handler and wrapping it again repeatedly. Moving the withInit call outside the lambda ensured that it only gets called once.

We're very sorry for this issue and how stressful it may have been. This error slipped through our testing since it only becomes a problem at notable scale. The good news is that unless you manually turned off retries for your functions, everything would have been successfully retried.

@inlined inlined closed this as completed Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants