Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [firestore-bigquery-export] backfilling less than 300k docs took days and cost ~$200 USD #2003

Open
jjaklitsch opened this issue Mar 26, 2024 · 7 comments
Labels
extension: firestore-bigquery-export Related to firestore-bigquery-export extension type: bug Something isn't working

Comments

@jjaklitsch
Copy link

[READ] Step 1: Are you in the right place?

Issues filed here should be about bugs for a specific extension in this repository.
If you have a general question, need help debugging, or fall into some
other category use one of these other channels:

  • For general technical questions, post a question on StackOverflow
    with the firebase tag.
  • For general Firebase discussion, use the firebase-talk
    google group.
  • To file a bug against the Firebase Extensions platform, or for an issue affecting multiple extensions, please reach out to
    Firebase support directly.

[REQUIRED] Step 2: Describe your configuration

  • Extension name: firestore-bigquery-export
  • Extension version: 0.1.46
  • Configuration values (redact info where appropriate):
    Cloud Functions location
    redacted
    BigQuery Dataset location
    redacted
    BigQuery Project ID
    redacted
    Database ID
    (default)
    Collection path
    occasions
    Enable Wildcard Column field with Parent Firestore Document IDs (Optional)
    false
    Dataset ID
    firestore_raw_export
    Table ID
    occasions_v2
    BigQuery SQL table Time Partitioning option type (Optional)
    none
    BigQuery Time Partitioning column name (Optional)
    createdAt
    Firestore Document field name for BigQuery SQL Time Partitioning field option (Optional)
    createdAt
    BigQuery SQL Time Partitioning table schema field(column) type (Optional)
    TIMESTAMP
    BigQuery SQL table clustering (Optional)
    Parameter not set
    Maximum number of synced documents per second (Optional)
    100
    Backup Collection Name (Optional)
    Parameter not set
    Transform function URL (Optional)
    Parameter not set
    Use new query syntax for snapshots
    no
    Exclude old data payloads (Optional)
    no
    Import existing Firestore documents into BigQuery?
    yes
    Existing Documents Collection (Optional)
    occasions
    Use Collection Group query (Optional)
    no
    Docs per backfill
    200
    Cloud KMS key name (Optional)
    Parameter not set

[REQUIRED] Step 3: Describe the problem

Steps to reproduce:

We installed the suggestion and set the preference to import existing records. The firestore database we imported from had <250K records. While importing, we saw a massive spike in firestore reads up to 45 million per hour. Our typically read volume is <10K per hour. We incurred a cost of ~$200 just from running this import.

Expected result

Bigquery database is created with minimal impact on read volumes

Actual result

45 million firestore reads per hour. 120 million reads total in a few hours.

@jjaklitsch jjaklitsch added the type: bug Something isn't working label Mar 26, 2024
@jjaklitsch
Copy link
Author

Linking to the same bug someone else reported: #2000

@cabljac
Copy link
Contributor

cabljac commented Mar 26, 2024

Hey looking into this now, do you have any relevant cloud function logs/errors?

@cabljac
Copy link
Contributor

cabljac commented Mar 26, 2024

I believe this issue is caused by us using offset to paginate, I am working on an alternative approach.

@jjaklitsch
Copy link
Author

Yes, see attached for the logs.
firestore-export-logs.docx

When do you think you'll have a fix in?
Also, what's the process for requesting a credit?

@jjaklitsch
Copy link
Author

jjaklitsch commented Apr 2, 2024 via email

@pr-Mais
Copy link
Member

pr-Mais commented Apr 2, 2024

You can still use the extension for streaming, this issue only affects backfilling which we disabled for now. Another solution that can backfill your existing data is to use the import script, which you can run locally.

You can reach out to Firebase support on this link.

@huangjeff5
Copy link
Collaborator

Hi, software engineer from Firebase here.

Just wanted to chime in on this issue, we have turned off backfill so if you use the latest version you won't run into the issue, and as Mais explained above, the import script is the temporary work around.

That being said we are actively working on reworking the backfill implementation such that offset isn't used. Will follow up when that is pushed out.

@pr-Mais pr-Mais added the extension: firestore-bigquery-export Related to firestore-bigquery-export extension label May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension: firestore-bigquery-export Related to firestore-bigquery-export extension type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants