firestore-bigquery-export: typed arrays for schema views #293

iislucas · 2020-04-28T21:30:38Z

[REQUIRED] Step 2: Extension name

This feature request is for extension: firestore-bigquery-export, and in particular for the GENERATE_SCHEMA_VIEWS.md

What feature would you like to see?

Support for arrays of different types. For example, some of the arrays in our project have maps inside them. A natural way to handle this would be to create a value field for each of the keys in the map. Even now, it's a bit strange that the array type doesn't specify if it's a string, number, boolean etc.

How would you use it?

To import and use bigquery on firestore objects that have arrays of nested maps.

The text was updated successfully, but these errors were encountered:

russellwheatley · 2020-04-29T14:35:44Z

Fair enough, could be a Feature Request candidate. Thanks for the suggestion.

i14h · 2020-09-28T22:03:09Z

@fredzqm @IanWyszynski since you wrote the original version of this extension, what do you think about this feature request?

IchordeDionysos · 2020-11-20T01:27:00Z

Also, arrays are a bit weird how they transform values...

Strings will look like this in the database: name
Strings in Arrays will look like this in the database: "name"

fredzqm · 2020-11-20T19:06:32Z

This would be a very cool addition.
There is no reason this isn't built other than we didn't have time to.

As a reminder, the schema generator just create view with type-safe queries for us. For each individual use case, you can craft that query as well to fit your need.
It is intended to cover enough common cases to help developers get started but does not intend to be covering all use cases. Supporting common asks like "schema within an array" would be a great addition.

See SQL fixtures for example of query generated.
This is the entry point of the schema SQL builder if anyone wants to contribute.

gregfenton · 2020-11-30T05:12:00Z

Does anyone have an example of a schema (or view query) for an array of a simple map? Something along the lines of:

{
    "orderName": "test order",
    "cartItems": [
      { "productName": "crayon", "quantity": 23, "isGift": false},
      { "productName": "glue", "quantity": 1, "isGift": true}
    ]
}

nwparker · 2020-11-30T05:17:20Z

Does anyone have an example of a schema (or view query) for an array of a simple map? Something along the lines of:
{
    "orderName": "test order",
    "cartItems": [
      { "productName": "crayon", "quantity": 23, "isGift": false},
      { "productName": "glue", "quantity": 1, "isGift": true}
    ]
}

You may be interested in:
#298 (comment)
And possibly:
#292 (comment)

gregfenton · 2021-01-03T22:26:27Z

Thank you @nwparker ! I have turned to looking at a combination of unnest() and json_extract_array() to extract the values from my data. I suspect I'll be simply adding a hand-coded query to BQ rather than expecting @firebaseextensions/fs-bq-schema-views to generate one for me.

I got some good direction from this question I posted on Stackoverflow.

ludvigaldrin · 2021-11-22T05:57:19Z

Din you guys solve this? I have a bit more simple than the example that I have an array with map objects. I have tried to add first a array field and inside the [] I added the map. But the output is that the array works great but it’s just one field with the whole map object as json. I’m not by my schema json so will post later. But it sounds like this could be solved?

AhmetAydemir1 · 2022-11-25T20:10:29Z

did you guys solve?

dackers86 · 2022-12-07T09:52:37Z

@AhmetAydemir1 We are re-investigating this.

We are also looking for sample schema datasets to test and develop against. If any users have a particular use case scenario they would like to put forward. They would be extremely helpful !

gregfenton · 2022-12-07T16:17:47Z

@dackers86 how about the Firestore schema I outlined in my post above ? Or, if not, what more are you looking for?

dackers86 · 2022-12-13T14:58:05Z

Thanks @gregfenton That looks a good starter, i'll use that as a baseline and maybe develop it further. For example:

{
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "date",
      "type": "string"
    },
    {
      "name": "total",
      "type": "string"
    },
    {
      "name": "cartItems",
      "type": "map", // new property?
      "fields": [
        {
          "name": "productName",
          "type": "string"
        },
        {
          "name": "quantity",
          "type": "string"
        },
        {
          "name": "isGift",
          "type": "string"
        }
      ]
    }
  ]
}

dackers86 · 2022-12-15T15:48:52Z

Initial PR added. For discussion, initial results have produced something similar to the following:

SELECT
  *
FROM
  (
    SELECT
      document_name,
      document_id,
      timestamp,
      operation,
      JSON_EXTRACT_SCALAR(data, '$.name') AS name,
      JSON_EXTRACT_SCALAR(data, '$.date') AS date,
      JSON_EXTRACT_SCALAR(data, '$.total') AS total,
      JSON_EXTRACT_SCALAR(cartItems, '$.productName') AS productName,
      JSON_EXTRACT_SCALAR(cartItems, '$.quantity') AS quantity,
      JSON_EXTRACT_SCALAR(cartItems, '$.isGift') AS isGift
    FROM
      `dev-extensions-testing.da_testing4.da_testing4_raw_changelog` da_testing4_raw_changelog
      LEFT JOIN UNNEST(
        json_extract_array(da_testing4_raw_changelog.data, '$.cartItems')
      ) cartItems WITH OFFSET _cartItems
  )

And for the latest view

-- Given a user-defined schema over a raw JSON changelog, returns the
-- schema elements of the latest set of live documents in the collection.
--   timestamp: The Firestore timestamp at which the event took place.
--   operation: One of INSERT, UPDATE, DELETE, IMPORT.
--   event_id: The event that wrote this row.
--   <schema-fields>: This can be one, many, or no typed-columns
--                    corresponding to fields defined in the schema.
SELECT
  document_name,
  document_id,
  timestamp,
  operation,
  name,
  date,
  total,
  productName,
  quantity,
  isGift
FROM
  (
    SELECT
      document_name,
      document_id,
      FIRST_VALUE(timestamp) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS timestamp,
      FIRST_VALUE(operation) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS operation,
      FIRST_VALUE(operation) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) = "DELETE" AS is_deleted,
      FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.name')) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS name,
      FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.date')) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS date,
      FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.total')) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS total,
      JSON_EXTRACT_SCALAR(cartItems, '$.productName') AS productName,
      JSON_EXTRACT_SCALAR(cartItems, '$.quantity') AS quantity,
      JSON_EXTRACT_SCALAR(cartItems, '$.isGift') AS isGift
    FROM
      `dev-extensions-testing.da_testing4.da_testing4_raw_latest`
      LEFT JOIN unnest(
        json_extract_array(
          `dev-extensions-testing.da_testing4.da_testing4_raw_latest`.data,
          '$.cartItems'
        )
      ) cartItems WITH OFFSET _cartItems
  )
WHERE
  NOT is_deleted
GROUP BY
  document_name,
  document_id,
  timestamp,
  operation,
  name,
  date,
  total,
  productName,
  quantity,
  isGift

Samples results lead to multiple rows per array item.

Questions

Would developers find this easier, if the array columns are separated into different columns, as opposed to adding multiple rows.
The latest view may have some performance issues, the latest BQ updates includes a much more performant script. For readability and ease upgrade. Should this be included in the update or as a separate PR

gregfenton · 2022-12-15T21:35:40Z

Hmmm....I don't think I even considered using FIRST_VALUE. Cool!

Answers

I'm not sure how the columns approach would work? The number of columns per FS document (e.g. items in the shopping cart) would vary. So how could BQ handle that? To me, the multiple rows approach works though it definitely makes "traditional SQL" folks uneasy 😃
Personally I'd be cool if there was a script/tool provided that could generate the SQL for me rather than directly update the BQ configuration with it (as view, right?). I likely will end up wanting to tweak it in some ways. Having the tool to give me the initial SQL would be a huge help.

dackers86 · 2022-12-20T10:35:24Z

BigQuery has a max columns of 10,000.

I believe the official cloud tool follows the same method for auto-generating columns.

Also agreed the traditional SQL.

For creating the views, we could a silent flag, as the gen-schema tool provides the output, we could provide a flag that ensures the views are not created automatically?

russellwheatley added in-review Awaiting review by FE team. type: feature request New feature or request labels Apr 29, 2020

iislucas mentioned this issue May 19, 2020

Array fields in from GENERATE_SCHEMA_VIEWS remove the row from bigquery when empty or not present. #298

Closed

russellwheatley mentioned this issue Jul 20, 2020

JSON schema definion for type record / mode repeated does not work #389

Closed

jhuleatt added this to Under consideration in Extension Update Tracker via automation Sep 21, 2020

i14h mentioned this issue Sep 28, 2020

Firestore Export to BigQuery add map as JSON string to schema #408

Closed

jhuleatt moved this from Under consideration to Blocked in Extension Update Tracker Oct 5, 2020

dackers86 removed the in-review Awaiting review by FE team. label Aug 3, 2021

cabljac added the extension: firestore-bigquery-export Related to firestore-bigquery-export extension label Oct 26, 2022

dackers86 mentioned this issue Dec 15, 2022

feat(firestore-bigquery-export): updating gen-schema-view to include typed schema arrays #1366

Merged

dackers86 mentioned this issue Oct 25, 2023

feat(gen-schema-view): added option to generate schema files to the local directory #1780

Open

dackers86 closed this as completed in #1366 Oct 27, 2023

Extension Update Tracker automation moved this from Blocked to Closed Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

firestore-bigquery-export: typed arrays for schema views #293

firestore-bigquery-export: typed arrays for schema views #293

iislucas commented Apr 28, 2020

russellwheatley commented Apr 29, 2020

i14h commented Sep 28, 2020

IchordeDionysos commented Nov 20, 2020

fredzqm commented Nov 20, 2020

gregfenton commented Nov 30, 2020 •

edited

nwparker commented Nov 30, 2020

gregfenton commented Jan 3, 2021

ludvigaldrin commented Nov 22, 2021

AhmetAydemir1 commented Nov 25, 2022

dackers86 commented Dec 7, 2022

gregfenton commented Dec 7, 2022

dackers86 commented Dec 13, 2022 •

edited

dackers86 commented Dec 15, 2022

gregfenton commented Dec 15, 2022

dackers86 commented Dec 20, 2022

firestore-bigquery-export: typed arrays for schema views #293

firestore-bigquery-export: typed arrays for schema views #293

Comments

iislucas commented Apr 28, 2020

[REQUIRED] Step 2: Extension name

What feature would you like to see?

How would you use it?

russellwheatley commented Apr 29, 2020

i14h commented Sep 28, 2020

IchordeDionysos commented Nov 20, 2020

fredzqm commented Nov 20, 2020

gregfenton commented Nov 30, 2020 • edited

nwparker commented Nov 30, 2020

gregfenton commented Jan 3, 2021

ludvigaldrin commented Nov 22, 2021

AhmetAydemir1 commented Nov 25, 2022

dackers86 commented Dec 7, 2022

gregfenton commented Dec 7, 2022

dackers86 commented Dec 13, 2022 • edited

dackers86 commented Dec 15, 2022

Questions

gregfenton commented Dec 15, 2022

Answers

dackers86 commented Dec 20, 2022

gregfenton commented Nov 30, 2020 •

edited

dackers86 commented Dec 13, 2022 •

edited