Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

firestore-bigquery-export: typed arrays for schema views #293

Closed
iislucas opened this issue Apr 28, 2020 · 15 comments · Fixed by #1366 · May be fixed by #1780
Closed

firestore-bigquery-export: typed arrays for schema views #293

iislucas opened this issue Apr 28, 2020 · 15 comments · Fixed by #1366 · May be fixed by #1780
Labels
extension: firestore-bigquery-export Related to firestore-bigquery-export extension type: feature request New feature or request

Comments

@iislucas
Copy link

[REQUIRED] Step 2: Extension name

This feature request is for extension: firestore-bigquery-export, and in particular for the GENERATE_SCHEMA_VIEWS.md

What feature would you like to see?

Support for arrays of different types. For example, some of the arrays in our project have maps inside them. A natural way to handle this would be to create a value field for each of the keys in the map. Even now, it's a bit strange that the array type doesn't specify if it's a string, number, boolean etc.

How would you use it?

To import and use bigquery on firestore objects that have arrays of nested maps.

@russellwheatley
Copy link
Member

Fair enough, could be a Feature Request candidate. Thanks for the suggestion.

@i14h
Copy link
Member

i14h commented Sep 28, 2020

@fredzqm @IanWyszynski since you wrote the original version of this extension, what do you think about this feature request?

@IchordeDionysos
Copy link
Contributor

Also, arrays are a bit weird how they transform values...

Strings will look like this in the database: name
Strings in Arrays will look like this in the database: "name"

@fredzqm
Copy link
Contributor

fredzqm commented Nov 20, 2020

This would be a very cool addition.
There is no reason this isn't built other than we didn't have time to.

As a reminder, the schema generator just create view with type-safe queries for us. For each individual use case, you can craft that query as well to fit your need.
It is intended to cover enough common cases to help developers get started but does not intend to be covering all use cases. Supporting common asks like "schema within an array" would be a great addition.

See SQL fixtures for example of query generated.
This is the entry point of the schema SQL builder if anyone wants to contribute.

@gregfenton
Copy link
Contributor

gregfenton commented Nov 30, 2020

Does anyone have an example of a schema (or view query) for an array of a simple map? Something along the lines of:

{
    "orderName": "test order",
    "cartItems": [
      { "productName": "crayon", "quantity": 23, "isGift": false},
      { "productName": "glue", "quantity": 1, "isGift": true}
    ]
}

@nwparker
Copy link

Does anyone have an example of a schema (or view query) for an array of a simple map? Something along the lines of:

{
    "orderName": "test order",
    "cartItems": [
      { "productName": "crayon", "quantity": 23, "isGift": false},
      { "productName": "glue", "quantity": 1, "isGift": true}
    ]
}

You may be interested in:
#298 (comment)
And possibly:
#292 (comment)

@gregfenton
Copy link
Contributor

Thank you @nwparker ! I have turned to looking at a combination of unnest() and json_extract_array() to extract the values from my data. I suspect I'll be simply adding a hand-coded query to BQ rather than expecting @firebaseextensions/fs-bq-schema-views to generate one for me.

I got some good direction from this question I posted on Stackoverflow.

@dackers86 dackers86 removed the in-review Awaiting review by FE team. label Aug 3, 2021
@ludvigaldrin
Copy link

Din you guys solve this? I have a bit more simple than the example that I have an array with map objects. I have tried to add first a array field and inside the [] I added the map. But the output is that the array works great but it’s just one field with the whole map object as json. I’m not by my schema json so will post later. But it sounds like this could be solved?

@cabljac cabljac added the extension: firestore-bigquery-export Related to firestore-bigquery-export extension label Oct 26, 2022
@AhmetAydemir1
Copy link

did you guys solve?

@dackers86
Copy link
Member

@AhmetAydemir1 We are re-investigating this.

We are also looking for sample schema datasets to test and develop against. If any users have a particular use case scenario they would like to put forward. They would be extremely helpful !

@gregfenton
Copy link
Contributor

@dackers86 how about the Firestore schema I outlined in my post above ? Or, if not, what more are you looking for?

@dackers86
Copy link
Member

dackers86 commented Dec 13, 2022

Thanks @gregfenton That looks a good starter, i'll use that as a baseline and maybe develop it further. For example:

{
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "date",
      "type": "string"
    },
    {
      "name": "total",
      "type": "string"
    },
    {
      "name": "cartItems",
      "type": "map", // new property?
      "fields": [
        {
          "name": "productName",
          "type": "string"
        },
        {
          "name": "quantity",
          "type": "string"
        },
        {
          "name": "isGift",
          "type": "string"
        }
      ]
    }
  ]
}

@dackers86
Copy link
Member

Initial PR added. For discussion, initial results have produced something similar to the following:

SELECT
  *
FROM
  (
    SELECT
      document_name,
      document_id,
      timestamp,
      operation,
      JSON_EXTRACT_SCALAR(data, '$.name') AS name,
      JSON_EXTRACT_SCALAR(data, '$.date') AS date,
      JSON_EXTRACT_SCALAR(data, '$.total') AS total,
      JSON_EXTRACT_SCALAR(cartItems, '$.productName') AS productName,
      JSON_EXTRACT_SCALAR(cartItems, '$.quantity') AS quantity,
      JSON_EXTRACT_SCALAR(cartItems, '$.isGift') AS isGift
    FROM
      `dev-extensions-testing.da_testing4.da_testing4_raw_changelog` da_testing4_raw_changelog
      LEFT JOIN UNNEST(
        json_extract_array(da_testing4_raw_changelog.data, '$.cartItems')
      ) cartItems WITH OFFSET _cartItems
  )

And for the latest view

-- Given a user-defined schema over a raw JSON changelog, returns the
-- schema elements of the latest set of live documents in the collection.
--   timestamp: The Firestore timestamp at which the event took place.
--   operation: One of INSERT, UPDATE, DELETE, IMPORT.
--   event_id: The event that wrote this row.
--   <schema-fields>: This can be one, many, or no typed-columns
--                    corresponding to fields defined in the schema.
SELECT
  document_name,
  document_id,
  timestamp,
  operation,
  name,
  date,
  total,
  productName,
  quantity,
  isGift
FROM
  (
    SELECT
      document_name,
      document_id,
      FIRST_VALUE(timestamp) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS timestamp,
      FIRST_VALUE(operation) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS operation,
      FIRST_VALUE(operation) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) = "DELETE" AS is_deleted,
      FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.name')) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS name,
      FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.date')) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS date,
      FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.total')) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS total,
      JSON_EXTRACT_SCALAR(cartItems, '$.productName') AS productName,
      JSON_EXTRACT_SCALAR(cartItems, '$.quantity') AS quantity,
      JSON_EXTRACT_SCALAR(cartItems, '$.isGift') AS isGift
    FROM
      `dev-extensions-testing.da_testing4.da_testing4_raw_latest`
      LEFT JOIN unnest(
        json_extract_array(
          `dev-extensions-testing.da_testing4.da_testing4_raw_latest`.data,
          '$.cartItems'
        )
      ) cartItems WITH OFFSET _cartItems
  )
WHERE
  NOT is_deleted
GROUP BY
  document_name,
  document_id,
  timestamp,
  operation,
  name,
  date,
  total,
  productName,
  quantity,
  isGift

Samples results lead to multiple rows per array item.

image

Questions

  1. Would developers find this easier, if the array columns are separated into different columns, as opposed to adding multiple rows.
  2. The latest view may have some performance issues, the latest BQ updates includes a much more performant script. For readability and ease upgrade. Should this be included in the update or as a separate PR

@gregfenton
Copy link
Contributor

Hmmm....I don't think I even considered using FIRST_VALUE. Cool!

Answers

  1. I'm not sure how the columns approach would work? The number of columns per FS document (e.g. items in the shopping cart) would vary. So how could BQ handle that? To me, the multiple rows approach works though it definitely makes "traditional SQL" folks uneasy 😃
  2. Personally I'd be cool if there was a script/tool provided that could generate the SQL for me rather than directly update the BQ configuration with it (as view, right?). I likely will end up wanting to tweak it in some ways. Having the tool to give me the initial SQL would be a huge help.

@dackers86
Copy link
Member

BigQuery has a max columns of 10,000.

I believe the official cloud tool follows the same method for auto-generating columns.

Also agreed the traditional SQL.

For creating the views, we could a silent flag, as the gen-schema tool provides the output, we could provide a flag that ensures the views are not created automatically?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension: firestore-bigquery-export Related to firestore-bigquery-export extension type: feature request New feature or request
Projects