-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
firestore-bigquery-export: typed arrays for schema views #293
Comments
Fair enough, could be a Feature Request candidate. Thanks for the suggestion. |
@fredzqm @IanWyszynski since you wrote the original version of this extension, what do you think about this feature request? |
Also, arrays are a bit weird how they transform values... Strings will look like this in the database: |
This would be a very cool addition. As a reminder, the schema generator just create view with type-safe queries for us. For each individual use case, you can craft that query as well to fit your need. See SQL fixtures for example of query generated. |
Does anyone have an example of a schema (or view query) for an array of a simple map? Something along the lines of:
|
You may be interested in: |
Thank you @nwparker ! I have turned to looking at a combination of I got some good direction from this question I posted on Stackoverflow. |
Din you guys solve this? I have a bit more simple than the example that I have an array with map objects. I have tried to add first a array field and inside the [] I added the map. But the output is that the array works great but it’s just one field with the whole map object as json. I’m not by my schema json so will post later. But it sounds like this could be solved? |
did you guys solve? |
@AhmetAydemir1 We are re-investigating this. We are also looking for sample schema datasets to test and develop against. If any users have a particular use case scenario they would like to put forward. They would be extremely helpful ! |
@dackers86 how about the Firestore schema I outlined in my post above ? Or, if not, what more are you looking for? |
Thanks @gregfenton That looks a good starter, i'll use that as a baseline and maybe develop it further. For example: {
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "date",
"type": "string"
},
{
"name": "total",
"type": "string"
},
{
"name": "cartItems",
"type": "map", // new property?
"fields": [
{
"name": "productName",
"type": "string"
},
{
"name": "quantity",
"type": "string"
},
{
"name": "isGift",
"type": "string"
}
]
}
]
} |
Initial PR added. For discussion, initial results have produced something similar to the following: SELECT
*
FROM
(
SELECT
document_name,
document_id,
timestamp,
operation,
JSON_EXTRACT_SCALAR(data, '$.name') AS name,
JSON_EXTRACT_SCALAR(data, '$.date') AS date,
JSON_EXTRACT_SCALAR(data, '$.total') AS total,
JSON_EXTRACT_SCALAR(cartItems, '$.productName') AS productName,
JSON_EXTRACT_SCALAR(cartItems, '$.quantity') AS quantity,
JSON_EXTRACT_SCALAR(cartItems, '$.isGift') AS isGift
FROM
`dev-extensions-testing.da_testing4.da_testing4_raw_changelog` da_testing4_raw_changelog
LEFT JOIN UNNEST(
json_extract_array(da_testing4_raw_changelog.data, '$.cartItems')
) cartItems WITH OFFSET _cartItems
) And for the latest view -- Given a user-defined schema over a raw JSON changelog, returns the
-- schema elements of the latest set of live documents in the collection.
-- timestamp: The Firestore timestamp at which the event took place.
-- operation: One of INSERT, UPDATE, DELETE, IMPORT.
-- event_id: The event that wrote this row.
-- <schema-fields>: This can be one, many, or no typed-columns
-- corresponding to fields defined in the schema.
SELECT
document_name,
document_id,
timestamp,
operation,
name,
date,
total,
productName,
quantity,
isGift
FROM
(
SELECT
document_name,
document_id,
FIRST_VALUE(timestamp) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS timestamp,
FIRST_VALUE(operation) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS operation,
FIRST_VALUE(operation) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) = "DELETE" AS is_deleted,
FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.name')) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS name,
FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.date')) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS date,
FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.total')) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS total,
JSON_EXTRACT_SCALAR(cartItems, '$.productName') AS productName,
JSON_EXTRACT_SCALAR(cartItems, '$.quantity') AS quantity,
JSON_EXTRACT_SCALAR(cartItems, '$.isGift') AS isGift
FROM
`dev-extensions-testing.da_testing4.da_testing4_raw_latest`
LEFT JOIN unnest(
json_extract_array(
`dev-extensions-testing.da_testing4.da_testing4_raw_latest`.data,
'$.cartItems'
)
) cartItems WITH OFFSET _cartItems
)
WHERE
NOT is_deleted
GROUP BY
document_name,
document_id,
timestamp,
operation,
name,
date,
total,
productName,
quantity,
isGift Samples results lead to multiple rows per array item. Questions
|
Hmmm....I don't think I even considered using Answers
|
BigQuery has a max columns of I believe the official cloud tool follows the same method for auto-generating columns. Also agreed the For creating the views, we could a |
[REQUIRED] Step 2: Extension name
This feature request is for extension:
firestore-bigquery-export
, and in particular for the GENERATE_SCHEMA_VIEWS.mdWhat feature would you like to see?
Support for arrays of different types. For example, some of the arrays in our project have maps inside them. A natural way to handle this would be to create a value field for each of the keys in the map. Even now, it's a bit strange that the array type doesn't specify if it's a string, number, boolean etc.
How would you use it?
To import and use bigquery on firestore objects that have arrays of nested maps.
The text was updated successfully, but these errors were encountered: