feat: ignoring time zone info when import from external files #1341
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the GreptimeDB CLA
What's changed and what's your intention?
This PR allows users to import parquet files with explicit time zone info in schema by ignoring the time zone info and treating the
i64
value as time elapsed since UNIX epoch.Why ignoring time zone in arrow array is correct?
According to Timezone Aware Timestamp Parsing - arrow-rs, the time zone info in schema is an indicator of the desired time zone, but the underlying value (the
i64
value) is always adjusted to UTC time.That is being to say, if we can a timestamp in secinds with value 28800 (86060) and time zone "+08:00", what is the UTC time point of this timestamp? Is it "1970-01:01 00:00:00"? Actually no, it "1970-01:01 08:00:00" in UTC and "1970-01:01 16:00:00" in CST.
Why ignoring time zone in parquet schema is ok?
Parquet has an "isAdjustedToUTC" flag which indicates if the INT96 value in Parquet is already adjusted to UTC time, but arrow/parquet does not provide a method to read that flag. So we can only ignore the flag and treat all INT96 values in parquet's timestamp columns as timestamp relative to UTC time, just like
Checklist
Refer to a related PR or issue link (optional)
#1319