Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting time zone #1319

Closed
3 of 5 tasks
v0y4g3r opened this issue Apr 4, 2023 · 6 comments
Closed
3 of 5 tasks

Supporting time zone #1319

v0y4g3r opened this issue Apr 4, 2023 · 6 comments
Assignees
Labels
C-documentation Improvements or additions to documentation IN THIS REPO C-feature Category Features
Milestone

Comments

@v0y4g3r
Copy link
Contributor

v0y4g3r commented Apr 4, 2023

What problem does the new feature solve?

Currently GreptimeDB supports parsing timestamps with time zone, but does not support time zone in data types, nor interpreting UTC timestamp values into user's local time.

arrow-schema's DataType::Timestamp has an additional time zone field

What does the feature do?

Implementation challenges

The underlying representation of GreptimeDB's vectors are arrow's arrays. Unfortunately in arrow-array, Timestamp[...]Array's data type does not contain any time zone info, instead their time zones are hard coded to None, although timestamp types with different time zones are considered as different types, which will leads to "mismatch schema error" as in this thread.

@v0y4g3r v0y4g3r added the C-feature Category Features label Apr 4, 2023
@v0y4g3r v0y4g3r self-assigned this Apr 4, 2023
@v0y4g3r
Copy link
Contributor Author

v0y4g3r commented Apr 4, 2023

Another issue would be how we treat the timestamps in parquet which contain a int64 value along with a timezone, should we adjust the int64 value into UTC time zone?

According to Parquet's doc, timestamp values are either adjusted to UTC or not adjusted to UTC.

  • If the value is adjusted to UTC, then the int64 value represents physical time in UTC time zone;
  • If not, the timestamp is local time, we need to mannualy convert it to UTC according to the time zone info.

We may first support adjuested timestamps, since converting non-adjusted timestamps requires identifying the time zone info with numerous aliases.

@waynexia
Copy link
Member

waynexia commented Apr 4, 2023

Another issue would be how we treat the timestamps in parquet which contain a int64 value along with a timezone, should we adjust the int64 value into UTC time zone?

I think we can adjust all timestamp-related types to UTC, and stores the time zone as extra infos in metadata in the very beginning. Then all our storage and computing operations only need to care about one standard time zone. User provided time zone only works on the outermost user-facing level, like displaying the result.

@v0y4g3r
Copy link
Contributor Author

v0y4g3r commented Apr 4, 2023

Another issue would be how we treat the timestamps in parquet which contain a int64 value along with a timezone, should we adjust the int64 value into UTC time zone?

I think we can adjust all timestamp-related types to UTC, and stores the time zone as extra infos in metadata in the very beginning. Then all our storage and computing operations only need to care about one standard time zone. User provided time zone only works on the outermost user-facing level, like displaying the result.

Did a simple survery on how MySQL, PostgreSQL and InfluxDB handles time zones:

  • MySQL converts TIMESTAMP values from the current time zone to UTC for storage, and back from UTC to the current time zone for retrieval.
  • [PostgreSQL] For timestamp with time zone, the internally stored value is always in UTC (GMT).
  • By default, InfluxDB stores and returns timestamps in UTC.

Looks like storing timestamp in UTC and rendering to local time zone format according to system/session time zone is a general approach, but there is still a problem: a timestamp from the future may point to different UTC time when daylight saving time changes. Currently this may not be an issue though.

@v0y4g3r
Copy link
Contributor Author

v0y4g3r commented Apr 4, 2023

As per reading/writing parquet files, we can assume all timestamps in parquet files are adjusted to UTC time, thus we don't need to further converting the values. This should be documented in our docs.

@v0y4g3r v0y4g3r added the C-documentation Improvements or additions to documentation IN THIS REPO label Apr 4, 2023
@ArslanYM
Copy link

ArslanYM commented Apr 7, 2023

Is this issue fixed?

@v0y4g3r
Copy link
Contributor Author

v0y4g3r commented Apr 7, 2023

Is this issue fixed?

When #1341 is merged, we will be able to insert/import timestamps with time zones. The only caveat will be displaying timestamps in local time zone and parsing timestamps without time zone info in the local (system) time zone.

@fengjiachun fengjiachun added this to the v0.2 milestone Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-documentation Improvements or additions to documentation IN THIS REPO C-feature Category Features
Projects
No open projects
Status: Done
Development

No branches or pull requests

4 participants