Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timezone Aware Timestamp Parsing #3794

Closed
tustvold opened this issue Mar 2, 2023 · 5 comments · Fixed by #3795
Closed

Timezone Aware Timestamp Parsing #3794

tustvold opened this issue Mar 2, 2023 · 5 comments · Fixed by #3795
Assignees
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@tustvold
Copy link
Contributor

tustvold commented Mar 2, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

string_to_timestamp_nanos contains logic to parse timestamp-like string to nanoseconds since UTC epoch.

The semantics for this are well defined for timestamps including a timezone, e.g. 1997-01-31 09:26:56.123Z or 1997-01-31T09:26:56.123-05:00. However, the semantics get confused for timestamps of the form 1997-01-31T09:26:56.123

As pointed out by @MachaelLee on #3787 prior to #2814 timestamp string without a timezone would be interpreted as being in the system's local timezone, and this continues to be what the function docs state happens. This was changed in #2814 by @waitingkuo to instead be parsed in the UTC timezone.

There are at least three "correct" behaviours when parsing strings without an embedded timezone 1997-01-31T09:26:56.123 depending on the context

  • When parsing a user-provided timestamp, use the system Local timezone and convert back to UTC what it used to do
  • When parsing a string to a datatype without a timezone assume UTC what it currently does
  • When parsing a string to a Timestamp column with a timezone, should assume the timestamp is in the given Tz and convert back to UTC I think this is correct??

Describe the solution you'd like

Provide a function with the signature

pub fn string_to_datetime<T: Timezone>(t: &T, s: &str) -> Result<DateTime<T>>

This could then be used with Local, Utc, or Tz as appropriate.

We could then update string_to_timestamp_nanos to be something like

pub fn string_to_timestamp_nanos(s: &str) -> Result<i64, ArrowError> {
    to_timestamp_nanos(string_to_datetime(Utc, s)?.naive_utc())
}

And possibly deprecate it as it has rather confusing semantics

Describe alternatives you've considered

Additional context

@tustvold tustvold added the enhancement Any new improvement worthy of a entry in the changelog label Mar 2, 2023
@alamb
Copy link
Contributor

alamb commented Mar 2, 2023

pub fn string_to_datetime<T: Timezone>(t: &T, s: &str) -> Result<DateTime>

Makes sense to me

@waitingkuo what do you think?

@stuartcarnie
Copy link
Contributor

Am I correct this is an improvement to the API, and the semantics of interpreting a timestamp without an offset is still to interpret as UTC?

@tustvold
Copy link
Contributor Author

tustvold commented Mar 2, 2023

the semantics of interpreting a timestamp without an offset is still to interpret as UTC

This does not alter the current behaviour, however, #1936 tracks using this functionality to interpret such timestamps with respect to the datatype's timezone, if any

@alamb
Copy link
Contributor

alamb commented Mar 3, 2023

Proposed implementation in #3795

@tustvold tustvold self-assigned this Mar 4, 2023
tustvold added a commit that referenced this issue Mar 4, 2023
* Timezone aware timestamp parsing (#3794)

* Add further test

* Update arrow-cast/src/parse.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@tustvold tustvold added the arrow Changes to the arrow crate label Mar 10, 2023
@tustvold
Copy link
Contributor Author

label_issue.py automatically added labels {'arrow'} from #3795

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants