Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A TZ string can be ambiguous #1153

Open
pitdicker opened this issue Jun 23, 2023 · 13 comments
Open

A TZ string can be ambiguous #1153

pitdicker opened this issue Jun 23, 2023 · 13 comments

Comments

@pitdicker
Copy link
Collaborator

Example problem

Consider this (extended) POSIX TZ string: CRAZY5SHORT,M12.5.0/50,0/2

  • The name of this timezone is CRAZY.
  • The offset from UTC is -05:00
  • The name of the timezone during daylight daving time is SHORT.
  • The offset during DST is missing, and is implied to be 1 hour more: -04:00
  • The base transition date from standard time to daylight daving time is the last (5) Sunday (0) of December (12).
  • The actual date and time is 50 hours later, so two days later at 02:00.
  • The transition from daylight daving time to standard time is day 0 of the year, January 1st. The time is 02:00.

The transition dates work out to:

year dst to std std to dst
2022 2022-01-01 2:00 2022-12-27 2:00
2023 2023-01-01 2:00 2024-02-02 2:00
2024 2024-01-01 2:00 2024-12-31 2:00

Daylight saving time would start at December 27th, 2022.
The transition to standard time is January 1st, 2023.
The first transition after that is also from daylight saving time to standard time on January 1st, 2024!

What is the offset from UTC during most of 2023?
→ The TZ string is ambiguous, we can't tell.

Why can the time of day be more than 24 hours?

POSIX allows times from 00:00:00 to 24:59:59.
RFC 8536 allows a TZ string in an TZif file to have times from -167:59:59 to 167:59:59. So up to a week before and after midnight of the transition date.

An example of when this can be needed to represent real-world transition rules comes from a man page:

Palestinian civil time, from 2012 onwards: EET-2EEST,M3.5.4/24,M9.3.6/145

2 hours ahead of UT in winter and 3 hours ahead in summer.
Changes at the end (24:00 local time) of the last Thursday in March and 01:00 local time on the Friday following the third Saturday in September (that is, the Friday falling between September 21 and September 27 inclusive).
The extended time-of-day "145", meaning 01:00 of the day six days after the nominal day, is only valid in the tzfile(5) variant of the System V syntax.

A year can end up with more than two transition dates

It is possible to write a date rule in 3 ways:

  • Jn: Ordinal which skips February 29th (a 'Julian day').
  • n: Ordinal in the range [0, 365] (a 'zero-based Julian day').
  • Mm.n.d: Month, day of the week, and n'th occurence of that day in the month (where n = 5 means the last day of the week that is in that month).

Two cases were it depends on the year whether a date falls in the current year or the next:

  • An ordinal of 365 maps to December 31st in leap years, and in non-leap years to Januari 1st of the next year.
  • M12.5.d will result in a date in the last week of december, potentially on the last day. Combined with a time >= 24 hours (allowed by POSIX) may push the date to the next year.

Problem 1: Our functions to map a local time to UTC don't expect to encounter more than two transition dates per year. But UTC to local time can handle it (I think?)

When can the transition dates switch order?

If the two dates (including time) are close together, within 1 week of each other.

  • Suppose one of the dates is defined with Mm.n.d. It can jump resolve to 7 different dates. If the other date is a fixed ordinal it is easy to make a TZ string where the dates switch order depending on the year.
  • The same is possible when combining a date with the 4th occurence of the month and a date with the last occurence of the month.
  • Mixing an ordinal that ignores a leap day with regular ordinal.

That the time of the transition can be negative or more than 24 hours makes detecting a TZ string with this ambiguity extra difficult.

Problem 2: we assume a TZ string never causes ambiguous cases.

Possible solutions

Option 1: 'never look beyond the current year'.

This does not make all that much sense to me. What makes the year boundary so special?
But as we are dealing with a non-sensical timezone specification, it is okay-ish to give bogus answers.

Option 2: detect weird TZ strings and return an return an error.

This is the approach in #789. The validation in that PR is quite involved.
An optimization there might be to resolve the two dates for some random year and test whether they are within a week of each other. Or more than 358 days apart (year boundary stuff 😞).

Option 3: detect ambiguous cases during conversion.

During the conversion to/from local time we already have two transition dates for the current year. It is fast to check if they are less than a week apart, and then calculate the transition dates for the preceding or following year. Only if the datetime falls in an ambiguous period would this return LocalResult::Ambiguous.

This could work for local-to-utc, but we assume utc-to-local is never ambiguous. So not a solution.

Is this worth fixing?

At the moment this is just a dark, unspecified corner of chrono 😄.
It is used if the TZ environment variable sets a TZ string, or when it is included in the TZif file of the current timezone.

I am playing with the idea of exposing this functionality in a public type DstRule (or something like that). It would be a fourth choice besides Utc, Local and FixedOffset.
I think it would be very useful for library users when writing unit tests to detect DST transition problems (and for us for the same reason).
And having an easy way to specify timezones that are often good enough seems useful, especially on platforms that don't have Local or a timezone database.

Whatever we do, it should be consistent and mentioned in the documentation.

@pitdicker
Copy link
Collaborator Author

#789 implements option 2, but in my opinion the desciption there doesn't do it justice.

cc @x-hgg-x You probably put a lot of thoughts into this already.

@pitdicker
Copy link
Collaborator Author

pitdicker commented Jun 23, 2023

Option 4: detect ambiguous cases during conversion, return standard offset.

The desciption of a TZ String descibes a 'standard timezone' with offset, and an optional 'alternative timezone' with offset (which is used during daylight saving time).

I propose to detect ambiguous cases during conversion like option 3, and in the rare ambiguous cases to assume the 'standard timezone'.

To phrase it more clearly: 'when transitions cause the period in between them to be ambiguous, assume that period to be in standard time'.

@x-hgg-x
Copy link

x-hgg-x commented Jun 23, 2023

Problem 1 is ok if the transition dates never switch order, since we also check previous and next year transitions when converting UTC to local time:

// Check DST start/end Unix times for previous/current/next years to support for transition day times outside of [0h, 24h] range

In #789, I have implemented an exhaustive validation check for the extra rule of a timezone, so that the assumptions I made in the other parts of the code are upheld (see https://github.com/chronotope/chrono/pull/789/files#diff-92d44e11f46c889256447f824f3c9fb4964ba2f3092727e8b0a4251a4e236ff5R196-R198 for example).

@x-hgg-x
Copy link

x-hgg-x commented Jun 23, 2023

I think it is better to check the timezone once when loading it, rather than doing the check each time we need to do an utc-to-local conversion.

@pitdicker
Copy link
Collaborator Author

@x-hgg-x Thank you for replying this quick!

Problem 1 is ok if the transition dates never switch order, since we also check previous and next year transitions when converting UTC to local time:

But we don't check it yet when converting from local time to UTC.

@pitdicker
Copy link
Collaborator Author

pitdicker commented Jun 24, 2023

Problem 3: a transition date falls in a gap created by another transition date.

A third way to make a mess with transition dates 😇 :
Transition date 1 creates a gap in local time, for example by jumping the offset from UTC from +2:00 to +3:00.
Transition date 2 has the same date, and a time right in the gap.
In theory transition date 2 doesn't exist.

@x-hgg-x
Copy link

x-hgg-x commented Jun 24, 2023

But we don't check it yet when converting from local time to UTC.

This differs between chrono and tz-rs.

When converting local time to UTC, since chrono doesn't keep the offset associated to the local time in the NaiveDateTime structure, it must scan the whole timezone to retrieve the UTC time, and so the resulting time can be ambiguous.

This is done in the TimeZoneRef::find_local_time_type_from_local() method, which corresponds to the find_date_time function in tz-rs. Since this method was written independently and was not taken from tz-rs like the other code in the tz_info module, it doesn't have any tests (unlike tz-rs), so I cannot guarantee that the implementation is correct.

@x-hgg-x
Copy link

x-hgg-x commented Jun 24, 2023

A third way to make a mess with transition dates innocent : Transition date 1 creates a gap in local time, for example by jumping the offset from UTC from +2:00 to +3:00. Transition date 2 has the same date, and a time right in the gap. In theory transition date 2 doesn't exist.

Yes there are many cases where the TZ string doesn't make any sense. This is why I chose to invalidate the timezone in these cases in tz-rs.

Note that we can still have this situation with normal transitions in a valid timezone, but since the transitions are specified with UTC timestamps, the corresponding UTC offset is never ambiguous if we know the time since epoch.

@pitdicker
Copy link
Collaborator Author

Yes there are many cases where the TZ string doesn't make any sense. This is why I chose to invalidate the timezone in these cases in tz-rs.

I am interested, do you know more cases?

@x-hgg-x
Copy link

x-hgg-x commented Jun 24, 2023

I am interested, do you know more cases?

The invalid TZ strings are those who cause the transition dates to switch order for a particular year (your problem 2). On the contrary, if we can exclude them, all remaining TZ strings are valid.

This is why my validation check in #789 is complex, because it guarantees that a valid TZ string cannot switch transition dates order for any year.

@pitdicker
Copy link
Collaborator Author

The solution to problem 3 turns out to be simple: it goes away when you switch the two transition dates.
And if you convert the transition dates to UTC before sorting them there is no problem at all.

I'll make this a test case at some point.

@pitdicker
Copy link
Collaborator Author

pitdicker commented Jun 25, 2023

RFC 8536 has an interesting TZ string: EST5EDT,0/0,J365/25:

  • DST is considered to be in effect all year if it starts January 1 at 00:00 and ends December 31 at 24:00 plus the difference between daylight saving and standard time, leaving no room for standard time in the calendar.

Example: EST5EDT,0/0,J365/25
This represents a time zone that observes daylight saving time all year. It is 4 hours west of UT and is abbreviated "EDT".

If a country is in daylight daving time the whole year, how do you specify that?
Like this, with the start of daylight saving time January 1 and the end December 31 at 24:00.

The time at the end date of the example is wrong however.
POSIX says "Each time field describes when, in current local time, the change to the other time is made." So the time should not be "24:00 plus the difference between daylight saving and standard time", but just 24 hours.

In this example if your implementation is of the type 'never look beyond the current year', the whole year is in DST.
If you take transition dates in adjacent years into account, almost the whole year would be in standard time: daylight saving time would start January 1 at 00:00, and end January 1 at 01:00, 25 hours after December 31 at 00:00 of the previous year. The rest of the current year would be in standard time.

@x-hgg-x
Copy link

x-hgg-x commented Jun 26, 2023

The time at the end date of the example is wrong however. POSIX says "Each time field describes when, in current local time, the change to the other time is made." So the time should not be "24:00 plus the difference between daylight saving and standard time", but just 24 hours.

No, the example is correct. The "current local time, when the change to the other time is made" corresponds to the time before the transition.

Here are the transitions described in the example TZ string EST5EDT,0/0,J365/25 for two consecutive years:

  • Year N:
    • A switch from EST UTC-5 to EDT UTC-4 at Year N, January 1 00:00, EST UTC-5.
    • A switch from EDT UTC-4 to EST UTC-5 at Year N, December 31 25:00, EDT UTC-4, corresponding to Year N+1, January 1 01:00, EDT UTC-4 or Year N+1, January 1 00:00, EST UTC-5.
  • Year N+1:
    • A switch from EST UTC-5 to EDT UTC-4 at Year N+1, January 1 00:00, EST UTC-5.
    • A switch from EDT UTC-4 to EST UTC-5 at Year N+1, December 31 25:00, EDT UTC-4, corresponding to Year N+2, January 1 01:00, EDT UTC-4 or Year N+2, January 1 00:00, EST UTC-5.

We can see we spend no time in the EST UTC-5 timezone between the second transition of the year N and the first transition of the year N+1, so we have the EDT UTC-4 timezone all year.

This case works in chrono because we check for the previous and next year transitions:

// Check DST start/end Unix times for previous/current/next years to support for transition day times outside of [0h, 24h] range

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants