Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normative: specify time zone ID requirements to reduce divergence between engines #877

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

justingrant
Copy link
Contributor

@justingrant justingrant commented Mar 31, 2024

This proposed change resolves #825 by adding normative spec text to clarify how ECMA-402 implementations should decide which IANA time zone IDs should be primary vs. non-primary. This will enable more consistency between ECMAScript implementations and prevent future divergence.

This PR also accommodates to web reality by aligning ECMA-402 with CLDR and ICU. This should make it easier for all ECMAScript engines to comply with the spec while still being able to use ICU.

This PR is stacked on #876, so please ignore the first commit when reviewing this PR.

Note that the problem of out-of-date primary IDs for renamed Zones (like Asia/Calcutta) is out of scope to this PR, because the spec already requires current IDs, and there's already a plan to fix it that requires no spec changes.

Summary of proposed changes

This PR implements "Option C" in #825 by deterministically defining ECMAScript's exceptions from the IANA Time Zone Database's defaults, and then pointing implementers at ICU as a convenient implementation of those exceptions.

We'll start with a baseline of IANA's Zone and Link names and specify a few exceptions:

  1. Existing special-cases for Etc/UTC, Etc/GMT, and GMT will be retained.
  2. Primary and non-primary IDs must share the same ISO-3166-2 country code; time zone merges between countries will not be allowed. Note that CLDR and therefore ICU already meet this requirement.
  3. IANA's merges of time zones inside of a country (e.g. Asia/Chongquing=>Asia/Shanghai) will be allowed. Like (2) above, CLDR and ICU already meet this requirement.
  4. Legacy POSIX identifiers, which are mostly non-primary identifiers already, will be fully deprecated into non-primary identifiers. The 9 remaining POSIX identifiers that are currently primary will be mapped to the most similar non-POSIX time zones e.g. PST8PDT=>America/Los_Angeles or CET=>Europe/Berlin.

This PR also makes two other smaller normative text changes that we expect to have zero impact on current engines:

  • Changes Editorial: align time zone identifier text and AOs with ECMA-262 #876's recommendations into requirements to limit observable dynamic updating of TZDB info. AFAIK, no existing ECMAScript engine updates TZDB (observably or otherwise!) during the lifetime of the surrounding agent, so this is really a future-compatibility change.
  • Adds text requiring a two-year waiting period before a newly-renamed ID becomes primary. These are rare (the last was 2022's Europe/Kiev=>Europe/Kyiv) and when the next rename happens, we'll try to convince CLDR to implement this requirement for us. There are no pending renames, so this is also a future-compatibility change.

Per-engine changes required

Implementing the changes in this PR will impact JS engines differently, given the current divergence between engines:

  • For V8 (cc @FrankYFTang) and JSC (cc @Constellation), requirements 1-3 above are already how these engines behave, and (4) should be simple to implement. Note that this PR doesn't affect the plan to fix out-of-date canonicalizations like Asia/Calcutta and Europe/Kiev. This plan is unchanged: as part of landing Temporal Stage 4, switch to use ICU's new icu::TimeZone::getIanaID(), which returns the latest IANA IDs instead of out-of-date canonical IDs like Asia/Calcutta.

  • For SpiderMonkey (cc @anba), more changes are needed because currently SpiderMonkey conforms to the spec which requires using backward in TZDB to determine canonicalization. SM could use icu::TimeZone::getIanaID() to implement (2) and (3) above, or could implement the same behavior by reading CLDR data or IANA data directly. Also, this PR will reduce SM's differences in Intl.supportedValuesOf('timeZone') vs. V8/JSC.

Testing

Test262 changes will be needed to validate these normative changes, but I'm not sure how we can run those tests except using the Temporal polyfill. @ptomato I'll be looking for your advice (and perhaps help writing tests!) on this point.

Feedback requested

Feedback is welcome on any part of this proposal, but I'm most interested in making sure that the spec text actually accomplishes what the summary above claims that it does.

@sffc
Copy link
Contributor

sffc commented Apr 2, 2024

Thanks for putting this together @justingrant! We can discuss this at the next TG2 meeting. In the mean time, I encourage the listed reviewers to take a look.

@ptomato
Copy link
Contributor

ptomato commented Apr 3, 2024

Test262 changes will be needed to validate these normative changes, but I'm not sure how we can run those tests except using the Temporal polyfill.

It's fine to submit tests to test262 that no engine can pass yet, as long as they are correct according to the current snapshot of ECMA-262 or 402 or a Stage 3 proposal.

Copy link
Contributor

@ptomato ptomato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel qualified to review the details — I mostly haven't followed those discussions. At a general level, this all looks very reasonable.

spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
@justingrant justingrant force-pushed the time-zone-id-requirements branch 4 times, most recently from c78de1d to 3bb6dee Compare April 3, 2024 01:48
spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
Copy link
Contributor

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly editorial comments, but I am not comfortable with making the rename waiting period mandatory.

spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
@FrankYFTang
Copy link
Contributor

4. e.g. PST7PDT=>America/Los_Angeles

I assuem you mean to say "e.g. PST8PDT=>America/Los_Angeles" ?
not "PST7PDT=>America/Los_Angeles" right?

@sffc sffc moved this from Priority Issues to Previously Discussed in ECMA-402 Meeting Topics Apr 25, 2024
This PR resolves tc39#825 by adding spec text that defines how ECMA-402
implementations should decide which IANA time zone IDs should be
primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU.

This PR is stacked on top of tc39#876.
@justingrant justingrant force-pushed the time-zone-id-requirements branch 4 times, most recently from b4a630c to 28d8a5a Compare May 22, 2024 06:11
@justingrant
Copy link
Contributor Author

I just pushed a new commit that includes what I think resolves all review feedback. @gibson042 @sffc (and anyone else who's interested) do you want to re-review?

Copy link
Contributor

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 editorial suggestions and one request for better specification.

spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
spec/locales-currencies-tz.html Show resolved Hide resolved
spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
spec/locales-currencies-tz.html Outdated Show resolved Hide resolved
Comment on lines +339 to +340
1. Let _countryCodeLine_ be the line in file <code>zone.tab</code> of the IANA Time Zone Database where the "country-code" column is _identifierCountryCode_.
1. Set _primary_ to the contents of the "TZ" column of _countryCodeLine_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Let _countryCodeLine_ be the line in file <code>zone.tab</code> of the IANA Time Zone Database where the "country-code" column is _identifierCountryCode_.
1. Set _primary_ to the contents of the "TZ" column of _countryCodeLine_.
1. Let _countryCodeLine_ be the line in file <code>zone.tab</code> of the IANA Time Zone Database where the country-code column is _identifierCountryCode_.
1. Set _primary_ to the contents of the “TZ” column of _countryCodeLine_.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specs are required to use smart quote characters?

1. If _identifierCountryCode_ is _zoneCountryCode_, then
1. Set _primary_ to _zone_.
1. Else,
1. Let _countryCodeLine_ be the line in file <code>zone.tab</code> of the IANA Time Zone Database where the "country-code" column is _identifierCountryCode_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of right now, 31 codes have more than one line in zone.tab:

$ curl -sSL 'https://github.com/eggert/tz/raw/main/zone.tab' \
  | awk '{ if (match($1, "^[A-Z][A-Z]$")) print $1; }' \
  | sort \
  | uniq -c \
  | awk '$1 > 1 { print; i++; } END { print i }'
     10 AQ
     12 AR
     12 AU
     16 BR
     23 CA
      2 CD
      3 CL
      2 CN
      2 CY
      2 DE
      2 EC
      3 ES
      3 FM
      4 GL
      4 ID
      3 KI
      7 KZ
      2 MH
      3 MN
     12 MX
      2 MY
      2 NZ
      3 PF
      2 PG
      2 PS
      3 PT
     26 RU
      2 UA
      2 UM
     29 US
      2 UZ
31

For example, America/Blanc-Sablon (CA) is a Link to America/Puerto_Rico (PR), but CA has a total of 23 lines in zone.tab (including one that is specifically for America/Blanc-Sablon), and similar arrangements affect regions as populous as Africa/Kinshasa and Asia/Kuala_Lumpur. So this algorithm is underspecified. What is the actual intent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, shoot. Yeah, we'll need a better solution, thanks for catching this.

The problem to solve here is multi-hop Links like A => B => C where A and B are in the the same country but C is in a different country. This happens when an earlier TZDB release created the A => B Link, then a later release folded B into C, so the link that used to be A => B now was changed to A => C (along with a new Link for B => C).

There are 15 cases like this, conveniently separated into three cases of 5 each.

Case 1: A is in zone.tab

  • America/Kralendijk ⇨ America/Curacao ⇨ America/Puerto_Rico
  • America/Lower_Princes ⇨ America/Curacao ⇨ America/Puerto_Rico
  • America/Marigot ⇨ America/Port_of_Spain ⇨ America/Puerto_Rico
  • America/St_Barthelemy ⇨ America/Port_of_Spain ⇨ America/Puerto_Rico
  • Arctic/Longyearbyen ⇨ Europe/Oslo ⇨ Europe/Berlin

Case 2: A's country code has only one line in zone.tab

  • America/Virgin ⇨ America/St_Thomas ⇨ America/Puerto_Rico
  • Atlantic/Jan_Mayen ⇨ Europe/Oslo ⇨ Europe/Berlin
  • Iceland ⇨ Atlantic/Reykjavik ⇨ Africa/Abidjan
  • Africa/Asmera ⇨ Africa/Asmara ⇨ Africa/Nairobi
  • Africa/Timbuktu ⇨ Africa/Bamako ⇨ Africa/Abidjan

Case 3: A's country code has multiple lines in zone.tab (broken case in current PR's spec text)

  • (FM) Pacific/Ponape ⇨ Pacific/Pohnpei ⇨ Pacific/Guadalcanal
  • (FM) Pacific/Truk ⇨ Pacific/Chuuk ⇨ Pacific/Port_Moresby
  • (FM) Pacific/Yap ⇨ Pacific/Chuuk ⇨ Pacific/Port_Moresby
  • (CA) America/Coral_Harbour ⇨ America/Atikokan ⇨ America/Panama
  • (AQ) Antarctica/South_Pole ⇨ Antarctica/McMurdo ⇨ Pacific/Auckland

For that last case, the only way to know what B was is to look in the backzone file in TZDB. The old Links are listed there.

I'll draft some spec text to handle that case, and will let you know once I push it because I'm sure it will require a round or two of polishing. I suspect that the cleanest solution will be to just to use backzone for both the second and third cases above, and remove the complex spec text to handle the second case using zone.tab.

Thanks again for catching this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gibson042 I just pushed a solution to the problem you found, as well as fixing (I think) all the other review feedback you had except the quote characters. Want to review the latest?

For the cases above, it turned out that there wasn't a single algorithmic solution possible, so I used the existing PR's algorithm for Case 2 above, and used backzone for Case 3.

Sadly, we can't use backzone for both cases 2 and 3 because a single A => B => C chain in backzone requires the zone.tab algorithm. The problematic identifier is Atlantic/Jan_Mayen. This ID is linked to Europe/Oslo (country code "NO") in backzone, but it should be linked to Arctic/Longyearbyen which has the same country code ("SJ") as Atlantic/Jan_Mayen.

Let me know what you think of the latest commit.

Comment on lines 330 to 340
1. Let _zone_ be the Zone name that _identifier_ resolves to, according to the rules for resolving Link names in the IANA Time Zone Database.
1. If _zone_ starts with *"Etc/"*, then
1. Set _primary_ to _zone_.
1. Else,
1. Let _identifierCountryCode_ be the <a href="https://www.iso.org/glossary-for-iso-3166.html">ISO 3166-1 Alpha-2</a> country code whose territory contains the geographical area corresponding to _identifier_.
1. Let _zoneCountryCode_ be the ISO 3166-1 Alpha-2 country code whose territory contains the geographical area corresponding to _zone_.
1. If _identifierCountryCode_ is _zoneCountryCode_, then
1. Set _primary_ to _zone_.
1. Else,
1. Let _countryCodeLine_ be the line in file <code>zone.tab</code> of the IANA Time Zone Database where the "country-code" column is _identifierCountryCode_.
1. Set _primary_ to the contents of the "TZ" column of _countryCodeLine_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole block can be simplified a bit.

Suggested change
1. Let _zone_ be the Zone name that _identifier_ resolves to, according to the rules for resolving Link names in the IANA Time Zone Database.
1. If _zone_ starts with *"Etc/"*, then
1. Set _primary_ to _zone_.
1. Else,
1. Let _identifierCountryCode_ be the <a href="https://www.iso.org/glossary-for-iso-3166.html">ISO 3166-1 Alpha-2</a> country code whose territory contains the geographical area corresponding to _identifier_.
1. Let _zoneCountryCode_ be the ISO 3166-1 Alpha-2 country code whose territory contains the geographical area corresponding to _zone_.
1. If _identifierCountryCode_ is _zoneCountryCode_, then
1. Set _primary_ to _zone_.
1. Else,
1. Let _countryCodeLine_ be the line in file <code>zone.tab</code> of the IANA Time Zone Database where the "country-code" column is _identifierCountryCode_.
1. Set _primary_ to the contents of the "TZ" column of _countryCodeLine_.
1. Set _primary_ to the Zone name that _identifier_ resolves to, according to the rules for resolving Link names in the IANA Time Zone Database.
1. If _primary_ does not start with *"Etc/"*, then
1. Let _identifierCountryCode_ be the <a href="https://www.iso.org/glossary-for-iso-3166.html">ISO 3166-1 Alpha-2</a> country code whose territory contains the geographical area corresponding to _identifier_.
1. Let _zoneCountryCode_ be the ISO 3166-1 Alpha-2 country code whose territory contains the geographical area corresponding to _primary_.
1. If _identifierCountryCode_ is not _zoneCountryCode_, then
1. Let _countryCodeLine_ be the line in file <code>zone.tab</code> of the IANA Time Zone Database where the “country-code” column is _identifierCountryCode_.
1. Set _primary_ to the contents of the “TZ” column of _countryCodeLine_.

@justingrant justingrant force-pushed the time-zone-id-requirements branch 6 times, most recently from 48ecaec to 0810433 Compare May 23, 2024 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
ECMA-402 Meeting Topics
Previously Discussed
Development

Successfully merging this pull request may close these issues.

Should ECMA-402 spec text for time zone canonicalization refer to CLDR or to IANA as authoritative?
8 participants