Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorial: Canonicalization operation for calendar IDs #889

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ptomato
Copy link
Contributor

@ptomato ptomato commented May 8, 2024

AvailableCalendars should return all possible aliases, so that other places in the spec (e.g. in the future, validating a string calendar ID in Temporal) can use them to determine whether a given input value is valid. This input value can subsequently be canonicalized by another abstract operation, CanonicalizeCalendar.

In Intl.supportedValuesOf(), on the other hand, we should not return all possible aliases, so we filter them out using CanonicalizeCalendar before returning the list of AvailableCalendars codes as an array to the caller.

See tc39/proposal-intl-enumeration#49. This is the part of that PR that I consider relevant for the future integration of Temporal. The time zone parts were already done as part of #876. If desired, I could implement the rest of that PR, adding CanonicalizeCollation, CanonicalizeCurrency, CanonicalizeNumberingSystem, and CanonicalizeUnit as well.

Closes: #726

AvailableCalendars should return all possible aliases, so that other
places in the spec (e.g. in the future, validating a string calendar ID in
Temporal) can use them to determine whether a given input value is valid.
This input value can subsequently be canonicalized by another abstract
operation, CanonicalizeCalendar.

In Intl.supportedValuesOf(), on the other hand, we should not return all
possible aliases, so we filter them out using CanonicalizeCalendar before
returning the list of AvailableCalendars codes as an array to the caller.

See tc39/proposal-intl-enumeration#49. This is the
part of that PR that I consider relevant for the future integration of
Temporal. The time zone parts were already done as part of tc39#876. If
desired, I could implement the rest of that PR, adding
CanonicalizeCollation, CanonicalizeCurrency, CanonicalizeNumberingSystem,
and CanonicalizeUnit as well.

Closes: tc39#726
@ben-allen ben-allen added the editorial Involves an editorial fix label May 16, 2024
Comment on lines +133 to +135
1. Let _canonical_ be CanonicalizeCalendar(_identifier_).
1. If _identifier_ is _canonical_, then
1. Append _canonical_ to _list_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than introducing a calendar-specific operation, I'd like one that covers all "-u-…" keywords and is also used in ResolveLocale step 12.g.iii.

Suggested change
1. Let _canonical_ be CanonicalizeCalendar(_identifier_).
1. If _identifier_ is _canonical_, then
1. Append _canonical_ to _list_.
1. Let _canonical_ be CanonicalizeUValue(*"ca"*, _identifier_).
1. If _identifier_ is _canonical_, then
1. Append _identifier_ to _list_.
CanonicalizeUValue (
  _ukey_: a Unicode locale extension sequence key defined in <a href="https://unicode.org/reports/tr35/#Key_And_Type_Definitions_">Unicode Technical Standard #35 Part 1 Core, Section 3.6.1 Key and Type Definitions</a>,
  _uvalue_: a String,
): a String
1. Let _lowerValue_ be the ASCII-lowercase of _uvalue_.
1. Let _canonicalized_ be the String value resulting from canonicalizing _lowerValue_ as a value of key _ukey_ per <a href="https://unicode.org/reports/tr35/#processing-localeids">Unicode Technical Standard #35 Part 1 Core, Annex C LocaleId Canonicalization Section 5 Canonicalizing Syntax, Processing LocaleIds</a>.
1. NOTE: It is recommended that implementations use the 'u' extension data in <code>common/bcp47</code> provided by the Common Locale Data Repository (available at <a href="https://cldr.unicode.org/">https://cldr.unicode.org/</a>).
1. Return _canonicalized_.

): a List of Strings
</h1>
<dl class="header">
<dt>description</dt>
<dd>The returned List is sorted according to lexicographic code unit order, and contains unique canonical calendar types identifying the calendars for which the implementation provides the functionality of Intl.DateTimeFormat objects. The list must include *"iso8601"*.</dd>
<dd>The returned List is sorted according to lexicographic code unit order, and contains unique calendar types identifying the calendars for which the implementation provides the functionality of Intl.DateTimeFormat objects. The list must include *"iso8601"*.</dd>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<dd>The returned List is sorted according to lexicographic code unit order, and contains unique calendar types identifying the calendars for which the implementation provides the functionality of Intl.DateTimeFormat objects. The list must include *"iso8601"*.</dd>
<dd>The returned List is sorted according to lexicographic code unit order, and contains unique calendar types in canonical form (<emu-xref href="#sec-calendar-types"></emu-xref>) identifying the calendars for which the implementation provides the functionality of Intl.DateTimeFormat objects, including their aliases (e.g., either both or neither of *"islamicc"* and *"islamic-civil"*). The List must include *"iso8601"*.</dd>

Comment on lines +465 to +482

<emu-clause id="sec-canonicalizecalendar" type="abstract operation">
<h1>
CanonicalizeCalendar (
_id_: a String that is a calendar type,
): a String that is a calendar type
</h1>
<dl class="header">
<dt>description</dt>
<dd>
The returned String is the canonical and case-regularized form of _id_.
</dd>
</dl>
<emu-alg>
1. Return the string _id_ after performing the algorithm steps to replace Unicode extension values with their canonical form per <a href="https://unicode.org/reports/tr35/#Canonical_Unicode_Locale_Identifiers">Unicode Technical Standard #35 LDML § 3.2.1 Canonical Unicode Locale Identifiers</a>, treating _id_ as an `uvalue` production.
1. NOTE: For example, if _id_ is *"ISLAMICC"*, return *"islamic-civil"*.
</emu-alg>
</emu-clause>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<emu-clause id="sec-canonicalizecalendar" type="abstract operation">
<h1>
CanonicalizeCalendar (
_id_: a String that is a calendar type,
): a String that is a calendar type
</h1>
<dl class="header">
<dt>description</dt>
<dd>
The returned String is the canonical and case-regularized form of _id_.
</dd>
</dl>
<emu-alg>
1. Return the string _id_ after performing the algorithm steps to replace Unicode extension values with their canonical form per <a href="https://unicode.org/reports/tr35/#Canonical_Unicode_Locale_Identifiers">Unicode Technical Standard #35 LDML § 3.2.1 Canonical Unicode Locale Identifiers</a>, treating _id_ as an `uvalue` production.
1. NOTE: For example, if _id_ is *"ISLAMICC"*, return *"islamic-civil"*.
</emu-alg>
</emu-clause>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editorial Involves an editorial fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specify canonicalization algorithms for Intl enumeration
3 participants