-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Editorial: Canonicalization operation for calendar IDs #889
base: main
Are you sure you want to change the base?
Conversation
AvailableCalendars should return all possible aliases, so that other places in the spec (e.g. in the future, validating a string calendar ID in Temporal) can use them to determine whether a given input value is valid. This input value can subsequently be canonicalized by another abstract operation, CanonicalizeCalendar. In Intl.supportedValuesOf(), on the other hand, we should not return all possible aliases, so we filter them out using CanonicalizeCalendar before returning the list of AvailableCalendars codes as an array to the caller. See tc39/proposal-intl-enumeration#49. This is the part of that PR that I consider relevant for the future integration of Temporal. The time zone parts were already done as part of tc39#876. If desired, I could implement the rest of that PR, adding CanonicalizeCollation, CanonicalizeCurrency, CanonicalizeNumberingSystem, and CanonicalizeUnit as well. Closes: tc39#726
1. Let _canonical_ be CanonicalizeCalendar(_identifier_). | ||
1. If _identifier_ is _canonical_, then | ||
1. Append _canonical_ to _list_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than introducing a calendar-specific operation, I'd like one that covers all "-u-…" keywords and is also used in ResolveLocale step 12.g.iii.
1. Let _canonical_ be CanonicalizeCalendar(_identifier_). | |
1. If _identifier_ is _canonical_, then | |
1. Append _canonical_ to _list_. | |
1. Let _canonical_ be CanonicalizeUValue(*"ca"*, _identifier_). | |
1. If _identifier_ is _canonical_, then | |
1. Append _identifier_ to _list_. |
CanonicalizeUValue (
_ukey_: a Unicode locale extension sequence key defined in <a href="https://unicode.org/reports/tr35/#Key_And_Type_Definitions_">Unicode Technical Standard #35 Part 1 Core, Section 3.6.1 Key and Type Definitions</a>,
_uvalue_: a String,
): a String
1. Let _lowerValue_ be the ASCII-lowercase of _uvalue_.
1. Let _canonicalized_ be the String value resulting from canonicalizing _lowerValue_ as a value of key _ukey_ per <a href="https://unicode.org/reports/tr35/#processing-localeids">Unicode Technical Standard #35 Part 1 Core, Annex C LocaleId Canonicalization Section 5 Canonicalizing Syntax, Processing LocaleIds</a>.
1. NOTE: It is recommended that implementations use the 'u' extension data in <code>common/bcp47</code> provided by the Common Locale Data Repository (available at <a href="https://cldr.unicode.org/">https://cldr.unicode.org/</a>).
1. Return _canonicalized_.
): a List of Strings | ||
</h1> | ||
<dl class="header"> | ||
<dt>description</dt> | ||
<dd>The returned List is sorted according to lexicographic code unit order, and contains unique canonical calendar types identifying the calendars for which the implementation provides the functionality of Intl.DateTimeFormat objects. The list must include *"iso8601"*.</dd> | ||
<dd>The returned List is sorted according to lexicographic code unit order, and contains unique calendar types identifying the calendars for which the implementation provides the functionality of Intl.DateTimeFormat objects. The list must include *"iso8601"*.</dd> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<dd>The returned List is sorted according to lexicographic code unit order, and contains unique calendar types identifying the calendars for which the implementation provides the functionality of Intl.DateTimeFormat objects. The list must include *"iso8601"*.</dd> | |
<dd>The returned List is sorted according to lexicographic code unit order, and contains unique calendar types in canonical form (<emu-xref href="#sec-calendar-types"></emu-xref>) identifying the calendars for which the implementation provides the functionality of Intl.DateTimeFormat objects, including their aliases (e.g., either both or neither of *"islamicc"* and *"islamic-civil"*). The List must include *"iso8601"*.</dd> |
|
||
<emu-clause id="sec-canonicalizecalendar" type="abstract operation"> | ||
<h1> | ||
CanonicalizeCalendar ( | ||
_id_: a String that is a calendar type, | ||
): a String that is a calendar type | ||
</h1> | ||
<dl class="header"> | ||
<dt>description</dt> | ||
<dd> | ||
The returned String is the canonical and case-regularized form of _id_. | ||
</dd> | ||
</dl> | ||
<emu-alg> | ||
1. Return the string _id_ after performing the algorithm steps to replace Unicode extension values with their canonical form per <a href="https://unicode.org/reports/tr35/#Canonical_Unicode_Locale_Identifiers">Unicode Technical Standard #35 LDML § 3.2.1 Canonical Unicode Locale Identifiers</a>, treating _id_ as an `uvalue` production. | ||
1. NOTE: For example, if _id_ is *"ISLAMICC"*, return *"islamic-civil"*. | ||
</emu-alg> | ||
</emu-clause> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<emu-clause id="sec-canonicalizecalendar" type="abstract operation"> | |
<h1> | |
CanonicalizeCalendar ( | |
_id_: a String that is a calendar type, | |
): a String that is a calendar type | |
</h1> | |
<dl class="header"> | |
<dt>description</dt> | |
<dd> | |
The returned String is the canonical and case-regularized form of _id_. | |
</dd> | |
</dl> | |
<emu-alg> | |
1. Return the string _id_ after performing the algorithm steps to replace Unicode extension values with their canonical form per <a href="https://unicode.org/reports/tr35/#Canonical_Unicode_Locale_Identifiers">Unicode Technical Standard #35 LDML § 3.2.1 Canonical Unicode Locale Identifiers</a>, treating _id_ as an `uvalue` production. | |
1. NOTE: For example, if _id_ is *"ISLAMICC"*, return *"islamic-civil"*. | |
</emu-alg> | |
</emu-clause> |
AvailableCalendars should return all possible aliases, so that other places in the spec (e.g. in the future, validating a string calendar ID in Temporal) can use them to determine whether a given input value is valid. This input value can subsequently be canonicalized by another abstract operation, CanonicalizeCalendar.
In Intl.supportedValuesOf(), on the other hand, we should not return all possible aliases, so we filter them out using CanonicalizeCalendar before returning the list of AvailableCalendars codes as an array to the caller.
See tc39/proposal-intl-enumeration#49. This is the part of that PR that I consider relevant for the future integration of Temporal. The time zone parts were already done as part of #876. If desired, I could implement the rest of that PR, adding CanonicalizeCollation, CanonicalizeCurrency, CanonicalizeNumberingSystem, and CanonicalizeUnit as well.
Closes: #726