Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Year of week-of-year support #1206

Merged
merged 10 commits into from
Dec 4, 2021
Merged

Conversation

mildgravitas
Copy link
Contributor

@mildgravitas mildgravitas commented Oct 25, 2021

Implements year of year-of-week support for #488 and:

  • Adds a dedicated Year enum to datetime::options::components::Bag.
  • Loosens get_best_available_format_pattern() & adjust_pattern_field_lengths() to match FieldSymbol enums but not their data. This is necessary for components::Year matching to work & also improves h12<->h23 coerced time patterns which previously dropped the timezone.

@zbraniecki
Copy link
Member

@mildgravitas this PR will have to wait for #1198 which will bitrot it. Please, wait for that to land and rebase on top

@zbraniecki zbraniecki marked this pull request as draft October 25, 2021 18:38
@Manishearth
Copy link
Member

(CI is failing due to a bug in CI that is fixed on main, feel free to ignore it)

@gregtatum
Copy link
Member

I'm requested for review on here, and the PR is marked as draft. Are you looking for early feedback @mildgravitas or full review?

@Manishearth
Copy link
Member

@gregtatum I think Github autorequested review here? Usually when it says "as code owners" it's because GitHub picked reviewers automatically

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • components/datetime/src/fields/symbols.rs is different
  • components/datetime/src/format/datetime.rs is different
  • components/datetime/src/options/components.rs is different
  • components/datetime/src/skeleton/helpers.rs is different
  • components/datetime/src/skeleton/mod.rs is different
  • provider/testdata/data/json/datetime/gregory_lengths@1/fr.json is different
  • provider/testdata/data/json/datetime/gregory_lengths@1/ru.json is different
  • provider/testdata/data/json/datetime/gregory_lengths@1/th.json is different
  • provider/testdata/data/testdata.postcard is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@mildgravitas
Copy link
Contributor Author

#1198 was merged & I've rebased onto 6fba6e5 -> switch back from draft.

@mildgravitas mildgravitas marked this pull request as ready for review October 29, 2021 15:40
Copy link
Member

@zbraniecki zbraniecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, nothing major but would like to re-review based on the responses.

I'd also like @gregtatum to review the discriminant_idx and its impact on skeleton selection.

#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub enum Year {
/// The numeric value of the year, such as "2021".
#[cfg_attr(feature = "serde", serde(rename = "numeric"))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why are you renaming it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For compatibility with existing usage in test fixtures. I can remove the renames & update the latter if preferable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do. ICU4X data shouldn't use renames to conform to anything external. And test fixtures should be updated when the values change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/// The numeric value of the year in "week-of-year", such as "2021".
NumericWeekOf,
/// The two-digit value of the year in "week-of-year", such as "21".
TwoDigitWeekOf,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: the TwoDigit and TwoDigitWeekOf and Numeric and NumericWeekOf comments are exactly the same. They don't help understand what's the difference and why we need both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're not equivalent but the difference is subtle. I've added examples to make it more salient.

Copy link
Member

@zbraniecki zbraniecki Nov 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Thanks, that's helpful, but the reader still has to derive an implicit information about the actual value. Could you extend the comments for NumericWeekOf and TwoDigitWeekOf to:

    /// The numeric value of the year in "week-of-year", such as "2019" of the "week 01 of 2019" for the
    /// week of 2018-12-31 according to the ISO calendar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"full": "HH:mm:ss",
"long": "HH:mm:ss",
"full": "HH:mm:ss v",
"long": "HH:mm:ss v",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why are you changing test data JSON files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a consequence of the discriminant_cmp() changes to skeleton matching: the changed patterns are 12<->24 hour translations of CLDR's full or long time patterns. The algorithm (from components/datetime/src/pattern/hour_cycle.rs) here is:

  1. Take the original pattern & swap 12<->24 hour symbols. e.g. 'h:mm:ss a zzzz' -> 'H:mm:ss a zzzz'.
  2. Turn the pattern into a skeleton. e.g. 'H:mm:ss a zzzz' -> 'Hmsv'.
  3. Call skeleton::create_best_pattern_for_fields() to find a matching pattern.

Previously 'v' & 'z' were considered distinct so create_best_pattern_for_fields() would settle for a partial match of 'Hmsv' against 'Hms'. Now it finds an exact match.

It's possible to cancel this change by altering discriminant_cmp() to compare the inner data for TimeZone, but given that this is arguably more correct I left it in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • provider/testdata/data/json/datetime/gregory_lengths@1/es-AR.json is different
  • provider/testdata/data/json/datetime/gregory_lengths@1/es.json is different
  • provider/testdata/data/json/datetime/gregory_lengths@1/sr-Cyrl.json is different
  • provider/testdata/data/json/datetime/gregory_lengths@1/sr-Latn.json is different
  • provider/testdata/data/json/datetime/gregory_lengths@1/sr.json is different
  • provider/testdata/data/testdata.postcard is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@mildgravitas
Copy link
Contributor Author

Rebased onto dc5ff31

zbraniecki
zbraniecki previously approved these changes Nov 2, 2021
@zbraniecki
Copy link
Member

This looks good to me - with the changes I requested.

I'd like @nordzilla or @gregtatum to review the skeleton's changes in particular before we merge this.

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • provider/testdata/data/testdata.postcard is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@mildgravitas
Copy link
Contributor Author

Rebased onto 0c855b1

Copy link
Member

@nordzilla nordzilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall, though @gregtatum is certainly more of an authority on the skeletons and patterns here, so I'd still like to wait for his review.

I had a couple things I'd like to see changed with regard to style.

components/datetime/src/skeleton/helpers.rs Outdated Show resolved Hide resolved
components/datetime/benches/fixtures/tests/components.json Outdated Show resolved Hide resolved
components/datetime/src/options/components.rs Outdated Show resolved Hide resolved
@zbraniecki
Copy link
Member

@mildgravitas - that looks good to me! Can you please move all components to use kebab-case? including the ones that are already there now with camel case (like Numeric, Text, Month, Week, TimeZoneName)?

zbraniecki
zbraniecki previously approved these changes Nov 4, 2021
Copy link
Member

@zbraniecki zbraniecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thank you!

@nordzilla nordzilla self-requested a review November 5, 2021 18:59
nordzilla
nordzilla previously approved these changes Nov 5, 2021
Copy link
Member

@nordzilla nordzilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@sffc sffc requested a review from gregtatum November 8, 2021 15:42
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • components/datetime/src/format/datetime.rs is different
  • components/datetime/tests/datetime.rs is different
  • provider/testdata/data/testdata.postcard is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • components/datetime/src/format/datetime.rs is different
  • components/datetime/src/provider/date_time.rs is different
  • components/datetime/src/skeleton/mod.rs is different
  • components/datetime/tests/datetime.rs is different
  • provider/testdata/data/testdata.postcard is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • components/datetime/src/format/datetime.rs is different
  • components/datetime/src/provider/date_time.rs is different
  • components/datetime/src/skeleton/helpers.rs is different
  • components/datetime/src/skeleton/mod.rs is different
  • components/datetime/tests/datetime.rs is different
  • provider/testdata/data/json/datetime/gregory_lengths@1/ar-EG.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/ar.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/bn.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/ccp.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/en-001.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/en-ZA.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/en.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/es-AR.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/es.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/fil.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/fr.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/ja.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/ru.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/sr-Cyrl.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/sr-Latn.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/sr.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/th.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/tr.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/gregory_lengths@1/und.json is no longer changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/ar-EG.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/ar.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/bn.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/ccp.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/en-001.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/en-ZA.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/en.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/es-AR.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/es.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/fil.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/fr.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/ja.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/ru.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/sr-Cyrl.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/sr-Latn.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/sr.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/th.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/tr.json is now changed in the branch
  • provider/testdata/data/json/datetime/lengths@1/gregory/und.json is now changed in the branch
  • provider/testdata/data/testdata.postcard is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • provider/testdata/data/testdata.postcard is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

…th adjustments for it.

To do this I've loosened get_best_available_format_pattern() to match on
FieldSymbol enums but not their data. From the function's greater/lesser
matching this is apparently what the function tried to do all along. Without
this Year::NumericWeekOf wouldn't match as CLDR skeletons use 'y' even for
patterns with 'Y'

This accessorily improves full & long time_h11_h12/time_h23_h24
patterns: the h11_h12/h23_h24 coercion logic matches adjusted patterns
against skeletons & previously 'z' was not matched againts 'v' leading
to the time zone being dropped.

If we don't care to expose the week-of year variants in components::Bag
& don't care about coerced time patterns then only
adjust_pattern_field_lengths() need be adjusted.
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • components/datetime/src/error.rs is different
  • components/datetime/src/fields/symbols.rs is different
  • components/datetime/src/format/datetime.rs is different
  • components/datetime/src/options/components.rs is different
  • components/datetime/src/provider/date_time.rs is different
  • components/datetime/src/skeleton/helpers.rs is different
  • components/datetime/src/skeleton/mod.rs is different
  • components/datetime/tests/datetime.rs is different
  • components/datetime/tests/fixtures/tests/components-width-differences.json is different
  • provider/testdata/data/json/datetime/lengths@1/gregory/th.json is different
  • provider/testdata/data/testdata.postcard is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@Manishearth
Copy link
Member

@zbraniecki @gregtatum are there any leftover things to do to land this PR? I think we're just missing a final review.

@Manishearth
Copy link
Member

@mildgravitas btw, we changed how the data is stored, so if you rebase again you probably will want to cargo make download-testdata and then cargo make testdata to generate the new data

@mildgravitas
Copy link
Contributor Author

@Manishearth: Thanks for the heads up. Seems like I'm already top-of-tree from yesterday's rebase onto dc414a8. Running testdata-download && testdata produces no diffs so all good for the time being it seems.

@Manishearth
Copy link
Member

Ah! I was looking at an older force-push-webhook comment which listed gregory_lengths etc. Yep, your latest push should be fine!

Copy link
Member

@gregtatum gregtatum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking good to me! Thanks for addressing my comments.

@Manishearth
Copy link
Member

Going to hit merge since Zibi already gave an "r+ except for minor issues" review above and they seem to have been addressed; I'd rather not let this bitrot again 😄

@Manishearth Manishearth merged commit a0f78c5 into unicode-org:main Dec 4, 2021
@mildgravitas mildgravitas deleted the year_week_of branch December 7, 2021 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants