Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CalendarDuration type #1290

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

pitdicker
Copy link
Collaborator

See #1282.

As a first PR this adds nothing but the type, methods to set and get the individual components, and a Display implementation.

We have 5 components: months, days, minutes, seconds and nanoseconds.
Example of ways to initialize a CalendarDuration:

use chrono::CalendarDuration;
// Duration in calendar months and days.
CalendarDuration::new().months(15).days(5);
// Duration in calendar days, and 3 hours.
CalendarDuration::new().days(5).hms(3, 0, 0).unwrap();
// Duration that will precisely count the number of seconds, including leap seconds.
CalendarDuration::new().seconds(23_456).millis(789).unwrap();

As described in #1282 (comment) we squeeze both the minutes and seconds components into one u32.
The idea is that it is too strange and niche to have two large components that only differ in how they count leap seconds.
So we either have:

  • a large number of minutes, and sub-minute seconds (which works almost like ignoring leap-seconds exist)
  • a large number of minutes (which accurately count leap seconds)

The Debug implementation will format the type as if it is a struct with fields for 5 components.

@pitdicker
Copy link
Collaborator Author

pitdicker commented Sep 14, 2023

For initialization the methods work much like the builder pattern, except that we don't need a seperate builder type.
We have methods to initialize the individual components as months(), days(), hms(), seconds() and nanos().
For convenience I added methods that initialize with other units, but with a name that hopefully helps to make clear on which component they act: years_and_months(), weeks_and_days(), micros() and millis().

I have intentionally added only methods that set a field, not methods that add some value to a field. And also not Add implementations. It is quite easy to create a CalendarDuration with a combination of components that will be near-impossible to explain. In my opinion the type should not be treated as black boxes that you can just add together. You should be aware of the existing components when changing anything.

To get the value of the individual components, we have not five but four methods (because minutes and seconds are closely related): months_component(), days_component(), mins_and_secs_component(), and nanos_component(). No convenience methods here.

@codecov
Copy link

codecov bot commented Sep 14, 2023

Codecov Report

Attention: Patch coverage is 92.24806% with 20 lines in your changes are missing coverage. Please review.

Project coverage is 91.81%. Comparing base (f8cecbe) to head (3739e2f).

Files Patch % Lines
src/calendar_duration.rs 92.24% 20 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #1290    +/-   ##
========================================
  Coverage   91.80%   91.81%            
========================================
  Files          37       38     +1     
  Lines       18151    18409   +258     
========================================
+ Hits        16664    16902   +238     
- Misses       1487     1507    +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start!

src/calendar_duration.rs Outdated Show resolved Hide resolved
// `seconds` can either encode `minutes << 6 | seconds`, or just seconds.
seconds: u32,
// `nanos` encodes `nanoseconds << 2 | has_minutes << 1 | 1`.
nanos: NonZeroU32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to implement the boring version here and optimize memory usage later if we decide that is necessary. Since these are public fields anyway, there should be no need to prematurely optimize.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a little premature optimization going on. But my goal is more about having two clear modes of operation. My thought process went like this:

  1. Include a flag for 'I care about leap seconds'/'leap seconds don't exist'.
    However that is not really something we can serialize, and also not enough to make CalendarDuration at least in theory capable of working with leap seconds correctly.
  2. Have explicit minutes and seconds fields.
  3. Suppose we pre-multiply the minutes field with 60. This turns the fields into seconds_in_utc and seconds_in_tai.
    This is becoming a mess. Having two pretty much the same fields that only differ in how they count leap seconds within one duration seems unreasonable.
  4. Introduce two sane modes of operation:
    • arbitrary large minutes and only sub-minute seconds
    • arbitrary large seconds

To encode that with rust types:

pub struct CalendarDuration {
    // Components with a nominal duration
    months: u32,
    days: u32,
    // Components with an accurate duration
    accurate: AccurateDuration,
}

enum AccurateDuration {
    MinutesAndSubMinuteSecondsAndNanos(u32, u8, u32),
    SecondsAndNanos(u32, u32);
}

We can simplify that a bit if we manually uphold the relation between minutes and seconds fields to:

pub struct CalendarDuration {
    // Components with a nominal duration
    months: u32,
    days: u32,
    // Components with an accurate duration
    minutes: u32,
    seconds: u32,
    nanos: u32,
}

The idea is that if you have an existing CalendarDuration and change it with with_seconds, it should switch to the SecondsAndNanos mode. Previous minutes and seconds should be forgotten. And if minutes is non-zero, seconds must be less than 61.

If we already have to uphold a relation between minutes and seconds anyway, why not encode them in a single u32 and get the smaller type size for free? And if we don't do it now but want to keep the option for the optimization open for later, we should already include the range checks.

I have to admit stuffing the discriminant inside the nanos field, and using the other bit for niche optimization, is a premature optimization. Can we please keep it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO that's way too much magic. I'd rather have the actual enum if we're going to do something like this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an MinutesAndSeconds type to encode this in.

And I switched to accurate part of a duration to a 64-bit type.
With a 32-bit value we could only store an accurate duration of ca. 136 years. With a 64-bit type CalendarDuration can cover all uses over std::time::Duration and TimeDelta (forgetting the negative part). With this type we also never have to make methods fallible that construct a duration from the difference between two dates; the result always fits.

}

/// Set the months component of this duration to `months`.
pub const fn months(mut self, months: u32) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should call this with_months() and allow months() to be used for the getter (same for the other ones).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed. I was going for having short names for initialization, but there is something to say for making the getters have the short names.

src/calendar_duration.rs Outdated Show resolved Hide resolved
src/calendar_duration.rs Outdated Show resolved Hide resolved
src/calendar_duration.rs Outdated Show resolved Hide resolved
src/calendar_duration.rs Show resolved Hide resolved
src/calendar_duration.rs Outdated Show resolved Hide resolved
src/calendar_duration.rs Outdated Show resolved Hide resolved
src/calendar_duration.rs Outdated Show resolved Hide resolved
@djc
Copy link
Contributor

djc commented Sep 15, 2023

(Please avoid adding more stuff to this PR -- other than per review feedback.)

@pitdicker
Copy link
Collaborator Author

Setting to draft for now. I have most of the parser ready, but need to clean up things a bit more before it is ready a second look.

@pitdicker pitdicker force-pushed the calendar_duration branch 4 times, most recently from ceee9b7 to 1916234 Compare September 18, 2023 09:41
@pitdicker pitdicker marked this pull request as ready for review September 18, 2023 11:16
@pitdicker
Copy link
Collaborator Author

(Please avoid adding more stuff to this PR -- other than per review feedback.)

Sorry. I added two more things, but minor I think: #[must_use] to a few methods, and a test to check the setters correctly work with our-of-range values.

@pitdicker
Copy link
Collaborator Author

Started working on this again.
@djc are you sure you don't want to have the parsing code split out from this PR so we can merge a minimal part first?

@pitdicker pitdicker force-pushed the calendar_duration branch 5 times, most recently from 9bc4c86 to a6790d2 Compare March 24, 2024 11:39
@pitdicker
Copy link
Collaborator Author

Put a couple of hours into self-review and improving the initial documentation.

Copy link
Contributor

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's split this into smaller chunks, first the CalendarDuration type itself (up to and including the Default commit, except squashing that) but removing some of the extra setters.

src/calendar_duration.rs Show resolved Hide resolved
src/calendar_duration.rs Outdated Show resolved Hide resolved
src/calendar_duration.rs Outdated Show resolved Hide resolved
src/calendar_duration.rs Outdated Show resolved Hide resolved
///
/// # Encoding
///
/// - Seconds: `seconds << 2 | 0b10`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much memory is this stuff saving? It's not obvious to me that this is worth it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main motivation is to encode the either seconds or minutes and seconds cases.

Still I prefer to make the type not larger than necessary, especially if one of the goals is to replace the Days and Months types. An enum will add 8 bytes to a type of currently 20 bytes; 40%.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time thinking this over and would still prefer that we start with a "real" enum -- that doesn't rely on bit-shifting tricks. If we get an issue that complains about the size we can decide whether it's worth optimizing this. But it seems unlikely to me at this point that people will want to hold a ton of these in an Vec or similar.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With an enum we would still have most of the complexity.

  • In my opinion we want to keep the range checks in MinutesAndSeconds::from_seconds and from_minutes_and_seconds.
    The restricted range for minutes gives as the guarantee minutes * 60 + seconds does not overflow.
    The checks would be needed to allow for this 'optimization' in the future.
  • We still need the special case where we encode a value created with from_minutes_and_seconds(0, s) the same as from_seconds(s) so that they compare equal.
  • We still need a match to get both values in mins_and_secs().

Switching to an enum is not going to save many lines or maybe even none.

I just don't consider a couple of bitwise operations much of a problem and well worth it for the much nicer type size. Really don't want to drop them 😞.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we going to proceed here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this is a textbook case of premature optimization and think we should drop the bitshifting for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a difference of opinion here. If chrono gets a CalendarDuration as new functionality I want it to be as good as I can make it. Do you want to block the functionality on this detail?

src/calendar_duration.rs Outdated Show resolved Hide resolved
src/calendar_duration.rs Outdated Show resolved Hide resolved
src/calendar_duration.rs Show resolved Hide resolved
@pitdicker
Copy link
Collaborator Author

@djc You gave an approval, but want to have a final look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants