Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider new helper methods for LocalResult #716

Open
botahamec opened this issue Jun 21, 2022 · 15 comments
Open

Consider new helper methods for LocalResult #716

botahamec opened this issue Jun 21, 2022 · 15 comments
Labels
API-incompatible Tracking changes that need incompatible API revisions
Milestone

Comments

@botahamec
Copy link
Contributor

From #687

Furthermore, while the LocalResult is of course correct, I think it would be nice to have some kind of "lossy" output. I would assume that most of the time there is only one result (as the DST transitions are mostly in the night anyway, where they have the least impact). Currently the user has to handle this situation every time, while most of the time the conversion will be correct. Anyway, I think it would be good for a time library to have some sensible default how to work with this, as not every user might know what to do in this case.

I checked on the Java time API, and they handle it like this (https://docs.oracle.com/javase/8/docs/api/java/time/LocalDateTime.html#atZone-java.time.ZoneId-):

In most cases, there is only one valid offset for a local date-time. In the case of an overlap, where clocks are set back, there are two valid offsets. This method uses the earlier offset typically corresponding to "summer".

In the case of a gap, where clocks jump forward, there is no valid offset. Instead, the local date-time is adjusted to be later by the length of the gap. For a typical one hour daylight savings change, the local date-time will be moved one hour later into the offset typically corresponding to "summer".

To obtain the later offset during an overlap, call ZonedDateTime.withLaterOffsetAtOverlap() on the result of this method. To throw an exception when there is a gap or overlap, use ZonedDateTime.ofStrict(LocalDateTime, ZoneOffset, ZoneId).

Why not create an additional method on LocalResult which returns something like the Java handling (not quite sure how other time libraries handle this situation)?

I like what we have already (the earliest and latest methods might just be as far as we want to go), but I wanted to see what people think about this. I don't think there's anything we could do without a breaking change. One idea is to have the implementor of the TimeZone interface provide a later or earlier time that would work.

@esheppa
Copy link
Collaborator

esheppa commented Jun 22, 2022

Thanks for adding this @botahamec - and thanks @gerritsangel for the original comments - I think this is a great idea.

As you mention this will be tricky without a breaking change, so its probably a 0.5 change, one option is we could change the LocalResult to a struct that keeps the original naive timestamp, and internally stores something like the current LocalResult. From here we could build multiple different ways to extract a DateTime potentially including the methodology listed above.

pub struct LocalResult<T> {
    naive_local: NaiveDateTime, 
    zoned: PotentiallyAmbiguous<T>,
}

enum PotentiallyAmbiguous<T> {
    None,
    Single(T),
    Ambiguous(T, T),
}

@djc
Copy link
Contributor

djc commented Jun 23, 2022

Why is this tricky without a breaking change? Why couldn't we add methods to today's definition of LocalResult?

@botahamec
Copy link
Contributor Author

@djc I think we would need to modify the TimeZone interface in order to support getting earlier and later valid times for the timezone. Something like

  1. Try to use 2:30 in some United States standard timezone on a specific day
  2. Returns LocalResult::None, because 2:30 is skipped that day (it just so happens you chose a date where DST changes)
  3. I want the next valid time
  4. The TimeZone interface figures out when that would be

@esheppa
Copy link
Collaborator

esheppa commented Jun 25, 2022

@djc - because in the None case of the current LocalResult we have lost the original local timestamp, which we need to complete a lossy conversion to a DateTime<Tz>

@botahamec - This is an interesting approach, we could leave LocalResult as it is and do the work by adding more functions to the TimeZone trait, essentially have a from_local_datetime_lossy which returns a DateTime<Tz> rather than a LocalResult<DateTime<Tz>>. If we can give this a default implementation then it also isn't a breaking change, so this is a pretty good reason to do this on the TimeZone trait. We could actually build this with what we have now just by using some heuristics like adding and subtrating a few hours to get the offsets either side of the given NaiveDateTime in the LocalResult::None case (figuring out a good choice of hours here would be important to capture unusual transitions)

@botahamec
Copy link
Contributor Author

@esheppa Interesting. I think we should try to do two methods though. One to find the next valid time, and one to find the previous valid time. I think to find we'd try to add an hour until we find a valid time, and then do a funky binary search to figure out the exact time we need. For most time zones, that should be fine because we're just worried about DST. But sometimes timezones will skip ahead by entire months, or someone will write a dummy time zone which always returns None (we'll also need to handle the case where the next valid time is ambiguous, for some reason). The binary search also doesn't work for a time zone where only the 10th and 30th seconds of each hour are valid.

Maybe if the next hour doesn't work, we try the next day, then the next month, then the next year, then 100 years, then we do the maximum possible date time. That won't work for a time zone where the first second of every hour is skipped.

The more I think about it, the harder it is for me to imagine that we could find a good default implementation.

@kevincox
Copy link
Contributor

This is biting me too. I want to schedule something at time X in a given day. However it needs to be in the users time zone and I need to convert to UTC for the actual scheduling. It works fine in most cases when there is no ambiguity everything is simple, when the time occurs twice I just pick the first one (good enough for my use case). However right now if the "correct" time occurs during a jump I can't find any way to get the right time. For my purposes I would want to schedule at the jump, for example if the cock jumps from 01:59 -> 03:00 I want to run at 03:00. However I can't find any way to get this timestamp with chrono. Right now the best solution I have found is just adding a small amount of time to the date and keep trying until I get a result but this isn't great and doesn't give the best result.

@botahamec
Copy link
Contributor Author

@kevincox For what it's worth, I think in that specific use case, you should probably create an error message saying that time doesn't exist on that day

Now that I think about it, I'm not quite sure what I would consider to be a good use case for this API

@kevincox
Copy link
Contributor

For my use case this is a repeating event and users would rather that it occurs on this day at the closest correct time instead of erroring out and missing a day.

@botahamec
Copy link
Contributor Author

@kevincox Ah, that makes sense. I can imagine that causing problems too, but that's probably a fair compromise.

@kevincox
Copy link
Contributor

Yeah, it isn't perfect for every use case but it works very well for my current problem.

@esheppa
Copy link
Collaborator

esheppa commented Jul 30, 2022

@kevincox - if you are using chrono-tz then can solve most cases by using user_tz.offset().base_utc_offset() and user_tz.offset().dst_offset() to get the possible offsets to more easily calculate the closest valid local time, this is not perfect as it won't capture less common timezone changes but you can fall back to your existing method in cases when this doesn't work.

That being said, in 0.5 we will try to find a more ergonomic way to to this, as well as some helper functions for the most common closest-valid-time calculations.

@esheppa
Copy link
Collaborator

esheppa commented Aug 1, 2022

Hi @kevincox - on a further review of the code I think my previous answer may be wrong. It looks like dst_offset returns 0 when not in DST and the increment between the base_utc_offset and the total DST offset when in DST. You could try instead skipping forward to the next whole hour. This should work in many cases, however AFAIK there are some timezones which regularly skip forwards more than one hour, so those could be solved by continuing to skip forward by whole hours until you reach a valid time.

@esheppa esheppa added this to the 0.5 milestone Aug 2, 2022
@esheppa esheppa added the API-incompatible Tracking changes that need incompatible API revisions label Aug 2, 2022
@msdrigg
Copy link
Contributor

msdrigg commented Aug 16, 2022

This is something I was just about to create an issue for. Lossy conversion is good enough for me, but there isn't really a way to do it at all now.

@kevincox
Copy link
Contributor

kevincox commented Mar 11, 2023

I'm glad that this is considered a blocking issue for v0.5! This has caused some small trouble recently as daylight savings is approaching. To share some more thoughts and maybe spark some discussion:

Cases

Forward Jumps

  • For forward jumps you could argue that the time immediately before, after and all times during the jump are actually the same instant. They are the same when converted to UTC at least.
  • I don't think it is possible to represent the "instant before the jump" right now as it would be something like 02:59.999…. So the best time in general will be the time after the jump. However it may be useful to give a time before the jump for doing math operations (such as figuring out how far the jump is by doing local_result.after_jump() - local_result.before_jump().
  • I think it still makes sense to be explicit about this case (unlike the Java example above) but it should be easy to bypass.

In summary I see the following information as being useful:

  1. How "long" is the jump. It's a weird question but it is basically how much time was missed.
  2. What is the instant after the jump.

Backward Jumps

  • The is conceptually simpler. There are just two possible times, return them both.

Single time

  • Trivial.

Current API

Data

pub enum LocalResult<T> {
    None,
    Single(T),
    Ambiguous(T, T),
}

I think the obvious flaw here is that in the case of forward jumps we have no information. Something like None{instant_after: T, second_before: T} may work. Although I'm not sure if it is better to store the second before or some other form of it like the jump duration in logical time. (Would usually be +/- 1h for DST). Or maybe the duration isn't required as IIUC you can subtract one second from the instant after time and the zone aware datetime will do the jump, you can then covert to UTC and compare. In that case we would just need None(T) in which case it is sort of the same as Single but warning that it doesn't exactly "match" the time you asked for (but is sort of the same instant). Also should we rename it to something like InGap, EquivelentTime, Unrepresentable, Lossy or something? I can't think of a name that I like more than None even if I don't love None.

Helper Modifications

To integrate this new data into the helper set I took a look at what we have and what could be changed.

Existing earliest and latest

I think that these should probably stay unchanged. Two thoughts are below but I think it is best to stay explicit and avoid changing the behaviour of existing code.

These could return the instant after the jump on the basis that it is an equal time. This would make these methods infallible But I think it is best to be explicit.

Alternatively earliest could return the time before a forward jump as well but since that exact instant can't be represented it is better to handle that in a different method.

Existing single

I think this method should stay unchanged. It is is for unwrapping the simple case. It could arguably return the instant after a forward jump but probably shouldn't for the reasons mentioned above.

New lossy_earliest and lossy_latest

Similar to earliest() and latest() except in the case of a forward jump they return the instant after the jump. This means that they can't fail, they always return a T.

I don't love the name but can't think of anything better right now. They are "lossy" since you can't go back to the original (invalid) local time. Going back will return the end of the jump. Other names I thought of were earliest_valid as it returns the nearest time that is valid.

New lossy_single

Same as single() except in the case of a forward jump it returns the instant after the jump. Still returns None in the case of Ambiguous

New collapse

I don't think this is the best interface but it is an interesting idea. Instead of adding lossy_* we can just add one method collapse that returns a type without None. Then ValidLocalResult can have earliest and latest which are infallible.

fn collapse(self) -> ValidLocalResult {
  match self {
    LocalResult::None(t) => ValidLocalResult::Single(t),
    LocalResult::Single(t) => ValidLocalResult::Single(t),
    LocalResult::Ambiguous(e, l) => ValidLocalResult::Ambiguous(e, l),
  }
}

Overall I don't think it makes sense to have a separate type for this, just to avoid lossy_earliest and lossy_latest. But it is an interesting idea.

Summary

My current thought on the desired changes are below. But I'm not a datetime expert so would appreciate input. Notably I am assuming that the only reason that the time can be invalid is a forward jump. Are there other cases such as when near the ends of representable time? In that case we would need an extra OutOfRange variant possibly with a nearest time parameter.

  • Update None variant to include the instant after forward jumps.
  • Add lossy_* methods which are equivalent to existing methods but treat None(T) the same as Single(T).

@sciyoshi
Copy link

sciyoshi commented Mar 20, 2024

My suggestion is to mimic the behavior Python takes in PEP 495, which is to change None to instead be Gap(T, T) or Invalid(T, T), where the earliest and latest times are the ones where the pre-transition and post-transition rules are in effect respectively. I see that this discussion has been picked up in #1448 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API-incompatible Tracking changes that need incompatible API revisions
Projects
None yet
Development

No branches or pull requests

6 participants