Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with odd months when restricting to "day" precision (precisedelta) #14

Open
ddellspe opened this issue May 3, 2022 · 10 comments · May be fixed by #19 or #39
Open

Issues with odd months when restricting to "day" precision (precisedelta) #14

ddellspe opened this issue May 3, 2022 · 10 comments · May be fixed by #19 or #39

Comments

@ddellspe
Copy link

ddellspe commented May 3, 2022

What did you do?

I'm trying to provide more precision than "months" when the timedelta is more than 1 month, but I'm finding some issues using precisedelta set to days with a format of %d as seen below. This all seems to stem from the use of 30.5 as a divisor for mod in precisedelta and how it and ngettext interact with floats vs. ints when it comes to pluralize and the display of the days itself. For what it's worth, the same thing happens at second precision with microseconds present, etc. It seems to be at the minimum_unit where this logic takes place.

What did you expect to happen?

>>> import humanize
>>> import datetime
>>> humanize.precisedelta(datetime.timedelta(days=31), minimum_unit="days", format="%d")
'1 month'
>>> 
>>> import humanize
>>> import datetime
>>> humanize.precisedelta(datetime.timedelta(days=92), minimum_unit="days", format="%d")
'3 months'
>>> 
>>> import humanize
>>> import datetime
>>> humanize.precisedelta(datetime.timedelta(days=32), minimum_unit="days", format="%d")
'1 month and 1 day'
>>>

What actually happened?

>>> import humanize
>>> import datetime
>>> humanize.precisedelta(datetime.timedelta(days=31), minimum_unit="days", format="%d")
'1 month and 0 days'
>>> 
>>> import humanize
>>> import datetime
>>> humanize.precisedelta(datetime.timedelta(days=92), minimum_unit="days", format="%d")
'3 month and 0 days'
>>> 
>>> import humanize
>>> import datetime
>>> humanize.precisedelta(datetime.timedelta(days=32), minimum_unit="days", format="%d")
'1 month and 1 days'
>>>

What versions are you using?

  • OS: Windows, Ubuntu 20.04
  • Python: 3.8.10
  • Humanize: 4.1.0

Please include code that reproduces the issue (included in expected, actually happened above)

@hugovk
Copy link
Member

hugovk commented May 3, 2022

@eldipa Hi! Any thoughts on this?

@eldipa
Copy link
Contributor

eldipa commented May 16, 2022

Consider the following examples, similar to the ones that @ddellspe gave but without forcing a format (the following example will use the default format='%0.2f'):

>>> import humanize
>>> import datetime

>>> humanize.precisedelta(datetime.timedelta(days=31))
'1 month and 0 days'

>>> humanize.precisedelta(datetime.timedelta(days=62))
'2 months and 1 day'

>>> humanize.precisedelta(datetime.timedelta(days=92))
'3 months and 0 days'

>>> humanize.precisedelta(datetime.timedelta(days=32))
'1 month and 1 days'

For even months the delta seems ok but for odd months it looks weird. humanize defines a month as 30.5 days: when you have an even month you have multiples of 61 days (30.5 * 2). This removes the fractional part and the final delta looks reasonable (2 months and 1 day)

When the month is odd we get a fractional number of days. These fraction was not properly propagated to the next unit, hours, and remained in the days unit. The rest of the logic of precisedelta assumed that if there is a fractional part it is at the minimum unit which by defaults is seconds.

Therefore the fractional days was formatted with the default format for any integer unit: %d.

This explains why:

  • we get '1 month and 0 days' instead of '1 month and 0.5 days' (format was %d)
  • we don't have hours (the fractional part of the days was not propagated)

The fix is to do the propagation correctly.

With this, we will get the following results:

>>> import humanize
>>> import datetime

>>> humanize.precisedelta(datetime.timedelta(days=31))
'1 month and 12 hours'

>>> humanize.precisedelta(datetime.timedelta(days=62))
'2 months and 1 day'

>>> humanize.precisedelta(datetime.timedelta(days=92))
'3 months and 12 hours'

>>> humanize.precisedelta(datetime.timedelta(days=32))
'1 month, 1 day and 12 hours'

Notice how the fractional part of the days was propagated to the hours so now we get '1 month and 12 hours'.

If we want to use days as the minimum unit, we get:

>>> import humanize
>>> import datetime

>>> humanize.precisedelta(datetime.timedelta(days=31), minimum_unit='days')
'1 month and 0.50 days'

>>> humanize.precisedelta(datetime.timedelta(days=62), minimum_unit='days')
'2 months and 1 day'

>>> humanize.precisedelta(datetime.timedelta(days=92), minimum_unit='days')
'3 months and 0.50 days'

>>> humanize.precisedelta(datetime.timedelta(days=32), minimum_unit='days')
'1 month and 1.50 days'

Notice also how the fractional part didn't disappear as the original issue stated. Instead was correctly formatted with the default %0.2f because indeed this time days is the minimum unit to use.

Requiring an integer day number is just a matter of saying that:

>>> import humanize
>>> import datetime

>>> humanize.precisedelta(datetime.timedelta(days=31), minimum_unit='days', format='%d')
'1 month and 0 days'

>>> humanize.precisedelta(datetime.timedelta(days=62), minimum_unit='days', format='%d')
'2 months and 1 day'

>>> humanize.precisedelta(datetime.timedelta(days=92), minimum_unit='days', format='%d')
'3 months and 0 days'

>>> humanize.precisedelta(datetime.timedelta(days=32), minimum_unit='days', format='%d')
'1 month and 1 days'

There are some gotchas like '0 days' and '1 days'. The issue here is at the formatting level: using %d converts a non-integer number in integer during the formatting, much after we calculated which units are shown or not.

For this we need another fix: the possibility to truncate the fractional part before doing the formatting and if a 0 value is obtained, increase the minimum unit (go from hours to days).

These would be the results:

>>> import humanize
>>> import datetime

>>> humanize.precisedelta(datetime.timedelta(days=31), minimum_unit='days', format='%d', truncate=True)
'1 month'

>>> humanize.precisedelta(datetime.timedelta(days=62), minimum_unit='days', format='%d', truncate=True)
'2 months and 1 day'

>>> humanize.precisedelta(datetime.timedelta(days=92), minimum_unit='days', format='%d', truncate=True)
'3 months'

>>> humanize.precisedelta(datetime.timedelta(days=32), minimum_unit='days', format='%d', truncate=True)
'1 month and 1 days'

Notice that truncate=True does not imply format='%d' so it must be set explicitly.

This is what you get if you use the default format but also truncate=True:

>>> import humanize
>>> import datetime

>>> humanize.precisedelta(datetime.timedelta(days=31), minimum_unit='days', truncate=True)
'1 month'

>>> humanize.precisedelta(datetime.timedelta(days=62), minimum_unit='days', truncate=True)
'2 months and 1 day'

>>> humanize.precisedelta(datetime.timedelta(days=92), minimum_unit='days', truncate=True)
'3 months'

>>> humanize.precisedelta(datetime.timedelta(days=32), minimum_unit='days', truncate=True)
'1 month and 1.50 days'

eldipa added a commit to eldipa/humanize that referenced this issue May 16, 2022
@eldipa eldipa linked a pull request May 16, 2022 that will close this issue
@hugovk
Copy link
Member

hugovk commented May 19, 2022

Thanks for the writeup and PR!

I think we always want truncate=True, at least I don't think we'd ever want an extra "0 units" at the end.

Are there any cases where we might want to use truncate=False?

@eldipa
Copy link
Contributor

eldipa commented May 19, 2022

Actually truncate=False is what you want most of the time. Consider the following example:

>>> humanize.precisedelta(datetime.timedelta(days=31), minimum_unit='days')
'1 month and 0.50 days'

precisedelta tries to not loose any precision so it returns 0.50 (see doc

Now, if the user wants to use a interger-like formatting for the fractional part like "%d", only then it makes sense to use truncate=True because otherwise you will get a 0 units:

>>> humanize.precisedelta(datetime.timedelta(days=31), minimum_unit='days', format='%d')
'1 month and 0 days'

>>> humanize.precisedelta(datetime.timedelta(days=31), minimum_unit='days', format='%d', truncate=True)
'1 month'

See doc and doc

Also, not very sure if truncate should be public.... it feels a hack: the algorithm is okay but how to enable it is what it looks hacky. For example, assuming format != '%d', is any useful case to set truncate=True. I don't think so.

In an ideal world truncate should be automatically set from format but not sure how to do it without doing if format == '%d' ... (I'm not fully convinced to that, it is fragile). For example, for format = '%i' you have to truncate=True also.

nuztalgia added a commit to nuztalgia/forked-humanize that referenced this issue Jul 2, 2022
@nuztalgia
Copy link
Contributor

nuztalgia commented Jul 2, 2022

I think I stumbled upon a fix for this issue in #39 (originally meant to fix #20). With the changes I made to precisedelta in that PR, it now seems to match the "expected" behavior described in the OP:

What did you expect to happen?

>>> import humanize
>>> import datetime
>>> humanize.precisedelta(datetime.timedelta(days=31), minimum_unit="days", format="%d")
'1 month'
>>> 
>>> import humanize
>>> import datetime
>>> humanize.precisedelta(datetime.timedelta(days=92), minimum_unit="days", format="%d")
'3 months'
>>> 
>>> import humanize
>>> import datetime
>>> humanize.precisedelta(datetime.timedelta(days=32), minimum_unit="days", format="%d")
'1 month and 1 day'
>>>

These are the test cases I added to that PR to cover this issue, along with their results before and after my changes:

>>> precisedelta(dt.timedelta(days=31), "days", format="%d")
BEFORE:  "1 month and 0 days"
AFTER:   "1 month"
>>> precisedelta(dt.timedelta(days=31.01), "days", format="%d")
BEFORE:  "1 month and 0 days"
AFTER:   "1 month and 1 day"
>>> precisedelta(dt.timedelta(days=92), "days", format="%d")
BEFORE:  "3 months and 0 days"
AFTER:   "3 months"

The fixes for both issues are contained in this commit. My solution differs from the one proposed here in that it doesn't make any changes/additions to the public API for precisedelta. Instead, it just tries to "do the right thing" - which I suppose could be good or bad, although hopefully the tests will catch all of "the bad". 😄 Let me know what you think!

@nuztalgia
Copy link
Contributor

As a side note, (possibly?) related to this issue only happening with odd months:

One way to mitigate rounding bias when rounding values in a dataset is to round ties to the nearest even number at the desired precision... [this] is the strategy used by Python’s built-in round() function:

>>> round(4.5)
4
>>> round(3.5)
4
>>> round(1.75, 1)
1.8
>>> round(1.65, 1)
1.6

(Source: https://realpython.com/python-rounding/#rounding-half-to-even)

I learned this just now while putting together my PR and didn't see any mention of Python's rounding strategy in this thread, so figured I'd share it here too just in case it's useful. 😊

@eldipa
Copy link
Contributor

eldipa commented Jul 4, 2022

@nuztalgia I took a look to your PR, nice contribution!

I tested a few cases that it looks weird. The following is a comparison between the expected and the obtained output of your PR:

$ byexample -l python issue-14-days-hours-boundary-in-precisedelta.md
**********************************************************************
File "issue-14-days-hours-boundary-in-precisedelta.md", line 10
Failed example:
    humanize.precisedelta(datetime.timedelta(days=31))
Expected:
'1 month and 12 hours'
Got:
'1 month and 0 days'

**********************************************************************                                                                                                                         
File "issue-14-days-hours-boundary-in-precisedelta.md", line 16
Failed example:
    humanize.precisedelta(datetime.timedelta(days=92))
Expected:
'3 months and 12 hours'
Got:
'3 months and 0 days'
                                                                                                                                                                                               
**********************************************************************                                                                                                                         
File "issue-14-days-hours-boundary-in-precisedelta.md", line 19
Failed example:
    humanize.precisedelta(datetime.timedelta(days=32))
Expected:
'1 month, 1 day and 12 hours'
Got:
'1 month and 1 days'

While you PR fixes the "apparent" rounding issue, it is not the "real" issue: the odd months issue.

I left a description of the issue above and my PR fixes that in this commit

Now, what my PR never does is to define when or when not truncate the value when the user request an interger-like format like %d. This is when the "rounding" issue comes to play and here is where your regex makes sense.

Instead of letting the user to decide if truncate=True or truncate=False, we could use your regex to do internally truncate=re.fullmatch(r"%\d*(d|(\.0*f))", format) (so the truncate parameter is not longer public)

Having said that, the regex may not be complete. Python supports %i which it is also for intergers and it should be considered too.

And I would suggest to use a more strict regex with anchors like re.match(r"^%\d*(d|(\.0*f))$", format)

Does make this sense to you? Rebase your PR in terms of mine to have the proper fix plus the detection of when truncate or not automatically?

nuztalgia added a commit to nuztalgia/forked-humanize that referenced this issue Jul 11, 2022
@nuztalgia
Copy link
Contributor

That does make sense, thank you for the detailed explanation @eldipa!

I added the test cases you mentioned (as well as a few more that you outlined in your PR) to mine, and you're right that my changes didn't completely fix this issue. But cherry-picking your commit did the trick to make all the test cases pass! 😄

As for the regex, I believe the snippets we wrote are equivalent - mine uses re.fullmatch without the ^ and $ anchors, which I think is equivalent to re.match with those anchors. I'm happy to change it if I'm wrong or if your version is considered more readable, though!

With regards to supporting %i - You're totally right about this one too, and it's an easy enough change! While I was looking into it though, I noticed that the Python docs list a bunch of other formats that we could also support by expanding the regex. Do you think we should consider some more of those, or just limit it to %d and %i (and \.0*f) for now?

@eldipa
Copy link
Contributor

eldipa commented Jul 12, 2022

There are a lot of formats indeed!

The user that writes format="%d" has the intention to truncate the last unit. Similar for format="%i". In the issue #30 the user wrote format="%0.0f" to round the decimal part.

Now, and let me think this from another angle. What if, in addition to a string, we accept a function for format? The user could then write format=int and format=round to do a truncate/round the decimal part.

From code's perspective checking format is a function is easier than the regex and it would allow the user to choose which kind of rounding he/she wants.

Something like:

def precisedelta(....):
  # ...
  if iscallable(format):
     round_fn = format
     format = "%s"
  else:
     round_fn = None

   # ...
       if round_fn:   # (1)
            fmt_value = round_fn(fmt_value)  

Where (1) is here.

From users' perspective calling precisedelta(..., format=round) may be more readable than precisedelta(...., format="%0.0f")

What do you think?

@shazib-summar
Copy link

@eldipa the truncate parameter seems to be removed from the precisedelta func. Any ideas how I can truncate the output without it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants