naturaltime: wrong translation with delta >= 2 years #21

Mathieu-Ghaleb · 2022-06-16T23:53:47Z

What did you do?

Ran the following code:

import humanize
import datetime as dt

humanize.i18n.activate("fr_FR")
output = humanize.naturaltime(dt.datetime(year=2010, month=1, day=1))
print(output)

What did you expect to happen?

Output to be:

il y a 12 ans

What actually happened?

Output was:

il y a 12 years

What versions are you using?

OS: Mac OS
Python: 3.8.6
Humanize: 4.0.1

The text was updated successfully, but these errors were encountered:

hugovk · 2022-06-17T12:57:36Z

Here's a more direct reproducer. The first three asserts are fine, but the last fails because it get "12 years" not "12 ans":

import humanize

output = humanize.naturaldelta(1 * 365 * 24 * 60 * 60)
print(output)
assert output == "a year"

output = humanize.naturaldelta(12 * 365 * 24 * 60 * 60)
print(output)
assert output == "12 years"


humanize.i18n.activate("fr_FR")

output = humanize.naturaldelta(1 * 365 * 24 * 60 * 60)
print(output)
assert output == "un an"

output = humanize.naturaldelta(12 * 365 * 24 * 60 * 60)
print(output)  # "12 years"
assert output == "12 ans"

This was introduced in d1faf1c from PR jmoiron/humanize#246:

Instead of passing an integer to the _ngettext localisation function we're passing in the results of intcomma(years) which is a string:

-        return _ngettext("%d year", "%d years", years) % years
+        return _ngettext("%s year", "%s years", years) % intcomma(years)

And also I think it can't find "%s year", "%s years" values because they use %d in the translation files:

humanize/src/humanize/locale/fr_FR/LC_MESSAGES/humanize.po

Lines 339 to 344 in 7688f20

    
           #: src/humanize/time.py:187 
        
           #, python-format 
        
           msgid "%d year" 
        
           msgid_plural "%d years" 
        
           msgstr[0] "%d an" 
        
           msgstr[1] "%d ans"

Ping @carterbox, please could you have a look at this?

carterbox · 2022-06-20T00:39:33Z

I checked out version 4.2.0, and I cannot run this reproducer.

Traceback (most recent call last):
  File "/home/dching/Documents/humanize/comma.py", line 15, in <module>
    humanize.i18n.activate("fr_FR")
  File "/home/dching/Documents/humanize/src/humanize/i18n.py", line 62, in activate
    translation = gettext_module.translation("humanize", path, [locale])
  File "/home/dching/miniconda3/envs/humanize/lib/python3.10/gettext.py", line 592, in translation
    raise FileNotFoundError(ENOENT,
FileNotFoundError: [Errno 2] No translation file found for domain: 'humanize'

What am I doing wrong?

carterbox · 2022-06-20T00:47:50Z

The docs say that this "gettext.translation" function only looks for ".mo" files, but the source only contains ".po" files. I assume there is some autoconversion supposed to be happening.

carterbox · 2022-06-20T00:53:38Z

OK. I grabbed the compiled translation files from the conda tar.

carterbox · 2022-06-20T01:28:03Z

The general translation workflow seems to be:

generate possible format strings
translate the unevaluated format strings
evaluate the format strings.

When I added the intcomma feature for years in the time module, I didn't properly test step 2 (translate the format string). Since the translation works directly on the format string and the translation phrases include the formatters (probably because in some locales the order is changed), switching from %d to %s broke the translation step. This means the translation is broken for all translated languages.

Notably, this translation error does not occur in precisedelta() because the translation and format evaluation occur on multiple lines, so the %d are swapped for %s after translation.

humanize/src/humanize/time.py

Lines 564 to 567 in 29d37fb

    
           elif unit == YEARS: 
        
               fmt_txt = fmt_txt.replace("%d", "%s") 
        
               texts.append(fmt_txt % intcomma(fmt_value)) 
        
               continue

So the solution would be to match naturaldelta() with precisedelta() and swap the formatter after the translation.

I don't want to change the translation files or swap the formatters in other places because using %d instead of %s may be used to do rounding or truncation of floating point numbers.

This patch fixes a bug introduced in 3.14.0, where the format string was changed from %d to %s to add separators to the year. However, this needs to happen after translation because the translator uses the format strings as part of the translation. Closes python-humanize#21

hugovk · 2022-06-20T08:13:34Z

For future reference, there's a script to generate .mo files. Mentioned in the release checklist:

# Generate translation binaries
scripts/generate-translation-binaries.sh

hugovk added the bug Something isn't working label Jun 17, 2022

carterbox mentioned this issue Jun 20, 2022

Use %d for year translations, convert to string for intcomma after #23

Merged

hugovk closed this as completed in #23 Jun 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

naturaltime: wrong translation with delta >= 2 years #21

naturaltime: wrong translation with delta >= 2 years #21

Mathieu-Ghaleb commented Jun 16, 2022 •

edited by hugovk

hugovk commented Jun 17, 2022

carterbox commented Jun 20, 2022

carterbox commented Jun 20, 2022 •

edited

carterbox commented Jun 20, 2022

carterbox commented Jun 20, 2022 •

edited

hugovk commented Jun 20, 2022

naturaltime: wrong translation with delta >= 2 years #21

naturaltime: wrong translation with delta >= 2 years #21

Comments

Mathieu-Ghaleb commented Jun 16, 2022 • edited by hugovk

What did you do?

What did you expect to happen?

What actually happened?

What versions are you using?

hugovk commented Jun 17, 2022

carterbox commented Jun 20, 2022

carterbox commented Jun 20, 2022 • edited

carterbox commented Jun 20, 2022

carterbox commented Jun 20, 2022 • edited

hugovk commented Jun 20, 2022

Mathieu-Ghaleb commented Jun 16, 2022 •

edited by hugovk

carterbox commented Jun 20, 2022 •

edited

carterbox commented Jun 20, 2022 •

edited