Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'Sept' for 'September' #1071

Merged
merged 2 commits into from Nov 11, 2022
Merged

Add 'Sept' for 'September' #1071

merged 2 commits into from Nov 11, 2022

Conversation

wulmer
Copy link
Contributor

@wulmer wulmer commented Aug 25, 2022

Due to an error with my Google Sheets table, I found that Google (and DIN 5008) use Sept as abbreviation for September, but dateparser only understood Sep. It looks like both Sept and Sep are valid.

This PR adds Sept as variant of September.

@serhii73
Copy link
Collaborator

serhii73 commented Nov 7, 2022

Please, see https://dateparser.readthedocs.io/en/latest/contributing.html#guidelines-for-editing-translation-data

wulmer and others added 2 commits November 7, 2022 23:39
Due to an error with my Google Sheets table, I found that Google (and [DIN 5008](https://www.din-5008-richtlinien.de/startseite/datum/)) use `Sept` as abbreviation for `September`, but `dateparser` only understood `Sep`. It looks like both `Sept` and `Sep` are valid.

This PR adds `Sept` as variant of `September`.
@wulmer
Copy link
Contributor Author

wulmer commented Nov 7, 2022

@serhii73 : thanks a lot for your response. I have updated the JSON data now and also included a test.

@Gallaecio
Copy link
Member

I believe the CLDR data JSON file is not meant to be updated manually, that you should be updating the YAML file instead.

@wulmer
Copy link
Contributor Author

wulmer commented Nov 8, 2022

I believe the CLDR data JSON file is not meant to be updated manually, that you should be updating the YAML file instead.

I don't think so: the script in dateparser_scripts/write_complete_data.py (lines 49-54) first reads all data from the two directories (with the JSON and YAML files) and then combines the data before writing it to the Python files:

    if language in cldr_languages:
        with open(cldr_date_directory + language + '.json') as f:
            cldr_data = json.load(f, object_pairs_hook=OrderedDict)
    if language in supplementary_languages:
        with open(supplementary_date_directory + language + '.yaml') as g:
            supplementary_data = OrderedDict(RoundTripLoader(g).get_data())
    complete_data = combine_dicts(cldr_data, supplementary_data)

At least, when running the script, the Python files get updated based on the changes in the JSON or YAML files.

@Gallaecio
Copy link
Member

Yes, and the CLDR file is meant to be kept in line with the upstream CLDR file, without customizations of any kind. Customizations are managed on the YAML files. Otherwise, your changes get removed the next time we run https://github.com/scrapinghub/dateparser/blob/master/dateparser_scripts/get_cldr_data.py.

@wulmer
Copy link
Contributor Author

wulmer commented Nov 10, 2022

@Gallaecio : the upstream files in the given repos will not change as the repos are currently archived, see the referenced repos from get_raw_data at https://github.com/unicode-cldr/cldr-dates-full, https://github.com/unicode-cldr/cldr-core, and https://github.com/unicode-cldr/cldr-rbnf.

There is however a newer still maintained repo at https://github.com/unicode-org/cldr-json/ , and they seem to have Sept as abbreviation for September.

How should I proceed here now? Next year, in September, my scripts will crash again... :-(

serhii73 added a commit that referenced this pull request Nov 11, 2022
Copy link
Collaborator

@serhii73 serhii73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a fix in the last commit, and the PR looks good.
Thank you very much.

@wRAR wRAR closed this Nov 11, 2022
@wRAR wRAR reopened this Nov 11, 2022
@codecov
Copy link

codecov bot commented Nov 11, 2022

Codecov Report

Base: 98.23% // Head: 98.23% // No change to project coverage 👍

Coverage data is based on head (f5f6fca) compared to base (6cdd6b5).
Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1071   +/-   ##
=======================================
  Coverage   98.23%   98.23%           
=======================================
  Files         232      232           
  Lines        2605     2605           
=======================================
  Hits         2559     2559           
  Misses         46       46           
Impacted Files Coverage Δ
dateparser/data/date_translation_data/de.py 100.00% <ø> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@serhii73 serhii73 merged commit e398627 into scrapinghub:master Nov 11, 2022
@Gallaecio
Copy link
Member

@wulmer Abotu CLDR data being outdated, that is it seems a known issue, and definitely one we would like to address.

@wulmer wulmer deleted the patch-1 branch November 11, 2022 11:22
@RadhiFadlillah
Copy link
Contributor

Apologize for necroing this PR, but I just want to notify that it seems the fix by @serhii73 is not merged in this PR.

I've checked the master branch, and the sept keyword is still applied directly in de.json file instead of in de.yaml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants