Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spaces in langs with no word spacing #917

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

noviluni
Copy link
Collaborator

Some regexes come from the CLDR corpus and contain spaces that should be optional. I modified the write_complete_data.py script to override them with optional spaces in those languages where the spaces are optional (indicated with no_word_spacing).

Apart from that, I added the no_word_spacing flag to Thai, as the spaces are optional, and in the case of Thai I also added a new word for "ago".

Finally, I added some tests from the examples provided in the issues below.

@noviluni noviluni requested a review from Gallaecio April 29, 2021 08:57
Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me, but the test error seems legit. I wonder if it has anything to do with no_word_spacing.

dateparser_scripts/write_complete_data.py Show resolved Hide resolved
@noviluni
Copy link
Collaborator Author

Yes... the Thai test failing in search_dates seems legit... I will check it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants