Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] add find_date function and tests: finds the first date #931

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lopuhin
Copy link
Member

@lopuhin lopuhin commented May 25, 2021

This function is similar to search_dates, but gets only the first date, and more suitable to shorts strings. It uses a brute-force approach, but has more predictable performance (at least on shorter strings - we limit the length to 100), and better quality in our tests, although still not perfect.

Not sure if it should be exposed in current form, but may be useful for future development.

NOTE: this PR is not worked on currently and free to pick up.

TODO:

  • gather feedback on whether the API (finding the first match) makes sense? Maybe instead we could have an API similar to search_dates and make this a generator, allowing the user to stop at the first match. Although that would likely make the implementation a bit harder.
  • Also what's about new vs old search dates? There are likely cases when old search dates works better in terms of performance or quality, and we won't maintain full backwards compatibility if we replace.
  • clean up the API regarding languages (not sure if auto-adding 'en' and auto-detection is what we want) - that would require some test changes
  • clean up the date cleanup code - it might be invalid for some languages?
  • take care of the docs

similar to search_dates, but gets only the first date,
and more suitable to shorts strings. It uses a brute-force approach,
but has more predictable performance
(at least on shorter strings - we limit the length to 100),
and better quality in our tests, although still not perfect.

Not sure if it should be exposed in current form, but may be useful
for future development.
@lopuhin lopuhin changed the title [WIP] add find_date function: finds the first date [WIP] add find_date function and tests: finds the first date May 25, 2021
@codecov
Copy link

codecov bot commented May 25, 2021

Codecov Report

Base: 98.26% // Head: 98.21% // Decreases project coverage by -0.05% ⚠️

Coverage data is based on head (2af6489) compared to base (255c421).
Patch coverage: 94.44% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #931      +/-   ##
==========================================
- Coverage   98.26%   98.21%   -0.06%     
==========================================
  Files         231      232       +1     
  Lines        2597     2633      +36     
==========================================
+ Hits         2552     2586      +34     
- Misses         45       47       +2     
Impacted Files Coverage Δ
dateparser/find_date.py 94.44% <94.44%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@gavishpoddar
Copy link
Contributor

gavishpoddar commented May 27, 2021

Many of the issues associated with the search_dates can be solved if this code is used to improve the current search_dates.

Like #930, #918, #921, #894 will be resolved.

@gavishpoddar
Copy link
Contributor

Tests seem to be failing with Python 3.5 because was it introduced in Python 3.6 which provides a guarantee that keyword arguments are passed in the same order they appear in the code (from left to right)

PEP 468 -- Preserving the order of **kwargs in a function.

A possible solution for the issue,
We may need to pass the input as a sequence of tuples to preserve ordering.

@serhii73 serhii73 closed this Dec 1, 2022
@serhii73 serhii73 reopened this Dec 1, 2022
@serhii73
Copy link
Collaborator

serhii73 commented Dec 1, 2022

We can close this PR after merging #945

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants