Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Reimplementing search_dates #945

Open
wants to merge 44 commits into
base: master
Choose a base branch
from

Conversation

gavishpoddar
Copy link
Contributor

@gavishpoddar gavishpoddar commented Jul 16, 2021

Reimplementing and simplifying search_dates

A reimplemented and simplified search_dates which more directly uses dateparser.parse, improves accuracy and fixes many bugs

New Feature:

  • search_first_date - searches and returns the first date from the given text.

NOTE: This PR is inspired by the previous implementation of search_dates and #931.

TODO

  • Gather feedback on the API and it's functions.
  • Write docs.
  • Write tests.
  • Chack search translation joints.
  • Fix DATE_ORDER

@gavishpoddar
Copy link
Contributor Author

gavishpoddar commented Jul 16, 2021

This PR will also solve many issues with the search_dates.

Somewhat address.

Some other issue fixed:

2021-08-04T14:21:37+05:30

would parse 05:30 only after this PR

2021-08-04T14:21:37 and 05:30

NOTE: THIS LIST IS NO LONGER ACCURATE ANYMORE

@gavishpoddar
Copy link
Contributor Author

Hi, I need a suggestion should I use translated chunks or the original chunks to further parse the data objects. I have currently used translated chunks instead of original chunks as this increased accuracy in some basic tests.

Thanks, and please suggest.

@gavishpoddar gavishpoddar changed the title [WIP] Reimplementing search_dates & fixing search translation [WIP] Reimplementing search_dates & fixing search translation Jul 17, 2021
@noviluni
Copy link
Collaborator

noviluni commented Jul 19, 2021

Hi, I need a suggestion should I use translated chunks or the original chunks to further parse the data objects. I have currently used translated chunks instead of original chunks as this increased accuracy in some basic tests.

Thanks, and please suggest.

Hi @gavishpoddar, using translated chunks makes sense. 👍

@codecov
Copy link

codecov bot commented Jul 19, 2021

Codecov Report

Base: 98.23% // Head: 98.10% // Decreases project coverage by -0.12% ⚠️

Coverage data is based on head (99e66c6) compared to base (0f5d9c5).
Patch coverage: 96.12% of modified lines in pull request are covered.

❗ Current head 99e66c6 differs from pull request most recent head 2935aae. Consider uploading reports for the commit 2935aae to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #945      +/-   ##
==========================================
- Coverage   98.23%   98.10%   -0.13%     
==========================================
  Files         232      235       +3     
  Lines        2604     2692      +88     
==========================================
+ Hits         2558     2641      +83     
- Misses         46       51       +5     
Impacted Files Coverage Δ
dateparser/search/search.py 94.73% <94.33%> (-4.64%) ⬇️
dateparser/search/__init__.py 100.00% <100.00%> (ø)
dateparser/search/languages.py 100.00% <100.00%> (ø)
dateparser/date_parser.py 93.75% <0.00%> (-0.19%) ⬇️
dateparser/date.py 99.24% <0.00%> (-0.04%) ⬇️
dateparser/parser.py 99.01% <0.00%> (-0.01%) ⬇️
dateparser/utils/__init__.py 95.27% <0.00%> (ø)
dateparser/languages/locale.py 98.69% <0.00%> (ø)
dateparser/conf.py 100.00% <0.00%> (ø)
dateparser/__init__.py 100.00% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@gavishpoddar
Copy link
Contributor Author

Hi @noviluni, I have made the changes for tests to work. Can you please approve the workflow approval?

@lopuhin
Copy link
Member

lopuhin commented Jul 22, 2021

@gavishpoddar workflows approved 👍

@gavishpoddar
Copy link
Contributor Author

Hi @noviluni, @lopuhin, @kishan3
The new search_dates implementation is nearing its completion.

Code reviews would be great to receive.

@gavishpoddar
Copy link
Contributor Author

I need help with one test

StaticTzInfo 'UTC' is expected.
UTC is received from new search_dates

@gavishpoddar
Copy link
Contributor Author

gavishpoddar commented Aug 6, 2021

Hi after some changes the codes are very much compatible with the old search_dates. Some tests are modified as the working of the search_dates has been changed.

Currently, this PR is left with some docs changes (just replacing the current with new docs and search_first_date).

@gavishpoddar gavishpoddar changed the title [WIP] Reimplementing search_dates & fixing search translation Reimplementing search_dates & fixing search translation Aug 6, 2021
@gavishpoddar
Copy link
Contributor Author

Replaced previous search_dates with the new implimentaion

Copy link
Member

@lopuhin lopuhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gavishpoddar , I left some comments regarding tests. Please tell if you need advice with how to implement xfail.

dateparser/search/__init__.py Outdated Show resolved Hide resolved
tests/test_search.py Outdated Show resolved Hide resolved
tests/test_search.py Show resolved Hide resolved
tests/test_search.py Outdated Show resolved Hide resolved
tests/test_search.py Outdated Show resolved Hide resolved
Copy link
Member

@lopuhin lopuhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gavishpoddar a few docs suggestions

dateparser/search/__init__.py Outdated Show resolved Hide resolved
dateparser/search/__init__.py Outdated Show resolved Hide resolved
@gavishpoddar
Copy link
Contributor Author

Hey, this PR is updated with the #932

@Eckii24
Copy link

Eckii24 commented Aug 6, 2022

Hi @gavishpoddar,

I found your PR to improve the behavior of search_dates.
There has been no further progress for almost a year. What is the current status of this PR?

Is there anything I can do to help get this PR merged?

def __init__(self):
self.loader = LocaleDataLoader()
self.available_language_map = self.loader.get_locale_map()
self.search = _ExactLanguageSearch(self.loader)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of DateSearchWithDetection.search is backward-incompatible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can create a shortcut the make DateSearchWithDetection.search and add a deprecation warning or simply rename.

Please suggest a preferred action.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping the old objects around with their old names, logging a warning when used (i.e. by exposing them through a property whose getter logs a warning), would be ideal.

@gavishpoddar gavishpoddar changed the title Reimplementing search_dates WIP: Reimplementing search_dates Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants