Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Performance in search dates #853

Open
surkova opened this issue Dec 3, 2020 · 1 comment
Open

[Discussion] Performance in search dates #853

surkova opened this issue Dec 3, 2020 · 1 comment

Comments

@surkova
Copy link
Contributor

surkova commented Dec 3, 2020

I'm working on a Flask app which does some markup parsing. One of the things it does is it parses strings like Arriving tomorrow by 9pm or Delivered on Friday. All of the strings are in English and they are short. Today I bumped the version of dateparser from 0.7.6 to 1.0.0 and this is what I saw in the distribution metrics (p50, p95, p99) of the function calling search (function abridged):

STATUS_TEXT_DELIVERED = re.compile(r"delivered", re.IGNORECASE)
settings = {
    "PREFER_DATES_FROM": "past"
    if bool(STATUS_TEXT_DELIVERED.search(text))
    else "future",
}
search_results = search_dates(text, languages=["en"], settings=settings)

Screen Shot 2020-12-03 at 21 50 07
One thing which strikes me most is huge latency spikes when the app is rebooted on deploy and how it calms down after some significant amount of time. This function is currently called around 20 times per minute, but we are expecting this number to grow to at least 400 rpm. On the screenshot you can see three deploys (red stripes).

Now, I have a very limited insight into what performance instrumentation you've been using, but what would be the easiest way to pinpoint what's happening with the search right after it starts from scratch? And why does it take so long to figure out the happy state?

@noviluni
Copy link
Collaborator

Hi @surkova, thanks for opening this issue!

I don't have any idea of why is this happening. There are some things that are initialized when importing the library, but it shouldn't produce this effect. I will take a look when having time, if you find any other clue or a way to reproduce it easily, please, let me know.

Thanks!

P.S: I see that you are using this library in a Flask app. I just wanted to pinpoint that we have some open issues related to concurrency and multithreading and we don't support it 100%: #441, #276, #834

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants