max_connections is not respected and connections pool grows until connections are closed by idle timeout #648

DmitryKakurin · 2020-08-14T23:23:52Z

Versions:

Hackney: 1.16.0
HttPoison: 1.7.0
Elixir/Erlang:

Erlang/OTP 21 [erts-10.3] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1]
Elixir 1.10.3 (compiled with Erlang/OTP 21)

Simplest Repro:

Set max pool size to 1 and hit 2 different servers.

Interactive Elixir (1.10.3) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> :hackney_pool.start_pool(:default, max_connections: 1)
:ok

iex(2)> HTTPoison.get("example.com")
{:ok,  %HTTPoison.Response{ …

iex(3)> HTTPoison.get("example.net")
{:ok,  %HTTPoison.Response{ …

iex(4)> :hackney_pool.get_stats(:default)
[name: :default, max: 1, in_use_count: 0, free_count: 2, queue_count: 0]

free_countis is 2, but it should not be greater than max, which is 1.

Suspected bug location

I suspect the bug was introduced in this checkin: ec5b90c
Note how the check if PoolSize =< MaxConn was not carried over from handle_call({checkin,... into deliver_socket function for the empty case.

The new check dict:size(Clients) >= MaxConn in handle_call({checkout... ensures that no more than MaxConn clients will be waiting for connection at the same time, but is not sufficient to limit the total pool size.

The text was updated successfully, but these errors were encountered:

benoitc · 2020-09-11T14:39:47Z

Does the requests return to the pool? What dis the Response thing returned ? Does it contains the body?

DmitryKakurin · 2020-09-11T20:04:17Z

@benoitc the code repro above is literally runnable (example.com and .net are real existing domains). Both requests return 200.
Given free_count: 2 I assume yes, connections are returned to the pool.

jdppettit · 2020-10-22T16:37:19Z

I'm running into this problem as well - any updates on where things are on this?

benoitc · 2020-10-22T16:52:03Z

There will be a release this WE including a fix for it.

…

On Thu, Oct 22, 2020 at 6:37 PM Joe Pettit ***@***.***> wrote: I'm running into this problem as well - any updates on where things are on this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#648 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAADRITYT4U5KT5O35SUGFLSMBNVDANCNFSM4P76ENZA> .

sorliem · 2020-12-16T15:54:15Z

I was bit pretty severely by this bug a couple days ago. Had to disable pooling entirely. Is there a release coming coming out that contains the fix?

benoitc · 2020-12-16T23:10:17Z

I was bit pretty severely by this bug a couple days ago. Had to disable pooling entirely. Is there a release coming coming out that contains the fix?

What happened. How did you reproduce the issue?

There is a relase that will land today that is changing the way connections are handled, but the problem above shouldn't trigger anything severe if connections are released correctly to the pool.

sorliem · 2020-12-17T21:49:11Z

I have code in my system using hackney that calls 3rd party services to get data. I saw the :checkout_timeout error explode, added some metrics to monitor the in_use value returned from :hackney_pool.get_stats(:default) and redeployed.

The following screenshot is a new k8s deploy of 10 pods each with 1 default hackney pool running:

The lines going steadily up is each pod reporting the in_use connections and all are rising to the 200 max connections I configured. Shortly after this time the checkout_timeout errors started showing up.

benoitc · 2020-12-17T21:51:48Z

I have code in my system using hackney that calls 3rd party services to get data. I saw the :checkout_timeout error explode, added some metrics to monitor the in_use value returned from :hackney_pool.get_stats(:default) and redeployed.

The following screenshot is that new k8s deploy of 10 pods each with 1 default hackney pool running:

The lines going steadily up is each pod reporting the in_use connections and all are rising to the 200 max connections I configured. Shortly after this time the connect_timeout errors started showing up.

does your code always ensure to read the body? That said I am working on replacing the pool. The code will be ready for consumption tomorrow. Took more time than expected...

dbhobbs · 2020-12-17T22:13:32Z

Is reading the body required? I don't see that mentioned in any of the documentation. 🤔

benoitc · 2020-12-17T23:10:43Z

Is reading the body required? I don't see that mentioned in any of the documentation. 🤔

Yes to release the socket you need either read the body (which is always done when using the with_body option in the request, or skip it using the skip_body function or similar. Closing the request does also work. If a response is not read completely and the process is still active hackney is actually consider it as an active session for now.

benoitc · 2020-12-19T11:26:48Z

that should indeed be mentionned by the doc ...

sorliem · 2020-12-19T17:39:58Z

Oh, ok. We may have some locations where we are not reading the body. Would you be able to point me to the right location in the documentation that says that?

benoitc · 2020-12-26T21:42:18Z

Oh, ok. We may have some locations where we are not reading the body. Would you be able to point me to the right location in the documentation that says that?

that's not written explicitly in the doc i think. A connection is removed from the pool automatically if the process is closed before releasing it or if the whole body has been read, otherwise (logically I would say) there is no way to know if the connection has to be released.

alexgleason · 2021-04-30T16:33:30Z

https://github.com/benoitc/hackney/blob/master/NEWS.md

1.17.0

fix memory leak in connection pool

Is this fixed now?

We downgraded hackney to 1.15.2 and wondering if it's safe to upgrade again: https://git.pleroma.social/pleroma/pleroma/-/issues/2101

benoitc · 2021-04-30T16:39:08Z

watch the release this week-end. Hackney will be bumped to 2.0. This will be announced next week also :)

…

On Fri, Apr 30, 2021 at 6:33 PM Alex Gleason ***@***.***> wrote: https://github.com/benoitc/hackney/blob/master/NEWS.md 1.17.0 fix memory leak in connection pool Is this fixed now? We downgraded hackney to 1.15.2 and wondering if it's safe to upgrade again: https://git.pleroma.social/pleroma/pleroma/-/issues/2101 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#648 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAADRIXTRL52IVNG756HNIDTLLLW7ANCNFSM4P76ENZA> .

kuznetskikh · 2021-11-20T06:46:52Z

Hello @benoitc!
Any update on this issue?

I'm facing with it on httpoison 1.8.0 and hackney 1.18.0.
Hackney still doesn't respect max_connections as in @DmitryKakurin's example.

benoitc · 2021-11-22T09:33:50Z

@kuznetskikh does httpoison read the body? Body need to be read or skipped to release the connection.

kuznetskikh · 2021-12-09T07:34:27Z

Hello @benoitc. I'm sorry for late response, it's working fine for me now.
I just incorrectly configured hackney pool on application start. Now all is fine, max_connections is taken into play using:

:hackney_pool.child_spec(:main_api_pool, timeout: 15_000, max_connections: 1)

And only 1 connection is available.

Thank you!

benoitc assigned benoitc and DmitryKakurin Sep 11, 2020

benoitc added the need fedback label Sep 11, 2020

benoitc added working on it and removed need fedback labels Oct 23, 2020

losvedir mentioned this issue Mar 19, 2021

feat: Engine.HealthCheck tests network connectivity mbta/realtime_signs#431

Merged

4 tasks

benoitc added not confirmed and removed working on it labels May 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max_connections is not respected and connections pool grows until connections are closed by idle timeout #648

max_connections is not respected and connections pool grows until connections are closed by idle timeout #648

DmitryKakurin commented Aug 14, 2020 •

edited

benoitc commented Sep 11, 2020

DmitryKakurin commented Sep 11, 2020

jdppettit commented Oct 22, 2020

benoitc commented Oct 22, 2020 via email

sorliem commented Dec 16, 2020 •

edited

benoitc commented Dec 16, 2020

sorliem commented Dec 17, 2020 •

edited

benoitc commented Dec 17, 2020

dbhobbs commented Dec 17, 2020

benoitc commented Dec 17, 2020

benoitc commented Dec 19, 2020

sorliem commented Dec 19, 2020

benoitc commented Dec 26, 2020

alexgleason commented Apr 30, 2021

benoitc commented Apr 30, 2021 via email

kuznetskikh commented Nov 20, 2021 •

edited

benoitc commented Nov 22, 2021 •

edited

kuznetskikh commented Dec 9, 2021

max_connections is not respected and connections pool grows until connections are closed by idle timeout #648

max_connections is not respected and connections pool grows until connections are closed by idle timeout #648

Comments

DmitryKakurin commented Aug 14, 2020 • edited

Versions:

Simplest Repro:

Suspected bug location

benoitc commented Sep 11, 2020

DmitryKakurin commented Sep 11, 2020

jdppettit commented Oct 22, 2020

benoitc commented Oct 22, 2020 via email

sorliem commented Dec 16, 2020 • edited

benoitc commented Dec 16, 2020

sorliem commented Dec 17, 2020 • edited

benoitc commented Dec 17, 2020

dbhobbs commented Dec 17, 2020

benoitc commented Dec 17, 2020

benoitc commented Dec 19, 2020

sorliem commented Dec 19, 2020

benoitc commented Dec 26, 2020

alexgleason commented Apr 30, 2021

benoitc commented Apr 30, 2021 via email

kuznetskikh commented Nov 20, 2021 • edited

benoitc commented Nov 22, 2021 • edited

kuznetskikh commented Dec 9, 2021

DmitryKakurin commented Aug 14, 2020 •

edited

sorliem commented Dec 16, 2020 •

edited

sorliem commented Dec 17, 2020 •

edited

kuznetskikh commented Nov 20, 2021 •

edited

benoitc commented Nov 22, 2021 •

edited