DeflatePerMessage VS memory consumption #1900
Replies: 18 comments
-
@M1ha-Shvn I still don't have a clear view on how to solve the issue with the |
Beta Was this translation helpful? Give feedback.
-
Also, do you have a MRE for me to confirm the issue? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
What I suggest, is one of the following:
|
Beta Was this translation helpful? Give feedback.
-
I haven't tested this PR, made it fast in order to show general idea |
Beta Was this translation helpful? Give feedback.
-
@aaugustin I'm sorry to bother, but when you have time, would you mind giving me your input here to understand if we are going in the right direction? 🙏 |
Beta Was this translation helpful? Give feedback.
-
If you're using context takeover (i.e. you don't set If you aren't using context takeover, then you don't have a memory usage problem. |
Beta Was this translation helpful? Give feedback.
-
Apart from that, is there a knob to configure max_window_bits? With a very high number of connections, probably you would benefit from lowering it. See https://websockets.readthedocs.io/en/stable/topics/compression.html#compression-settings for defaults. |
Beta Was this translation helpful? Give feedback.
-
Hi, thanks for your answer.
First of all, I understand that I've changed this behavior. But what I don't understand: why deflate object is created per connection? Why can not Deflate context be shared between all connections? From my point of view, in a typical websocket server you have lots of connections with clients and send very similar (particularly, JSON) messages to all clients => to all connections. So it would be worth using same deflate object with single context and increasing it's capacity, so it could compress messages better. Of course, it depends on connection settings, set by request headers. But there is a small number of combinations of these parameters => A limited number of objects can be created and used.
I'm not so sure about it. Creating an object in python has its memory cost. Even if I won't use context takeover, objects PerMessageDeflate would be created and use memory for each connection.
Yes, adding deflate tuning settings to uvicorn settings was one of my proposals. Though, it would lower the problem for me, but would not solve it: I'll just have higher connection limit still consuming lots of memory per each connection |
Beta Was this translation helpful? Give feedback.
-
Kludex asked for my inputs; I gave them. If you don't trust me, make your experiments and reach your own conclusions. In your experiments, don't stop at opening connections; exchange a significant number of different messages on each connection, in both directions, and make sure they make it through the compress / decompress cycle correctly. |
Beta Was this translation helpful? Give feedback.
-
In case it helps:
|
Beta Was this translation helpful? Give feedback.
-
Thanks @aaugustin! I really appreciate it. 🙏 |
Beta Was this translation helpful? Give feedback.
-
Did you try this @M1ha-Shvn ? |
Beta Was this translation helpful? Give feedback.
-
But it's not really possible to directly disable context takeover from uvicorn, correct? I'm also hitting this issue. |
Beta Was this translation helpful? Give feedback.
-
@Kludex @aaugustin In my use case, most of my request and response data is small json, but sporadically, there will be large chunks, making compression beneficial. However, while those connections are long running, the activity pattern per connection is bursty in nature, making the current settings not memory efficient. 1 Context TakeoverThe first step would be to expose It would be best to keep the default at Then we can expose those parameters by:
There would only be one toggle to set both client and server context, because if your server is memory starved from connection quantity, you wouldn't care to enable context takeover for just one direction. This would allow both current implementations to discard their zlib 2 Max window bitsThe next step would be to expose the Websockets says that they set theirs by default at 12 bits https://websockets.readthedocs.io/en/stable/topics/compression.html#for-servers, so a 4kb max window, while WSProto sets theirs to 15 bits https://github.com/python-hyper/wsproto/blob/main/src/wsproto/extensions.py#L65, so a 32kb max window, which is quite large when you have thousands of connections. We should be capable of setting both the client and server max bit windows because each server connection requires them for decompressing input and compressing output. I would suggest the While both, websockets and wsproto use zlib's 3 Cpu performanceCurrently, having deflate on forces context takeover. There is a permenent This is probably probably not good.
93kb allocated and deallocated per message received "not good", before windows and strategy lookup dictionaries are filled. It would be good if websockets and wsproto used the The same cannot be done with decompression however. Wsproto simply calls Unfortunately, the objectless 4 zlib noteUnrelated to the websockets and uvicorn, but it would be nice to have the |
Beta Was this translation helpful? Give feedback.
-
Hi @reportingissue @Kludex @aaugustin I ran into a similar memory leak issue that may be resolved by disabling per message deflate or enabling no context takeover. However, do you have a recommended near term fix to enable no context takeover cleanly? |
Beta Was this translation helpful? Give feedback.
-
Discussed in #1850
Originally posted by M1ha-Shvn January 27, 2023
Hi.
I'm devloping a server serving websockets connections using FastAPI.
I've noticed, that creating several thousands of simultinious websocket connections leads to high memory usage (4 Gb per 5-8k websocket connections in my case). I've started debugging it with tracemalloc and found out that the largest amount of this memory is consumed by websockets deflate extension in this line.
After that I've digged into websockets deflate mechanics and found out that it can be tuned wisely in order to achive lower memory consumption using custom
ServerPerMessageDeflateFactory
. I've tried searching for it in FastApi => Starlette => Uvicorn code and it lead me here.What is the source of memory leak:
PerMessageDeflate
instance for each websocket connection (usingServerPerMessageDeflateFactory
). From my point of view, it is a disadvantageous behaviour: it would be much better if it is created not for each websocket connection, but for each combination of connection parameters (like Singleton pattern, but created for each combination of parameters. Something like lru_cache).--ws-per-message-deflate
. It is not flexible for different cases.Beta Was this translation helpful? Give feedback.
All reactions