Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate of 812. Memory increase after upgrade to 3.4.9 #846

Closed
rdc-Green opened this issue Jun 22, 2022 · 11 comments
Closed

Duplicate of 812. Memory increase after upgrade to 3.4.9 #846

rdc-Green opened this issue Jun 22, 2022 · 11 comments

Comments

@rdc-Green
Copy link

I have the same issue. 3.4.9 memory consumption grows to about 12GB over 6-7 days and then crashes.

It is on my server:
Linux 5.13.0-1025-azure on x86_64
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz, 4 cores
Virtualmin version 7.1-1
Real Memory 15GB

java -version
openjdk version "11.0.15" 2022-04-19
OpenJDK Runtime Environment (build 11.0.15+10-Ubuntu-0ubuntu0.20.04.1)
OpenJDK 64-Bit Server VM (build 11.0.15+10-Ubuntu-0ubuntu0.20.04.1, mixed mode, sharing)

I have about 20 charge points running, mostly Alfen with one Phihong

After downgrading to an earlier version of SteVe (3.4.6) it is now working properly (average memory 650MB)

@goekay
Copy link
Member

goekay commented Jun 25, 2022

#812

@rdc-Green
Copy link
Author

HI @goekay i appreciate this is a duplicate, but i didn't think #812 had been resolved. It is closed so i assumed my comment i added wouldn't be responded to, so I started a new ticket.

@goekay
Copy link
Member

goekay commented Jun 27, 2022

it's all good. i just referenced it for the sake of completeness.

in terms of the problem at hand, i don't have anything to add. i cannot reproduce it (as i illustrated in my load tests). unfortunately, i don't know how to help.

@rdc-Green
Copy link
Author

ok, thanks for clarifying.

I'll see if we can find anything our side and report back.

@shadsnz
Copy link

shadsnz commented Jul 20, 2022

@rdc-Green I was wondering if you had happened to have made any changes to your database instance just prior to seeing the behaviour reported? Particularly with session timeouts?

@rdc-Green
Copy link
Author

Hi @shadsnz we haven't made any changes to the database instance.

@jpires
Copy link
Contributor

jpires commented Feb 3, 2023

Hi, I think I have found the issue.

I started to see this issue after the update to 3.5.0 from the 3.4.3.
It was crashing under the same situation and within a few minutes of starting up.
It's a AWS 0.5vCPU and 1Gb.

Initially I thought it would be because of the database, the MeterValues tables has around 78Gb, it wasn't.

Then I started to look at the number of messages per second.
The setup that I have, the MeterValues are configured to be sent every second.
And when there were around 20 transaction at the same time, the instance would crash.

Also the zig-zag patters on the heap happen much more frequent on the new versions than on 3.4.3.
Even with only one message per second.
The heap would grow 50Mb in a few seconds, and then the GC would release it.
On 3.4.3 the same pattern happens every 5 minutes or so.

With this, I started to bisect the git history to find where it the new behaviour was introduced.
Which is this one 7a41a3f
Essentially, the migration to Jetty 10 was the cause.

With a few more digging, I noticed that the Buffer Pools on the JVM grew 8Mb every time a message is received.
So, it seems that the new Jetty version allocates 8Mb of memory when a message is received.
This memory is then released a few seconds later when the GC is triggered.

These two pictures show this behaviour in action.
Heap   CPU with max size configured
Direct Buffers with max size configured

After some tries, I applied this patch on top of 3.5.0.
As MAX_MSG_SIZE is 8Mb.

diff --git a/src/main/java/de/rwth/idsg/steve/SteveAppContext.java b/src/main/java/de/rwth/idsg/steve/SteveAppContext.java
index 6c86c215..bedb9cf4 100644
--- a/src/main/java/de/rwth/idsg/steve/SteveAppContext.java
+++ b/src/main/java/de/rwth/idsg/steve/SteveAppContext.java
@@ -80,9 +80,9 @@ public class SteveAppContext {
      */
     public void configureWebSocket() {
         JettyWebSocketServerContainer container = JettyWebSocketServerContainer.getContainer(webAppContext.getServletContext());
-        container.setInputBufferSize(MAX_MSG_SIZE);
-        container.setOutputBufferSize(MAX_MSG_SIZE);
-        container.setMaxTextMessageSize(MAX_MSG_SIZE);
+        //container.setInputBufferSize(MAX_MSG_SIZE);
+        //container.setOutputBufferSize(MAX_MSG_SIZE);
+        //container.setMaxTextMessageSize(MAX_MSG_SIZE);
         container.setIdleTimeout(IDLE_TIMEOUT);
     }

This seems to bring back the old behaviour.
But I am not familiar with Jetty, and from the little reading that I did today, I don't fully understand those settings and why it's not reusing the allocated memory.
I will do a bit more reading on Monday, but if you have any ideas on this, let me know.

P.S. I am currently run a test with 120 simulated EVSE, reporting MeterValues every second.
This will be running the whole weekend to see how it goes.
But so far, it seems good.
The same two graph as before for the last 15 minutes.
image
image

@jpires
Copy link
Contributor

jpires commented Feb 6, 2023

The weekend test went well, and the memory was stable for the whole weekend.
It seems to me that it's behaving in the same way as before.

After reading a bit more about this, the default value for the MaxTextMessageSize is 65k, and if a message is bigger than this value, it will be rejected.
From the top of head, this doesn't seem to be a big issue, as the OCPP messages tend to be quite small.
I think only the MeterValues.req and the DataTransfer are the only ones that have arbitrary sizes.

Does anyone know of a situation were this 65k would be a limitation?

P.S. Keeping the current MaxTextMessageSize, doesn't seem affect the allocation, and the memory is still stable.

@goekay
Copy link
Member

goekay commented Feb 11, 2023

hey @jpires this is amazing detective work! thanks for your diagnosis!

Does anyone know of a situation were this 65k would be a limitation?

i remember having issues with one of the messages, which you pointed out, with a station because of which we increased the default.

--

it is weird that we had this config for a long time never causing issues... only to be a problem after jetty 10 migration. my changes in 7a41a3f just use the new api. therefore, they are only from syntactical nature. the defaults from back then were 65K were as well. this alone should not have changed the behaviour.

i assume during the refactoring of websocket (this is what happened with jetty 10) some behaviour changed or regression was introduced. the regression must be flying under the radar with small size (65K), while having a greater value (8Mb) accentuates the regression. there are a couple of jetty issues that hint at similar memory problems. for example: jetty/jetty.project#6696 and jetty/jetty.project#6328

i get a "rabbit hole" vibe from these jetty discussions and therefore would want to be pragmatic: remove our special settings and fallback to jetty defaults (as is the case in your git diff). i would like your valuable work to be part of this project as an official contribution under your name. can you pls make a PR? i will merge it ASAP.

@jpires
Copy link
Contributor

jpires commented Feb 15, 2023

I have created a pull request with the changes.
I have only removed the calls to setInputBufferSize and setOutputBufferSize.
This seems to solve the issues, but we are still able to receive messages bigger than 64k.

it is weird that we had this config for a long time never causing issues... only to be a problem after jetty 10 migration. my changes in 7a41a3f just use the new api. therefore, they are only from syntactical nature. the defaults from back then were 65K were as well. this alone should not have changed the behaviour.

It seems to me that in that before that commit, the setInputBufferSize and setOutputBufferSize are never called, and these two buffer are still the 4K.
And looking at the changes on the commit, they don't show up on the deleted changes.

I did some digging(very brief) on Jetty code, and those two settings control the size of the buffer used to perform the operation on the "socket". So, it allocates the buffer, read from the socket, copies the read data to a different buffer and deallocated this buffer.
Anyway, the size of this buffer doesn't really matter, the read operation will continue until there are no more data on the socket. Meaning that if a message is bigger than 4K, the read operation will be repeated.
I guess that increasing the size of these buffer would help if there are a lot of message bigger than 4K, but I don't think that is the case with OCPP.

I would say that we can probably close this ticket after the pull request is merged.
But it would be great to hear from the other people who experienced the this issue.

goekay added a commit that referenced this issue Feb 16, 2023
Use the default values for input/output buffers #846
@goekay
Copy link
Member

goekay commented Feb 16, 2023

fixed by #1058

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants