Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High cpu load #896

Open
Adeptius opened this issue May 24, 2019 · 15 comments · May be fixed by #1253
Open

High cpu load #896

Adeptius opened this issue May 24, 2019 · 15 comments · May be fixed by #1253

Comments

@Adeptius
Copy link
Contributor

Adeptius commented May 24, 2019

From time to time the load on cpu sharply rises to 100% core and sometimes it going to 3% without the reason. Jprofiler shows load on selector.select
If i stop in debud on the line (thread stop) in screenshot - load goes to 3%.

To Reproduce
Steps to reproduce the behavior:
Really i dont know how to reproduse. Sometime it raise to 100% at 50 connections but another time it idle on 300.
Even not redusing on close all connections in debug... (last screenshot)

Example application to reproduce the issue
I cant to share my project :(

Debug log
Does it really needed? There is so many logs)

Environment(please complete the following information):

  • Version used: 1.4.0
  • Java version:
    openjdk version "1.8.0_151"
    OpenJDK Runtime Environment (build 1.8.0_151-b12)
    OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
  • Operating System and version: centOS

Additional context
Add any other context about the problem here.
2019-05-24_20-38-14
2019-05-24_20-39-00
2019-05-24_20-40-50

2019-05-24_20-45-34
image

@Adeptius
Copy link
Contributor Author

image

@marci4
Copy link
Collaborator

marci4 commented Jun 11, 2019

Anyone is more than welcome to investigate this issue.
Not sure what is happening and why. And without any source code it is even harder to reproduce.

Best regards,
Marcel

@Adeptius
Copy link
Contributor Author

Thank for answer. I understand. Issue is hard to reproduce and even can be kind of bug of selector. Unless i founded minor fix: i put Thread.sleep(1) before selector.select() to prevent millions cycles per second. And i want to say that load goes even less than usual and latency for few additional milliseconds does not matter for me.
That really not properly fix, but good anough in my case :)
I can close this ticket, but you deside.

@forseth11
Copy link

I would just like to say that I have run into a completely similar problem. I cannot tell what is causing it in my case at all. I also get 100% on the WebSocketServer.

@doyledavidson
Copy link
Contributor

So we too are seeing a rare but extended 100% cpu load.

I have not used NIO selectors but all of the examples I see remove the key from the selected keys set after iterating over each of them.

Example: http://tutorials.jenkov.com/java-nio/selectors.html
More info: https://www.oreilly.com/library/view/java-nio/0596002882/ch04.html (see example 4-1)

Perhaps this is the root of the problem?

@doyledavidson
Copy link
Contributor

So trying to investigate this area more and in the WebSocketServer why not just always remove the key after calling i.next()?

while ( i.hasNext() ) {
key = i.next();
i.remove(); // NEW LINE - you want to always remove the key from the set via iterator
...

Eliminate passing the iterator into doAccept() and doRead() which are calling "i.remove()"

and notice the there is no "i.remove()" in the key.isWritable() case. Might that be the problem that is is stuck isWritable()?

If we learn more, I will add info here.

@fbasar
Copy link

fbasar commented Sep 11, 2021

This error happens every night. While researching, I saw that it is from websocket. When I stop the websocket, the cpu returns to normal. i found the problem today i will download the source code and examine it

@Adeptius
Copy link
Contributor Author

The problem is gone for me on java 16.

@fbasar
Copy link

fbasar commented Sep 11, 2021

my problem started with ssl. If no data is received without establishing a connection at the SSL destination

@sahnjeok
Copy link

sahnjeok commented Nov 24, 2021

Anyone is more than welcome to investigate this issue. Not sure what is happening and why. And without any source code it is even harder to reproduce.

Best regards, Marcel

Dear Marcel

I'm facing same cpu overload on Selector.select in WebSocketServer class run. Maybe it happens after network failure.
I found blog that there was same overload issue in Mina lib caused by JVM bug and how to avoid it.
Below is the link about Mina's workaround, doing re-register select. It 's in korean, but you can see and understand codes.

Can you check this blog?

https://knight76.tistory.com/entry/Apache-Mina-%EC%82%AC%EB%A1%80%EC%97%90%EC%84%9C-%EB%B3%B8-Selectorselect-%EC%9D%B4%EC%8A%88-cpu-100-%ED%8A%80%EB%8A%94-%ED%98%84%EC%83%81

Best regards, JungGi

@marci4
Copy link
Collaborator

marci4 commented Nov 24, 2021

@sahnjeok feel free to open a PR

@sahnjeok
Copy link

@sahnjeok feel free to open a PR

What do you mean 'PR' ? ^^;;

@marci4
Copy link
Collaborator

marci4 commented Nov 25, 2021

Pull request

@sahnjeok
Copy link

sahnjeok commented Dec 7, 2021

Pull request

Dear Marcel

I found that sometimes selected keys iterator does not remove key.
In my cases after network issue this cpu overload happened.
Maybe there is some keys remains forever in selected keys after network problem.

I think it should always be removed after .next() function.

Best regards, JungGi

sahnjeok added a commit to sahnjeok/Java-WebSocket that referenced this issue Jun 17, 2022
'handshakeStartTime' long variable is added and isHandShakeComplete()  function is updated for TooTallNate#896.

If wss handshake is not completed in 10s, close this channel to prevent cpu overload or unexpected channel error. See TooTallNate#896.
sahnjeok added a commit to sahnjeok/Java-WebSocket that referenced this issue Jun 17, 2022
'handshakeStartTime' long variable is added and isHandShakeComplete()  function is updated for TooTallNate#896.

If wss handshake is not completed in 10s, close this channel to prevent cpu overload or unexpected channel error. See TooTallNate#896.
@sahnjeok sahnjeok linked a pull request Jun 17, 2022 that will close this issue
8 tasks
@bergice
Copy link

bergice commented Jan 7, 2023

This issue made SSL unusuable in production due to freezing the server randomly multiple times per day and not being able to recover by itself.

I tried a bunch of the fixes suggested, but in the end, the solution that worked was just to use Stunnel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants