New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stuck tomcat http threads #1328
Comments
Probably Tomcat would need one more thread to run a task that would unblock a blocked connection, but there are no threads available, so it's blocked. Jetty does not suffer from this problem, that we encountered in the past and fixed long time ago. |
What would be a good workaround for this? Somehow reserve one extra thread? |
You can use Jetty. Configuring the thread pool with more threads works until you have more connections, at which point you are locked up again. |
Do you mean, use jetty server instead of tomcat server? This might be hard as my organization is pretty set on using tomcat. Is there any other option? |
Yes.
Not that I know to get rid of the problem, if it is what I suspect. You may reduce the probability of occurrence by tuning the thread pool, but you may always end up in this problem: imagine exactly all connections send a message at the same time, then all threads will be blocked in the send, leaving no thread to wake them up. Have you asked the Tomcat project? |
I will ask tomcat project as the next step then. Thank you so much for your help! |
Do you have a pointer to the jetty fix which you had to do? Do you know if this an issue specific to CometD5 or was it present on CometD 3? |
No, it was a series of changes to the Jetty threading model that occurred over time.
CometD 3 was buggy in this respect, since it was exiting the |
I debugged more and now I understand what you meant. The culprit is this piece of code:
The result callback will never get called if all tomcat threads are busy processing incoming message, i.e. waiting on promise.get(). So I dont understand, how Jetty addresses this problem. The only possible solution is to have sendText call lambda synchronously, ie before returning. Otherwise, it will require a thread to call the lambda. In a situation where all threads are busy waiting on promise.get, there are no threads available. A possible solution might be the one which never calls Promise.get(). Instead, the code should implement an async endpoint interface. Having said that, I dont know if such interface exists. I'm thinking about doing a private fix by replacing getAsyncRemote with getBasicRemote. Do you think, it'd work? |
The analysis is correct, but it is not the only cause.
The Jetty threading model never ends up with no threads available to perform critical operations such as unblocking a write or similar.
It will not work. |
I opened a bug for Tomcat on this: https://bz.apache.org/bugzilla/show_bug.cgi?id=66531 |
I've been thinking more about this. Whats the semantics on holding onMessage until all tasks are completed? Why should we let the client sit around while we're fanning out 10000 responses. Why do they care if we wrote bytes into 10000 sockets and a few of them failed? Tomcat is a servlet container and we should release tomcat threads as soon as possible. Is anything bad going to happen if we remove .get() call from onMessage()? |
The server must apply backpressure to the client because otherwise the client can easily flood the server with messages and the server will not be able to process them. The fundamental problem is that of ordering: releasing backpressure too early will cause message processing on the server to be completely out of order, and that is really bad. Other schemes that maintain order but not backpressure are subject to queueing, and it would be easy for any client to cause an enormous accumulation of messages that will blow up the server. |
This makes sense. Having message processing on the server out of order might be not good. Did CometD 3 have this issue and did it get fixed in CometD 4? My team recently upgraded from v3 to v5 and we discovered this issue. Convincing the entire company to move from Tomcat to Jetty is off the table for now. We have 2 options:
How hard would it be to go back to V3 behavior? Does removing promis.get() achieve this? BTW, we reproduced the issue using Spring app. Its available here if you're interested in the repro: https://github.com/maykov/spring-cometd-websocket-test |
Possibly yes. CometD 3's API were synchronous (e.g. In CometD 4, the CometD APIs were made asynchronous, so now the processing order became an issue, so we took the chance to fix both processing order and infinite queueing. Other CometD users are using Tomcat in general, but using Jetty for CometD, so a mixed setup would not be strange. Another option would be to guarantee to never consume all threads in the thread pool. |
@maykov - does this Tomcat issue seem similar to your problem? I've had that issue for a very, very long time with Tomcat + CometD. However, with the last couple of Tomcat versions (9.0.70+ let's say) it seems to have disappeared. Which Tomcat version are you using? P.S. I opened a CometD issue before opening one for Tomcat. |
@boris-petrov yes, this looks very similar. My issue is connected to Tomcat running out of threads. If I was able to increase the number of threads to some large number (1000), this would not be a problem. In your case, whats the number of tomcat threads? Are you able to increase it? |
I haven't changed the default. When I saw the issue, I just downgraded CometD to version 3 and used that for a while. Now, with the problem seemingly gone, I'm using CometD v6. With the latest version of Tomcat 9.0.74 there is a new issue though - hopefully resolved - I'm waiting on 9.0.75 to come out to try it. |
CometD version(s)
5.0.14
Java version & vendor
(use: java -version)
openjdk version "11.0.15" 2022-04-19
OpenJDK Runtime Environment Homebrew (build 11.0.15+0)
OpenJDK 64-Bit Server VM Homebrew (build 11.0.15+0, mixed mode)
Question
Hello!
When I run the same number of webscocket connections as I have a number of tomcat threads, all threads get stuck in WebSocketEndPoint::onMessage in completable.get90. Any ideas on what I can be doing wrong? When I run less simultaneous connections, threads are not getting stuck.
The text was updated successfully, but these errors were encountered: