Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sweep mechanism can stop working abruptly #1627

Open
nagarjun-reddy opened this issue Feb 5, 2024 · 8 comments
Open

Sweep mechanism can stop working abruptly #1627

nagarjun-reddy opened this issue Feb 5, 2024 · 8 comments

Comments

@nagarjun-reddy
Copy link

CometD version(s)
5.0.14

Java version & vendor
openjdk version "11.0.21"

Description
We have encountered a scenario where the sweep has stopped working and didn’t remove any sessions until the application was restarted. There are no logs suggesting what could have happened to sweep and why it might not have rescheduled to run. This looks similar to the bug that was filed before #960 when it was using non asynchronous sweep. 

We are on 5.0.14 and wanted to check if the asyncsweep would need any similar exception handling or there could be any scenarios where this can happen and sweep exits without rescheduling.


Will add more details as we find.

@sbordet
Copy link
Member

sbordet commented Feb 5, 2024

CometD 5 is at End of Community Support (#1179).
You should upgrade to CometD 7.

Issue #960 was unexplicable, as apparently an exception slipped out of a catch(Throwable) which should be impossible.

Please take a JVM thread dump if you can reproduce the issue.

I am not aware of reasons for which the async sweep would stop running. In case of any exception the sweep is re-scheduled, provided that CompletableFuture.whenComplete() is called.

Let us know if you have more details.

@nagarjun-reddy
Copy link
Author

Thanks for the reply.
Haven't been able to reproduce it. We have thread dump from the time when the issue has happened. The scheduler thread from the thread dump is still running responding to /meta/connect's. Don't see the sweeper sweeping any sessions based on the log we have and the sessions have piled up which should have been removed otherwise.

Is there anything from the heap dump that would help understand what could have happened with the sweeper?

@sbordet
Copy link
Member

sbordet commented Feb 8, 2024

Please post the thread dump.

Also, would be useful if you can take a BayeuxServer dump by calling (via JMX) BayeuxServer.dump().
This will dump the internal state of the BayeuxServer that can help to diagnose what is the issue.

Are you using the HTTP transport or WebSocket?

@nagarjun-reddy
Copy link
Author

We are on WebSockets. The server has been restarted since and doesn't exhibit same behavior. Will see if I get the BayeuxServer dump if it happens next time. Below is sample thread dump

BayeuxServerImpl@2204d1-Scheduler-1  Runnable Thread ID: 98
  org.cometd.bayeux.server.ServerSession$Extension.outgoing(ServerSession.java:478)
  org.cometd.server.ServerSessionImpl.lambda$extendOutgoing$6(ServerSessionImpl.java:312)
  org.cometd.server.ServerSessionImpl$$Lambda$6167.apply()
  org.cometd.common.AsyncFoldLeft$AbstractLoop.run(AsyncFoldLeft.java:208)
  org.cometd.common.AsyncFoldLeft.run(AsyncFoldLeft.java:106)
  org.cometd.common.AsyncFoldLeft.reverseRun(AsyncFoldLeft.java:122)
  org.cometd.server.ServerSessionImpl.extendOutgoing(ServerSessionImpl.java:310)
  org.cometd.server.BayeuxServerImpl.lambda$extendReply$31(BayeuxServerImpl.java:1107)
  org.cometd.server.BayeuxServerImpl$$Lambda$6152.accept()
  org.cometd.bayeux.Promise$2.succeed(Promise.java:103)
  org.cometd.common.AsyncFoldLeft$AbstractLoop.run(AsyncFoldLeft.java:232)
  org.cometd.common.AsyncFoldLeft.run(AsyncFoldLeft.java:106)
  org.cometd.common.AsyncFoldLeft.reverseRun(AsyncFoldLeft.java:122)
  org.cometd.server.BayeuxServerImpl.extendOutgoing(BayeuxServerImpl.java:1083)
  org.cometd.server.BayeuxServerImpl.extendReply(BayeuxServerImpl.java:1104)
  org.cometd.server.AbstractServerTransport.processReply(AbstractServerTransport.java:247)
  org.cometd.server.websocket.common.AbstractWebSocketEndPoint.resume(AbstractWebSocketEndPoint.java:284)
  org.cometd.server.websocket.common.AbstractWebSocketEndPoint$WebSocketScheduler.run(AbstractWebSocketEndPoint.java:421)
  java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
  java.util.concurrent.FutureTask.run(FutureTask.java:264)
  java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
  java.lang.Thread.run(Thread.java:829)

BayeuxServerImpl@2204d1-Executor-1511183  Parked Thread ID: 1511183
  jdk.internal.misc.Unsafe.park(Unsafe.java)
  java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
  java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
  org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:382)
  org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:974)
  org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1018)
  java.lang.Thread.run(Thread.java:829)

BayeuxServerImpl@2204d1-Executor-1507906  Runnable Thread ID: 1507906
  org.cometd.server.ServerSessionImpl.calculateInterval(ServerSessionImpl.java:927)
  org.cometd.server.ServerSessionImpl.scheduleExpiration(ServerSessionImpl.java:641)
  org.cometd.server.AbstractServerTransport.scheduleExpiration(AbstractServerTransport.java:268)
  org.cometd.server.websocket.common.AbstractWebSocketEndPoint$Entry.scheduleExpiration(AbstractWebSocketEndPoint.java:637)
  org.cometd.server.websocket.common.AbstractWebSocketEndPoint$Flusher.process(AbstractWebSocketEndPoint.java:552)
  org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
  org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
  org.cometd.server.websocket.common.AbstractWebSocketEndPoint.flush(AbstractWebSocketEndPoint.java:314)
  org.cometd.server.websocket.common.AbstractWebSocketEndPoint$WebSocketScheduler.lambda$executeFlush$1(AbstractWebSocketEndPoint.java:393)
  org.cometd.server.websocket.common.AbstractWebSocketEndPoint$WebSocketScheduler$$Lambda$7028.run()
  org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
  org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
  java.lang.Thread.run(Thread.java:829)

@sbordet
Copy link
Member

sbordet commented Feb 10, 2024

If it happens again, please perform the BayeuxServer dump as explained above, and then via JMX, call BayeuxServer.sweep(), and check if the sweep actually worked.

Also, consider doing what was done in #960:

subclassing BayeuxServerImpl and overriding sweep to try/catch and log any exception

Let us know how it goes.

@sbordet
Copy link
Member

sbordet commented Apr 24, 2024

@nagarjun-reddy we have just fixed #1716, which likely this issue duplicates.

Please upgrade to the latest CometD version, and report back if the issue has been fixed.
Thanks!

@nagarjun-reddy
Copy link
Author

Thank you @sbordet. It would take sometime for us to upgrade, can this be cherry picked on top of 5.0.14 or are there any other dependencies? Also I think, the symptoms mentioned in this issue #1132 seem related to this fix. Would this also fix this without needing to configure maxProcessing parameter?

@sbordet
Copy link
Member

sbordet commented Apr 29, 2024

can this be cherry picked on top of 5.0.14

No, CometD 5.0.x is at End of Community Support, see #1179.

The fix for #1716 would remove the need to configure the maxProcessing parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants