regression tests getting stuck with jetty upgrade of v9.4.33 #5922

4devwithgit · 2021-01-27T09:17:24Z

Jetty version -v9.4.33

Java version - IBM JDK 8 SR6 FP20

**After we upgraded the jetty in our product from v9.4.26 to 9.4.33/9.4.35, the regression tests cases are consistently getting stuck after running for couple of hours at different tests cases. But if we downgrade the jetty version back to 9.4.26, the tests continues as usual.
We do have some stack trace where jetty classes are getting into some exceptions but yet to locate a clear test case which we can consistently reproduce.
We have also reviewed thread dumps etc when the tests cases were hung, but they dont indicate and issues. Neither the VM nor the DB connectivity were found to be problematic, so are at loss ideas to understand the issue

So, any ideas to come out of this issue or resolve it is highly appreciated. We think this behavior to be jetty upgrade related, but we have not got clear log errors to log a bug, so that's where we need help.**

gregw · 2021-01-27T15:45:53Z

Not much there for us to go on! Perhaps share your thread dumps and stack traces?
Also if you can get a jetty server dump as well?

gregw · 2021-01-27T15:46:47Z

Also, tell us about your app. Is it using async servlets? async IO? websocket? JDBC?????

4devwithgit · 2021-01-28T07:48:34Z

Here is the thread dump, though we dont see much with jetty threads
swathi_regression_jvm_dumps.zip

4devwithgit · 2021-01-29T09:28:19Z

We are using jetty with the http server for the product Sterling B2B integrator. We dont use asyncServlets, but it does have JDBC, websocket.
I believe you mentioned about jetty server dump. But, since, we dont really see error with jetty threads, will it be really useful here to get jetty server dump?

4devwithgit · 2021-02-01T09:25:21Z

Any update or findings on this issue?

4devwithgit · 2021-02-01T09:48:52Z

We have 30k test cases, so will the jetty server dump going to help? Its getting stuck after 13.5k tests, just wondering if it fills up the machine, without generating the key data point that we need?

janbartel · 2021-02-01T10:34:19Z

The thread dumps don't even mention jetty, and I don't even see any jetty classes listed in the java classpath: looks like all that is listed is just cruisecontrol, not what cruisecontrol is running. I would try running these tests outside of cruisecontrol, maybe that will give you some better thread dumps.

When you say the tests "hang" what does that mean? Is there a garbage collect in progress? Are there enough server resources - file descriptors, memory etc? Is cruisecontrol itself experiencing some problem?

BTW the suggestion of doing a server dump was so that we could see what your jetty configuration and deployment look like.

4devwithgit · 2021-02-02T05:27:37Z

Do you have any suggestion, if our tests are hung, how can we troubleshoot it wrt jetty?

4devwithgit · 2021-02-02T06:00:45Z

The VM where the CI tests are running is at 75 % CPU
free -g
total used free shared buff/cache available
Mem: 45 8 1 0 36 36
Swap: 12 3 8

4devwithgit · 2021-02-02T07:48:54Z

we are using junit-4.4

4devwithgit · 2021-02-03T12:56:41Z

@janbartel do u have any response based on my previous messages?
This is a critical issue for us, and all the customers of Sterling B2B Integrator, and we cant really upgrade from 9.4.26 to higher secure version (9.4.33 psirt fix)given our regressions tests are not completing.

jmcc0nn3ll · 2021-02-03T13:13:33Z

If time is an issue you may want to consider support through webtide.com since open source support is on an as available basis, especially if you are hesitant to share information. This sort of triage is a normal aspect of that support and is typically isolated or specific enough to a situation like yours as to be ill-suited for support in this project forum. It would be different if you could point to a specific commit or issue that is causing your problem but asking for triage is a nebulous ask.

gregw · 2021-02-03T13:26:28Z

@4devwithgit sorry but we just don't have enough information. We don't even known what "getting stuck" means in your context? Is it jetty not responding? Or just a test that doesn't complete?

Of your 30k tests, you say it is getting "stuck" after 13.5k of them. Can you identify that individual test that it get's stuck on? Can you run just that test by itself? Does it pass? If you remove that test then do the 29.999k tests remaining pass? or do you just get stuck at the 13.501k test?

Ultimately we need to see something that is actually stuck, with a description of what it is stuck waiting for. Ideally then with a thread dump and a server dump to match. If you can provide us some of this information here, then we can assist in the open source project. But if you can't provide us with any more information publicly and this is time critical, then please do consider commercial support.

@joakime @lachlan-roberts Can you think of any websocket changes since 9.4.26 that could cause an app to become stuck?

joakime · 2021-02-03T13:58:52Z

Changes in websocket since 9.4.26

Reduce log level for WebSocket connections closed by clients #5785 - removed log warning message on CompressExtension failure
639cad6 - present parser warnings better in logs
Implement and test a WebSocket Proxy with the 9.4 Jetty API #5726 - added example of proxy support for websocket
Improve temporary buffer usage for WebSocket PerMessageDeflate. #5499 - use ByteBufferAccumulator for compression extensions
WebSocket text event execute in same thread as running binary event and destroy Threadlocal #5368 - when using streaming delivery (InputStream / Reader) only move to next message once the onMessage() call exits, warn if did not read until EOF.
Using WebSocketClient with jetty-websocket-httpclient.xml in a Jetty web application causes ClassCastException #5320 - jetty-websocket-httpclient.xml uses HttpClient classloader with server classpath access
NullPointerException in HttpReceiverOverHTTP during WebSocket client Upgrade #5170 - backport jetty-10 based fix for NPE in HttpReceiverOverHTTP during WebSocket client upgrade
WebSocket server outgoing message queue memory growth #4824 - add optional maxOutgoingFrames configuration for users of async writes
WebSocket unimplemented BINARY message handling can result in TEXT message delivery to fail #5193 - fix lost message bug when using single onMessage annotation
Issue #5122 - Improve connection statistics for WebSocket #5125 - improved WebSocket stats
Retrieving websocket connections via jmx #5122 - expose more details on JMX
Improve SessionTracker scalability #5108 - removing concurrency issues on SessionTrackers for high session count users (if more than 200,000 active session)
WebSocketClient connect / upgrade timeout not configurable #5018 - cancel WebSocketClient.Connect future on all upgrade failures (and connect/request idle timeout)
CWE-331 in DigestAuthentication class #5053 - CWE-331 - give option for SecureRandom on websocket masker (defaults to pseudo random)
Simplify Connection.upgradeFrom()/upgradeTo() #4971 - Connection.upgradeFrom()/upgradeTo() cleanup of "floating" buffer
JSR356 Encoder#init is not called when created on demand #1100 - ensure init and destroy are always called on JSR356 Encoders
Give better errors for non public Websocket Endpoints #4903 - more fixes to javax.websocket for TCK, all around validation of various components before use (ServerEndpointConfig, Configurator, Annotations, and method modifiers)
Do not use ServiceLoader every time a WebSocketSession is created #4650 - do not use ServiceLoader every time a WebSocket Session is started
Issue #847 - deprecate asyncWriteTimeout in the jetty websocket API #4597 - deprecate Async Write Timeout in Jetty WebSocket API (app timeouts best handled by app callbacks and/or futures)
High CPU on Jetty Websocket thread #4537 - address intermittent spin when discarding a completed WebSocketConnection
WebSocket JSR356 implementation not honoring javadoc of MessageHandler on Whole<Reader> #4475 - add state machine for streaming message delivery to minimize ordering concerns with the multithreaded delivery of messages (superceded by WebSocket text event execute in same thread as running binary event and destroy Threadlocal #5368)

And lots of new tests, javadoc updates, and documentation updates.

joakime · 2021-02-03T14:13:16Z

@gregw if their code is using InputStream or Reader as a message delivery option with the javax.websocket API and the OP has implemented some kind of workaround for message deliver (or message order) because of how the threading works with the streaming delivery options in 9.4.26 then those workarounds will likely be the cause of the issues they are experiencing now.

Keep in mind that InputStream and Reader options in the API are not designed for delivery of lots of messages on the websocket connection, they are designed for those users that need a single, long term, stream of data over the connection. Think video transfer, audio transfer, games, etc. Those that use it to deliver many messages are often surprised by the need dispatch each and every message to a new thread (per the API spec). This has, historically, resulted in users of the javax.websocket API not understanding that because of the dispatch nature of using of the Stream API the messages can appear to arrive out of order to the application, but actually arrived in order on the connection. 9.4.26 had this behavior (and many projects aware of this behavior worked around this in their own code). 9.4.36 does not have this behavior anymore. We changed it to not read/parse the next message until the active onMessage(InputStream) call (or equiv) has exited. This change was done for two reasons, to make things easier for the users of the API, and also to alleviate the thread usage spikes that occur when applications receive lots of small ("small" in this context is under 40MB) messages on the connection.

Finally, for this specific issue, we have no details on what the "stuck" is or means or how it manifests.

4devwithgit · 2021-02-04T07:15:45Z

Thanks for the above explanation. @joakime @gregw @jmcc0nn3ll

Just to clarify further,
9.4.26 - There is no issue seen, and this is what the product is using right now.
9.4.33 - This had the psirt fix, and so we wanted to upgrade to minimum this version. But, we see the issue here.
9.4.35 - We see the same issue here as well.
9.4.36 - We have not tested it yet.
Latest tests - 9.4.27 - its completing all the tests.

So, does the above explanations holds good with the behavior seen above versions used in our product?
Is there a work around possible for the hung state like some flag or code change, which we can try out in our product to overcome this issue?

We are not really hesitant to give more information on the issue, but, we really don't have any concrete information.

logs - we cant enable jetty logs, as we have 30k tests, so machine will go out of space, before fetching us relevant data point.
thread dumps - already shared. But as you too shared, we really dont have anything pointing out to jetty threads.
junit test case - the issue is not specific to one test case. If we remove that specific tests, it gets stuck with some other tests after crossing 13k or 14k mark.
Please also note, the hung behavior is seen when it moves from one test suite to another test suite. It not seen WHILE running a certain test case, but when it transitions from one test suite to another test suite, and probably while loading new test suite it gets hung. So, when its hung, the current tests will have 0 tests cases run, while the previous will have all the tests cases executed.
I will try to share the jetty server dumps as soon as I can.
To narrow down the issue with jetty versions, we are trying out with different versions between 9.4.26 and 9.4.36, will update here as soon as I have the results.

gregw · 2021-02-04T17:19:50Z

So it is hanging between tests. Potentially jetty is leaking something or filling something up? But still hard to say without your test frame work.

Is this possible: use you test framework to start a jetty server how you currently start it and deploy the simplest webapp possible. Then have a really simple test that you some how duplicate 20k times. This might tickle the same problem and hang the test framework after 13k to 14k tests, in which case you can give us the whole code as it will not have your application in it.

Even if that is impossible, if you can give us something that shows how you: start jetty; deploy webapps; send test request; stop the server after the test. We can then try the same.

Do you use the websocket client at all?

gregw · 2021-02-04T17:20:52Z

The other thing to do is to get get your application and stop/stop it 15k times in a similar environment to your test setup and see what happens.

4devwithgit · 2021-02-05T12:30:42Z

Thanks for the suggestions @gregw . I will see if I can get to generate data using the approaches you suggested.

Meanwhile, I ran tests using 9.4.27 and 9.4.29, and both passed i.e. no hung state for tests.

We are testing it now with the versions,
9.4.30
9.4.36

Does 9.4.36 have any known issues that we need to be aware?

4devwithgit · 2021-02-07T12:20:03Z

I see regressions tests are getting completed, with 9.4.30 and 9.4.36. So, most likely the issue is introduced in jetty v9.4.31/32.

We will evaluate, if we can upgrade to 9.4.36, since its quite new, we need to review.

Thanks
Dev

joakime · 2021-02-24T21:42:10Z

9.4.37.v20210219 has been released.

9.4.38 is in progress as well.

The OP still has not provided any actionable information about his reported regression.
No other user of these features (and we have some exceedingly aggressive users of the websocket features) have reported a regression.

4devwithgit · 2021-02-25T10:44:11Z

with 9.4.36, our 6.0.3.4 release is working fine.
However, our next release 6.1.0.2 is giving the same behavior of tests getting stuck. We are planning to run some Performance tests, and if we see anything wrt jetty, I will keep it posted here.

Is it possible to port the fix from 9.4.33 version to 9.4.26?
CVEs: (details as of the time of ADV creation)
CVEID: CVE-2020-27216
Description: Eclipse Jetty could allow a local authenticated attacker to gain elevated privileges on the system, caused by a race condition in the creation of the temporary subdirectory. By sending a specially-crafted request, an authenticated attacker could exploit this vulnerability to gain elevated privileges.
CVSS Base Score: 7.8
CVSS Temporal Score: https://exchange.xforce.ibmcloud.com/vulnerabilities/190474 for more information
CVSS Vector: (CVSS:3.0/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H)

Also, is it possible to reveal the details of the fix?

Thanks

gregw · 2021-02-25T10:52:48Z

@4devwithgit backporting fixes to specific versions is a service that we provide for our commercial support clients. We can't do that on an open source basis, else we'd end up with infinite versions to support.

The details of the fix are in #5452, so you can build your own version.

Alternately, use one of the work arounds and wait until a recent release is mature enough for you.

4devwithgit · 2021-02-25T11:05:19Z

"Alternately, use one of the work arounds and wait until a recent release is mature enough for you." what is the work around you are referring to?

joakime · 2021-02-25T13:00:00Z

List of security advisories https://www.eclipse.org/jetty/security_reports.php

That specific advisory - github.com/eclipse/jetty.project/security/advisories/GHSA-g3wg-6mcf-8jj6

joakime · 2021-02-25T13:07:52Z

Note, there are 2 followup PRs that addresses issues within Multipart and PutFilter that are also impacted by the CVE you listed.
See PRs #5453 and #5458 as well.

github-actions · 2022-02-26T00:00:40Z

This issue has been automatically marked as stale because it has been a
full year without activity. It will be closed if no further activity occurs.
Thank you for your contributions.

github-actions · 2022-03-29T00:00:25Z

This issue has been closed due to it having no activity.

4devwithgit added the Question label Jan 27, 2021

gregw added the More Info Required label Jan 27, 2021

github-actions bot added the Stale For auto-closed stale issues and pull requests label Feb 26, 2022

github-actions bot closed this as completed Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regression tests getting stuck with jetty upgrade of v9.4.33 #5922

regression tests getting stuck with jetty upgrade of v9.4.33 #5922

4devwithgit commented Jan 27, 2021 •

edited

gregw commented Jan 27, 2021

gregw commented Jan 27, 2021

4devwithgit commented Jan 28, 2021

4devwithgit commented Jan 29, 2021

4devwithgit commented Feb 1, 2021

4devwithgit commented Feb 1, 2021

janbartel commented Feb 1, 2021

4devwithgit commented Feb 2, 2021

4devwithgit commented Feb 2, 2021

4devwithgit commented Feb 2, 2021

4devwithgit commented Feb 3, 2021

jmcc0nn3ll commented Feb 3, 2021

gregw commented Feb 3, 2021

joakime commented Feb 3, 2021 •

edited

joakime commented Feb 3, 2021

4devwithgit commented Feb 4, 2021 •

edited

gregw commented Feb 4, 2021

gregw commented Feb 4, 2021

4devwithgit commented Feb 5, 2021

4devwithgit commented Feb 7, 2021

joakime commented Feb 24, 2021

4devwithgit commented Feb 25, 2021

gregw commented Feb 25, 2021

4devwithgit commented Feb 25, 2021

joakime commented Feb 25, 2021

joakime commented Feb 25, 2021

github-actions bot commented Feb 26, 2022

github-actions bot commented Mar 29, 2022

regression tests getting stuck with jetty upgrade of v9.4.33 #5922

regression tests getting stuck with jetty upgrade of v9.4.33 #5922

Comments

4devwithgit commented Jan 27, 2021 • edited

gregw commented Jan 27, 2021

gregw commented Jan 27, 2021

4devwithgit commented Jan 28, 2021

4devwithgit commented Jan 29, 2021

4devwithgit commented Feb 1, 2021

4devwithgit commented Feb 1, 2021

janbartel commented Feb 1, 2021

4devwithgit commented Feb 2, 2021

4devwithgit commented Feb 2, 2021

4devwithgit commented Feb 2, 2021

4devwithgit commented Feb 3, 2021

jmcc0nn3ll commented Feb 3, 2021

gregw commented Feb 3, 2021

joakime commented Feb 3, 2021 • edited

joakime commented Feb 3, 2021

4devwithgit commented Feb 4, 2021 • edited

gregw commented Feb 4, 2021

gregw commented Feb 4, 2021

4devwithgit commented Feb 5, 2021

4devwithgit commented Feb 7, 2021

joakime commented Feb 24, 2021

4devwithgit commented Feb 25, 2021

gregw commented Feb 25, 2021

4devwithgit commented Feb 25, 2021

joakime commented Feb 25, 2021

joakime commented Feb 25, 2021

github-actions bot commented Feb 26, 2022

github-actions bot commented Mar 29, 2022

4devwithgit commented Jan 27, 2021 •

edited

joakime commented Feb 3, 2021 •

edited

4devwithgit commented Feb 4, 2021 •

edited