Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[馃悰 Bug]: Random GOAWAY frame received from Selenium Grid #2195

Open
cezzarez opened this issue Apr 7, 2024 · 4 comments
Open

[馃悰 Bug]: Random GOAWAY frame received from Selenium Grid #2195

cezzarez opened this issue Apr 7, 2024 · 4 comments

Comments

@cezzarez
Copy link

cezzarez commented Apr 7, 2024

What happened?

We have a Selenium Grid setup using KEDA and Selenium Grid helm chart. KEDA v.2.13.0 and Selenium Grid helm chart v.0.28.3.
When we are testing some of the pages we randomly received the:

Caused by: java.io.IOException: /10.251.7.8:35426: GOAWAY received
From Selenium Grid.

The settings we have for Egde node:
SE_NODE_SESSION_TIMEOUT=600 (because first we saw the timeout was reached, but 600 is much more than we need for tests)
SE_SESSION_REQUEST_TIMEOUT=450 (a lot of values tested here, looks like it does not change anything)
SE_JAVA_OPTS=-Dwebdriver.http.factory=jdk-http-client -Djdk.httpclient.websocket.intermediateBufferSize=3000000 (also different values tested for Buffer, but did not see any change for that)
shm size = 4Gi (it was also increase/decreased and I do not see any changes for this param)

In the HUB I even change the Healthcheck interval from 120 to 300 because I though, that maybe it is because of the timeout.

I launch the logs level to ALL and to be honest still nothing special on EDGE or HUB hosts observed.

In the EDGE I can see the error:
09:27:16.633 DEBUG [PlainHttpConnection.close] - [HttpClient-1-SelectorManager] [186s 134ms] PlainHttpConnection(SocketTube(1)) Closing channel: channel registered with selector, key.interestOps=1, sa.interestOps=1
09:27:16.634 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-1-SelectorManager] [186s 134ms] SocketTube(1) got read error: java.io.IOException: connection closed locally
09:27:16.634 DEBUG [SocketTube.debugState] - [HttpClient-1-SelectorManager] [186s 134ms] SocketTube(1) leaving read() loop after EOF: Reading: [ops=0, demand=0, stopped=true], Writing: [ops=0, demand=1]
09:27:16.634 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-1-SelectorManager] [186s 134ms] SocketTube(1) Read scheduler stopped
09:27:16.634 DEBUG [HttpClientImpl$SelectorAttachment.resetInterestOps] - [HttpClient-1-SelectorManager] [186s 134ms] SelectorAttachment key cancelled for java.nio.channels.SocketChannel[closed]
09:27:16.635 DEBUG [SocketTube$SocketFlowEvent.abort] - [HttpClient-1-SelectorManager] [186s 135ms] SocketTube(1) abort: java.io.IOException: java.nio.channels.CancelledKeyException
09:27:16.635 DEBUG [HttpClientImpl$SelectorManager.run] - [HttpClient-1-SelectorManager] [186s 135ms] HttpClientImpl(1) next timeout: 0
09:27:16.635 DEBUG [HttpClientImpl$SelectorManager.run] - [HttpClient-1-SelectorManager] [186s 135ms] HttpClientImpl(1) next expired: 0
09:27:16.635 DEBUG [HttpClientImpl$SelectorManager.run] - [HttpClient-1-SelectorManager] [186s 135ms] HttpClientImpl(1) Next deadline is 3000
09:27:16.643 DEBUG [UrlChecker.lambda$waitUntilUnavailable$2] - Polling http://localhost:31879/shutdown
09:27:16.643 FINEST [HttpURLConnection.plainConnect0] - ProxySelector Request for http://localhost:31879/shutdown

But I cannot find anything interesting in the issues here or Stack Overflow about this "part" of logs...

In the HUB logs (ALL level) nothig what could suggest anythig.

So what happened...?
No idea, we run the tests, we see the pod is created, but sometimes (in a very random moment) the GOAWAY frame is received.
It suggest the Selenium is closed, but in logs there is no information what exactly is the problem.
I am now sure it is not because of the memory, timeouts or anything like that, but maybe you as an experts know what could it be?

I want to just add, that sometimes the tests are going well, and are finished with SUCCESS state.

Command used to start Selenium Grid with Docker (or Kubernetes)

It would be just helm install for Selenium Grid helm chart on k8s.
You have to also set the autoscale to true to install KEDA.

Relevant log output

In the EDGE I can see the error: 
09:27:16.633 DEBUG [PlainHttpConnection.close] - [HttpClient-1-SelectorManager] [186s 134ms] PlainHttpConnection(SocketTube(1)) Closing channel: channel registered with selector, key.interestOps=1, sa.interestOps=1
09:27:16.634 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-1-SelectorManager] [186s 134ms] SocketTube(1) got read error: java.io.IOException: connection closed locally
09:27:16.634 DEBUG [SocketTube.debugState] - [HttpClient-1-SelectorManager] [186s 134ms] SocketTube(1) leaving read() loop after EOF:  Reading: [ops=0, demand=0, stopped=true], Writing: [ops=0, demand=1]
09:27:16.634 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-1-SelectorManager] [186s 134ms] SocketTube(1) Read scheduler stopped
09:27:16.634 DEBUG [HttpClientImpl$SelectorAttachment.resetInterestOps] - [HttpClient-1-SelectorManager] [186s 134ms] SelectorAttachment key cancelled for java.nio.channels.SocketChannel[closed]
09:27:16.635 DEBUG [SocketTube$SocketFlowEvent.abort] - [HttpClient-1-SelectorManager] [186s 135ms] SocketTube(1) abort: java.io.IOException: java.nio.channels.CancelledKeyException
09:27:16.635 DEBUG [HttpClientImpl$SelectorManager.run] - [HttpClient-1-SelectorManager] [186s 135ms] HttpClientImpl(1) next timeout: 0
09:27:16.635 DEBUG [HttpClientImpl$SelectorManager.run] - [HttpClient-1-SelectorManager] [186s 135ms] HttpClientImpl(1) next expired: 0
09:27:16.635 DEBUG [HttpClientImpl$SelectorManager.run] - [HttpClient-1-SelectorManager] [186s 135ms] HttpClientImpl(1) Next deadline is 3000
09:27:16.643 DEBUG [UrlChecker.lambda$waitUntilUnavailable$2] - Polling http://localhost:31879/shutdown
09:27:16.643 FINEST [HttpURLConnection.plainConnect0] - ProxySelector Request for http://localhost:31879/shutdown

From Jenkins, where we run the tests I can see randomly:
Caused by: java.io.UncheckedIOException: java.io.IOException: /10.251.7.8:35426: GOAWAY received

Operating System

Kubernetes (EKS)

Docker Selenium version (image tag)

4.18.1-20240224

Selenium Grid chart version (chart version)

0.28.3

Copy link

github-actions bot commented Apr 7, 2024

@cezzarez, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@diemol
Copy link
Member

diemol commented Apr 9, 2024

Why do you need these options?

SE_JAVA_OPTS=-Dwebdriver.http.factory=jdk-http-client -Djdk.httpclient.websocket.intermediateBufferSize=3000000

-Dwebdriver.http.factory=jdk-http-client is the default one since 4.14.

What test script can we use to replicate the issue?

@cezzarez
Copy link
Author

cezzarez commented Apr 9, 2024

Hello @diemol !
Thank you for the reply.
About the options, in the previous versions I was using them, as it was the internal requirement, probably missed the note about the default settings. Just removed them.

About the replication, not sure if it will be possible to do on your site...
Our tests are using internal pages and the source code cannot be shared.

The tests are pretty complex also (they can took some time), but as I mentioned it is for sure not the timeout problem, because the GOAWAY frame is very random.

Haven't you see similar cases before? Maybe you have some hints, what could be the reason of it?

@diemol
Copy link
Member

diemol commented Apr 9, 2024

I've never seen this reported before, so I would not know where to start without a way to reproduce it. Could it be something in your infrastructure?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants