Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[馃悰 Bug]: Tests are just getting queued up though there are available sessions. #2169

Open
tv2022 opened this issue Mar 14, 2024 · 9 comments

Comments

@tv2022
Copy link

tv2022 commented Mar 14, 2024

What happened?

Suddenly when we run some automation jobs the grid hub queue size is growing and not allocating sessions though we have enough sessions. We had to restart the grid to overcome this issue. This happens very frequently. We are using podman for configuring our grid. This seems relevant to this issue #1865. Tests will run for some time, and suddenly this issue is popping up.

image

Capabilities:


browserName: 'chrome',
    acceptInsecureCerts: true,
    'goog:chromeOptions': {
      prefs: {
        intl: {accept_languages: "en-US"},
      },
      w3c: true,
      args: [
        'incognito',
        '--disable-browser-side-navigation',
        '--ignore-certificate-errors',
        '--no-sandbox', '--start-fullscreen',
        '--disable-infobars',
        '--disable-dev-shm-usage'
      ]
    }

Command used to start Selenium Grid with Docker (or Kubernetes)

podman run -d --name selenium-hub \
 --network myprivate \
 --cpus 2 \
 -p 4442:4442 \
 -p 4443:4443 \
 -p 4444:4444 \
 --restart always \
 --memory 20000M \
 --memory-reservation 19000M \
 -e GRID_BROWSER_TIMEOUT=300 \
 -e GRID_TIMEOUT=300 \
 -e GRID_MAX_SESSION=60 \
 -e GRID_MAX_INSTANCES=3 \
 -e GRID_CLEAN_UP_CYCLE=5000 \
 -e GRID_UNREGISTER_IF_STILL_DOWN_AFTER=180000 \
 -e GRID_NEW_SESSION_WAIT_TIMEOUT=60 \
 docker.io/selenium/hub:4.14.1-20231025



# Set the number of nodes
NUM_NODES=150

NETWORK_NAME=private

MEMORY_LIMIT=4000M
MEMORY_RESERVATION=3900M

for ((i=1; i<=$NUM_NODES; i++)); do
 NODE_NAME="selenium-chrome-node-$i"

 podman run -d --name $NODE_NAME \
  --network $NETWORK_NAME \
  --cpus 1.5 \

Relevant log output

NA

Operating System

Red Hat Enterprise Linux release 8.9 (Ootpa)

Docker Selenium version (image tag)

4.14.1-20231025

Selenium Grid chart version (chart version)

NA

Copy link

@tv2022, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@tv2022 tv2022 changed the title [馃悰 Bug]: Grid hub queue size growing though there are available browser sessions. [馃悰 Bug]: Tests are just getting queued up though there are available sessions. Mar 14, 2024
@VietND96
Copy link
Member

Hi @tv2022, in your tests, is there any setCapability cause the mismatch?
Or is there options that enable remote download? If yes, Node should set env var SE_NODE_ENABLE_MANAGED_DOWNLOADS=true

@VietND96
Copy link
Member

Ok, I missed this statement We had to restart the grid to overcome this issue.
Can you enable the debug logs in Hub by setting SE_LOG_LEVEL=FINE. Once the issue happens again, before restarting the Hub, capture the Hub container logs. With logs, it will help us can understand what is the issue.

@VietND96
Copy link
Member

Also, I saw in the steps, docker.io/selenium/hub:4.14.1-20231025 image was used. Can you also have a try with the recent versions of Hub to see if the issue is reproducible?

@tv2022
Copy link
Author

tv2022 commented Mar 19, 2024

@VietND96 Thanks for the quick response, don't have any download option enabled in capabilities but I still tried adding it and after few hours I saw this issue again. I also changed GRID_MAX_SESSION=60 to GRID_MAX_SESSION=150 as we may trigger multiple jobs at any point, and the concurrent session may go up beyond 60 but still I was able to reproduce the issue.

Sure I will add this SE_LOG_LEVEL=FINE and capture logs with the latest image. Will get back once I am able to reproduce the issue again with this env variable.

Thanks!

@tv2022
Copy link
Author

tv2022 commented Mar 26, 2024

@VietND96 Thanks for the quick response, don't have any download option enabled in capabilities but I still tried adding it and after few hours I saw this issue again. I also changed GRID_MAX_SESSION=60 to GRID_MAX_SESSION=150 as we may trigger multiple jobs at any point, and the concurrent session may go up beyond 60 but still I was able to reproduce the issue.

Sure I will add this SE_LOG_LEVEL=FINE and capture logs with the latest image. Will get back once I am able to reproduce the issue again with this env variable.

Thanks!

@VietND96 I can see the issue again with the latest version 4.18.1-20240224 and SE_LOG_LEVEL=FINE. The log file is huge (8GB) so cannot upload the entire file, I am still trying to analyze the log. Do you have any specific inputs or exception to look for related to this issue?

I see lot of below exception messages in log.

18:25:14.088 DEBUG [ConnectionPool.cleanup] - [HttpClient-445-SelectorManager] [6219s 989ms] ConnectionPool(445) SocketTube(764) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.091 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-445-SelectorManager] [6219s 992ms] SocketTube(759) got read error: java.io.IOException: connection closed locally
18:25:14.091 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-445-SelectorManager] [6219s 992ms] SocketTube(759) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(759))
18:25:14.092 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-445-SelectorManager] [6219s 992ms] SocketTube(759) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.092 DEBUG [ConnectionPool.cleanup] - [HttpClient-445-SelectorManager] [6219s 992ms] ConnectionPool(445) SocketTube(759) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.521 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-466-SelectorManager] [6220s 422ms] SocketTube(812) got read error: java.io.IOException: connection closed locally
18:25:14.521 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-468-SelectorManager] [6220s 422ms] SocketTube(815) got read error: java.io.IOException: connection closed locally
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-463-SelectorManager] [6220s 424ms] SocketTube(805) got read error: java.io.IOException: connection closed locally
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-463-SelectorManager] [6220s 424ms] SocketTube(805) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(805))
18:25:14.521 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-468-SelectorManager] [6220s 422ms] SocketTube(815) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(815))
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-468-SelectorManager] [6220s 424ms] SocketTube(815) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-466-SelectorManager] [6220s 424ms] SocketTube(812) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(812))
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-466-SelectorManager] [6220s 424ms] SocketTube(812) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.525 DEBUG [ConnectionPool.cleanup] - [HttpClient-466-SelectorManager] [6220s 426ms] ConnectionPool(466) SocketTube(812) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.523 DEBUG [ConnectionPool.cleanup] - [HttpClient-468-SelectorManager] [6220s 424ms] ConnectionPool(468) SocketTube(815) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-463-SelectorManager] [6220s 424ms] SocketTube(805) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.525 DEBUG [ConnectionPool.cleanup] - [HttpClient-463-SelectorManager] [6220s 426ms] ConnectionPool(463) SocketTube(805) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.525 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-463-SelectorManager] [6220s 426ms] SocketTube(804) got read error: java.io.IOException: connection closed locally
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-468-SelectorManager] [6220s 428ms] SocketTube(814) got read error: java.io.IOException: connection closed locally
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-468-SelectorManager] [6220s 428ms] SocketTube(814) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(814))
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-466-SelectorManager] [6220s 428ms] SocketTube(811) got read error: java.io.IOException: connection closed locally
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-466-SelectorManager] [6220s 428ms] SocketTube(811) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(811))
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-466-SelectorManager] [6220s 428ms] SocketTube(811) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.527 DEBUG [ConnectionPool.cleanup] - [HttpClient-466-SelectorManager] [6220s 428ms] ConnectionPool(466) SocketTube(811) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-468-SelectorManager] [6220s 428ms] SocketTube(814) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.527 DEBUG [ConnectionPool.cleanup] - [HttpClient-468-SelectorManager] [6220s 428ms] ConnectionPool(468) SocketTube(814) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.528 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-463-SelectorManager] [6220s 428ms] SocketTube(804) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(804))
18:25:14.528 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-463-SelectorManager] [6220s 428ms] SocketTube(804) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.528 DEBUG [ConnectionPool.cleanup] - [HttpClient-463-SelectorManager] [6220s 428ms] ConnectionPool(463) SocketTube(804) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.558 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-441-SelectorManager] [6220s 459ms] SocketTube(751) got read error: java.io.IOException: connection closed locally
18:25:14.621 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-441-SelectorManager] [6220s 521ms] SocketTube(751) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(751))
18:25:14.621 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-441-SelectorManager] [6220s 522ms] SocketTube(751) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.538 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-416-SelectorManager] [6220s 439ms] SocketTube(706) got read error: java.io.IOException: connection closed locally
18:25:14.626 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-463-SelectorManager] [6220s 527ms] SocketTube(816) forwarding error to subscriber: java.io.IOException: connection closed locally

@VietND96
Copy link
Member

Can you check if there are any logs related to the below string that appeared consecutively in Hub?

selenium.common.exceptions.SessionNotCreatedException
Could not start a new session. Could not start a new session. Unable to create new session
LocalDistributor.reserveSlot
LocalDistributor.newSession
Unable to find a free slot for request
Ending request (POST)
LocalNode.stopTimedOutSession
Deleted session from local Session Map

@tv2022
Copy link
Author

tv2022 commented Mar 27, 2024

@VietND96 No, i don't find any exception starting with either first 2 lines.

selenium.common.exceptions.SessionNotCreatedException
Could not start a new session. Could not start a new session. Unable to create new session

Also, no log was found with this exception LocalNode.stopTimedOutSession

Found the below log related to a few sessions.

18:31:57.053 DEBUG [LocalDistributor.reserveSlot] - No slots found for request 035bc19e-125c-4555-bb32-3202d2aa6a56 and capabilities Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}
18:31:57.056 INFO [LocalDistributor.newSession] - Unable to find a free slot for request 035bc19e-125c-4555-bb32-3202d2aa6a56. 
18:31:57.074 DEBUG [SeleniumSpanExporter$1.lambda$export$4] - SpanData{spanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=04b42b799c8c9faf, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=a42873d2fbbd5b30, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, resource=Resource{schemaUrl=null, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.35.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=default, version=null, schemaUrl=null, attributes={}}, name=distributor.new_session, kind=INTERNAL, startEpochNanos=1711477917035085033, endEpochNanos=1711477917073164451, attributes=AttributesMap{data={error=true}, capacity=128, totalAddedValues=1}, totalAttributeCount=1, events=[ImmutableEventData{name=Session request received by the Distributor, attributes={logger="org.openqa.selenium.grid.distributor.local.LocalDistributor", request.payload="[Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}]"}, epochNanos=1711477917035106807, totalAttributeCount=2}, ImmutableEventData{name=exception, attributes={exception.message="Will retry session 035bc19e-125c-4555-bb32-3202d2aa6a56", logger="org.openqa.selenium.grid.distributor.local.LocalDistributor", request.payload="[Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}]"}, epochNanos=1711477917073161864, totalAttributeCount=3}], totalRecordedEvents=2, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=ERROR, description=Kind: ABORTED Description:}, hasEnded=true}
18:31:57.075 WARN [SeleniumSpanExporter$1.lambda$export$1] - Will retry session 035bc19e-125c-4555-bb32-3202d2aa6a56
18:31:57.094 WARN [SeleniumSpanExporter$1.lambda$export$3] - {"traceId": "0052ee5b995e046800c34be87f386338","eventTime": 1711477917073161864,"eventName": "exception","attributes": {"exception.message": "Will retry session 035bc19e-125c-4555-bb32-3202d2aa6a56","logger": "org.openqa.selenium.grid.distributor.local.LocalDistributor","request.payload": "[Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}]"}}
18:31:57.114 DEBUG [SeleniumSpanExporter$1.lambda$export$4] - SpanData{spanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=b8e96ce838da9ebb, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=a42873d2fbbd5b30, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, resource=Resource{schemaUrl=null, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.35.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=default, version=null, schemaUrl=null, attributes={}}, name=distributor.retry, kind=INTERNAL, startEpochNanos=1711477917095127649, endEpochNanos=1711477917113470452, attributes={}, totalAttributeCount=0, events=[ImmutableEventData{name=Retry adding to front of queue. No slot available., attributes={logger="org.openqa.selenium.grid.distributor.local.LocalDistributor$NewSessionRunnable", request="SessionRequest[requestId=035bc19e-125c-4555-bb32-3202d2aa6a56, desiredCapabilities=[Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}], downstreamDialects=[W3C], metadata={desiredCapabilities={browserName=chrome, acceptInsecureCerts=true, goog:chromeOptions={prefs={intl={accept_languages=en-US}}, w3c=true, args=[incognito, --disable-browser-side-navigation, --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage]}}}, traceHeaders={traceparent=00-0052ee5b995e046800c34be87f386338-6ea589eaa1851fd2-01}]", request.id="035bc19e-125c-4555-bb32-3202d2aa6a56", request.retry_add=true}, epochNanos=1711477917113468586, totalAttributeCount=4}], totalRecordedEvents=1, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=UNSET, description=}, hasEnded=true}
18:31:57.115 DEBUG [SeleniumSpanExporter$1.lambda$export$3] - {"traceId": "0052ee5b995e046800c34be87f386338","eventTime": 1711477917113468586,"eventName": "**Retry adding to front of queue**. No slot available.","attributes": {"logger": "org.openqa.selenium.grid.distributor.local.LocalDistributor$NewSessionRunnable","request": "SessionRequest[requestId=035bc19e-125c-4555-bb32-3202d2aa6a56, desiredCapabilities=[Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}], downstreamDialects=[W3C], metadata={desiredCapabilities={browserName=chrome, acceptInsecureCerts=true, goog:chromeOptions={prefs={intl={accept_languages=en-US}}, w3c=true, args=[incognito, --disable-browser-side-navigation, --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage]}}}, traceHeaders={traceparent=00-0052ee5b995e046800c34be87f386338-6ea589eaa1851fd2-01}]","request.id": "035bc19e-125c-4555-bb32-3202d2aa6a56","request.retry_add": true}}
18:31:57.115 DEBUG [SeleniumSpanExporter$1.lambda$export$4] - SpanData{spanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=a42873d2fbbd5b30, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=6ea589eaa1851fd2, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=true, valid=true}, resource=Resource{schemaUrl=null, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.35.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=default, version=null, schemaUrl=null, attributes={}}, name=distributor.poll_queue, kind=INTERNAL, startEpochNanos=1711477917035000000, endEpochNanos=1711477917114692381, attributes=AttributesMap{data={request.id=035bc19e-125c-4555-bb32-3202d2aa6a56}, capacity=128, totalAddedValues=1}, totalAttributeCount=1, events=[], totalRecordedEvents=0, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=UNSET, description=}, hasEnded=true}
18:32:01.320 DEBUG [SeleniumSpanExporter$1.lambda$export$4] - SpanData{spanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=36d1570e9bd5af94, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=6ea589eaa1851fd2, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=true, valid=true}, resource=Resource{schemaUrl=null, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.35.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=default, version=null, schemaUrl=null, attributes={}}, name=distributor.poll_queue, kind=INTERNAL, startEpochNanos=1711477920290000000, endEpochNanos=1711477921320458446, attributes=AttributesMap{data={request.id=035bc19e-125c-4555-bb32-3202d2aa6a56}, capacity=128, totalAddedValues=1}, totalAttributeCount=1, events=[], totalRecordedEvents=0, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=UNSET, description=}, hasEnded=true}

@adam-chaters
Copy link

adam-chaters commented May 1, 2024

I don't currently have time to add much detail, but I did want to chime in that I'm seeing the same got read error: java.io.IOException: connection closed locally issues, where the Hub pod is online and seems to be staying that way, but it is struggling to communicate with the nodes as they register.

Edit:
Resolved my inter-node errors by ensuring the chart dependencies were rebuilt and redeploying the SE 4.20 images to a clean environment. My pods are still dying off quickly after start, but this appears to be related to the startup probe; turning it off kept my containers running as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants