Weird behavior of TcpServerConnection #311

tessob · 2021-04-08T14:39:52Z

Hello guys!

I rather need a clarification, but not a help. Currently I'm trying to run application in K8s and everything is good so far, but I have a little bit weird logs:

2021-04-08 13:51:59.890  INFO 1 --- [.IO.thread-in-0] c.h.i.server.tcp.TcpServerConnection     : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=10, /10.154.24.57:5701->/10.154.1.219:15106, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:51:59.890  INFO 1 --- [.IO.thread-in-1] c.h.i.server.tcp.TcpServerConnection     : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=11, /10.154.24.57:5701->/10.154.47.197:32686, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:51:59.890  INFO 1 --- [.IO.thread-in-2] c.h.i.server.tcp.TcpServerConnection     : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=12, /10.154.24.57:5701->/10.154.5.157:10454, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:52:00.911  INFO 1 --- [ration.thread-0] c.h.internal.cluster.ClusterService      : [10.154.24.57]:5701 [WORKFLOW] [4.2] 

Members {size:2, ver:2} [
	Member [10.154.24.57]:5701 - 94f6d56b-a72d-4d0a-82ee-a33e47053f78 this
	Member [10.154.92.185]:5701 - 1f66b061-f995-46c0-b886-dd0ee1a49dc3
]

2021-04-08 13:52:04.335  INFO 1 --- [.IO.thread-in-0] c.h.i.server.tcp.TcpServerConnection     : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=13, /10.154.24.57:5701->/10.154.6.102:16924, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:52:04.335  INFO 1 --- [.IO.thread-in-1] c.h.i.server.tcp.TcpServerConnection     : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=14, /10.154.24.57:5701->/10.154.19.176:5548, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:52:04.336  INFO 1 --- [.IO.thread-in-2] c.h.i.server.tcp.TcpServerConnection     : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=15, /10.154.24.57:5701->/10.154.5.124:47016, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:52:04.336  INFO 1 --- [.IO.thread-in-0] c.h.i.server.tcp.TcpServerConnection     : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=16, /10.154.24.57:5701->/10.154.1.219:52814, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:52:09.889  INFO 1 --- [.IO.thread-in-2] c.h.i.server.tcp.TcpServerConnection     : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=18, /10.154.24.57:5701->/10.154.19.176:32638, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:52:09.889  INFO 1 --- [.IO.thread-in-1] c.h.i.server.tcp.TcpServerConnection     : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=17, /10.154.24.57:5701->/10.154.29.156:1518, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:52:09.889  INFO 1 --- [.IO.thread-in-0] c.h.i.server.tcp.TcpServerConnection     : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=19, /10.154.24.57:5701->/10.154.47.197:32712, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side

It seems like current node constantly trying to establish connection with other pods "using all variety of irrelevant ports". Is it normal behavior or not? Is there a way to restrict it somehow?

My configs:

  @Bean
  public Config hazelcastConfig() {
    Config cfg = new Config();
    cfg.setClusterName("WORKFLOW");
    cfg.getNetworkConfig().setPort(5701).setPortAutoIncrement(false);
    cfg.setSerializationConfig(serializationConfig());
    cfg.setMapConfigs(getMapConfigs());
    cfg.setProperty("hazelcast.logging.type", "slf4j");

    if (isApplicationInsideKube()) {
      JoinConfig joinConfig = cfg.getNetworkConfig().getJoin();
      joinConfig.getAutoDetectionConfig().setEnabled(false);
      joinConfig.getTcpIpConfig().setEnabled(false);
      joinConfig.getMulticastConfig().setEnabled(false);
      joinConfig.getKubernetesConfig().setEnabled(true)
          .setProperty("service-name", "workflow-service")
          .setProperty("service-port", "5701");
      log.info(joinConfig.toString());
    }

    log.info(cfg.toString());
    return cfg;
  }

Best regards & Thank you in advance,

The text was updated successfully, but these errors were encountered:

alparslanavci · 2021-04-08T14:49:33Z

@tessob thanks for submitting the issue. Could you please tell us more about your k8s environment? Are there any proxy/service mesh setup?

tessob · 2021-04-08T15:15:46Z

@alparslanavci no, we have no proxy/service mesh. Our Service manifest is similar to this one. There are just two pods in separate namespace now. However cluster itself is bigger.

Our JoinConfig on application start looks like this:

JoinConfig{multicastConfig=MulticastConfig [enabled=false, multicastGroup=224.2.2.3, multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[], loopbackModeEnabled=true], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSeconds=5, members=[], requiredMember=null], awsConfig=AliasedDiscoveryConfig{tag='aws', enabled=false, usePublicIp=false, properties={}}, gcpConfig=AliasedDiscoveryConfig{tag='gcp', enabled=false, usePublicIp=false, properties={}}, azureConfig=AliasedDiscoveryConfig{tag='azure', enabled=false, usePublicIp=false, properties={}}, kubernetesConfig=AliasedDiscoveryConfig{tag='kubernetes', enabled=true, usePublicIp=false, properties={service-name=workflow-service, service-port=5701}}, eurekaConfig=AliasedDiscoveryConfig{tag='eureka', enabled=false, usePublicIp=false, properties={}}, discoveryConfig=DiscoveryConfig{discoveryStrategyConfigs=[], discoveryServiceProvider=null, nodeFilter=null, nodeFilterClass='null'}, autoDetectionConfig=AutoDetectionConfig{enabled=false}}

and Discovery properties like this:

Kubernetes Discovery properties: { service-dns: null, service-dns-timeout: 5, service-name: workflow-service, service-port: 5701, service-label: null, service-label-value: true, namespace: XXXXX, pod-label: null, pod-label-value: null, resolve-not-ready-addresses: true, use-node-name-as-external-address: false, kubernetes-api-retries: 3, kubernetes-master: https://kubernetes.default.svc}

and there are few additional messages that I don't realy know hot to interprete:

2021-04-08 13:51:46.207  INFO 1 --- [           main] c.h.s.d.integration.DiscoveryService     : [10.154.24.57]:5701 [WORKFLOW] [4.2] Kubernetes Discovery activated with mode: KUBERNETES_API
2021-04-08 13:51:46.499  INFO 1 --- [           main] com.hazelcast.instance.impl.Node         : [10.154.24.57]:5701 [WORKFLOW] [4.2] Using Discovery SPI
2021-04-08 13:51:47.783  INFO 1 --- [           main] com.hazelcast.core.LifecycleService      : [10.154.24.57]:5701 [WORKFLOW] [4.2] [10.154.24.57]:5701 is STARTING
2021-04-08 13:51:48.603  INFO 1 --- [           main] c.h.s.d.integration.DiscoveryService     : [10.154.24.57]:5701 [WORKFLOW] [4.2] Kubernetes plugin discovered availability zone: eu-XXXXXX
2021-04-08 13:51:48.771  INFO 1 --- [           main] c.h.s.d.integration.DiscoveryService     : [10.154.24.57]:5701 [WORKFLOW] [4.2] Kubernetes plugin discovered node name: ip-XXXXXXX.compute.internal
2021-04-08 13:51:48.990  WARN 1 --- [           main] c.hazelcast.kubernetes.KubernetesClient  : Cannot fetch public IPs of Hazelcast Member PODs, you won't be able to use Hazelcast Smart Client from outside of the Kubernetes network

I hope it would be helpful.
BR,

tessob · 2021-04-09T07:54:38Z

Short update,

I have checked out morning logs with 20K attempts to find other nodes and then taken a look at source codes to comprehend where the client has got the list of irrelevant endpoints.

I reproduced this API call manually with the following curl query:

kubectl proxy --port=0 &

curl http://localhost:36891/api/v1/namespaces/XXXXXXXXX/endpoints/workflow-service

and got this output:

{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "workflow-service",
    "namespace": "XXXXXXXXX",
    "selfLink": "/api/v1/namespaces/XXXXXXXXX/endpoints/workflow-service",
    "uid": "64ffb585-e68a-41bc-97bf-fb54e8f3fdc1",
    "resourceVersion": "113176905",
    "creationTimestamp": "2021-04-08T11:01:06Z",
    "labels": {
      "fluxcd.io/sync-gc-mark": "sha256.VJun3ud720pxL2mM3fjAbo3OoiEfk8bQDeZA8wZQx7c"
    }
  },
  "subsets": [
    {
      "addresses": [
        {
          "ip": "10.154.24.57",
          "nodeName": "XXXXXXXXX.compute.internal",
          "targetRef": {
            "kind": "Pod",
            "namespace": "XXXXXXXXX",
            "name": "workflow-tracker-service-5769dc77cf-9rnhp",
            "uid": "7c84a0c5-7d9b-4cb9-b1d8-d6bb0d08979e",
            "resourceVersion": "113176843"
          }
        },
        {
          "ip": "10.154.92.185",
          "nodeName": "XXXXXXXXX.compute.internal",
          "targetRef": {
            "kind": "Pod",
            "namespace": "XXXXXXXXX",
            "name": "workflow-tracker-service-5769dc77cf-zf9pd",
            "uid": "29737169-ff5e-49ba-902f-1756dc4e33db",
            "resourceVersion": "113176895"
          }
        }
      ],
      "ports": [
        {
          "name": "workflow-s",
          "port": 8080,
          "protocol": "TCP"
        },
        {
          "name": "hazelcast",
          "port": 5701,
          "protocol": "TCP"
        }
      ]
    }
  ]
}

So, as I told, I have just 2 pods relevant to the Hazelcast cluster. Hence I don't really understand what is the source IPs that TcpServerConnection is trying to connect. Looks like a bug.

Best regards,

leszko · 2021-04-09T09:50:30Z

It's hard to tell where Hazelcast tries to connect to. Some more questions:

If you follow exactly the code in this guide, do you experience the same?
If you deploy your application on some widely-used Kubernetes cluster (like Docker Desktop, or GKE, or EKS), do you experience the same?

tessob · 2021-04-09T11:17:50Z

Hi @leszko,

We have tested as in guide, in the same k8s cluster and namespace, and everything is OK. Logs:

########################################
# JAVA_OPTS=-Djava.net.preferIPv4Stack=true -Dhazelcast.logging.type=log4j2 -Dlog4j.configurationFile=/opt/hazelcast/log4j2.properties -XX:MaxRAMPercentage=80.0 -XX:MaxGCPauseMillis=5 --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED -Dhazelcast.config=/data/hazelcast/hazelcast.yaml -DserviceName=hazelcast -Dnamespace=XXXXXXXXX -Dhazelcast.shutdownhook.policy=GRACEFUL -Dhazelcast.shutdownhook.enabled=true -Dhazelcast.graceful.shutdown.max.wait=600 
# CLASSPATH=/opt/hazelcast/*:/opt/hazelcast/lib/*
# starting now....
########################################
+ exec java -server -Djava.net.preferIPv4Stack=true -Dhazelcast.logging.type=log4j2 -Dlog4j.configurationFile=/opt/hazelcast/log4j2.properties -XX:MaxRAMPercentage=80.0 -XX:MaxGCPauseMillis=5 --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED -Dhazelcast.config=/data/hazelcast/hazelcast.yaml -DserviceName=hazelcast -Dnamespace=XXXXXXXXX -Dhazelcast.shutdownhook.policy=GRACEFUL -Dhazelcast.shutdownhook.enabled=true -Dhazelcast.graceful.shutdown.max.wait=600 com.hazelcast.core.server.HazelcastMemberStarter
2021-04-09 10:39:31,214 [[32m INFO[m] [[36mmain[m] [[34mc.h.i.c.AbstractConfigLocator[m]: Loading configuration '/data/hazelcast/hazelcast.yaml' from System property 'hazelcast.config'
2021-04-09 10:39:31,219 [[32m INFO[m] [[36mmain[m] [[34mc.h.i.c.AbstractConfigLocator[m]: Using configuration file at /data/hazelcast/hazelcast.yaml
2021-04-09 10:39:31,360 [[32m INFO[m] [[36mmain[m] [[34mc.h.system[m]: [10.154.60.11]:5701 [dev] [4.2] Hazelcast 4.2 (20210324 - 405cfd1) starting at [10.154.60.11]:5701
2021-04-09 10:39:31,749 [[32m INFO[m] [[36mmain[m] [[34mc.h.s.d.i.DiscoveryService[m]: [10.154.60.11]:5701 [dev] [4.2] Kubernetes Discovery properties: { service-dns: null, service-dns-timeout: 5, service-name: hazelcast, service-port: 0, service-label: null, service-label-value: true, namespace: XXXXXXXXX, pod-label: null, pod-label-value: null, resolve-not-ready-addresses: true, use-node-name-as-external-address: false, kubernetes-api-retries: 3, kubernetes-master: https://kubernetes.default.svc}
2021-04-09 10:39:31,752 [[32m INFO[m] [[36mmain[m] [[34mc.h.s.d.i.DiscoveryService[m]: [10.154.60.11]:5701 [dev] [4.2] Kubernetes Discovery activated with mode: KUBERNETES_API
2021-04-09 10:39:31,807 [[32m INFO[m] [[36mmain[m] [[34mc.h.i.i.Node[m]: [10.154.60.11]:5701 [dev] [4.2] Using Discovery SPI
2021-04-09 10:39:31,810 [[33m WARN[m] [[36mmain[m] [[34mc.h.c.CPSubsystem[m]: [10.154.60.11]:5701 [dev] [4.2] CP Subsystem is not enabled. CP data structures will operate in UNSAFE mode! Please note that UNSAFE mode will not provide strong consistency guarantees.
2021-04-09 10:39:32,024 [[32m INFO[m] [[36mmain[m] [[34mc.h.i.d.Diagnostics[m]: [10.154.60.11]:5701 [dev] [4.2] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
2021-04-09 10:39:32,035 [[32m INFO[m] [[36mmain[m] [[34mc.h.c.LifecycleService[m]: [10.154.60.11]:5701 [dev] [4.2] [10.154.60.11]:5701 is STARTING
2021-04-09 10:39:32,448 [[32m INFO[m] [[36mmain[m] [[34mc.h.s.d.i.DiscoveryService[m]: [10.154.60.11]:5701 [dev] [4.2] Kubernetes plugin discovered availability zone: XXXXXXXXX
2021-04-09 10:39:32,500 [[32m INFO[m] [[36mmain[m] [[34mc.h.s.d.i.DiscoveryService[m]: [10.154.60.11]:5701 [dev] [4.2] Kubernetes plugin discovered node name: XXXXXXXXX.compute.internal
2021-04-09 10:39:32,616 [[33m WARN[m] [[36mmain[m] [[34mc.h.k.KubernetesClient[m]: Cannot fetch public IPs of Hazelcast Member PODs, you won't be able to use Hazelcast Smart Client from outside of the Kubernetes network
2021-04-09 10:39:37,600 [[32m INFO[m] [[36mmain[m] [[34mc.h.i.c.ClusterService[m]: [10.154.60.11]:5701 [dev] [4.2] 

Members {size:1, ver:1} [
	Member [10.154.60.11]:5701 - 6e43ac32-e6bf-4cfb-81e0-a865d6f986b1 this
]

2021-04-09 10:39:37,611 [[32m INFO[m] [[36mmain[m] [[34mc.h.c.LifecycleService[m]: [10.154.60.11]:5701 [dev] [4.2] [10.154.60.11]:5701 is STARTED
2021-04-09 10:40:22,454 [[32m INFO[m] [[36mhz.boring_ganguly.IO.thread-in-1[m] [[34mc.h.i.s.t.TcpServerConnection[m]: [10.154.60.11]:5701 [dev] [4.2] Initialized new cluster connection between /10.154.60.11:5701 and /10.154.107.219:57097
2021-04-09 10:40:28,455 [[32m INFO[m] [[36mhz.boring_ganguly.priority-generic-operation.thread-0[m] [[34mc.h.i.c.ClusterService[m]: [10.154.60.11]:5701 [dev] [4.2] 

Members {size:2, ver:2} [
	Member [10.154.60.11]:5701 - 6e43ac32-e6bf-4cfb-81e0-a865d6f986b1 this
	Member [10.154.107.219]:5701 - 3140ed16-8832-4e69-a875-d8da79b091eb
]

2021-04-09 10:40:56,850 [[32m INFO[m] [[36mhz.boring_ganguly.IO.thread-in-0[m] [[34mc.h.i.s.t.TcpServerConnection[m]: [10.154.60.11]:5701 [dev] [4.2] Initialized new cluster connection between /10.154.60.11:5701 and /10.154.101.83:44681
2021-04-09 10:41:02,834 [[32m INFO[m] [[36mhz.boring_ganguly.priority-generic-operation.thread-0[m] [[34mc.h.i.c.ClusterService[m]: [10.154.60.11]:5701 [dev] [4.2] 

Members {size:3, ver:3} [
	Member [10.154.60.11]:5701 - 6e43ac32-e6bf-4cfb-81e0-a865d6f986b1 this
	Member [10.154.107.219]:5701 - 3140ed16-8832-4e69-a875-d8da79b091eb
	Member [10.154.101.83]:5701 - 21669cbd-71fc-4a70-8727-b6538ddd3d23
]

We are in AWS EKS

In my dependencies I have only these two that can be related to Hazelcast:

    <dependency>
      <artifactId>spring-boot-starter-cache</artifactId>
      <groupId>org.springframework.boot</groupId>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.hazelcast/hazelcast-all -->
    <dependency>
      <groupId>com.hazelcast</groupId>
      <artifactId>hazelcast-all</artifactId>
      <version>4.2</version>
    </dependency>

My config is in very first message.

Thanks,

tessob · 2021-04-10T11:45:39Z

Short update,

I have tested my application in the minikube locally and everything is OK. So the root-cause of such behavior highly likely in K8s cluster and/or deployment manifests. However, for me still unclear why Hazelcast is trying to establish connection with nodes that are irrelevant to service-name.

leszko · 2021-04-14T09:27:00Z

It may be the phone home call. Could you try disabling phone home and check if it helps?

tessob · 2021-04-15T07:03:18Z

After I disabled it I still have the same behavior in logs, so there should be another cause.

fallthough · 2021-11-19T07:39:38Z

Hi @tessob , were you able to find out the cause? I'm currently experiencing the same issue. I'm also in AWS EKS

fatih-iver · 2022-12-24T17:40:11Z

I'm also experiencing the same problem and it is also in AWS EKS.

tessob · 2022-12-25T09:39:26Z

Hi @tessob , were you able to find out the cause? I'm currently experiencing the same issue. I'm also in AWS EKS

Sorry for A late reply ))
I have migrated to Infinispan.

AshishWat · 2023-12-13T06:08:32Z

Hey folks,

I am facing same issue .
We have java spring app, with embedded hazelcast, and run it in Kubernetes.
For member lookup we using DNS Lookup Discovery mode , member creation looks good but Hazelcast trying to connect some random IP and giving below warning.

java.lang.IllegalStateException: Unknown protocol: OPT
java.lang.IllegalStateException: TLS handshake header detected, but plain protocol header was expected.

@tessob Did you find the root cause of this? My issue is similar to what you have posted .

tessob · 2024-01-23T17:40:08Z

@AshishWat No, just forget about Hazelcast after code observation and moved to Infinispan. So far so good.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird behavior of TcpServerConnection #311

Weird behavior of TcpServerConnection #311

tessob commented Apr 8, 2021

alparslanavci commented Apr 8, 2021

tessob commented Apr 8, 2021

tessob commented Apr 9, 2021

leszko commented Apr 9, 2021

tessob commented Apr 9, 2021 •

edited

tessob commented Apr 10, 2021

leszko commented Apr 14, 2021

tessob commented Apr 15, 2021

fallthough commented Nov 19, 2021

fatih-iver commented Dec 24, 2022

tessob commented Dec 25, 2022

AshishWat commented Dec 13, 2023 •

edited

tessob commented Jan 23, 2024

Weird behavior of TcpServerConnection #311

Weird behavior of TcpServerConnection #311

Comments

tessob commented Apr 8, 2021

alparslanavci commented Apr 8, 2021

tessob commented Apr 8, 2021

tessob commented Apr 9, 2021

leszko commented Apr 9, 2021

tessob commented Apr 9, 2021 • edited

tessob commented Apr 10, 2021

leszko commented Apr 14, 2021

tessob commented Apr 15, 2021

fallthough commented Nov 19, 2021

fatih-iver commented Dec 24, 2022

tessob commented Dec 25, 2022

AshishWat commented Dec 13, 2023 • edited

tessob commented Jan 23, 2024

tessob commented Apr 9, 2021 •

edited

AshishWat commented Dec 13, 2023 •

edited