Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hazelcast csharp client heartbeat problem #170

Closed
simpleusr opened this issue Jun 18, 2018 · 4 comments · Fixed by #181
Closed

Hazelcast csharp client heartbeat problem #170

simpleusr opened this issue Jun 18, 2018 · 4 comments · Fixed by #181
Assignees
Labels
Priority: Critical Critical Priority. Priority: High High Priority.
Milestone

Comments

@simpleusr
Copy link

Hi,

This question was previously asked at:

https://groups.google.com/forum/#!topic/hazelcast/rImP_5SRnYk

Since we do not have any feedback I am repeating the same question..

We are trying to use hazelcast messaging system via topic.

We have client - server mode usage . Our hazelcast verison is 3.9. Producer application is java based and we have c# client topic listeners.

Our throughput is moderately high about 20 publishes per second.

The problem we are facing is that after a certain time c# client application starts, cluster member closes the connection to client due to heartbeat failure :

2018-06-13 07:57:22,674 : [INFO ] [hz._hzInstance_1_dev.async.thread-24][AuthenticationMessageTask] [hzlhost]:9000 [dev] [3.9] Received auth from Connection[id=26392, /hzlhostip:9000->/clientip:50004, endpoint=null, alive
=true, type=CSHARP_CLIENT], successfully authenticated, principal: ClientPrincipal{uuid='3c924baf-6e0c-49fe-8151-e541e835704b', ownerUuid='9f253cd0-8115-4a0d-90d8-f3c752090612'}, owner connection: true, client version: 3.9

2018-06-13 08:30:38,530 : [INFO ] [hz._hzInstance_1_dev.cached.thread-2][TcpIpConnection] [hzlhost]:9000 [dev] [3.9] Connection[id=26392, /hzlhostip:9000->clientip/clientip:50004, endpoint=[clientip]:50004, aliv
e=false, type=CSHARP_CLIENT] closed. Reason: Client heartbeat is timed out, closing connection to Connection[id=26392, /hzlhostip:9000->clientip/clientip:50004, endpoint=[clientip]:50004, alive=true, type=CSHARP_CLI
ENT]. Now: 2018-06-13 08:30:38.530. LastTimePacketReceived: 2018-06-13 08:25:31.445

2018-06-13 08:30:38,531 : [INFO ] [hz._hzInstance_1_dev.event-5][ClientEndpointManager] [hzlhost]:9000 [dev] [3.9] Destroying ClientEndpoint{connection=Connection[id=26392, 0.0.0.0/0.0.0.0:9000->clientip/clientip:50004,
endpoint=[clientip]:50004, alive=false, type=CSHARP_CLIENT], principal='ClientPrincipal{uuid='3c924baf-6e0c-49fe-8151-e541e835704b', ownerUuid='9f253cd0-8115-4a0d-90d8-f3c752090612'}, firstConnection=true, authenticated=true, c
lientVersion=3.9, creationTime=1528865842673, latest statistics=null}

Please note that we are using default values for hazelcast.client.heartbeat.interval, hazelcast.client.heartbeat.timeout configuration.

What is very strange is that, when I switch to java client from same client host this problem does NOT occur...

We see the initial connection and this does not get closed afterwards like c# client..

2018-06-13 08:42:12,548 : [INFO ] [hz._hzInstance_1_dev.async.thread-15][AuthenticationMessageTask] [hzlhost]:9000 [dev] [3.9] Received auth from Connection[id=26509, /hzlhostip:9000->/clientip:51391, endpoint=null, alive
=true, type=JAVA_CLIENT], successfully authenticated, principal: ClientPrincipal{uuid='5d3b8b5e-c77d-47d9-a6dc-3ba4f722c66a', ownerUuid='9f253cd0-8115-4a0d-90d8-f3c752090612'}, owner connection: true, client version: 3.9

The client code is very simple. We just log the incoming message. We could not find out why we have problem in c# client but not in java client from the same host..

As far as I understand , heartbeat management from c# client is done in this class:

https://github.com/hazelcast/hazelcast-csharp-client/blob/master/Hazelcast.Net/Hazelcast.Client.Connection/ClientConnectionManager.cs

We implemented hazelcast logging interface and do not see any warnings like "Heartbeat failed to connection" and etc.

Any help is appreciated...

@asimarslan
Copy link
Contributor

Hi @simpleusr ,

Is there a simple unit test or demo app that I can reproduce the bug locally.

@simpleusr
Copy link
Author

Hi @asimarslan ,

Thanks for the interest.

I will also try to reproduce in local server but the config is simple. We have two node hazelcast cluster running on ibm aix server . Hazelcast version is 3.9.

We have .net client configured as follows:

using Hazelcast.Client;
using Hazelcast.Config;
using System;
using System.Collections.Generic;
using System.Configuration;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace DotNetClient
{
    public class Helper
    {
        private static readonly log4net.ILog log =
            log4net.LogManager.GetLogger(System.Reflection.MethodBase.GetCurrentMethod().DeclaringType);

        public static void Configure()
        {
            try
            {
                log.Info("CONFIGURATION STARTED ");

                var clientConfig = new ClientConfig();

                var groupName = ConfigurationManager.AppSettings["groupName"];

                var groupPassword = ConfigurationManager.AppSettings["groupPassword"];

                log.Info("Using groupName: " + groupName + " groupPassword: " + groupPassword);

                clientConfig.SetGroupConfig(new GroupConfig(groupName, groupPassword));

                string memberAddresses = ConfigurationManager.AppSettings["memberAddresses"];

                string[] adresses = memberAddresses.Split(',');

                foreach (var adress in adresses)
                {
                    log.Info("Adding address " + adress);
                    clientConfig.GetNetworkConfig().AddAddress(adress);
                }

                //SocketOptions options = clientConfig.GetNetworkConfig().GetSocketOptions();

                //options.SetBufferSize(128);
                //options.SetKeepAlive(true);
                //options.SetLingerSeconds(3);
                //options.SetReuseAddress(true);
                //options.SetTcpNoDelay(true);


                //log.Info("options " + options);


                clientConfig.GetSerializationConfig()
                  .AddPortableFactory(MyPortableFactory.FactoryId, new MyPortableFactory());



                var client = HazelcastClient.NewHazelcastClient(clientConfig);

                var topicName = ConfigurationManager.AppSettings["topicName"];

                log.Info("Using topicName: " + topicName );

                var topic = client.GetTopic<PriceNotification>(topicName);

                topic.AddMessageListener(new TopicListener());
            }
            catch (Exception e)
            {
                log.Error("Error in HazelcastConfig", e);
                throw;
            }
        }
    }
}

The listener class is as follows:

using Hazelcast.Core;
using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace DotNetClient
{
    public class TopicListener : IMessageListener<PriceNotification>
    {
        private static readonly log4net.ILog log =
          log4net.LogManager.GetLogger(System.Reflection.MethodBase.GetCurrentMethod().DeclaringType);


        public void OnMessage(Message<PriceNotification> message)
        {
            //var obj = message.GetMessageObject();
            //log.Info("recevied at : " + GetFormattedCurrentTimeStamp() + " obj " + obj.ToString());

            var consumer = Task.Factory.StartNew(() =>
            {

                var obj = message.GetMessageObject();
                log.Info("recevied at : " + GetFormattedCurrentTimeStamp() + " obj " + obj.ToString());
            });

        }

        private static readonly DateTime Jan1st1970 = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);

        public static long CurrentTimeMillis()
        {
            return (long)(DateTime.UtcNow - Jan1st1970).TotalMilliseconds;
        }

        public static String GetFormattedCurrentTimeStamp()
        {
            return DateTime.Now.ToString("HH:mm:ss:fff");
        }
    }
}

As I said I will also try to locally reproduce the issue. But please note once again that we do not have this problem in the very similar java client...

@simpleusr
Copy link
Author

Hi @asimarslan ,

I managed to create a reproducer for the problem. You can find the repo at

https://github.com/simpleusr/hazelcastcsharpclientproblem

Here are the steps to reproduce:

  1. Start member1 (clusterandjavaproducer\hazelcastproblemdotnet\member\com.mycompany.member.MemberAppMain) with argument hazelcast-dev1.xml .

  2. Start member2 (clusterandjavaproducer\hazelcastproblemdotnet\member\com.mycompany.member.MemberAppMain) with argument hazelcast-dev2.xml .

At this point you have a cluster in local host with two members.

  1. Start java producer app (clusterandjavaproducer\hazelcastproblemdotnet\client\com.mycompany.client.ClientAppMain)

This app continuously publishes TopicData to hazelcastSampleTopic ..

  1. Start c# client app

(dotnetclienttopiclistener\HazelcastDotNetProblem\Program)

This is the sample client app listening TopicData messages from hazelcastSampleTopic .

I reproduced the scenario several times and here is what is happening.

Cluster Member logs:

2018-06-19 13:42:26,470 : [INFO ] [hz._hzInstance_1_dev.IO.thread-Acceptor][TcpIpAcceptor] [localhost]:9001 [dev] [3.9] Accepting socket connection from /127.0.0.1:54182
2018-06-19 13:42:26,470 : [INFO ] [hz._hzInstance_1_dev.cached.thread-6][TcpIpConnectionManager] [localhost]:9001 [dev] [3.9] Established socket connection between /127.0.0.1:9001 and /127.0.0.1:54182
2018-06-19 13:42:26,485 : [INFO ] [hz._hzInstance_1_dev.async.thread-3][AuthenticationMessageTask] [localhost]:9001 [dev] [3.9] Received auth from Connection[id=3, /127.0.0.1:9001->/127.0.0.1:54182, endpoint=null, alive=true, type=CSHARP_CLIENT], successfully authenticated, principal: ClientPrincipal{uuid='d8ee1de7-d95c-4373-a1f9-479f47e46895', ownerUuid='4df2aee7-2256-453b-b82e-6fe037f72186'}, owner connection: true, client version: 3.9
2018-06-19 13:47:40,348 : [INFO ] [hz._hzInstance_1_dev.cached.thread-1][TcpIpConnection] [localhost]:9001 [dev] [3.9] Connection[id=3, /127.0.0.1:9001->/127.0.0.1:54182, endpoint=[127.0.0.1]:54182, alive=false, type=CSHARP_CLIENT] closed. Reason: Client heartbeat is timed out, closing connection to Connection[id=3, /127.0.0.1:9001->/127.0.0.1:54182, endpoint=[127.0.0.1]:54182, alive=true, type=CSHARP_CLIENT]. Now: 2018-06-19 13:47:40.347. LastTimePacketReceived: 2018-06-19 13:42:36.526
2018-06-19 13:47:40,351 : [INFO ] [hz._hzInstance_1_dev.event-3][ClientEndpointManager] [localhost]:9001 [dev] [3.9] Destroying ClientEndpoint{connection=Connection[id=3, 0.0.0.0/0.0.0.0:9001->/127.0.0.1:54182, endpoint=[127.0.0.1]:54182, alive=false, type=CSHARP_CLIENT], principal='ClientPrincipal{uuid='d8ee1de7-d95c-4373-a1f9-479f47e46895', ownerUuid='4df2aee7-2256-453b-b82e-6fe037f72186'}, firstConnection=true, authenticated=true, clientVersion=3.9, creationTime=1529404946482, latest statistics=null}
2018-06-19 13:47:40,382 : [INFO ] [hz._hzInstance_1_dev.IO.thread-Acceptor][TcpIpAcceptor] [localhost]:9000 [dev] [3.9] Accepting socket connection from /127.0.0.1:54312
2018-06-19 13:47:40,383 : [INFO ] [hz._hzInstance_1_dev.cached.thread-3][TcpIpConnectionManager] [localhost]:9000 [dev] [3.9] Established socket connection between /127.0.0.1:9000 and /127.0.0.1:54312
2018-06-19 13:47:40,386 : [INFO ] [hz._hzInstance_1_dev.async.thread-2][AuthenticationMessageTask] [localhost]:9000 [dev] [3.9] Received auth from Connection[id=3, /127.0.0.1:9000->/127.0.0.1:54312, endpoint=null, alive=true, type=CSHARP_CLIENT], successfully authenticated, principal: ClientPrincipal{uuid='d8ee1de7-d95c-4373-a1f9-479f47e46895', ownerUuid='453d12c8-b4d7-4064-afef-c5405d974860'}, owner connection: true, client version: 3.9

C# client logs:

2018-06-19 13:42:26,508 [hz-striped-scheduler-2] INFO HazelcastDotNetProblem.CustomLogFactory+CustomLogger -

Members [2] {
IMember [localhost]:9000
IMember [localhost]:9001
}
..........
2018-06-19 13:42:26,510 [8] INFO HazelcastDotNetProblem.Program - Using topicName: hazelcastSampleTopic
2018-06-19 13:42:26,511 [18] INFO HazelcastDotNetProblem.CustomLogFactory+CustomLogger - Updating partition list.
2018-06-19 13:42:26,517 [18] INFO HazelcastDotNetProblem.CustomLogFactory+CustomLogger - Partition list updated
2018-06-19 13:47:40,285 [hz-striped-scheduler-1] INFO HazelcastDotNetProblem.TopicListener - recevied at : 13:47:40:285 obj TopicData [Data=data_1529404921559]
2018-06-19 13:47:40,350 [29] INFO HazelcastDotNetProblem.CustomLogFactory+CustomLogger - Got exception in connection Connection[0][IMember [localhost]:9001 -> 127.0.0.1:54182]:
Hazelcast.Client.Spi.TargetDisconnectedException: Target[Address[localhost]:9001] disconnected, Socket was closed.
2018-06-19 13:47:40,352 [29] INFO HazelcastDotNetProblem.CustomLogFactory+CustomLogger - Destroying connection Connection[0][IMember [localhost]:9001 -> 127.0.0.1:54182] due to Target[Address[localhost]:9001] disconnected, Socket was closed.
2018-06-19 13:47:40,352 [29] INFO HazelcastDotNetProblem.CustomLogFactory+CustomLogger - Closing socket, address: Address[localhost]:9001 id: 0
2018-06-19 13:47:40,359 [29] INFO HazelcastDotNetProblem.CustomLogFactory+CustomLogger - Cleaning up connection resources for 0
2018-06-19 13:47:40,368 [29] INFO HazelcastDotNetProblem.CustomLogFactory+CustomLogger - Re-registering listener for ClientMessage{length=47, correlationId=6, messageType=402, partitionId=196, isComplete=True, isRetryable=False, isEvent=False, writeOffset=0}
2018-06-19 13:47:40,368 [27] INFO HazelcastDotNetProblem.CustomLogFactory+CustomLogger - HazelcastClient[hz.client_0_dev] is ClientDisconnected
2018-06-19 13:47:40,381 [27] INFO HazelcastDotNetProblem.CustomLogFactory+CustomLogger - Trying to connect to Address[127.0.0.1]:9000

As you can see,

Around 13:42:26 client successfully connects to cluster. Later on around 13:47:40 cluster closes socket due to heartbeat faılure and client reconnects again.

What is worse is that, we sometimes see that topic listening completely stops.

I think this is the similar issue pointed in:

#153

Please let me know if you have any problems in reproducing..

Regards

@asimarslan asimarslan added this to the 3.10 milestone Jun 19, 2018
@asimarslan asimarslan added this to To do in 3.10 via automation Jun 19, 2018
@asimarslan asimarslan added the Type: Defect Issue is a Defect ie a Bug that needs to be fixed. label Jun 19, 2018
@asimarslan asimarslan self-assigned this Jun 19, 2018
@asimarslan asimarslan added Priority: Critical Critical Priority. Estimation: M Priority: High High Priority. and removed Type: Defect Issue is a Defect ie a Bug that needs to be fixed. labels Jun 19, 2018
@asimarslan
Copy link
Contributor

Same as this java client bug: hazelcast/hazelcast#13576

@asimarslan asimarslan changed the title Hazelcast csharp client heartbeat problem (no problem in java client) Hazelcast csharp client heartbeat problem Aug 14, 2018
@asimarslan asimarslan moved this from To do to In progress in 3.10 Aug 31, 2018
3.10 automation moved this from In progress to Done Sep 4, 2018
@asimarslan asimarslan modified the milestones: 3.10, 3.9.4 Sep 25, 2018
@asimarslan asimarslan added this to IN PROGRESS in 3.9.4 Sep 25, 2018
@asimarslan asimarslan moved this from IN PROGRESS to DONE in 3.9.4 Sep 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Critical Critical Priority. Priority: High High Priority.
Projects
No open projects
3.10
  
Done
3.9.4
DONE
Development

Successfully merging a pull request may close this issue.

2 participants