After intentional shutdown of a node of the cluster, the other nodes are still attempting to reconnect to the shutdown node #26319

federicasuriano · 2024-04-24T12:02:54Z

Bug
I encountered this issue regarding a particular behavior in the Hazelcasts cluster setup. After intentionally shutting down one of the cluster nodes, I noticed that the remaining nodes received the following logs:

WARN […] com.hazelcast.internal.server.tcp.TcpServerConnectionErrorHandler - Removing connection to endpoint […] Cause => java.io.IOException {Connection refused to address /[…]}, Error-Count: 5
WARN […] com.hazelcast.internal.cluster.impl.MembershipManager - […] Member […] is suspected to be dead for reason: No connection

The remaining nodes are detecting the failure of the shut-down node. However, despite the intentional shutdown, the other nodes are still attempting to reconnect to the shut-down node.
I tested it on Hazelcast version 5.3.2, 5.0.2 and 4.0.3 and it always produces the same logs.

Expected behavior
I expect that when a node is intentionally shutdown, the other nodes do not attempt to reconnect to the shutdown node.

How to reproduce
I created a test to reproduce the error.

package com.nm.test.hazelcast.shutdown;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertTrue;

import com.hazelcast.config.Config;
import com.hazelcast.config.TcpIpConfig;
import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.HazelcastInstance;
import com.hazelcast.internal.cluster.impl.MembershipManager;
import com.hazelcast.spi.properties.ClusterProperty;
import com.nm.test.hazelcast.utils.StoreLoggedEventsAppender;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;
import org.apache.log4j.Logger;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Disabled;
import org.junit.jupiter.api.Test;

// Test handling intentional node shutdown.
public class TestShutDown6 {

	private List<HazelcastInstance> instances;

	private final String[] targetLoggers = { "com.hazelcast.internal.server.tcp.TcpServerConnectionErrorHandler", MembershipManager.class.getName() };

	private StoreLoggedEventsAppender tcpServerConnectionErrorHandlerAppender;
	private StoreLoggedEventsAppender membershipManagerAppender;

	private Logger tcpServerConnectionErrorHandlerLogger;
	private Logger membershipManagerLogger;

	@BeforeEach
	public void setUp() throws Exception {

		// create individual appenders for target loggers
		tcpServerConnectionErrorHandlerAppender = new StoreLoggedEventsAppender();
		membershipManagerAppender = new StoreLoggedEventsAppender();

		// add appenders to the respective loggers
		tcpServerConnectionErrorHandlerLogger = Logger.getLogger(targetLoggers[0]);
		tcpServerConnectionErrorHandlerLogger.setAdditivity(true);
		tcpServerConnectionErrorHandlerLogger.addAppender(tcpServerConnectionErrorHandlerAppender);

		membershipManagerLogger = Logger.getLogger(targetLoggers[1]);
		membershipManagerLogger.setAdditivity(true);
		membershipManagerLogger.addAppender(membershipManagerAppender);

		instances = new ArrayList<>();
	}

	@AfterEach
	public void tearDown() {

		// remove the appenders
		tcpServerConnectionErrorHandlerLogger.removeAppender(tcpServerConnectionErrorHandlerAppender);
		membershipManagerLogger.removeAppender(membershipManagerAppender);

		// shutdown all Hazelcast instances
		for (HazelcastInstance instance : instances) {
			instance.getLifecycleService().terminate();
		}
	}

	@Test
	public void testNoReconnectAfterNode1Shutdown() throws InterruptedException {

		// create config and start a 2 node cluster
		configAndStart2NodeCluster();

		HazelcastInstance hcInstance1 = instances.get(0);
		HazelcastInstance hcInstance2 = instances.get(1);

		// shut down hcInstance1 intentionally
		hcInstance1.getLifecycleService().shutdown();

		// wait for some time to ensure any reconnection attempts would have happened
		TimeUnit.SECONDS.sleep(10);

		// ensure only one member remains in the cluster after shutting down hcInstance1
		assertEquals(1, hcInstance2.getCluster().getMembers().size());

		// ensure no reconnection attempts are made by hcInstance:
		// assert that there were no WARN messages from the target loggers
		assertTrue(tcpServerConnectionErrorHandlerAppender.getWarnLogs().isEmpty());
		assertTrue(membershipManagerAppender.getWarnLogs().isEmpty());
	}

	private void configAndStart2NodeCluster() {

		// create config
		Config config = new Config();

		// configure Log4j logging
		config.setProperty(ClusterProperty.LOGGING_TYPE.getName(), "log4j2");

		// enable TCP-IP config
		TcpIpConfig tcpIpConfig = config.getNetworkConfig().getJoin().getTcpIpConfig();
		tcpIpConfig.setEnabled(true);
		tcpIpConfig.setMembers(List.of("127.0.0.1"));

		HazelcastInstance hcInstance1 = Hazelcast.newHazelcastInstance(config);
		HazelcastInstance hcInstance2 = Hazelcast.newHazelcastInstance(config);

		instances.add(hcInstance1);
		instances.add(hcInstance2);
	}
}

package com.nm.test.hazelcast.utils;

import java.util.ArrayList;
import java.util.List;
import org.apache.log4j.AppenderSkeleton;
import org.apache.log4j.Level;
import org.apache.log4j.spi.LoggingEvent;

public class StoreLoggedEventsAppender extends AppenderSkeleton {

	private List<String> debugLogs = new ArrayList<>();

	private List<String> infoLogs = new ArrayList<>();

	private List<String> warnLogs = new ArrayList<>();

	private List<String> errorLogs = new ArrayList<>();

	@Override
	protected void append(LoggingEvent loggingEvent) {

		if (Level.DEBUG.equals(loggingEvent.getLevel())) {
			debugLogs.add(loggingEvent.getRenderedMessage());
		} else if (Level.INFO.equals(loggingEvent.getLevel())) {
			infoLogs.add(loggingEvent.getRenderedMessage());
		} else if (Level.WARN.equals(loggingEvent.getLevel())) {
			warnLogs.add(loggingEvent.getRenderedMessage());
		} else if (Level.ERROR.equals(loggingEvent.getLevel())) {
			errorLogs.add(loggingEvent.getRenderedMessage());
		}
	}

	@Override
	public void close() {
	}

	@Override
	public boolean requiresLayout() {
		return false;
	}

	public List<String> getDebugLogs() {
		return debugLogs;
	}

	public List<String> getInfoLogs() {
		return infoLogs;
	}

	public List<String> getWarnLogs() {
		return warnLogs;
	}

	public List<String> getErrorLogs() {
		return errorLogs;
	}
}

The text was updated successfully, but these errors were encountered:

federicasuriano added the Type: Defect label Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After intentional shutdown of a node of the cluster, the other nodes are still attempting to reconnect to the shutdown node #26319

After intentional shutdown of a node of the cluster, the other nodes are still attempting to reconnect to the shutdown node #26319

federicasuriano commented Apr 24, 2024

After intentional shutdown of a node of the cluster, the other nodes are still attempting to reconnect to the shutdown node #26319

After intentional shutdown of a node of the cluster, the other nodes are still attempting to reconnect to the shutdown node #26319

Comments

federicasuriano commented Apr 24, 2024