Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FORWARD-PORT 4.0.z] Introduce NODE_AWARE partitioning group type #17913

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -77,20 +77,37 @@
* <p>
* You can define as many <code>member-group</code>s as you want. Hazelcast will always store backups in a different
* member-group to the primary partition.
*
* <h1>Zone Aware Partition Groups</h1>
* In this scheme, groups are allocated according to the metadata provided by Discovery SPI
* These metadata are availability zone, rack and host. The backups of the partitions are not
* placed on the same group so this is very useful for ensuring partitions are placed on
* different availability zones without providing the IP addresses to the config ahead.
* <code>
* <pre>
* &lt;partition-group enabled="true" group-type="ZONE_AWARE"/&gt;
* </pre>
* </code>
*
* <h1>Zone Aware Partition Groups</h1>
* In this scheme, groups are allocated according to the metadata provided by Discovery SPI Partitions are not
* written to the same group. This is very useful for ensuring partitions are written to availability
* zones or different racks without providing the IP addresses to the config ahead.
* <h1>Node Aware Partition Groups</h1>
* In this scheme, groups are allocated according to node name metadata provided by Discovery SPI.
* For container orchestration tools like Kubernetes and Docker Swarm, node is the term used to refer
* machine that containers/pods run on. A node may be a virtual or physical machine.
* The backups of the partitions are not placed on same group so this is very useful for ensuring partitions
* are placed on different nodes without providing the IP addresses to the config ahead.
*
* <code>
* <pre>
* &lt;partition-group enabled="true" group-type="SPI"/&gt;
* &lt;partition-group enabled="true" group-type="NODE_AWARE"/&gt;
* </pre>
* </code>
*
* <h1>SPI Aware Partition Groups</h1>
* In this scheme, groups are allocated according to the implementation provided by Discovery SPI.
* <code>
* <pre>
* &lt;partition-group enabled="true" group-type="SPI"/&gt;
* </pre>
*
* <h2>Overlapping Groups</h2>
* Care should be taken when selecting overlapping groups, e.g.
Expand Down Expand Up @@ -139,6 +156,11 @@ public enum MemberGroupType {
* If only one zone is available, backups will be created in the same zone.
*/
ZONE_AWARE,
/**
* Node Aware. Backups will be created in other nodes.
* If only one node is available, backups will be created in the same node.
*/
NODE_AWARE,
/**
* MemberGroup implementation will be provided by the user via Discovery SPI.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ public static MemberGroupFactory newMemberGroupFactory(PartitionGroupConfig part
return new SingleMemberGroupFactory();
case ZONE_AWARE:
return new ZoneAwareMemberGroupFactory();
case NODE_AWARE:
return new NodeAwareMemberGroupFactory();
case SPI:
return new SPIAwareMemberGroupFactory(discoveryService);
default:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
/*
* Copyright (c) 2008-2020, Hazelcast, Inc. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.hazelcast.internal.partition.membergroup;

import com.hazelcast.cluster.Member;
import com.hazelcast.spi.discovery.DiscoveryStrategy;
import com.hazelcast.spi.partitiongroup.MemberGroup;
import com.hazelcast.spi.partitiongroup.PartitionGroupMetaData;

import java.util.Collection;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;

import static com.hazelcast.internal.util.MapUtil.createHashMap;

/**
* NodeAwareMemberGroupFactory is responsible for MemberGroups
* creation according to name of the node metadata. For container orchestration
* tools like Kubernetes and Docker Swarm, node is the term used to refer
* machine that containers/pods run on. A node may be a virtual or physical machine.
* Node name metadata provided by
* {@link DiscoveryStrategy#discoverLocalMetadata()}
*/
public class NodeAwareMemberGroupFactory extends BackupSafeMemberGroupFactory implements MemberGroupFactory {

@Override
protected Set<MemberGroup> createInternalMemberGroups(Collection<? extends Member> allMembers) {
Map<String, MemberGroup> groups = createHashMap(allMembers.size());
for (Member member : allMembers) {
final String nodeInfo = member.getAttribute(PartitionGroupMetaData.PARTITION_GROUP_NODE);
if (nodeInfo == null) {
throw new IllegalArgumentException("Not enough metadata information is provided. "
+ "Node name information must be provided with NODE_AWARE partition group.");
}
MemberGroup group = groups.get(nodeInfo);
if (group == null) {
group = new DefaultMemberGroup();
groups.put(nodeInfo, group);
}
group.addMember(member);
}
return new HashSet<>(groups.values());
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,12 @@
import com.hazelcast.spi.partitiongroup.PartitionGroupMetaData;

import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;

import static com.hazelcast.internal.util.MapUtil.createHashMap;

/**
* ZoneAwareMemberGroupFactory is responsible for MemberGroups
* creation according to the host metadata provided by
Expand All @@ -37,7 +38,7 @@ public class ZoneAwareMemberGroupFactory extends BackupSafeMemberGroupFactory im

@Override
protected Set<MemberGroup> createInternalMemberGroups(Collection<? extends Member> allMembers) {
Map<String, MemberGroup> groups = new HashMap<String, MemberGroup>();
Map<String, MemberGroup> groups = createHashMap(allMembers.size());
for (Member member : allMembers) {

final String zoneInfo = member.getAttribute(PartitionGroupMetaData.PARTITION_GROUP_ZONE);
Expand Down Expand Up @@ -77,6 +78,6 @@ protected Set<MemberGroup> createInternalMemberGroups(Collection<? extends Membe
}
}
}
return new HashSet<MemberGroup>(groups.values());
return new HashSet<>(groups.values());
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ public interface DiscoveryStrategy {

/**
* Returns a map with discovered metadata provided by the runtime environment. Those information
* may include, but are not limited, to location information like datacenter, rack, host or additional
* tags to be used for custom purpose.
* may include, but are not limited, to location information like datacenter, rack, host,
* node name or additional tags to be used for custom purpose.
* <p>
* Information discovered from this method are shaded into the {@link Member}s
* attributes. Existing attributes will not be overridden, that way local attribute configuration
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,19 @@

/**
* This class contains the definition of known Discovery SPI metadata to support automatic
* generation of zone aware backup strategies based on cloud or service discovery provided
* information. These information are split into three different levels of granularity:
* generation of zone aware and node aware backup strategies.
*
* Zone aware backup strategies are based on cloud or service discovery provided information.
* These information are split into three different levels of granularity:
* <ul>
* <li><b>Zone:</b> A low-latency link between (virtual) data centers in the same area</li>
* <li><b>Rack:</b> A low-latency link inside the same data center but for different racks</li>
* <li><b>Host:</b> A low-latency link on a shared physical node, in case of virtualization being used</li>
* </ul>
*
* Node aware backup strategy is based on name of the node which is provided by container orchestration tool.
* like Kubernetes, Docker Swarm and ECS. A node is the term used to refer machine that containers/pods run on.
* A node may be a virtual or physical machine.
*/
public enum PartitionGroupMetaData {
;
Expand All @@ -43,4 +49,10 @@ public enum PartitionGroupMetaData {
* Metadata key definition for a low-latency link on a shared physical node, in case of virtualization being used
*/
public static final String PARTITION_GROUP_HOST = "hazelcast.partition.group.host";

/**
* Metadata key definition for a node machine that containers/pods run on,
* in case of container orchestration tools being used.
*/
public static final String PARTITION_GROUP_NODE = "hazelcast.partition.group.node";
}
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,16 @@
/**
* <p>A <code>PartitionGroupStrategy</code> implementation defines a strategy
* how backup groups are designed. Backup groups are units containing
* one or more Hazelcast nodes to share the same physical host, rack or
* one or more Hazelcast nodes to share the same physical host/node, rack or
* zone and backups are stored on nodes being part of a different
* backup group. This behavior builds an additional layer of data
* reliability by making sure that, in case of two racks, if rack A
* fails, rack B will still have all the backups and is guaranteed
* to still provide all data. Similar is true for zones or physical hosts.</p>
* reliability by making sure that, in case of two zones, if zone A
* fails, zone B will still have all the backups and is guaranteed
* to still provide all data. Similar is true for nodes or physical hosts.</p>
* <p>Custom implementations of the PartitionGroupStrategy may add specific
* or additional behavior based on the provided environment and can
* be injected into Hazelcast by overriding
* {@link AbstractDiscoveryStrategy#getPartitionGroupStrategy()}.
* {@link AbstractDiscoveryStrategy#getPartitionGroupStrategy()}. </p>
*/
@FunctionalInterface
public interface PartitionGroupStrategy {
Expand Down
1 change: 1 addition & 0 deletions hazelcast/src/main/resources/hazelcast-config-4.0.xsd
Original file line number Diff line number Diff line change
Expand Up @@ -2549,6 +2549,7 @@
<xs:enumeration value="CUSTOM"/>
<xs:enumeration value="PER_MEMBER"/>
<xs:enumeration value="ZONE_AWARE"/>
<xs:enumeration value="NODE_AWARE"/>
<xs:enumeration value="SPI"/>
</xs:restriction>
</xs:simpleType>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,9 @@ public abstract class AbstractConfigBuilderTest extends HazelcastTestSupport {
@Test
public abstract void testPartitionGroupZoneAware();

@Test
public abstract void testPartitionGroupNodeAware();

@Test
public abstract void testPartitionGroupSPI();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1114,6 +1114,19 @@ public void testPartitionGroupZoneAware() {
assertEquals(PartitionGroupConfig.MemberGroupType.ZONE_AWARE, partitionGroupConfig.getGroupType());
}

@Override
@Test
public void testPartitionGroupNodeAware() {
String xml = HAZELCAST_START_TAG
+ "<partition-group enabled=\"true\" group-type=\"NODE_AWARE\" />"
+ HAZELCAST_END_TAG;

Config config = buildConfig(xml);
PartitionGroupConfig partitionGroupConfig = config.getPartitionGroupConfig();
assertTrue(partitionGroupConfig.isEnabled());
assertEquals(PartitionGroupConfig.MemberGroupType.NODE_AWARE, partitionGroupConfig.getGroupType());
}

@Override
@Test
public void testPartitionGroupSPI() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1122,6 +1122,21 @@ public void testPartitionGroupZoneAware() {
assertEquals(PartitionGroupConfig.MemberGroupType.ZONE_AWARE, partitionGroupConfig.getGroupType());
}

@Override
@Test
public void testPartitionGroupNodeAware() {
String yaml = ""
+ "hazelcast:\n"
+ " partition-group:\n"
+ " enabled: true\n"
+ " group-type: NODE_AWARE\n";

Config config = buildConfig(yaml);
PartitionGroupConfig partitionGroupConfig = config.getPartitionGroupConfig();
assertTrue(partitionGroupConfig.isEnabled());
assertEquals(PartitionGroupConfig.MemberGroupType.NODE_AWARE, partitionGroupConfig.getGroupType());
}

@Override
@Test
public void testPartitionGroupSPI() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,47 @@ private Collection<Member> createMembersWithHostAwareMetadata() {
return members;
}

@Test
public void testNodeMetadataAwareMemberGroupFactoryCreateMemberGroups() {
MemberGroupFactory groupFactory = new NodeAwareMemberGroupFactory();
Collection<Member> members = createMembersWithNodeAwareMetadata();
Collection<MemberGroup> memberGroups = groupFactory.createMemberGroups(members);

assertEquals("Member Groups: " + String.valueOf(memberGroups), 3, memberGroups.size());
for (MemberGroup memberGroup : memberGroups) {
assertEquals("Member Group: " + String.valueOf(memberGroup), 1, memberGroup.size());
}
}

private Collection<Member> createMembersWithNodeAwareMetadata() {
Collection<Member> members = new HashSet<Member>();
MemberImpl member1 = new MemberImpl(new Address("192.192.0.1", fakeAddress, 5701), VERSION, true);
member1.setAttribute(PartitionGroupMetaData.PARTITION_GROUP_NODE, "kubernetes-node-f0bbd602-f7cw");

MemberImpl member2 = new MemberImpl(new Address("192.192.0.2", fakeAddress, 5701), VERSION, true);
member2.setAttribute(PartitionGroupMetaData.PARTITION_GROUP_NODE, "kubernetes-node-f0bbd602-hgdl");

MemberImpl member3 = new MemberImpl(new Address("192.192.0.3", fakeAddress, 5701), VERSION, true);
member3.setAttribute(PartitionGroupMetaData.PARTITION_GROUP_NODE, "kubernetes-node-f0bbd602-0zjs");

members.add(member1);
members.add(member2);
members.add(member3);
return members;
}

@Test(expected = IllegalArgumentException.class)
public void testNodeAwareMemberGroupFactoryThrowsIllegalArgumentExceptionWhenNoMetadataIsProvided() {
MemberGroupFactory groupFactory = new NodeAwareMemberGroupFactory();
Collection<Member> members = createMembersWithNoMetadata();
Collection<MemberGroup> memberGroups = groupFactory.createMemberGroups(members);

assertEquals("Member Groups: " + String.valueOf(memberGroups), 3, memberGroups.size());
for (MemberGroup memberGroup : memberGroups) {
assertEquals("Member Group: " + String.valueOf(memberGroup), 1, memberGroup.size());
}
}

/**
* When there is a matching {@link MemberGroupConfig} for a {@link Member}, it will be assigned to a {@link MemberGroup}.
* <p>
Expand Down