Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce NODE_AWARE partitioning group type #17889

Merged

Conversation

hasancelik
Copy link
Contributor

@hasancelik hasancelik commented Nov 24, 2020

For Kubernetes based environments, user might want to store their backups at another Kubernetes node. They can easily decide which pod will be running on which node via defining affinity or node-selector. To satisfy these kind of specific requirements, NODE_AWARE partition group type is introduced with this PR. Newly created NodeAwareMemberGroupFactory class is just simplified version of ZoneAwareMemberGroupFactory actually. Here is the related PRD.

These changes will be forward-port to 4.0.z, 4.1.z and master branches.

Todo:

For Kubernetes based environments, user might want to store their backups at another Kubernetes node.
They can easily decide which pod will be runing on which node via defining labels or node-selector.
To satisfy these kind of specific requirements, NODE_AWARE partition group type is introduced.
Newly created NodeAwareMemberGroupFactory class is just simplified version of ZoneAwareMemberGroupFactory.
Copy link

@leszko leszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Added some (minor) comments, mainly about limiting this feature to Kubernetes. Other than that it looks, but please add also someone from the Core team to reviewers.

@@ -83,24 +83,34 @@
* <p>
* You can define as many <code>member-group</code>s as you want. Hazelcast will always store backups in a different
* member-group to the primary partition.
*
* <h1>Zone Aware Partition Groups</h1>
* In this scheme, groups are allocated according to the metadata provided by Discovery SPI Partitions are not
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groups are allocated according to the metadata provided by Discovery SPI Partitions are not written to the same group., could you re-write this sentence? Looks like some words are missing.

* <h1>Zone Aware Partition Groups</h1>
* In this scheme, groups are allocated according to the metadata provided by Discovery SPI Partitions are not
* written to the same group. This is very useful for ensuring partitions are written to availability
* zones or different racks without providing the IP addresses to the config ahead.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

racks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I did not write this part but as far as I can understand, rack is jclouds specific term and rack metadata is provided by jclouds and can be passed like this . Giving rack as a sample usage is not reasonable so I will remove it and use only zone.

* In this scheme, groups are allocated according to the metadata provided by Discovery SPI Partitions are not
* written to the same group. This is very useful for ensuring partitions are written to availability
* zones or different racks without providing the IP addresses to the config ahead.
* written to the same group. This is very useful for ensuring partitions are written to different Kubernetes nodes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should mention Kubernetes here? The mechanism is generic and it may be applied to some other orchestration solutions, like Docker Swarm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I thought so at first, but I couldn't quite decide we should talk about other tools. When I think twice, you are right, we should mention other tools like Docker Swarm and ECS 👍

@@ -151,6 +161,11 @@
* If only one zone is available, backups will be created in the same zone.
*/
ZONE_AWARE,
/**
* Node Aware. Backups will be created in other Kubernetes nodes/physical machines.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same, I don't think this feature is related only to Kubernetes.

* NodeAwareMemberGroupFactory is responsible for MemberGroups
* creation according to the Kubernetes Node metadata provided by
* {@link DiscoveryStrategy#discoverLocalMetadata()}
* @since 3.7
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just remove this @since. If you want to keep it, please add the correct version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy paste error :) fixed 👍

final String nodeInfo = member.getStringAttribute(PartitionGroupMetaData.PARTITION_GROUP_NODE);
if (nodeInfo == null) {
throw new IllegalArgumentException("Not enough metadata information is provided. "
+ "Kubernetes node name information must be provided with NODE_AWARE partition group.");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same, we'd better not limit it to Kubernetes

}
group.addMember(member);
}
return new HashSet<MemberGroup>(groups.values());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return new HashSet<MemberGroup>(groups.values());
return new HashSet<>(groups.values());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have converted this line into diamond operator but PR builder fails with below error so I reverted it:

error: diamond operator is not supported in -source 1.6

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, right, I forgot that it's a PR to 3.12.x and we need to support java 1.6. Then, it's fine. Leave it as it is.

* may include, but are not limited, to location information like datacenter, rack, host or additional
* tags to be used for custom purpose.
* may include, but are not limited, to location information like datacenter, rack, host,
* Kubernetes node name or additional tags to be used for custom purpose.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same, I'd avoid limiting this feature to Kubernetes.

* may include, but are not limited, to location information like datacenter, rack, host or additional
* tags to be used for custom purpose.
* may include, but are not limited, to location information like datacenter, rack, host,
* Kubernetes node name or additional tags to be used for custom purpose.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same, I'd avoid limiting this feature to Kubernetes

@@ -18,13 +18,17 @@

/**
* This class contains the definition of known Discovery SPI metadata to support automatic
* generation of zone aware backup strategies based on cloud or service discovery provided
* information. These information are split into three different levels of granularity:
* generation of zone aware and Kubernetes node aware backup strategies.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same

@blazember blazember self-requested a review November 26, 2020 11:02
@@ -18,13 +18,19 @@

/**
* This class contains the definition of known Discovery SPI metadata to support automatic
* generation of zone aware backup strategies based on cloud or service discovery provided
* information. These information are split into three different levels of granularity:
* generation of zone aware and Kubernetes node aware backup strategies.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* generation of zone aware and Kubernetes node aware backup strategies.
* generation of zone aware and node aware backup strategies.

@@ -22,7 +22,7 @@
/**
* <p>A <tt>PartitionGroupStrategy</tt> implementation defines a strategy
* how backup groups are designed. Backup groups are units containing
* one or more Hazelcast nodes to share the same physical host, rack or
* one or more Hazelcast nodes to share the same physical host/Kubernetes node, rack or
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* one or more Hazelcast nodes to share the same physical host/Kubernetes node, rack or
* one or more Hazelcast nodes to share the same physical host/node, rack or

Copy link

@leszko leszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few super minor comments. Other than that, LGTM 👍

Copy link
Contributor

@blazember blazember left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with a few minors 👍

*
* <h1>Zone Aware Partition Groups</h1>
* In this scheme, groups are allocated according to the metadata provided by Discovery SPI
* These metadata are availability zone, rack and host. Partitions are not written to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like "The backups of the partitions are not placed on..."?

* In this scheme, groups are allocated according to node name metadata provided by Discovery SPI.
* For container orchestration tools like Kubernetes and Docker Swarm, node is the term used to refer
* machine that containers/pods run on. A node may be a virtual or physical machine.
* Partitions are not written to the same group so this is very useful for ensuring partitions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

groups.put(nodeInfo, group);
}
group.addMember(member);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if logging the group:member mappings at debug level would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For zone and node aware, I can say that user (should) already knows which member will be running on which node or zone via external deployment config. On the other hand, I agree with you it would be good beneficial for debug purposes. I have added into my todo list, I will prepare another PR for all group factories.

@blazember
Copy link
Contributor

Don't forget about the documentation update (6.10.1) if there is no such PR in progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants