Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hazelcast Cluster Manager (Vertx side) does not update nodeId of member when Hazelcast resets it #164

Closed
javastack59 opened this issue Dec 18, 2022 · 2 comments · Fixed by #170
Assignees
Labels
Milestone

Comments

@javastack59
Copy link

Version

4.3.2

Context

Frequently encountered a event bus reply timeout exception which looks suspicious as the UUID mentioned in the exception stacktrace is not even part of the cluster member list n Hazelcast logs.

Upon debugging, it is dentified that when a member is too busy for a brief time, Hazelcast resets the UUID of the member
Reference hee:
https://github.com/hazelcast/hazelcast/blob/master/hazelcast/src/main/java/com/hazelcast/internal/cluster/impl/ClusterServiceImpl.java#L342

Currently the Cluster Manager is not reacting to this event which essentially is leaving EventBus subscribers list with dangling references to removed UUIDs

jvm 1 | Members {size:1, ver:1} [
jvm 1 | Member [10.244.237.60]:5701 - 202d7784-d695-4d46-83ba-24541b6a2943 this
jvm 1 | ]
jvm 1 |
jvm 1 | 2022-12-16 12:12:18.265+0000 [] [vert.x-worker-thread-0] INFO com.hazelcast.core.LifecycleService - [10.244.237.60]:5701 [nimbus-v3] [4.2.4] [10.244.237.60]:5701 is STARTED
jvm 1 | 2022-12-16 12:12:18.265+0000 [] [vert.x-worker-thread-0] INFO i.v.s.c.h.HazelcastClusterManager - Local Member : 10.244.237.60
jvm 1 | 2022-12-16 12:12:18.265+0000 [] [vert.x-worker-thread-0] INFO i.v.s.c.h.HazelcastClusterManager - Local Member Node Id: 202d7784-d695-4d46-83ba-24541b6a2943
jvm 1 | 2022-12-16 12:12:18.295+0000 [] [vert.x-worker-thread-0] INFO i.v.s.c.h.HazelcastClusterManager - Join complete

jvm 1 | 2022-12-16 12:12:18.356+0000 [] [vert.x-eventloop-thread-0] INFO i.v.c.e.i.c.ClusteredEventBus - Starting EventBus with Cluster Address : 10.244.237.60 and nodeId :202d7784-d695-4d46-83ba-24541b6a2943

jvm 1 | 2022-12-16 12:17:13.071+0000 [] [hz.happy_borg.cached.thread-8] WARN c.h.internal.cluster.ClusterService - [10.244.237.60]:5701 [nimbus-v3] [4.2.4] Resetting local member UUID. Previous: 202d7784-d695-4d46-83ba-24541b6a2943, new: e0a23b31-e32c-4ad1-9acc-d43d76c0dd01

jvm 1 | 2022-12-16 12:17:20.227+0000 [] [vert.x-worker-thread-2] INFO i.v.s.c.hazelcast.impl.SubsMapHelper - republishOwnSubs::
jvm 1 | 2022-12-16 12:17:20.227+0000 [] [vert.x-worker-thread-2] INFO i.v.s.c.hazelcast.impl.SubsMapHelper - republish for adress: 202d7784-d695-4d46-83ba-24541b6a2943

@javastack59
Copy link
Author

I have submitted a pull request that is a possible fix for the above problem:
#163

@tsegismont
Copy link
Contributor

Thank you, I will review it

@tsegismont tsegismont added this to the 4.4.0 milestone Dec 21, 2022
@tsegismont tsegismont self-assigned this Dec 21, 2022
tsegismont added a commit to tsegismont/vertx-hazelcast that referenced this issue Feb 7, 2023
Fixes vert-x3#164

Hazelcast may change the member uuid if it goes out of the grid for a moment (e.g. if suspected to be unhealthy).
We can't use this identifier as Vert.x nodeId.

Instead, we create one and store it as a member attribute.
This has to happen before starting the member because in recent versions of Hazelcast, member attributes can no longer be updated.

As a consequence, users who rely on an existing Hazelcast instance will have to configure the attribute manually (breaking change, documented).

Signed-off-by: Thomas Segismont <tsegismont@gmail.com>
tsegismont added a commit that referenced this issue Feb 20, 2023
* Use fixed nodeId from HZ attributes config

Fixes #164

Hazelcast may change the member uuid if it goes out of the grid for a moment (e.g. if suspected to be unhealthy).
We can't use this identifier as Vert.x nodeId.

Instead, we create one and store it as a member attribute.
This has to happen before starting the member because in recent versions of Hazelcast, member attributes can no longer be updated.

As a consequence, users who rely on an existing Hazelcast instance will have to configure the attribute manually (breaking change, documented).

Signed-off-by: Thomas Segismont <tsegismont@gmail.com>

* Filter data-only members out of the nodes list

Signed-off-by: Thomas Segismont <tsegismont@gmail.com>

---------

Signed-off-by: Thomas Segismont <tsegismont@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

2 participants