Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARTEMIS-2716 Implements pluggable Quorum Vote #3555

Closed
wants to merge 7 commits into from

Conversation

franz1981
Copy link
Contributor

@franz1981 franz1981 marked this pull request as draft April 26, 2021 08:25
@@ -183,6 +183,73 @@
<artifactId>tomcat-servlet-api</artifactId>
</dependency>

<!-- for artemis-quorum-ri -->

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wheres the ZK alternative option dependencies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eagle eyes :D adding them now, together with features

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still dont spot them. Unresolving. I could be being blind

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franz1981 franz1981 force-pushed the pluggable_quorum_voting branch 16 times, most recently from e208c31 to b231f78 Compare May 4, 2021 05:06
@@ -106,6 +110,27 @@
<include>com.sun.xml.bind:jaxb-core</include>
<include>jakarta.activation:jakarta.activation-api</include>
<include>jakarta.security.auth.message:jakarta.security.auth.message-api</include>
<!-- quorum -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the zk deps here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


<include>org.apache.curator:curator-recipes</include>
<include>org.apache.curator:curator-client</include>
<include>org.apache.curator:curator-framework</include>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelandrepearce these are the ZK deps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now out of office eh, still need to rebase vs master and include other deps for logging and force 3.6.1 ZK client version

<artifactId>classgraph</artifactId>
<version>4.2.3</version>
</dependency>
<!-- Curator Zookeeper RI -->
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelandrepearce here! I still lack logging deps and config

assert activeMQServer.getNodeManager().getNodeId() != null;
final String liveID = activeMQServer.getNodeManager().getNodeId().toString();
final int voteRetries = policy.getVoteRetries();
final long maxAttempts = voteRetries >= 0 ? (voteRetries + 1) : -1;
Copy link
Contributor Author

@franz1981 franz1981 May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelandrepearce
@jbertram
It seems that vote-retries can be set to be -1 (unbounded) and was used with the meaning that vote-retries == 0 is possible, but saving any vote to happen (!).
I've decided to change the semantic to better match its name ie "retries" it seems to me (I'm not native eng, so please correct me if I'm wrong eh!) that are "additional" to the first mandatory "try".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in retry logic.

-1 normally means to me "unlimited" e.g. never stop, keep re-trying for ever.
0 means dont do any retry, e.g. fail/error on first error, and do not re-attempt.
1 means retry once (or how ever many times, per a positive value)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in regards to replication, i would typically expect.

-1 normally means wait until all members have acked.
0 means do not wait for any replica acks.
1 would wait till 1 member acked.

Copy link
Contributor Author

@franz1981 franz1981 May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this doc related vote-retries:

vote-retries
This property controls how many times the backup broker retries the quorum vote in order to receive a majority result that allows the backup broker to start up.

The code on SharedNothingBackupQuorum::isLiveDown:

      final int size = quorumSize == -1 ? quorumManager.getMaxClusterSize() : quorumSize;

      synchronized (voteGuard) {
         for (int voteAttempts = 0; voteAttempts < voteRetries && !stopped; voteAttempts++) {
            if (voteAttempts > 0) {
               try {
                  voteGuard.wait(voteRetryWait);
               } catch (InterruptedException e) {
                  //nothing to do here
               }
            }
            if (!quorumManager.hasLive(targetServerID, size, quorumVoteWait, TimeUnit.SECONDS)) {
               return true;
            }
         }
      }

It doesn't seem correct related the retrying semantic we've said above, but I would like to understand what it means.
QuorumManager::hasLive:

  • returns true if the majority of nodes decide that a live with NodeID === targetServerID is still around or if quorumVoteWait expires -> it will be re-attempted vote-retries times
  • returns false if the majority of nodes decide that a live with NodeID === targetServerID isn't down

Hence SharedNothingBackupQuorum::isLiveDown is going to retry vote-retries times until it get QuorumManager::hasLive === false, retrying if there is stil a live around OR there's no majority agreement in quorumVoteWait seconds.

The new implementation is doing the same, but if vote-retries == 0 it still try the first time before giving up, that seems more correct to me.

@franz1981
Copy link
Contributor Author

Sadly canAcquireLockOnMajorityRestart can still get some failures because of https://issues.apache.org/jira/browse/CURATOR-594, I'm awaiting the PR I've sent on apache/curator#383 to be merged

@franz1981 franz1981 force-pushed the pluggable_quorum_voting branch 5 times, most recently from 362ef2b to a9b7c4d Compare May 20, 2021 14:21
@michaelandrepearce
Copy link
Contributor

michaelandrepearce commented Jun 25, 2021

Im confused here because i thought we agreed on the call that versioned journal was needed and the assumption i took there was that would then need to be implemented before we move forwards.

I dont see a need or rush to merge to master till all the bits are done from that community call.

@franz1981
Copy link
Contributor Author

franz1981 commented Jun 25, 2021

During the call we have agreed that I would have raised the JIRAs for the missing features to make it more robust in front of journal misalignments scenario, if compared to che classic replication.

The already delivered values here are:

  • support for single pair too
  • many fixes on some dark corners of the original state diagram
  • easy migration path (configuration-wise)
  • pluggable quorum service

Users can already use what we have here getting same or better value then classic replication.

In the meantime we can:

  • collect feedbacks on it as it is
  • design journal versioning as an enhancement
  • any other improvements we need

Agree about not been in hurry, but given the magnitude of this feature I believe delivering it in chunks/iteratively is the way to go (assuming each chunk to have its value).

I was sure I have explained it during the call

@gtully any comment?

@michaelandrepearce
Copy link
Contributor

michaelandrepearce commented Jun 25, 2021

So i thought we had a data state issue and thus a solution was a must before we went ahead, and the point of the call in the first place to agree on that solution. As such the need for the journal versioning as a solution agreed to on the call to address that. Im very cautious about us delivering this to main and to users without an automated and safe solution to the data state issue as its asking for production issues, and people shooting themselves in the foot.

Further enhancements like multi slave etc was the stuff we could implement after.

But the automated data integrity was a must.

@franz1981 franz1981 force-pushed the pluggable_quorum_voting branch 4 times, most recently from 09f5a22 to ed32cce Compare June 28, 2021 09:32
@gtully
Copy link
Contributor

gtully commented Jun 28, 2021

I think what we have here is a very good line in the sand. the problem that a lock version solves for a computer can more easily be solved by a capable human. What we have here is a clear definition of the problem. This is a nice incremental improvement.

@michaelandrepearce
Copy link
Contributor

And here lies the biggest issue, the human. This is what and why this was something to be solved, because too often than not, the human cannot make the right decision, starts accidentally the wrong node, so forth and so on.
And thus why we had the call to find the solution. Now the agreed solution from call was the journal solution proposed by youself, but we do need something. If we dont think journal solution now work, then we should probably have a follow up call. but i suspect thats not the case here.

@gtully
Copy link
Contributor

gtully commented Jun 28, 2021

this PR is just a core part of the solution, https://issues.apache.org/jira/browse/ARTEMIS-3340 can layer over the versioning.

@michaelandrepearce
Copy link
Contributor

michaelandrepearce commented Jun 28, 2021

Indeed, but i think we need both parts before we goto main, id be concerned about this going into main before that as then we could accidentally release just one part and people getting themselves into an issue in prod, without both parts being done, and protecting people from themselves, e.g. even myself running in prod, i would not be confident in always making the right call and indeed would want protection automatically from running the wrong one.

I see no harm in keeping a branch with this stuff, until both done, if we want to have a formal feature branch in apache git, then lets make that, and then we can pull to main from there once all done.

@gtully
Copy link
Contributor

gtully commented Jun 28, 2021

@michaelandrepearce I hear you. there is more work to do for sure, but it won't be in any way easier to shoot ones self in the foot with these changes.
It is all about collaboration, it is difficult to get folks to commit to code on a PR or even a feature branch.

The reality is that If it is not on main, it does not exist.

This feature is ripe for collaboration because it pulls in a distributed lock service, getting more users of that would be great to shape the API. The journal exclusive lock is just one use case (all be it an important one).

@michaelandrepearce
Copy link
Contributor

michaelandrepearce commented Jun 28, 2021

Right now in testing this, yes i have been building and running it, i have started the wrong node after taking it all down, and thus ended up running the older journal, right now with leader follower (mstr, slv) setups, because of the leader (mstr) concept you always know which node should be taken down last and backup first. And on restart if you do accidentally start slave you're safe, because it wont start till master is there.

If it goes to main, we're saying its ready for users, without the second part i strongly disagree i dont think it is ready for release, we can have a feature branch for collaboration, even we can produce non official binaries, but i dont think its wise to officially release it, and have users giving feedback from real life deployments where they dont fully understand what theyre buying into users != active developers, i think at this stage we're in a state that its suitable for developers yes to be running it, but they can do that from branch / overnight non release binaries.

@michaelandrepearce
Copy link
Contributor

michaelandrepearce commented Jun 28, 2021

If it does go to main, then it should have docs written with very explicit statements (big red/orange banner like) that this is experimental not for production and work in progress.

@franz1981
Copy link
Contributor Author

franz1981 commented Jun 28, 2021

If it does go to main, then it should have docs written with very explicit statements (big red/orange banner like) that this is experimental not for production and work in progress.

+100 agree
It would ease the pain of using a separate branch chasing main and would ease collaboration IMO (it's just personal).
I would like to add a comment about it on the doc + a couple of other minor changes ie I would prefix all ZK live locks with namespace/locks/ to prepare for the journal version change (that will likely use something like namespace/longs/ prefixes

@gtully
I believe we should doc somewhere the data misalignments issue of classic replication too (not just the JIRA with the fix using a journal version) and what are the expected mitigation actions to help: the pluggable version will going to see a fix for this, but for classic probably not, hence it worths to help users to be aware of it IMO.

@michaelandrepearce
Copy link
Contributor

Def want to be able to give artemis zk a base i the tree (configurable) after all many places will share a zk with other stuff or one zk for multiple brokers

@franz1981
Copy link
Contributor Author

Yep, that's already implemented using the "namespace" parameter of the manager, but I would like to already separate locks from "others" (journal timestamps or any other primitives we will need in the future)

@franz1981
Copy link
Contributor Author

franz1981 commented Jun 29, 2021

@michaelandrepearce @gtully
I have both added the ZK default locks path (but I'm playing with ZK to see if it satisfies me) and the warning for users mentioning the upstream JIRA that explain the data misalignment issue.
I'm going to add a doc JIRA to produce some meaningful mitigation actions for replication in general, given that classic replication is affected by this, but is very unlikely that's gonna be fixed.

I used this manager configuration:

               <manager>
               		<properties>
               			<property key="connect-string" value="localhost:2181" />
               			<property key="namespace" value="artemis_test" />
               		</properties> 
               </manager> 

And using the ZK Cli cmd:

[zk: localhost:2181(CONNECTED) 16] ls /artemis_test/locks/f2ed3b55-d8ac-11eb-9420-8cc68169c75b/leases 
[_c_d3b7e156-7599-4125-b891-0ebca91d3e63-lease-0000000000]

Where f2ed3b55-d8ac-11eb-9420-8cc68169c75b is a broker NodeID: it seems ok to me.
If in the follow-up PRs we're gonna expose counters/mutable longs holding the journal/data versioning/timestamp with the same NodeID it would look like

/artemis_test/mutable_longs/f2ed3b55-d8ac-11eb-9420-8cc68169c75b

That looks ok to me (name mutable_longs is temporary).

@gtully
Copy link
Contributor

gtully commented Jul 15, 2021

I have pulled all of the above into a new PR and added in the activation sequencing. I think we can move forward on that.

#3646

@franz1981
Copy link
Contributor Author

Closing as being superseded by #3646

@franz1981 franz1981 closed this Jul 27, 2021
@franz1981 franz1981 deleted the pluggable_quorum_voting branch August 6, 2021 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants