Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CURATOR-535] TestServer random port selection has a race condition #406

Closed
wants to merge 1 commit into from

Conversation

paul8263
Copy link

This PR is to mitigate the race condition issue. The random available port is now assigned right before starting Zookeeper testing server.

Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that with this patch it's still possible that a resolved port be in used by other process before it's used by the caller.

@paul8263
Copy link
Author

It seems that with this patch it's still possible that a resolved port be in used by other process before it's used by the caller.

Yes. currently this PR updates InstanceSpec so that the random port will only be allocated when it is actually used.
But There are still some more steps before the port is actually used: wrap the configuration into QuorumPeerConfig and start the server. There could be a race condition in these steps.

QuorumPeerConfig config = configBuilder.buildConfig(thisInstanceIndex);
main.runFromConfig(config);

I suggest we could introduce a file lock in order to alleviate the race condition among processes. But the implementation might be cumbersome.

@paul8263 paul8263 requested a review from tisonkun May 23, 2022 02:51
@tisonkun tisonkun removed their request for review May 30, 2022 10:05
@tisonkun
Copy link
Member

Since this patch doesn't resolve the problem, I tend not to merge such change.

Perhaps @eolivelli @Randgalt have different ideas.

@madrob
Copy link
Contributor

madrob commented May 30, 2022

apache/zookeeper#1868 May be relevant

@paul8263
Copy link
Author

Hi @tisonkun @eolivelli and @Randgalt ,
Thank you for your reply. This problem is unusual. I got this problem when running unit tests in other project which relies on Zookeeper. The unit tests are running parallelly so that TestServer creating process might get a race condition when allocating unused ports. Currently my solution is the steps below:

  1. Get a random unused port.
  2. Implementing a file lock.
  3. Allocating the port to TestServer.
  4. Check if TestServer starts properly. If it starts successfully, release the file lock.
    I would like to move those steps inside TestServer creation and start process. However I worry that introducing a file lock might not be the best solution as it only solves an unusual problem at the cost of performance degradation.
    Have you got any better ideas? I think using the file lock should be considered as the last resort. Correct me if I am wrong.

@tisonkun
Copy link
Member

Should be fixed by 7e7c207 already.

@tisonkun tisonkun closed this Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants