Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race Conditions when closing server, port is still bound #879

Open
Jonahss opened this issue Apr 12, 2019 · 27 comments
Open

Race Conditions when closing server, port is still bound #879

Jonahss opened this issue Apr 12, 2019 · 27 comments

Comments

@Jonahss
Copy link

Jonahss commented Apr 12, 2019

Describe the bug

Hello! I'm not a Java expert, but was happy to find this easy websocket implementation.
I've run into a race condition with closing a server, and immediately opening a new server on the same port.
This came up as part of my unit tests. Each test passes in isolation, but when they run quickly one after another, well, they started to throw java.net.BindException: Address already in use exceptions.

I've built a really great repro case for you! I was really embarrassed to add a Thread.sleep() call after my server.close() call, so I created a repro.

The following code will run just fine. This is the naiive case of starting a server, closing it, and starting another server. The issue comes up in the particular way that my client is spamming the server to create new connections.

import org.java_websocket.WebSocket;
import org.java_websocket.handshake.ClientHandshake;
import org.java_websocket.server.WebSocketServer;
import org.junit.Test;

import java.io.IOException;
import java.net.InetSocketAddress;
import java.net.URISyntaxException;
import java.nio.ByteBuffer;

public class WebsocketServerTest {

    @Test
    public void QuickStopTest() throws IOException, InterruptedException, URISyntaxException {
        int port = 8765;

        class SimpleServer extends WebSocketServer {

            public SimpleServer(InetSocketAddress address) {
                super(address);
            }

            @Override
            public void onOpen(WebSocket conn, ClientHandshake handshake) {
                conn.send("Welcome to the server!"); //This method sends a message to the new client
                broadcast( "new connection: " + handshake.getResourceDescriptor() ); //This method sends a message to all clients connected
                System.out.println("new connection to " + conn.getRemoteSocketAddress());
            }

            @Override
            public void onClose(WebSocket conn, int code, String reason, boolean remote) {
                System.out.println("closed " + conn.getRemoteSocketAddress() + " with exit code " + code + " additional info: " + reason);
            }

            @Override
            public void onMessage(WebSocket conn, String message) {
                System.out.println("received message from "	+ conn.getRemoteSocketAddress() + ": " + message);
            }

            @Override
            public void onMessage( WebSocket conn, ByteBuffer message ) {
                System.out.println("received ByteBuffer from "	+ conn.getRemoteSocketAddress());
            }

            @Override
            public void onError(WebSocket conn, Exception ex) {
                System.err.println("an error occured on connection " + conn.getRemoteSocketAddress()  + ":" + ex);
            }

            @Override
            public void onStart() {
                System.out.println("server started successfully");
            }
        }
        SimpleServer serverA = new SimpleServer(new InetSocketAddress("localhost", port));
        SimpleServer serverB = new SimpleServer(new InetSocketAddress("localhost", port));

        serverA.start();

        Thread.sleep(3000);

        serverA.stop();
        serverB.start();
    }
}

So that code passes.

BUT. When I have a server spamming connections, the code above will sometimes pass, sometimes fail with my "Address already in use error" and sometimes fail with a ClosedSelectorException.

Here's the code which spams the connection. I did my best to help here, but I had to pull the code out of a Python script which was running, I couldn't repro with a java client.

(use python3 and install the websockets module)

import asyncio
import websockets

# requirements.txt:
# websockets==6.0.0

async def websocket_loop():
    """
    Processes messages from self.queue until mitmproxy shuts us down.
    """
    while True:
        try:
            async with websockets.connect('ws://localhost:8765', max_size = None) as websocket:
                while True:
                    # Make sure connection is still live.
                    await websocket.ping()

        except websockets.exceptions.ConnectionClosed:
            # disconnected from server
            pass
        except BrokenPipeError:
            # Connect failed
            pass
        except IOError:
            # disconnected from server mis-transfer
            pass
        except:
            print("[mitmproxy-node plugin] Unexpected error:", sys.exc_info())
            traceback.print_exc(file=sys.stdout)

worker_event_loop = asyncio.new_event_loop()
worker_event_loop.run_until_complete(websocket_loop())

Expected behavior
Well, my unit tests should be able to start servers on the same port as a previously stopped server without adding a 500ms sleep in between.

Debug log
The the debug log (set the log level to TRACE) to help explain your problem.
^I could do this if you walk me through how.

Environment(please complete the following information):

  • Version used: 2.0-m02
  • Java version: 1.8
  • Operating System and version: MacOS 10.14.4 (18E226)
  • Endpoint Name and version: N/A (self hosted in repro case)
  • Link to your project: coming soon...
@PhilipRoman
Copy link
Collaborator

PhilipRoman commented Apr 12, 2019

Hello!
This could be caused by TIME_WAIT state after closing the server. Have you tried using WebSocketServer.setReuseAddr(true)?

@Jonahss
Copy link
Author

Jonahss commented Apr 12, 2019

Thanks. I gave it a try. Same result.

@NoahAndrews
Copy link
Contributor

NoahAndrews commented Apr 13, 2019

I saw a maybe-related issue here: #877 (comment)

@NoahAndrews
Copy link
Contributor

Maybe this test is related as well: #877 (comment)

@marci4
Copy link
Collaborator

marci4 commented Apr 14, 2019

Maybe this test is related as well: #877 (comment)

Not really related, the way I did the test back then sucked and broke when I updated to use SLF4J.

@marci4
Copy link
Collaborator

marci4 commented May 24, 2019

Was not able to reproduce with the websocket client...

@Jonahss
Copy link
Author

Jonahss commented May 24, 2019

so the code in the issue didn't reproduce for you? darn.

@Jonahss Jonahss closed this as completed May 24, 2019
@marci4
Copy link
Collaborator

marci4 commented May 27, 2019

@Jonahss I was simple not able to reproduce it with the java websocket client when I tried to reproduce it quickly. No need to close it yet ;)

@marci4 marci4 reopened this May 27, 2019
@Jonahss
Copy link
Author

Jonahss commented May 28, 2019

Oh okay. it's not a terrible issue for me. Most people must not be encountering it. I was able to solve for my use case by adding a short wait. Although inelegant, it doesn't really harm the code, as it's a small project with little use so far.

@chenshun00
Copy link

chenshun00 commented May 30, 2019

port is still bound, i have to wait for dozens of seconds. :)

marci4 added a commit to marci4/Java-WebSocket-Dev that referenced this issue Jun 11, 2019
Fix ConcurrentModificationException found during the investigation of TooTallNate#879
@marci4
Copy link
Collaborator

marci4 commented Jun 11, 2019

I tried again with a new pc and differend jre version (12) and still was not able to reproduce it.
I however found a ConcurrentModificationException which I fixed with #902.

Best regards,
Marcel

@mneundorfer
Copy link

Maybe my observation can help on this:
I encounter the same problem as @Jonahss (at least I think so :-) ): If I shut down my application while there are still open connections to the server, the issue occurs. So

  1. Start app
  2. Connect client(s)
  3. Stop app
  4. Restart app --> java.net.BindException: Address already in use. Only works after waiting dozens of seconds of sometimes even minutes.

However,

  1. Start app
  2. Connect client(s)
  3. Disconnect client(s)
  4. Stop app
  5. Restart app --> works just fine

I am using the native JavaScript WebSocket client

@marci4
Copy link
Collaborator

marci4 commented Jun 12, 2019

Would WebSocketServer.stop(timeout) be a better option for you ?
What interval are we talking between stop app and restart? (Do you have used WebSocketServer.setReuseAddr(true) before starting the server?)

Do be honest, only a good project where I can reproduce it can help...

@maiph
Copy link
Contributor

maiph commented Jun 12, 2019

I wrote this test but didn't get it to fail.

public class Issue879Test {

    private static final int NUM_SERVERS_CREATED = 2;
    private static final int NUM_CLIENTS = 10;
    private static final AtomicInteger SERVERS_STARTED_COUNT = new AtomicInteger(0);

    @Test()
    public void testIssue() throws Exception {
        int port = SocketUtil.getAvailablePort();

        for (int i = 0; i < NUM_SERVERS_CREATED; i++) {
            WebSocketServer server = new Issue879Server(new InetSocketAddress(port));
            server.start();

            for (int j = 0; j < NUM_CLIENTS; j++) {
                WebSocketClient client = new Issue879Client(new URI("ws://localhost:" + port));
                client.connectBlocking();
                assert client.isOpen();
            }

            server.stop();
        }

        assert SERVERS_STARTED_COUNT.get() == NUM_SERVERS_CREATED;
    }

    private static class Issue879Server extends WebSocketServer {
        Issue879Server(InetSocketAddress address) {
            super(address);
        }

        @Override
        public void onOpen(WebSocket conn, ClientHandshake handshake) {

        }

        @Override
        public void onClose(WebSocket conn, int code, String reason, boolean remote) {

        }

        @Override
        public void onMessage(WebSocket conn, String message) {

        }

        @Override
        public void onError(WebSocket conn, Exception ex) {
            assert false;
        }

        @Override
        public void onStart() {
            SERVERS_STARTED_COUNT.incrementAndGet();
        }
    }

    private static class Issue879Client extends WebSocketClient {

        Issue879Client(URI serverUri) {
            super(serverUri);
        }

        @Override
        public void onOpen(ServerHandshake handshakedata) {

        }

        @Override
        public void onMessage(String message) {

        }

        @Override
        public void onClose(int code, String reason, boolean remote) {

        }

        @Override
        public void onError(Exception ex) {

        }
    }
}

@mneundorfer
Copy link

@marci4 Indeed, I have not. Also, it solves my issue.

To anyone stumbling accross this in the future: Here is the according wiki section which I did not see before... Sorry about that!

@pcdv
Copy link

pcdv commented Nov 25, 2019

I recently upgraded to 1.4.0 (from 1.3.4) and also witnessed that the websocket port remained in TIME_WAIT after a close(), preventing to reopen the port immediately.

I added a call to setReuseAddr(true) and that fixed my issue.

NB: the problem was occurring on Linux but not on Windows

@jimirocks
Copy link

Seems like I have the same issue, reproducible using the client and server from the same (this) library. See this test: https://github.com/Smarteon/loxone-java/blob/master/src/test/groovy/cz/smarteon/loxone/LoxoneWebSocketIT.groovy#L114

I tried also adding reuseAddr but no luck. The test behaves kind of different on different linuxes and JVMs - ie. on OpenJDK11 ubuntu 19.10 it fails 90% times.

@marci4
Copy link
Collaborator

marci4 commented Dec 5, 2019

@jimirocks so it is os and Java version dependant? Sounds like an issue in the JVM?

@jimirocks
Copy link

@marci4 it happens on all linuxes and JVMs I use (raspbian, ubuntu, fedora, jvm8-11), what differs is the rate. So my guess it's not necessarily the JVM issue.

@marci4
Copy link
Collaborator

marci4 commented Dec 5, 2019

@jimirocks was never able to reproduce it on windows.
Could you invest some more time and try to understand the issue? Thx

@jimirocks
Copy link

@marci4 well, my reproducer is 100% successful using JDK11 on ubuntu. netstat doesn't show the port is in use, however until the JVM ends it's not possible to bind it again even from different process. That means the port remains bound while unused until the original JVM finishes. I haven't found such issue in JDK11 neither some similar complains on it. And if this was the JDK11 BUG I would guess somebody found it sooner than here. Debugging the JDK is beyond my knowledge...

@victordiaz
Copy link

I updated to 1.5.1 and got the same problem in Android.
I went back to 1.3.x where problems disappears for me and don't have a big need to update for my needs.

Thanks for the library and your effort

@pawankgupta-se
Copy link

@marci4 I am also encountered this problem and I used setReuseAddr(true) to solve this issue by as it mentioned in this thread. Is this correct solution for the problem ?

@dustContributor
Copy link

Just a heads up. I had the same issue across JVM restarts: Create a WebSocketServer, connect from a browser client (localhost, Firefox), close the WebSocketServer and the whole JVM instance, and then starting it again said the address was already bound.

Using WebSocketServer::setReuseAddr(true) seems to fix it. Thanks!

@chenshun00
Copy link

SO_REUSEADDR allows your server to bind to an address which is in a
TIME_WAIT state. It does not allow more than one server to bind to
the same address.

http://www.unixguide.net/network/socketfaq/4.11.shtml

@xstremasoftware
Copy link

I had the same issue with a server running on Android.
setReuseAddr(true) solved the problem

@rawnsley
Copy link

rawnsley commented Jul 1, 2021

I can confirm this worked for me on Android. setReuseAddr(true) appears to prevent the socket entering the problematic TIME_WAIT state if the app process is killed, which happens as a matter of routine in Android.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests