Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Connections not closed on inherited net.Conn #74

Open
securez opened this issue Sep 18, 2022 · 2 comments
Open

[Bug] Connections not closed on inherited net.Conn #74

securez opened this issue Sep 18, 2022 · 2 comments

Comments

@securez
Copy link

securez commented Sep 18, 2022

After deep review of this great library, I come with a problem on inherited connections, let's me explain with a example.

I figure that in golang, to access the file descriptor of a net.Conn or viceversa a fd dup is in place, and has it's explanation, This library has 2 scenarios:

  1. Inherit a listener: This is done via Fds.Listen and creates a duplicate file descriptor on the process, one gets used in the listener, another one is added to be passed to new process. This seems ok, no really large number of listeners in one process in real life.
  2. Inherit a connection: This is done via Fds.Conn and if the connection is not present, not one is created, but when inherited the FD is duplicated, one used in the net.Conn and another added to used map, and here is the problem, when the inherited connection is closed the socket remain open, in the example below that only work with one connection, the conn get hang after update.

The problems seems to be fds.go:300 replace f.used[key] = file by file.Close(), but PacketConn seems to have the same problem. For large number of connections, the duplication of unused file descriptors, can be a problem too.

The sample code, that listen on 8080, and writes to the socket every second, and closes it after 30s. If the connection is fresh the connection is closed and client receives TCP close, but if inherited the connection will hang.

package main

import (
	"flag"
	"fmt"
	"log"
	"net"
	"os"
	"os/signal"
	"syscall"
	"time"

	"github.com/cloudflare/tableflip"
)

var stop = make(chan bool)
var done = make(chan bool)

func handleConn(conn net.Conn, upg *tableflip.Upgrader) {
	ticker := time.NewTicker(time.Second)
	timer := time.NewTimer(30 * time.Second)

	for {
		select {
		case <-stop:
			log.Printf("Updating...")
			ticker.Stop()
			timer.Stop()
			c := conn.(tableflip.Conn)
			upg.Fds.AddConn("tcp", "0", c)
			conn.Close()
			log.Printf("Done...")
			done <- true
			return

		case t := <-ticker.C:
			log.Printf("Tick: %+v", t)
			conn.SetDeadline(time.Now().Add(time.Second))
			conn.Write([]byte(fmt.Sprintf("It is not a mistake to think you can solve any major problems just with potatoes. [%d]\n", os.Getpid())))

		case t := <-timer.C:
			log.Printf("Clossing: %+v", t)
			ticker.Stop()
			timer.Stop()
			conn.Close()
			log.Printf("Closed conn")
			return
		}
	}

}

func main() {
	var (
		listenAddr = flag.String("listen", "localhost:8080", "`Address` to listen on")
		pidFile    = flag.String("pid-file", "", "`Path` to pid file")
	)

	flag.Parse()
	log.SetPrefix(fmt.Sprintf("%d ", os.Getpid()))

	upg, err := tableflip.New(tableflip.Options{
		PIDFile: *pidFile,
	})
	if err != nil {
		panic(err)
	}
	defer upg.Stop()

	// Do an upgrade on SIGHUP
	go func() {
		sig := make(chan os.Signal, 1)
		signal.Notify(sig, syscall.SIGHUP)
		for range sig {
			stop <- true
			log.Println("stopping service")
			<-done
			err := upg.Upgrade()
			if err != nil {
				log.Println("upgrade failed:", err)
			}
		}
	}()

	conn, err := upg.Fds.Conn("tcp", "0")
	if err != nil {
		log.Fatalln("Can't get conn:", err)
	}
	if conn != nil {
		log.Printf("Inherited conn: %+v", conn.RemoteAddr())
		go handleConn(conn, upg)
	}

	ln, err := upg.Fds.Listen("tcp", *listenAddr)
	if err != nil {
		log.Fatalln("Can't listen:", err)
	}

	go func() {
		defer ln.Close()

		log.Printf("listening on %s", ln.Addr())

		for {
			c, err := ln.Accept()
			if err != nil {
				log.Printf("Error on Accept: %+v", err)
				return
			}

			go handleConn(c, upg)
		}
	}()

	log.Printf("ready")
	if err := upg.Ready(); err != nil {
		panic(err)
	}
	<-upg.Exit()
	log.Printf("exiting, done, :)")
}
@lmb
Copy link
Contributor

lmb commented Sep 29, 2022

Tableflip is not meant to preserve what established connections, only listeners (TCP) or unconnected sockets (UDP). The reason is that it's rarely useful to have an established connection without some associated state to go with it and that sharing a connection like this requires additional care. In your example you only ever write to conn, but in most scenarios you also want to read from it. How can you do that if another process might still be reading from the same socket concurrently? How do you figure out at what point in a protocol a socket is in?

If you really want to share established connections you will have to build your own mechanism to do so. You can use AddConn to build this mechanism by using a unix domain socket or similar and passing state + fds via SCM_RIGHTS.

Regarding the lib: this is not a bug. The best it could do is refuse to accept connected sockets in AddConn, to avoid this mistake in the first place.

@20k-ultra
Copy link

I have a scenario where I want to perform hitless reloads but I have too many connections that are long lived (such as WebSocket). tableflip helps with the listener and I can write the code to pass connection FDs via unix sockets with SCM_RIGHTS but I need to actually pass existing connections too.

The one part that's actually tricky is serializing the net.Conn struct so that I can pass it via RPCs on unix socket or shared memory space then deserialize to reconstruct the struct and pass it to net.Conn request handlers. My only choice is to write my own H1/H2 lib isn't it ? One that I can pass a fd and config struct to say it was an existing connection and some data like keepalive headers (h1), stream states (h2), flow controls (h2), etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants