Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

close file double time sometimes cause copy problem wh #572

Open
lindmin opened this issue Jan 11, 2024 · 10 comments
Open

close file double time sometimes cause copy problem wh #572

lindmin opened this issue Jan 11, 2024 · 10 comments

Comments

@lindmin
Copy link

lindmin commented Jan 11, 2024

Hello,

version of go : 1.20.5
version of lib sftp : v1.13.6
version of sftp server: OpenSSH_8.9p1 Ubuntu-3ubuntu0.6

I sometime experiment a bug when my code close 2 times the same sftp.File ref in concurrent situation

here the code to reproduce :

main.go

package main

import (
	"bytes"
	"fmt"
	"net"
	"os"
	"sync"

	"github.com/pkg/sftp"
	"golang.org/x/crypto/ssh"
)

func main() {
	connSSH, errSSH := ssh.Dial("tcp", net.JoinHostPort("localhost", "22"), &ssh.ClientConfig{
		HostKeyCallback: ssh.InsecureIgnoreHostKey(),
		User:            "USER",
		Auth: []ssh.AuthMethod{
			ssh.Password("PASSWORD"),
		},
	})
	if errSSH != nil {
		panic(errSSH)
	}
	client, errSftp := sftp.NewClient(connSSH, sftp.UseConcurrentReads(true), sftp.UseConcurrentWrites(true), sftp.UseFstat(true))
	if errSftp != nil {
		panic((errSftp))
	}
	if errSftp != nil {
		panic(errSftp.Error())
	}
	input := make(chan int, 1000)

	for i := 1; i < 1000; i++ {
		input <- i
	}

	wait := &sync.WaitGroup{}
	for i := 1; i <= 1000; i++ {
		wait.Add(1)
		go func(wait *sync.WaitGroup) {
			defer wait.Done()
			for {
				idx, ok := <-input
				if !ok {
					return
				}
				filename := fmt.Sprintf("/tmp/file-%d.dat", idx)
				err := copy(client, filename, "small content")
				if err != nil {
					fmt.Println(err.Error())
				}
			}
		}(wait)
	}
	close(input)
	wait.Wait()

}

func copy(client *sftp.Client, filename string, content string) error {
	file, errCreate := client.OpenFile(filename, os.O_WRONLY|os.O_CREATE|os.O_TRUNC)
	if errCreate != nil {
		return errCreate
	}
	_, errCopy := file.ReadFrom(bytes.NewBufferString(content))
	if errCopy != nil {					
               fmt.Printf("💣💣💣💣💣 COPY: %s", err.Error())
		//clean it
		client.Remove(filename)
		return errCopy
	}
	defer file.Close()
	//force double close
	file.Close()
	return nil
}

go.mod

module golang_sftp_fd_null

go 1.20

require (
	github.com/pkg/sftp v1.13.6
	golang.org/x/crypto v0.9.0
)

require github.com/stretchr/testify v1.8.1 // indirect

require (
	github.com/kr/fs v0.1.0 // indirect
	golang.org/x/sys v0.8.0 // indirect
)

When i launched this code, sometime i got 1 or 2 SSH_FX_FAILURE on the ReadFrom method .

output of my code :

💣💣💣💣💣 COPY: sftp: "Failure" (SSH_FX_FAILURE)

On server side, I can see that type of logs:

Jan 11 15:27:04 BVK-book sftp-server[185806]: debug1: request 2162: write "(null)" (handle -1) off 0 len 14
Jan 11 15:27:04 BVK-book sftp-server[185806]: debug3: request 2162: sent status 4
Jan 11 15:27:04 BVK-book sftp-server[185806]: sent status Failure

Now, if I replace

	defer file.Close()
	//force double close
	file.Close()

with

        alreadyClosed:=false
	defer func(){
       if !alreadyClosed{
           file.Close()        
        }
        }
	//force double close
	file.Close()

I cant reproduce the error.

In the Close method documentation, it is mention that Close closes the File, rendering it unusable for I/O..
Is-it really the case ?

Does a PR that add a flag check "alreadyClose" in the Close method before send a close request to sftp server is a great idea ?

thank you.

@puellanivis
Copy link
Collaborator

puellanivis commented Jan 11, 2024

It is a well-known situation that calling Close more than once on something may result in erroneous behavior. Consider: https://go.dev/play/p/K9uveTKRNus

Which returns:

panic: close of closed channel

goroutine 1 [running]:
main.main()
	/tmp/sandbox2717924249/prog.go:8 +0x30

Also, note that attempting to closing an os.File twice will return an error file already closed.

Now, all of that said, we could support this a little better, by locking the sftp.File mutex during a Close call and then marking it as already closed, then returning fs.ErrClosed when we expect that double close has occurred. Rather than trying to send a known broken SSH_FXP_CLOSE to the server for it to respond back with an SSH_FX_FAILURE.

P.S.: Note that your code was erroneous even if using os.File, but the error produced would be simply dropped on the floor, as you’re not actually trapping it in the defer.

@lindmin
Copy link
Author

lindmin commented Jan 15, 2024

Hello,

I knew double close is not a really clean code, I was fool by other module implementation of Close where Close can be called safely multiple time.

I create the issue it because the ReadFrom error happend randomly, and it take me some time to identify that was the second close() on a already closed file reference that impacting another file reference copy process, it seems to me very sneaky.

With your solution, it will be more explicit and preventive 👍

@puellanivis
Copy link
Collaborator

puellanivis commented Jan 15, 2024

where Close can be called safely multiple time.

This may not be strictly true. Since you’re throwing away any errors that are happening, you cannot know whether these multiple closes are actually safe or not.

Hm, this is still kind of weird though, because the SSH_FXP_CLOSE shouldn’t be impacting any other open file handles. 🤔

And even with UseConcurrentWrites(true) the length of data being written, isn’t enough to trigger concurrent transfers.

Actually:

debug1: request 2162: write "(null)" (handle -1) off 0 len 14

This is the sftp server related what the request it got was. Which appears to be:

&sshFxpWritePacket{
	ID:     2162,
	Handle: "",
	Offset: 0,
	Length: 14,
	Data:   []byte{ /* 14-bytes of data here */ },
}

And that’s just all sorts of wrong. There shouldn’t really be a way that we can even send that.

@puellanivis
Copy link
Collaborator

I think what might help is more of the server logs preceding one of these errors. 🤔 While the logs are, I’m sure, relatively free of sensitive information, you should probably attach it as a file, rather than embed it. There’s almost certainly a lot of noise going on in there.

@lindmin
Copy link
Author

lindmin commented Jan 16, 2024

here the log of one run with 2 copy errors :

bapt@BVK-book:~$ ./golang_sftp_fd_null 
💣💣💣💣💣 COPY: sftp: "Failure" (SSH_FX_FAILURE)
💣💣💣💣💣 COPY: sftp: "Failure" (SSH_FX_FAILURE)

double_close_bug.log

@puellanivis
Copy link
Collaborator

Interesting, found two corresponding close requests via no bytes being written, all others have written 14:

Jan 16 12:24:26 BVK-book sftp-server[33548]: close "/tmp/file-696.dat" bytes read 0 written 0
Jan 16 12:24:26 BVK-book sftp-server[33548]: close "/tmp/file-948.dat" bytes read 0 written 0

From here, we can then extract the entries for file-696.dat, and mix in where the status Failure occurs:

Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 701: open flags 26
Jan 16 12:24:26 BVK-book sftp-server[33548]: open "/tmp/file-696.dat" flags WRITE,CREATE,TRUNCATE mode 0666
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 701: sent handle handle 4
--
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 2150: write "(null)" (handle -1) off 0 len 14
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2150: sent status 4
Jan 16 12:24:26 BVK-book sftp-server[33548]: sent status Failure
--
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2142: close handle 4
Jan 16 12:24:26 BVK-book sftp-server[33548]: close "/tmp/file-696.dat" bytes read 0 written 0
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2142: sent status 0
Jan 16 12:24:26 BVK-book sftp-server[33548]: sent status Success
--
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 3146: remove
Jan 16 12:24:26 BVK-book sftp-server[33548]: remove name "/tmp/file-696.dat"
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 3146: sent status 0
Jan 16 12:24:26 BVK-book sftp-server[33548]: sent status Success

Wait a second, handle handle 4? But that’s not right. We already used that handle. 😮 oh no. What about the open for 948?

Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 954: open flags 26
Jan 16 12:24:26 BVK-book sftp-server[33548]: open "/tmp/file-948.dat" flags WRITE,CREATE,TRUNCATE mode 0666
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 954: sent handle handle 5

handle handle 5. Again, an attempt to reuse an existing handle. In fact, these got shunted all the way to the back of the line. Almost all of the opens happen first, before much if any writes happen. But these two “bugs” in the open requests both happen deep into the writes. Looking at the lowest number handles:

Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 1: sent handle handle 0
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 2: sent handle handle 1
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 4: sent handle handle 2
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 3: sent handle handle 3
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 5: sent handle handle 4
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 6: sent handle handle 5
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 7: sent handle handle 6
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 8: sent handle handle 7
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 13: sent handle handle 8
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 9: sent handle handle 9
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 1105: close handle 1
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 1124: close handle 2
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 1138: close handle 4
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 701: sent handle handle 4
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 1192: close handle 5
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 954: sent handle handle 5
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 1309: close handle 3
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 1866: close handle 0
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2003: close handle 6
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2005: close handle 8
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2006: close handle 7
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2013: close handle 9
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2142: close handle 4
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2193: close handle 5

Hm. The ones reusing a handle occur immediately after the close for that handle.

Something is very sketchy here, and it’s not immediately apparent that it would be our code.

@puellanivis
Copy link
Collaborator

Congratulations! You have managed to stumble across a race condition in openssh!

There is actually absolutely nothing wrong with our code. It’s operating correctly, and openssh is the one which is misbehaving.

I’ll be isolating it, preparing an issue report to their repo, then will link to the issue here. (And maybe a PR if I can work out how to address it at the same time.)

@lindmin
Copy link
Author

lindmin commented Jan 17, 2024

Oh ok, thank you for your analysis on that ! 👍

@puellanivis
Copy link
Collaborator

Hm… ordering them by line numbers, I see something different. No bug or race condition in openssh (well, small bug, but only in the logging, but since it’s not multithreaded, this was not making sense.)

Instead, it’s (obviously) a race condition in the code which is multithreaded:
Thread A: open /tmp/file-5.dat; Response: handle 4
Thread A: write handle 4 with 14 bytes; Response: OK
Thread A: close handle 4; Response: OK
Thread B: open /tmp/file-40.dat; Response: handle 4 (reused)
Thread A: close handle 4; Response: OK
Thread B: write /tmp/file-40.dat; Response: Failure
Thread B: close handle 4; Response: File not found
Thread B: close handle 4; Response: File not found

So, this is a bug in sftp, and we’ll need to prevent the potential of a reused file handler being closed by a delayed second close on another thread of requests.

@puellanivis
Copy link
Collaborator

If you can try out the new PR, and verify that you don’t seem to hit the error anymore. I know we cannot for sure say it doesn’t happen anymore, but some amount of confidence would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants