Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Channel.close() race condition in pipe.py #2274

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

pettitpeon
Copy link

self._closed = True
was moved to the top. Built-in assignment is python is atomic and there is no need to protect it with a lock.

If there is a Bad file descriptor (errno == 9). We can safely disregard it if ._closed is set

`self._closed = True`  
was moved to the top. Built-in assignment is python is atomic and there is no need to protect it with a lock.

If there is a `Bad file descriptor` (errno == 9). We can safely disregard it if `._closed` is set
@pettitpeon
Copy link
Author

Fixes this BUG
#2271

Copy link
Contributor

@bskinn bskinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of questions about the implementation. Thanks for the PR!

paramiko/pipe.py Outdated
Comment on lines 66 to 72
try:
os.write(self._wfd, b"*")
except OSError as e:
if e.errno == 9 and self._closed:
# The pipe was closed, no need to do anything
return
raise e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not also need something analogous to protect the os.read(self._rfd, 1) call above in clear()?

(We might not; again, just raising the question.)

Copy link
Author

@pettitpeon pettitpeon Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the pipe's perspective, yes. And it would be wise to protect it. However, the way the pipe is used, this does not happen. The same thread

  1. Opens,
  2. clears and
  3. closes

the pipe. So there is no race condition between .clear() and `.close().
A second thread sets it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same request, please add a comment here explaining why a similar race catch is not (currently) required here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

paramiko/pipe.py Outdated
Comment on lines 50 to 51
os.close(self._rfd)
os.close(self._wfd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pettitpeon, is there any chance of either of these two .close() calls failing, making self._closed == True a possibly inaccurate representation of the pipe state?

(I would think that a failure of either of these two calls would signal an error state for the pipe such that self._closed == True is a better state than == False... but I thought I'd raise the question.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any chance of either of these two .close() calls failing?

Yes, it could happen. If either .close() fails, it would raise an OSError. In that case some FD might stay open, but the Pipe is not usable anyways. There is no use-case of using it after .close(). Worst case scenario, we leak resources until the application exits. The most probable scenario is that the close() exception is not handled and your program terminates anyways.
A small improvement would be to have single trys on each .close(). In that case if, the first fails and leaks, we might still close the second one correctly.
In any case, I see .close() errors as non-handable, we close() file descriptors as a "best-effort" to release the resource, but cannot really do anything if it fails.

making self._closed == True a possibly inaccurate representation of the pipe state?

Not really, after close() has been called, the object is disposed and no one should it anyways. The "broken state" is meaningless.

Last, the pipe is most probably a child of Channel and when the channel gets destroyed it tries to re-close, which fails if the pipe was previously closed, but it is disregarded by a catch-all try. This is OK, and aligns with the idea that it is a non-handable error.

def __del__(self):

paramiko/pipe.py Outdated
try:
os.write(self._wfd, b"*")
except OSError as e:
if e.errno == 9 and self._closed:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for this ifcondition to ever be True?

We've already trapped for self._closed == True with the first if in the method, so it seems like we'd only ever hit this point with self._closed == False.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the flowing (race condition) can happen

  1. The write thread sees ._closed == False and continues
  2. The close thread closes the descriptors before the write thread writes on line 67
  3. The write fails to write() because the FD has been closed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, this is the crux of the race condition, got it.

For the benefit of future readers of the code, please add a comment here describing the race condition as you just described it.

Otherwise we risk someone thinking the same thing I did and removing this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@bskinn
Copy link
Contributor

bskinn commented Aug 2, 2023

I would think it would be challenging to do, but: is there any good way to write a test for this?

I would figure that any such test would require setting up a test harness with a server to connect to, and then run the open/close cycle you describe in the issue for some period of time long enough to trigger the race condition 90%+ of the time... but that would make for a flaky test, and that'd be not great.

Interested in any thoughts you have on it, though.

@bskinn
Copy link
Contributor

bskinn commented Aug 2, 2023

Oh, also -- I added Needs changelog/docs because this would need an entry in /sites/www/changelog.rst.

@pettitpeon
Copy link
Author

I would think it would be challenging to do, but: is there any good way to write a test for this?

I will look into this. It might be possible by tunneling/forwarding an ssh connection and sending ssh echos in a loop.

@pettitpeon
Copy link
Author

I would think it would be challenging to do, but: is there any good way to write a test for this?

I would figure that any such test would require setting up a test harness with a server to connect to, and then run the open/close cycle you describe in the issue for some period of time long enough to trigger the race condition 90%+ of the time... but that would make for a flaky test, and that'd be not great.

Interested in any thoughts you have on it, though.

Unfortunately, I could not think of a way of easily reproducing the bug. I tried tunneling SSH connections through a local forward (demo/forward.py), but SSH is too smart and rejects connections from the server side if it is being spammed (DoS attack mitigation). So I cannot open and close connections fast enough to run into the race condition.

@pettitpeon
Copy link
Author

/sites/www/changelog.rst

I fixed the review issues. I do not know what to to about the changelog.

paramiko/pipe.py Outdated Show resolved Hide resolved
@pettitpeon
Copy link
Author

@bskinn @jun66j5 What else shall be done to merge? I fixed all the issues

paramiko/pipe.py Outdated Show resolved Hide resolved
paramiko/pipe.py Outdated Show resolved Hide resolved
@bskinn
Copy link
Contributor

bskinn commented Aug 17, 2023

@bskinn @jun66j5 What else shall be done to merge? I fixed all the issues

Minor tweak to the PR, and then I think it's ready to flag for review by bitprophet.

It still needs a CHANGELOG entry - you said you're not sure what to do about that... take a look at /www/sites/changelog.rst to see the syntax & template of entries added for other bugfixes, and adapt to provide a description of this bugfix.

pettitpeon and others added 2 commits August 30, 2023 10:32
Co-authored-by: Brian Skinn <brian.skinn@gmail.com>
Co-authored-by: Brian Skinn <brian.skinn@gmail.com>
@pettitpeon
Copy link
Author

pettitpeon commented Aug 30, 2023

@bskinn @jun66j5 What else shall be done to merge? I fixed all the issues

Minor tweak to the PR, and then I think it's ready to flag for review by bitprophet.

It still needs a CHANGELOG entry - you said you're not sure what to do about that... take a look at /www/sites/changelog.rst to see the syntax & template of entries added for other bugfixes, and adapt to provide a description of this bugfix.

Gonna give it a try today
--> DONE. thanks for the orientation. Anything else left to do?

@pettitpeon
Copy link
Author

Dear @bskinn @jun66j5 @bitprophet, What else is needed for the merge?

thanks!

@pettitpeon
Copy link
Author

Hi all again! please review the PR when possible

# "best effort" approach
try:
os.close(self._rfd)
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why OSError is not used rather than Exception?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we want to be as broad as possible in the try/except. If any exception is raised during the closing of the rd file descriptor, we still want to guarantee the closing of the wr file descriptor. Catching only one exception type could be too narrow and leave the pipe in an invalid state (one descriptor open and the other one closed). But I won't be able to test my changes on Windows

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we want to be as broad as possible in the try/except.

I'm unsure your comment. What I'm saying is:

        try:
            os.close(self._rfd)
        except Exception:
            pass
        try:
            os.close(self._wfd)
        except Exception:
            pass

should be

        try:
            os.close(self._rfd)
        except OSError:
            pass
        try:
            os.close(self._wfd)
        except OSError:
            pass

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the problem with catching OSError is that if os.close() raises a different exception for whatever unknow reason, the pipe could go into an invalid state and potentially leak opened file descriptors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        try:
            os.close(self._rfd)  # imagine this throws RuntimeError
        except OSError:
            pass
        try:
            os.close(self._wfd)  # in that case, the _wrd would not be closed
        except OSError:
            pass

Copy link
Contributor

@bskinn bskinn Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave these excepts as a decision for bitprophet.

@jun66j5
Copy link
Contributor

jun66j5 commented Oct 11, 2023

This PR is to fix issues in PosixPipe. I think WindowsPipe has the same issue. Any plans to fix it?

@pettitpeon
Copy link
Author

This PR is to fix issues in PosixPipe. I think WindowsPipe has the same issue. Any plans to fix it?

TBH, I did not look into the windows pipe since I use paramiko only in linux. If desired I can look into it and replicate the changes in the WindowsPipe. Still I'd like to merge this fix asap to remove my local patches

@bskinn
Copy link
Contributor

bskinn commented Oct 17, 2023

I figure this is up to bitprophet whether he wants to merge with just the PosixPipe fix and leave the WindowsPipe fix for a later PR, or roll them all up in one.

Copy link
Contributor

@bskinn bskinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting nit.

sites/www/changelog.rst Outdated Show resolved Hide resolved
@bskinn
Copy link
Contributor

bskinn commented Oct 17, 2023

It looks like the two failing CircleCI checks were due to an error on CircleCI's side. Likely all is ok with the test suite here.

Co-authored-by: Brian Skinn <brian.skinn@gmail.com>
@pettitpeon
Copy link
Author

It looks like the two failing CircleCI checks were due to an error on CircleCI's side. Likely all is ok with the test suite here.

Thanks I applied your formatting suggestion. I'll await the final review. Cheers

@bskinn
Copy link
Contributor

bskinn commented Oct 17, 2023

Great, thanks! And, thanks for your patience here. bitprophet is in the middle of a dry spell in terms of his open source bandwidth, which has slowed everything down across his projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants