Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--bisect deadlocks when reporting results #2637

Closed
DavidS opened this issue Jun 13, 2019 · 20 comments · Fixed by #2669
Closed

--bisect deadlocks when reporting results #2637

DavidS opened this issue Jun 13, 2019 · 20 comments · Fixed by #2669
Assignees

Comments

@DavidS
Copy link

DavidS commented Jun 13, 2019

--bisect deadlocks when reporting results

david@davids:~/git/puppet-resource_api$ bundle exec rspec --seed 40589 --bisect=verbose --pattern spec/\*\*\{,/\*/\*\*\}/\*_spec.rb  --exclude-pattern 'spec/{fixtures/**/*.rb,fixtures/modules/*/**/*.rb}' 
Bisect started using options: "--seed 40589 --pattern spec/**{,/*/**}/*_spec.rb --exclude-pattern spec/{fixtures/**/*.rb,fixtures/modules/*/**/*.rb}" and bisect runner: :fork
Running suite to find failures...^C

Bisect aborted!

The most minimal reproduction command discovered so far is:
  (Not yet enough information to provide any repro command)


Bisect aborted!

The most minimal reproduction command discovered so far is:
  (Not yet enough information to provide any repro command)

david@davids:~/git/puppet-resource_api$

at the point of ctrl+c, the process has been already sitting a lot longer than the test suite would run.

stracing the processes shows the following situation:

david@davids:~$ strace -s 10000 -p 6090 -p 6096
strace: Process 6090 attached
strace: Process 6096 attached
[pid  6090] wait4(6096,  <unfinished ...>
[pid  6096] ppoll([{fd=10, events=POLLOUT}], 1, NULL, NULL, 8^Cstrace: Process 6090 detached
strace: Process 6096 detached
 <detached ...>

david@davids:~$ 

where pid 6090 is the --bisect process and pid 6096 is the child rspec process. From other traces I've been running, I understand that the ppoll is waiting on an IO event after/while writing the results to fd 10. Meanwhile the main process is hanging in waitpid at

A common reason why this might not show up in testing is if the result report in the tests is smaller than the underlying OS's buffer size. In that case the runner process exits after writing to the buffer and the parent continues happily reading from the buffer. In my case the testsuite results are ~93kB and the processes deadlock.

Your environment

  • Ruby version: ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-linux-gnu]
  • rspec-core version: 3.8.0

Steps to reproduce

david@davids:~/git/puppet-resource_api$ bundle exec rspec --bisect=verbose --seed 40589 --pattern 'spec/**{,/*/**}/*_spec.rb' --exclude-pattern 'spec/{fixtures/**/*.rb,fixtures/modules/*/**/*.rb}'
Bisect started using options: "--seed 40589 --pattern spec/**{,/*/**}/*_spec.rb --exclude-pattern spec/{fixtures/**/*.rb,fixtures/modules/*/**/*.rb}" and bisect runner: :fork
Running suite to find failures...

while this creates results:

david@davids:~/git/puppet-resource_api$ bundle exec rspec --bisect=verbose --require shell --seed 40589 --pattern 'spec/**{,/*/**}/*_spec.rb' --exclude-pattern 'spec/{fixtures/**/*.rb,fixtures/modules/*/**/*.rb}'
Bisect started using options: "--require shell --seed 40589 --pattern spec/**{,/*/**}/*_spec.rb --exclude-pattern spec/{fixtures/**/*.rb,fixtures/modules/*/**/*.rb}" and bisect runner: :shell
Running suite to find failures... (2 minutes 48.2 seconds)
 - Failing examples (1):
    - ./spec/puppet/resource_api_spec.rb[1:16:2:1]
 - Non-failing examples (1398):
    - ./spec/acceptance/array_spec.rb[1:1:1]
[...]
@benoittgt
Copy link
Member

Thanks a lot for this detailed answer and the clues.

I would love to take a deeper look on this one. In the meantime did you see this discussion rspec/rspec-rails#1353 ?

@JonRowe
Copy link
Member

JonRowe commented Jun 23, 2019

@DavidS I'd love it if you could provide us with an isolated reproduction of this, my time is quite limited at the moment and I won't be able to get to a reproduction containing complex code like puppet, another reason for this is to exclude the possibility of puppet itself causing the deadlock.

If it is just a buffer issue as you suggest it should be possible to trigger with only "RSpec" code right?

Alternatively I'm open to suggestions for detecting deadlocks and preventing them within RSpec itself?

@benoittgt
Copy link
Member

Alternatively I'm open to suggestions for detecting deadlocks and preventing them within RSpec itself?

Definitely 💚

@DavidS
Copy link
Author

DavidS commented Jun 24, 2019

require 'rspec'

RSpec.describe "a bunch of nothing" do
  (0...3000).each do |t|
    it { expect(t).to eq t }
  end
end

is an example that immediately deadlocks for me when running under rspec --bisect:

david@davids:~/tmp/rspec-deadlock-example$ bundle exec rspec spec/nil_spec.rb --bisect 
Bisect started using options: "spec/nil_spec.rb"
Running suite to find failures...^C

Bisect aborted!

The most minimal reproduction command discovered so far is:
  (Not yet enough information to provide any repro command)


Bisect aborted!

The most minimal reproduction command discovered so far is:
  (Not yet enough information to provide any repro command)

david@davids:~/tmp/rspec-deadlock-example$ cat spec/nil_spec.rb 
root@davids:~# strace -p 27326 -p 27329
strace: Process 27326 attached
strace: Process 27329 attached
[pid 27326] wait4(27329,  <unfinished ...>
[pid 27329] ppoll([{fd=8, events=POLLOUT}], 1, NULL, NULL, 8^Cstrace: Process 27326 detached
strace: Process 27329 detached
 <detached ...>

root@davids:~# 

By changing the 3000, you can make the result set arbitrarily large if you OS has a larger default buffer size.

@JonRowe
Copy link
Member

JonRowe commented Jun 24, 2019

Thanks, yes that triggers the issue for me, as a work around the shell runner of course works, but then I guess thats why you have #2638 open 😂

@DavidS
Copy link
Author

DavidS commented Jun 24, 2019

Exactly :-D

Thanks a lot for the time and work y'all put into rspec.

@benoittgt
Copy link
Member

Thanks. I was able to reproduce it on my mac with a very basic example like the one you mentioned.

https://github.com/benoittgt/rspec_repro_bisect_deadlock

@benoittgt
Copy link
Member

A very interesting answer has been posted by @palkan here : benoittgt/rspec_repro_bisect_deadlock#1

I had also success removing Process.waitpid but I was not sure it was a good idea. I didn't understand it's usage here.

@DavidS
Copy link
Author

DavidS commented Jul 5, 2019

@palkan comes to the same conclusion that is in my original analysis. Process.waitpid does some OS-level cleanup that is sometimes necessary for the overall health of the OS. In this case of a single child process, we can be lazy and not do it at all, or defer it until all output is read and processed.

@benoittgt
Copy link
Member

benoittgt commented Jul 5, 2019

Thanks a lot @DavidS for those additionnals information.

What do you think about the proposal of @palkan

Use IO#write_nonblock should work. That would require changes in both send and receive methods.

I tried in utilities.rb

        def send(message)
          packet = Marshal.dump(message)
-         @write_io.write("#{packet.bytesize}\n#{packet}")
+         @write_io.write_nonblock("#{packet.bytesize}\n#{packet}")
        end

        # rubocop:disable Security/MarshalLoad
        def receive
          packet_size = Integer(@read_io.gets)
-         Marshal.load(@read_io.read(packet_size))
+         Marshal.load(@read_io.read_nonblock(packet_size))

but

Traceback (most recent call last):
	18: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/bin/rspec:23:in `<main>'
	17: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/bin/rspec:23:in `load'
	16: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/gems/rspec-core-3.8.1/exe/rspec:4:in `<top (required)>'
	15: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/gems/rspec-core-3.8.1/lib/rspec/core/runner.rb:45:in `invoke'
	14: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/gems/rspec-core-3.8.1/lib/rspec/core/runner.rb:69:in `run'
	13: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/gems/rspec-core-3.8.1/lib/rspec/core/invocations.rb:36:in `call'
	12: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/gems/rspec-core-3.8.1/lib/rspec/core/bisect/coordinator.rb:17:in `bisect_with'
	11: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/gems/rspec-core-3.8.1/lib/rspec/core/bisect/coordinator.rb:27:in `bisect'
	10: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/gems/rspec-core-3.8.1/lib/rspec/core/bisect/coordinator.rb:49:in `start_bisect_runner'
	 9: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/gems/rspec-core-3.8.1/lib/rspec/core/bisect/fork_runner.rb:38:in `start'
	 8: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/gems/rspec-core-3.8.1/lib/rspec/core/bisect/coordinator.rb:31:in `block in bisect'
	 7: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/gems/rspec-core-3.8.1/lib/rspec/core/bisect/example_minimizer.rb:20:in `find_minimal_repro'
	 6: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/gems/rspec-core-3.8.1/lib/rspec/core/bisect/example_minimizer.rb:117:in `prep'
	 5: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/gems/rspec-core-3.8.1/lib/rspec/core/bisect/example_minimizer.rb:155:in `track_duration'
	 4: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/gems/rspec-core-3.8.1/lib/rspec/core/bisect/example_minimizer.rb:118:in `block in prep'
	 3: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/gems/rspec-core-3.8.1/lib/rspec/core/bisect/fork_runner.rb:59:in `original_results'
	 2: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/gems/rspec-core-3.8.1/lib/rspec/core/bisect/fork_runner.rb:71:in `dispatch_run'
	 1: from /Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/gems/rspec-core-3.8.1/lib/rspec/core/bisect/utilities.rb:47:in `receive'
/Users/benoit.tigeot/.rvm/gems/ruby-2.5.1/gems/rspec-core-3.8.1/lib/rspec/core/bisect/utilities.rb:47:in `load': marshal data too short (ArgumentError)

I am ok to remove the waitpid.

@DavidS
Copy link
Author

DavidS commented Jul 5, 2019

Reading through the docs of write_nonblock (https://apidock.com/ruby/IO/write_nonblock) I also think that in the original error case you would just get a Errno::EWOULDBLOCK instead of writing the data out.

@palkan
Copy link

palkan commented Jul 5, 2019

the original error case you would just get a Errno::EWOULDBLOCK

I tried this: it fills the buffer and returns 65536. Not sure when Errno::EWOULDBLOCK could happen.
So, that would be a partial write (and that's why we see marshal data too short).

My initial suggestion that we can leverage write_nonblock was not correct: we need to initiate the read before calling waitpid. That's the main problem.

Probably, the following refactoring could be a bit better than just dropping Process.waitpid:

def dispatch_run(run_descriptor)
- @run_dispatcher.dispatch_specs(run_descriptor)
+ pid = @run_dispatcher.dispatch_specs(run_descriptor)
  @channel.receive.tap do |result|
    # ...
    Process.waitpid(pid)
   end
end

@JonRowe
Copy link
Member

JonRowe commented Jul 8, 2019

@palkan do you feel like working up a patch with something like that?

@palkan
Copy link

palkan commented Jul 8, 2019

@JonRowe Not in the next couple of weeks. I can try to find another evil martian to help with this)

@benoittgt
Copy link
Member

benoittgt commented Jul 10, 2019

It is working with @palkan proposal

@@ -1,9 +1,10 @@
         def dispatch_run(run_descriptor)
-          @run_dispatcher.dispatch_specs(run_descriptor)
+          pid = @run_dispatcher.dispatch_specs(run_descriptor)
           @channel.receive.tap do |result|
             if result.is_a?(String)
               raise BisectFailedError.for_failed_spec_run(result)
             end
+            Process.waitpid(pid)
           end
         end

@@ -23,6 +24,5 @@
           end

           def dispatch_specs(run_descriptor)
-            pid = fork { run_specs(run_descriptor) }
-            Process.waitpid(pid)
+            fork { puts run_specs(run_descriptor) }
           end

But I didn't verified yet what David mentioned.

Process.waitpid does some OS-level cleanup that is sometimes necessary for the overall health of the OS

I dig into MRI source code but at the moment I no clue of what can happens when it not "cleanup". I don't see it.

Also another proposal will be to use WNOHANG. From waitpid doc:

If WNOHANG was specified in options and there were no children in a waitable state, then waitid() returns 0 immediately (...)

--- a/lib/rspec/core/bisect/fork_runner.rb
+++ b/lib/rspec/core/bisect/fork_runner.rb
@@ -92,7 +92,7 @@ module RSpec

           def dispatch_specs(run_descriptor)
             pid = fork { run_specs(run_descriptor) }
-            Process.waitpid(pid)
+            Process.waitpid(pid, Process::WNOHANG)

It is working too.

@benoittgt benoittgt self-assigned this Sep 2, 2019
@benoittgt
Copy link
Member

I recently use my last proposition with the Process::WNOHANG in a project with stuck bisect. It helped us having a result instead of a locked bisect.

Don't you think a PR with that change will be a good idea? Not sure about how we can test this at the moment. I have to think about it.

benoittgt added a commit that referenced this issue Sep 19, 2019
From Palkan in benoittgt/rspec_repro_bisect_deadlock#1

First, I've tried to play with the number of specs which led to the
interesting conclusion: **the process hangs only at 1548+ specs**.

```diff
 RSpec.describe "a bunch of nothing" do
   (0...(ENV.fetch('N', 3000).to_i)).each do |t|
     it { expect(t).to eq t }
   end
 end
```

Try to run with `N=1547` and `N=1548`.

Seems suspicious, right?

Let's add `pry-byebug` to the equation (or Gemfile).

In order it to work we need to tweak our runner code a bit:

```diff
- $stdout = $stderr = @spec_output
+ # $stdout = $stderr = @spec_output
```

After a bit of `puts` debugging I localized the problem:
[`@channel.send`](https://github.com/rspec/rspec-core/blob/7b6b9c3f2e2878213f97d6fc9e9eb23c323cfe1c/lib/rspec/core/bisect/fork_runner.rb#L122).

 `Channel#send` calls `IO#write` here
 https://github.com/rspec/rspec-core/blob/7b6b9c3f2e2878213f97d6fc9e9eb23c323cfe1c/lib/rspec/core/bisect/utilities.rb#L41:

 ```ruby
 def send(message)
   packet = Marshal.dump(message)
   @write_io.write("#{packet.bytesize}\n#{packet}")
 end
 ```

 Do you know, what is the `packet.bytesize` for `N=1548`? It's **65548**.
 This number is very important: the pipe size is only **65536** on MacOS
 (see docs for [`IO#write_nonblock`](https://ruby-doc.org/core-2.6.3/IO.html#method-i-write_nonblock)
 for more).

 That makes `@write_io.write` hangs forever, because no one reads the
 buffer: we call `Channel#receive` only after `Process.waitpid(pid)`,
 thus waiting for the write operation to complete.

 -----------

 A basic proposal is to use WNOHANG. From waitpid doc:

 > If WNOHANG was specified in options and there were no children
 > in a waitable state, then waitid() returns 0 immediately (...)

 Related:
 - #2637
@hayesr
Copy link

hayesr commented Oct 14, 2019

@benoittgt 's Process::WNOHANG solution also worked for me.

benoittgt added a commit that referenced this issue Nov 25, 2019
From Palkan in benoittgt/rspec_repro_bisect_deadlock#1

First, I've tried to play with the number of specs which led to the
interesting conclusion: **the process hangs only at 1548+ specs**.

```diff
 RSpec.describe "a bunch of nothing" do
   (0...(ENV.fetch('N', 3000).to_i)).each do |t|
     it { expect(t).to eq t }
   end
 end
```

Try to run with `N=1547` and `N=1548`.

Seems suspicious, right?

Let's add `pry-byebug` to the equation (or Gemfile).

In order it to work we need to tweak our runner code a bit:

```diff
- $stdout = $stderr = @spec_output
+ # $stdout = $stderr = @spec_output
```

After a bit of `puts` debugging I localized the problem:
[`@channel.send`](/lib/rspec/core/bisect/fork_runner.rb@7b6b9c3#L122).

 `Channel#send` calls `IO#write` here
 /lib/rspec/core/bisect/utilities.rb@7b6b9c3#L41:

 ```ruby
 def send(message)
   packet = Marshal.dump(message)
   @write_io.write("#{packet.bytesize}\n#{packet}")
 end
 ```

 Do you know, what is the `packet.bytesize` for `N=1548`? It's **65548**.
 This number is very important: the pipe size is only **65536** on MacOS
 (see docs for [`IO#write_nonblock`](ruby-doc.org/core-2.6.3/IO.html#method-i-write_nonblock)
 for more).

 That makes `@write_io.write` hangs forever, because no one reads the
 buffer: we call `Channel#receive` only after `Process.waitpid(pid)`,
 thus waiting for the write operation to complete.

 -----------

 A basic proposal is to use WNOHANG. From waitpid doc:

 > If WNOHANG was specified in options and there were no children
 > in a waitable state, then waitid() returns 0 immediately (...)

 To validate this proposal on OSX we run just before running bisect:
 `lsof -n -P -r1 -c ruby | grep -e 'PIP' -e '===' -e 'COMMAND'`

 This will give us in loop the PIPE sizes of Ruby processes. Without our
 patch we see that quickly we hit 65536 bytes on two pipes, with the patch
 we keep pipes at the right size.

```
COMMAND PID    USER     FD   TYPE DEVICE                SIZE/OFF  NODE NAME
ruby    40134  benoit    3   PIPE 0xf3b025a6a6cd6005    16384     ->0xf3b025a6a6cd5045
ruby    40134  benoit    4   PIPE 0xf3b025a6a6cd5045    16384     ->0xf3b025a6a6cd6005
ruby    40134  benoit    5   PIPE 0xf3b025a6a6cd7805    16384     ->0xf3b025a6a6cd7145
ruby    40134  benoit    7   PIPE 0xf3b025a6a6cd7145    16384     ->0xf3b025a6a6cd7805
ruby    40134  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40134  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5
ruby    40144  benoit    3   PIPE 0xf3b025a6a6cd5d05    16384     ->0xf3b025a6a6cd5c45
ruby    40144  benoit    4   PIPE 0xf3b025a6a6cd5c45    16384     ->0xf3b025a6a6cd5d05
ruby    40144  benoit    5   PIPE 0xf3b025a6a6cd7085    16384     ->0xf3b025a6a6cd6785
ruby    40144  benoit    7   PIPE 0xf3b025a6a6cd6785    16384     ->0xf3b025a6a6cd7085
ruby    40144  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40144  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5
```

Improvements:
The bisect command request lot's of ram. The next step should be to
reduce that consumption.

Related:
 - #2637
benoittgt added a commit that referenced this issue Nov 26, 2019
From Palkan in benoittgt/rspec_repro_bisect_deadlock#1

First, I've tried to play with the number of specs which led to the
interesting conclusion: **the process hangs only at 1548+ specs**.

```diff
 RSpec.describe "a bunch of nothing" do
   (0...(ENV.fetch('N', 3000).to_i)).each do |t|
     it { expect(t).to eq t }
   end
 end
```

Try to run with `N=1547` and `N=1548`.

Seems suspicious, right?

Let's add `pry-byebug` to the equation (or Gemfile).

In order it to work we need to tweak our runner code a bit:

```diff
- $stdout = $stderr = @spec_output
+ # $stdout = $stderr = @spec_output
```

After a bit of `puts` debugging I localized the problem:
[`@channel.send`](/lib/rspec/core/bisect/fork_runner.rb@7b6b9c3#L122).

 `Channel#send` calls `IO#write` here
 /lib/rspec/core/bisect/utilities.rb@7b6b9c3#L41:

 ```ruby
 def send(message)
   packet = Marshal.dump(message)
   @write_io.write("#{packet.bytesize}\n#{packet}")
 end
 ```

 Do you know, what is the `packet.bytesize` for `N=1548`? It's **65548**.
 This number is very important: the pipe size is only **65536** on MacOS
 (see docs for [`IO#write_nonblock`](ruby-doc.org/core-2.6.3/IO.html#method-i-write_nonblock)
 for more).

 That makes `@write_io.write` hangs forever, because no one reads the
 buffer: we call `Channel#receive` only after `Process.waitpid(pid)`,
 thus waiting for the write operation to complete.

 -----------

 A basic proposal will be to use WNOHANG. From waitpid doc:

 > If WNOHANG was specified in options and there were no children
 > in a waitable state, then waitid() returns 0 immediately (...)

 To validate this proposal on OSX we run just before running bisect:
 `lsof -n -P -r1 -c ruby | grep -e 'PIP' -e '===' -e 'COMMAND'`

 This will give us in loop the PIPE sizes of Ruby processes. Without our
 patch we see that quickly we hit 65536 bytes on two pipes, with the patch
 we keep pipes at the right size.

```
COMMAND PID    USER     FD   TYPE DEVICE                SIZE/OFF  NODE NAME
ruby    40134  benoit    3   PIPE 0xf3b025a6a6cd6005    16384     ->0xf3b025a6a6cd5045
ruby    40134  benoit    4   PIPE 0xf3b025a6a6cd5045    16384     ->0xf3b025a6a6cd6005
ruby    40134  benoit    5   PIPE 0xf3b025a6a6cd7805    16384     ->0xf3b025a6a6cd7145
ruby    40134  benoit    7   PIPE 0xf3b025a6a6cd7145    16384     ->0xf3b025a6a6cd7805
ruby    40134  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40134  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5
ruby    40144  benoit    3   PIPE 0xf3b025a6a6cd5d05    16384     ->0xf3b025a6a6cd5c45
ruby    40144  benoit    4   PIPE 0xf3b025a6a6cd5c45    16384     ->0xf3b025a6a6cd5d05
ruby    40144  benoit    5   PIPE 0xf3b025a6a6cd7085    16384     ->0xf3b025a6a6cd6785
ruby    40144  benoit    7   PIPE 0xf3b025a6a6cd6785    16384     ->0xf3b025a6a6cd7085
ruby    40144  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40144  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5
```

But if we look properly from the doc we can even go further.

> If status information is immediately available on an appropriate child process,
waitpid() returns this information. Otherwise, waitpid() returns immediately with
an error code indicating that the information was not available. In other words,
WNOHANG checks child processes without causing the caller to be suspended.

and as pirj mention: "With this in mind, do we really need to check that information
that waitpid returns? We don't seem to use it."

Removing "waitpid" produce the same behavior as with `WNOHANG`.

Improvements:
The bisect command request lot's of ram. The next step should be to
reduce that consumption.

Related:
 - fix: #2637
 - PR discussion: #2669
benoittgt added a commit that referenced this issue Nov 26, 2019
From Palkan in benoittgt/rspec_repro_bisect_deadlock#1

First, I've tried to play with the number of specs which led to the
interesting conclusion: **the process hangs only at 1548+ specs**.

```diff
 RSpec.describe "a bunch of nothing" do
   (0...(ENV.fetch('N', 3000).to_i)).each do |t|
     it { expect(t).to eq t }
   end
 end
```

Try to run with `N=1547` and `N=1548`.

Seems suspicious, right?

Let's add `pry-byebug` to the equation (or Gemfile).

In order it to work we need to tweak our runner code a bit:

```diff
- $stdout = $stderr = @spec_output
+ # $stdout = $stderr = @spec_output
```

After a bit of `puts` debugging I localized the problem:
[`@channel.send`](/lib/rspec/core/bisect/fork_runner.rb@7b6b9c3#L122).

 `Channel#send` calls `IO#write` here
 /lib/rspec/core/bisect/utilities.rb@7b6b9c3#L41:

 ```ruby
 def send(message)
   packet = Marshal.dump(message)
   @write_io.write("#{packet.bytesize}\n#{packet}")
 end
 ```

 Do you know, what is the `packet.bytesize` for `N=1548`? It's **65548**.
 This number is very important: the pipe size is only **65536** on MacOS
 (see docs for [`IO#write_nonblock`](ruby-doc.org/core-2.6.3/IO.html#method-i-write_nonblock)
 for more).

 That makes `@write_io.write` hangs forever, because no one reads the
 buffer: we call `Channel#receive` only after `Process.waitpid(pid)`,
 thus waiting for the write operation to complete.

 -----------

 A basic proposal will be to use WNOHANG. From waitpid doc:

 > If WNOHANG was specified in options and there were no children
 > in a waitable state, then waitid() returns 0 immediately (...)

 To validate this proposal on OSX we run just before running bisect:
 `lsof -n -P -r1 -c ruby | grep -e 'PIP' -e '===' -e 'COMMAND'`

 This will give us in loop the PIPE sizes of Ruby processes. Without our
 patch we see that quickly we hit 65536 bytes on two pipes, with the patch
 we keep pipes at the right size.

```
COMMAND PID    USER     FD   TYPE DEVICE                SIZE/OFF  NODE NAME
ruby    40134  benoit    3   PIPE 0xf3b025a6a6cd6005    16384     ->0xf3b025a6a6cd5045
ruby    40134  benoit    4   PIPE 0xf3b025a6a6cd5045    16384     ->0xf3b025a6a6cd6005
ruby    40134  benoit    5   PIPE 0xf3b025a6a6cd7805    16384     ->0xf3b025a6a6cd7145
ruby    40134  benoit    7   PIPE 0xf3b025a6a6cd7145    16384     ->0xf3b025a6a6cd7805
ruby    40134  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40134  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5
ruby    40144  benoit    3   PIPE 0xf3b025a6a6cd5d05    16384     ->0xf3b025a6a6cd5c45
ruby    40144  benoit    4   PIPE 0xf3b025a6a6cd5c45    16384     ->0xf3b025a6a6cd5d05
ruby    40144  benoit    5   PIPE 0xf3b025a6a6cd7085    16384     ->0xf3b025a6a6cd6785
ruby    40144  benoit    7   PIPE 0xf3b025a6a6cd6785    16384     ->0xf3b025a6a6cd7085
ruby    40144  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40144  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5
```

But if we look properly from the doc we can even go further.

> If status information is immediately available on an appropriate child process,
waitpid() returns this information. Otherwise, waitpid() returns immediately with
an error code indicating that the information was not available. In other words,
WNOHANG checks child processes without causing the caller to be suspended.

and as pirj mention: "With this in mind, do we really need to check that information
that waitpid returns? We don't seem to use it."

Removing "waitpid" produce the same behavior as with `WNOHANG`.

Improvements:
The bisect command request lot's of ram. The next step should be to
reduce that consumption.

Related:
 - fix: #2637
 - PR discussion: #2669
benoittgt added a commit that referenced this issue Dec 6, 2019
From Palkan in benoittgt/rspec_repro_bisect_deadlock#1

First, I've tried to play with the number of specs which led to the
interesting conclusion: **the process hangs only at 1548+ specs**.

```diff
 RSpec.describe "a bunch of nothing" do
   (0...(ENV.fetch('N', 3000).to_i)).each do |t|
     it { expect(t).to eq t }
   end
 end
```

Try to run with `N=1547` and `N=1548`.

Seems suspicious, right?

Let's add `pry-byebug` to the equation (or Gemfile).

In order it to work we need to tweak our runner code a bit:

```diff
- $stdout = $stderr = @spec_output
+ # $stdout = $stderr = @spec_output
```

After a bit of `puts` debugging I localized the problem:
[`@channel.send`](/lib/rspec/core/bisect/fork_runner.rb@7b6b9c3#L122).

 `Channel#send` calls `IO#write` here
 /lib/rspec/core/bisect/utilities.rb@7b6b9c3#L41:

 ```ruby
 def send(message)
   packet = Marshal.dump(message)
   @write_io.write("#{packet.bytesize}\n#{packet}")
 end
 ```

 Do you know, what is the `packet.bytesize` for `N=1548`? It's **65548**.
 This number is very important: the pipe size is only **65536** on MacOS
 (see docs for [`IO#write_nonblock`](ruby-doc.org/core-2.6.3/IO.html#method-i-write_nonblock)
 for more).

 That makes `@write_io.write` hangs forever, because no one reads the
 buffer: we call `Channel#receive` only after `Process.waitpid(pid)`,
 thus waiting for the write operation to complete.

 -----------

 A basic proposal will be to use WNOHANG. From waitpid doc:

 > If WNOHANG was specified in options and there were no children
 > in a waitable state, then waitid() returns 0 immediately (...)

 To validate this proposal on OSX we run just before running bisect:
 `lsof -n -P -r1 -c ruby | grep -e 'PIP' -e '===' -e 'COMMAND'`

 This will give us in loop the PIPE sizes of Ruby processes. Without our
 patch we see that quickly we hit 65536 bytes on two pipes, with the patch
 we keep pipes at the right size.

```
COMMAND PID    USER     FD   TYPE DEVICE                SIZE/OFF  NODE NAME
ruby    40134  benoit    3   PIPE 0xf3b025a6a6cd6005    16384     ->0xf3b025a6a6cd5045
ruby    40134  benoit    4   PIPE 0xf3b025a6a6cd5045    16384     ->0xf3b025a6a6cd6005
ruby    40134  benoit    5   PIPE 0xf3b025a6a6cd7805    16384     ->0xf3b025a6a6cd7145
ruby    40134  benoit    7   PIPE 0xf3b025a6a6cd7145    16384     ->0xf3b025a6a6cd7805
ruby    40134  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40134  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5
ruby    40144  benoit    3   PIPE 0xf3b025a6a6cd5d05    16384     ->0xf3b025a6a6cd5c45
ruby    40144  benoit    4   PIPE 0xf3b025a6a6cd5c45    16384     ->0xf3b025a6a6cd5d05
ruby    40144  benoit    5   PIPE 0xf3b025a6a6cd7085    16384     ->0xf3b025a6a6cd6785
ruby    40144  benoit    7   PIPE 0xf3b025a6a6cd6785    16384     ->0xf3b025a6a6cd7085
ruby    40144  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40144  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5
```

But if we look properly from the doc we can even go further.

> If status information is immediately available on an appropriate child process,
waitpid() returns this information. Otherwise, waitpid() returns immediately with
an error code indicating that the information was not available. In other words,
WNOHANG checks child processes without causing the caller to be suspended.

and as pirj mention: "With this in mind, do we really need to check that information
that waitpid returns? We don't seem to use it."

Removing "waitpid" produce the same behavior as with `WNOHANG`.

Improvements:
The bisect command request lot's of ram. The next step should be to
reduce that consumption.

Related:
 - fix: #2637
 - PR discussion: #2669
benoittgt added a commit that referenced this issue Dec 6, 2019
From Palkan in benoittgt/rspec_repro_bisect_deadlock#1

First, I've tried to play with the number of specs which led to the
interesting conclusion: **the process hangs only at 1548+ specs**.

```diff
 RSpec.describe "a bunch of nothing" do
   (0...(ENV.fetch('N', 3000).to_i)).each do |t|
     it { expect(t).to eq t }
   end
 end
```

Try to run with `N=1547` and `N=1548`.

Seems suspicious, right?

Let's add `pry-byebug` to the equation (or Gemfile).

In order it to work we need to tweak our runner code a bit:

```diff
- $stdout = $stderr = @spec_output
+ # $stdout = $stderr = @spec_output
```

After a bit of `puts` debugging I localized the problem:
[`@channel.send`](/lib/rspec/core/bisect/fork_runner.rb@7b6b9c3#L122).

 `Channel#send` calls `IO#write` here
 /lib/rspec/core/bisect/utilities.rb@7b6b9c3#L41:

 ```ruby
 def send(message)
   packet = Marshal.dump(message)
   @write_io.write("#{packet.bytesize}\n#{packet}")
 end
 ```

 Do you know, what is the `packet.bytesize` for `N=1548`? It's **65548**.
 This number is very important: the pipe size is only **65536** on MacOS
 (see docs for [`IO#write_nonblock`](ruby-doc.org/core-2.6.3/IO.html#method-i-write_nonblock)
 for more).

 That makes `@write_io.write` hangs forever, because no one reads the
 buffer: we call `Channel#receive` only after `Process.waitpid(pid)`,
 thus waiting for the write operation to complete.

 -----------

 A basic proposal will be to use WNOHANG. From waitpid doc:

 > If WNOHANG was specified in options and there were no children
 > in a waitable state, then waitid() returns 0 immediately (...)

 To validate this proposal on OSX we run just before running bisect:
 `lsof -n -P -r1 -c ruby | grep -e 'PIP' -e '===' -e 'COMMAND'`

 This will give us in loop the PIPE sizes of Ruby processes. Without our
 patch we see that quickly we hit 65536 bytes on two pipes, with the patch
 we keep pipes at the right size.

```
COMMAND PID    USER     FD   TYPE DEVICE                SIZE/OFF  NODE NAME
ruby    40134  benoit    3   PIPE 0xf3b025a6a6cd6005    16384     ->0xf3b025a6a6cd5045
ruby    40134  benoit    4   PIPE 0xf3b025a6a6cd5045    16384     ->0xf3b025a6a6cd6005
ruby    40134  benoit    5   PIPE 0xf3b025a6a6cd7805    16384     ->0xf3b025a6a6cd7145
ruby    40134  benoit    7   PIPE 0xf3b025a6a6cd7145    16384     ->0xf3b025a6a6cd7805
ruby    40134  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40134  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5
ruby    40144  benoit    3   PIPE 0xf3b025a6a6cd5d05    16384     ->0xf3b025a6a6cd5c45
ruby    40144  benoit    4   PIPE 0xf3b025a6a6cd5c45    16384     ->0xf3b025a6a6cd5d05
ruby    40144  benoit    5   PIPE 0xf3b025a6a6cd7085    16384     ->0xf3b025a6a6cd6785
ruby    40144  benoit    7   PIPE 0xf3b025a6a6cd6785    16384     ->0xf3b025a6a6cd7085
ruby    40144  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40144  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5
```

But if we look properly from the doc we can even go further.

> If status information is immediately available on an appropriate child process,
waitpid() returns this information. Otherwise, waitpid() returns immediately with
an error code indicating that the information was not available. In other words,
WNOHANG checks child processes without causing the caller to be suspended.

and as pirj mention: "With this in mind, do we really need to check that information
that waitpid returns? We don't seem to use it."

Removing "waitpid" produce the same behavior as with `WNOHANG`.

Improvements:
The bisect command request lot's of ram. The next step should be to
reduce that consumption.

Related:
 - fix: #2637
 - PR discussion: #2669
benoittgt added a commit that referenced this issue Dec 14, 2019
From Palkan in benoittgt/rspec_repro_bisect_deadlock#1

First, I've tried to play with the number of specs which led to the
interesting conclusion: **the process hangs only at 1548+ specs**.

```diff
 RSpec.describe "a bunch of nothing" do
   (0...(ENV.fetch('N', 3000).to_i)).each do |t|
     it { expect(t).to eq t }
   end
 end
```

Try to run with `N=1547` and `N=1548`.

Seems suspicious, right?

Let's add `pry-byebug` to the equation (or Gemfile).

In order it to work we need to tweak our runner code a bit:

```diff
- $stdout = $stderr = @spec_output
+ # $stdout = $stderr = @spec_output
```

After a bit of `puts` debugging I localized the problem:
[`@channel.send`](/lib/rspec/core/bisect/fork_runner.rb@7b6b9c3#L122).

 `Channel#send` calls `IO#write` here
 /lib/rspec/core/bisect/utilities.rb@7b6b9c3#L41:

 ```ruby
 def send(message)
   packet = Marshal.dump(message)
   @write_io.write("#{packet.bytesize}\n#{packet}")
 end
 ```

 Do you know, what is the `packet.bytesize` for `N=1548`? It's **65548**.
 This number is very important: the pipe size is only **65536** on MacOS
 (see docs for [`IO#write_nonblock`](ruby-doc.org/core-2.6.3/IO.html#method-i-write_nonblock)
 for more).

 That makes `@write_io.write` hangs forever, because no one reads the
 buffer: we call `Channel#receive` only after `Process.waitpid(pid)`,
 thus waiting for the write operation to complete.

 -----------

 A basic proposal will be to use WNOHANG. From waitpid doc:

 > If WNOHANG was specified in options and there were no children
 > in a waitable state, then waitid() returns 0 immediately (...)

 To validate this proposal on OSX we run just before running bisect:
 `lsof -n -P -r1 -c ruby | grep -e 'PIP' -e '===' -e 'COMMAND'`

 This will give us in loop the PIPE sizes of Ruby processes. Without our
 patch we see that quickly we hit 65536 bytes on two pipes, with the patch
 we keep pipes at the right size.

```
COMMAND PID    USER     FD   TYPE DEVICE                SIZE/OFF  NODE NAME
ruby    40134  benoit    3   PIPE 0xf3b025a6a6cd6005    16384     ->0xf3b025a6a6cd5045
ruby    40134  benoit    4   PIPE 0xf3b025a6a6cd5045    16384     ->0xf3b025a6a6cd6005
ruby    40134  benoit    5   PIPE 0xf3b025a6a6cd7805    16384     ->0xf3b025a6a6cd7145
ruby    40134  benoit    7   PIPE 0xf3b025a6a6cd7145    16384     ->0xf3b025a6a6cd7805
ruby    40134  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40134  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5
ruby    40144  benoit    3   PIPE 0xf3b025a6a6cd5d05    16384     ->0xf3b025a6a6cd5c45
ruby    40144  benoit    4   PIPE 0xf3b025a6a6cd5c45    16384     ->0xf3b025a6a6cd5d05
ruby    40144  benoit    5   PIPE 0xf3b025a6a6cd7085    16384     ->0xf3b025a6a6cd6785
ruby    40144  benoit    7   PIPE 0xf3b025a6a6cd6785    16384     ->0xf3b025a6a6cd7085
ruby    40144  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40144  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5
```

But if we look properly from the doc we can even go further.

> If status information is immediately available on an appropriate child process,
waitpid() returns this information. Otherwise, waitpid() returns immediately with
an error code indicating that the information was not available. In other words,
WNOHANG checks child processes without causing the caller to be suspended.

and as pirj mention: "With this in mind, do we really need to check that information
that waitpid returns? We don't seem to use it."

Removing "waitpid" produce the same behavior as with `WNOHANG`.

Improvements:
The bisect command request lot's of ram. The next step should be to
reduce that consumption.

Related:
 - fix: #2637
 - PR discussion: #2669
benoittgt added a commit that referenced this issue Dec 14, 2019
From Palkan in benoittgt/rspec_repro_bisect_deadlock#1

First, I've tried to play with the number of specs which led to the
interesting conclusion: **the process hangs only at 1548+ specs**.

```diff
 RSpec.describe "a bunch of nothing" do
   (0...(ENV.fetch('N', 3000).to_i)).each do |t|
     it { expect(t).to eq t }
   end
 end
```

Try to run with `N=1547` and `N=1548`.

Seems suspicious, right?

Let's add `pry-byebug` to the equation (or Gemfile).

In order it to work we need to tweak our runner code a bit:

```diff
- $stdout = $stderr = @spec_output
+ # $stdout = $stderr = @spec_output
```

After a bit of `puts` debugging I localized the problem:
[`@channel.send`](/lib/rspec/core/bisect/fork_runner.rb@7b6b9c3#L122).

 `Channel#send` calls `IO#write` here
 /lib/rspec/core/bisect/utilities.rb@7b6b9c3#L41:

 ```ruby
 def send(message)
   packet = Marshal.dump(message)
   @write_io.write("#{packet.bytesize}\n#{packet}")
 end
 ```

 Do you know, what is the `packet.bytesize` for `N=1548`? It's **65548**.
 This number is very important: the pipe size is only **65536** on MacOS
 (see docs for [`IO#write_nonblock`](ruby-doc.org/core-2.6.3/IO.html#method-i-write_nonblock)
 for more).

 That makes `@write_io.write` hangs forever, because no one reads the
 buffer: we call `Channel#receive` only after `Process.waitpid(pid)`,
 thus waiting for the write operation to complete.

 -----------

 A basic proposal will be to use WNOHANG. From waitpid doc:

 > If WNOHANG was specified in options and there were no children
 > in a waitable state, then waitid() returns 0 immediately (...)

 To validate this proposal on OSX we run just before running bisect:
 `lsof -n -P -r1 -c ruby | grep -e 'PIP' -e '===' -e 'COMMAND'`

 This will give us in loop the PIPE sizes of Ruby processes. Without our
 patch we see that quickly we hit 65536 bytes on two pipes, with the patch
 we keep pipes at the right size.

```
COMMAND PID    USER     FD   TYPE DEVICE                SIZE/OFF  NODE NAME
ruby    40134  benoit    3   PIPE 0xf3b025a6a6cd6005    16384     ->0xf3b025a6a6cd5045
ruby    40134  benoit    4   PIPE 0xf3b025a6a6cd5045    16384     ->0xf3b025a6a6cd6005
ruby    40134  benoit    5   PIPE 0xf3b025a6a6cd7805    16384     ->0xf3b025a6a6cd7145
ruby    40134  benoit    7   PIPE 0xf3b025a6a6cd7145    16384     ->0xf3b025a6a6cd7805
ruby    40134  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40134  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5
ruby    40144  benoit    3   PIPE 0xf3b025a6a6cd5d05    16384     ->0xf3b025a6a6cd5c45
ruby    40144  benoit    4   PIPE 0xf3b025a6a6cd5c45    16384     ->0xf3b025a6a6cd5d05
ruby    40144  benoit    5   PIPE 0xf3b025a6a6cd7085    16384     ->0xf3b025a6a6cd6785
ruby    40144  benoit    7   PIPE 0xf3b025a6a6cd6785    16384     ->0xf3b025a6a6cd7085
ruby    40144  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40144  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5
```

But if we look properly from the doc we can even go further.

> If status information is immediately available on an appropriate child process,
waitpid() returns this information. Otherwise, waitpid() returns immediately with
an error code indicating that the information was not available. In other words,
WNOHANG checks child processes without causing the caller to be suspended.

and as pirj mention: "With this in mind, do we really need to check that information
that waitpid returns? We don't seem to use it."

Removing "waitpid" produce the same behavior as with `WNOHANG`.

Improvements:
The bisect command request lot's of ram. The next step should be to
reduce that consumption.

Related:
 - fix: #2637
 - PR discussion: #2669
@agibralter
Copy link
Contributor

I just ran into this—I'm on rspec-core 3.9.2 on OS X and still seem to get the hanging behavior right before it outputs the failing and non-failing examples. Using shell runner works... Does anyone have suggestions on how to debug what is going on?

@benoittgt
Copy link
Member

Hello @agibralter

There is many commands in the PR that may help you to understand what is happening.

If you succeed to find an easy way to reproduce it I would be very happy to patch it.

MatheusRich pushed a commit to MatheusRich/rspec-core that referenced this issue Oct 30, 2020
From Palkan in benoittgt/rspec_repro_bisect_deadlock#1

First, I've tried to play with the number of specs which led to the
interesting conclusion: **the process hangs only at 1548+ specs**.

```diff
 RSpec.describe "a bunch of nothing" do
   (0...(ENV.fetch('N', 3000).to_i)).each do |t|
     it { expect(t).to eq t }
   end
 end
```

Try to run with `N=1547` and `N=1548`.

Seems suspicious, right?

Let's add `pry-byebug` to the equation (or Gemfile).

In order it to work we need to tweak our runner code a bit:

```diff
- $stdout = $stderr = @spec_output
+ # $stdout = $stderr = @spec_output
```

After a bit of `puts` debugging I localized the problem:
[`@channel.send`](/lib/rspec/core/bisect/fork_runner.rb@7b6b9c3#L122).

 `Channel#send` calls `IO#write` here
 /lib/rspec/core/bisect/utilities.rb@7b6b9c3#L41:

 ```ruby
 def send(message)
   packet = Marshal.dump(message)
   @write_io.write("#{packet.bytesize}\n#{packet}")
 end
 ```

 Do you know, what is the `packet.bytesize` for `N=1548`? It's **65548**.
 This number is very important: the pipe size is only **65536** on MacOS
 (see docs for [`IO#write_nonblock`](ruby-doc.org/core-2.6.3/IO.html#method-i-write_nonblock)
 for more).

 That makes `@write_io.write` hangs forever, because no one reads the
 buffer: we call `Channel#receive` only after `Process.waitpid(pid)`,
 thus waiting for the write operation to complete.

 -----------

 A basic proposal will be to use WNOHANG. From waitpid doc:

 > If WNOHANG was specified in options and there were no children
 > in a waitable state, then waitid() returns 0 immediately (...)

 To validate this proposal on OSX we run just before running bisect:
 `lsof -n -P -r1 -c ruby | grep -e 'PIP' -e '===' -e 'COMMAND'`

 This will give us in loop the PIPE sizes of Ruby processes. Without our
 patch we see that quickly we hit 65536 bytes on two pipes, with the patch
 we keep pipes at the right size.

```
COMMAND PID    USER     FD   TYPE DEVICE                SIZE/OFF  NODE NAME
ruby    40134  benoit    3   PIPE 0xf3b025a6a6cd6005    16384     ->0xf3b025a6a6cd5045
ruby    40134  benoit    4   PIPE 0xf3b025a6a6cd5045    16384     ->0xf3b025a6a6cd6005
ruby    40134  benoit    5   PIPE 0xf3b025a6a6cd7805    16384     ->0xf3b025a6a6cd7145
ruby    40134  benoit    7   PIPE 0xf3b025a6a6cd7145    16384     ->0xf3b025a6a6cd7805
ruby    40134  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40134  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5
ruby    40144  benoit    3   PIPE 0xf3b025a6a6cd5d05    16384     ->0xf3b025a6a6cd5c45
ruby    40144  benoit    4   PIPE 0xf3b025a6a6cd5c45    16384     ->0xf3b025a6a6cd5d05
ruby    40144  benoit    5   PIPE 0xf3b025a6a6cd7085    16384     ->0xf3b025a6a6cd6785
ruby    40144  benoit    7   PIPE 0xf3b025a6a6cd6785    16384     ->0xf3b025a6a6cd7085
ruby    40144  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40144  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5
```

But if we look properly from the doc we can even go further.

> If status information is immediately available on an appropriate child process,
waitpid() returns this information. Otherwise, waitpid() returns immediately with
an error code indicating that the information was not available. In other words,
WNOHANG checks child processes without causing the caller to be suspended.

and as pirj mention: "With this in mind, do we really need to check that information
that waitpid returns? We don't seem to use it."

Removing "waitpid" produce the same behavior as with `WNOHANG`.

Improvements:
The bisect command request lot's of ram. The next step should be to
reduce that consumption.

Related:
 - fix: rspec#2637
 - PR discussion: rspec#2669
@mikejarema
Copy link
Contributor

mikejarema commented Jan 21, 2021

I've contributed a PR which addresses a specific situation in which rspec --bisect hangs.

Specifically when encoding is set to UTF-8 (eg. by Rails) and the forked child process tries to send back BINARY/ASCII-8BIT encoded data to the parent, but the communication channel (Bisect::Channel) rejects it silently due to the encoding mismatch.

The parent process, whose output you're seeing, waits indefinitely for the child process which has since errored out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants