Add support for new thread scheduler of Ruby-3.0 #799

larskanis · 2020-07-08T14:19:57Z

The scheduler feature is described in: https://bugs.ruby-lang.org/issues/16786

To avoid blocking the current ruby thread while calls to C, calls can be executed in a dedicated pthread. This happens when the current thread has a scheduler assigned by Thread.current.scheduler= and the function is marked as blocking: true .

A pipe is used to signal the end of a call and the scheduler is invoked in order to wait for readability of the pipe. This way the scheduler can yield to another fiber or do other work instead of blocking the thread until the C call finishes.

The current implementation does not yield any callbacks back to the calling thread. Instead all callbacks invoked in this way are handled as asynchronous callbacks. This means that each callback is executed in a dedicated ruby thread.

cc: @ioquatix

The feature is described in: https://bugs.ruby-lang.org/issues/16786 To avoid blocking the current ruby thread while calls to C, calls can be executed in a dedicated pthread. This happens when the current thread has a scheduler assigned by Thread.current.scheduler= . A pipe is used to signal the end of a call and the scheduler is invoked in order to wait for readability of the pipe. This way the scheduler can yield to another fiber or do other work instead of blocking the thread until the C call finishs. The current implementation does not yield any callbacks back to the calling thread. Instead all callbacks invoked in this way are handled as asynchronous callbacks. This means that each callback is executed in a dedicated ruby thread.

larskanis · 2020-07-08T14:24:27Z

The feature can be tested by something like this:

require "ffi"

class Scheduler
  def for_fd(fd)
    ::IO.for_fd(fd, autoclose: false)
  end
  def wait_readable_fd(fd)
    wait_readable(for_fd(fd))
  end

  def wait_readable(io)
    p wait_readable_start: io
    IO.select([io])
    p wait_readable_end: io
  end

  def enter_blocking_region
    puts "Enter blocking region: #{caller.first}"
  end

  def exit_blocking_region
    puts "Exit blocking region: #{caller.first}"
  end

  def fiber(&block)
    fiber = Fiber.new(blocking: false, &block)
    fiber.resume
    return fiber
  end
end

Thread.current.scheduler = Scheduler.new

module Native
  extend FFI::Library
  ffi_lib :c

  attach_function :sleep, [:uint], :uint, blocking: true

  callback :qsort_cmp, [ :pointer, :pointer ], :int
  attach_function :qsort, [ :pointer, :int, :int, :qsort_cmp ], :int, blocking: true
end

Fiber do
  p native_sleep: :start
  r = Native.sleep 1
  p native_sleep: r
end

Fiber do
  arr = [2, 1, 3]
  pa = FFI::MemoryPointer.new(:int, arr.size)
  pa.write_array_of_int32(arr)
  Native.qsort(pa, arr.size, FFI.find_type(:int).size) do |p1, p2|
    p Thread.current
    p1.read_int <=> p2.read_int
  end
  p pa.read_array_of_int32(arr.size)
end

It prints:

{:native_sleep=>:start}
{:wait_readable_start=>#<IO:fd 5>}
{:wait_readable_end=>#<IO:fd 5>}
{:native_sleep=>0}

{:wait_readable_start=>#<IO:fd 5>}
#<Thread:0x00005583304cf938 run>
#<Thread:0x00005583304cf5c8 run>
#<Thread:0x00005583304cf460 run>
{:wait_readable_end=>#<IO:fd 5>}
[1, 2, 3]

ioquatix · 2020-07-08T15:27:28Z

This is very cool.

larskanis · 2020-07-08T17:05:55Z

Currently callback blocks are executed in a dedicated ruby thread if they use this PR's feature. I don't think this is desired for fiber based event loops. So, I think it makes sense to pass callbacks back to the same thread, that made the C call which called the callback pointer.

So given this script:

p Thread.current
Native.qsort(pa, arr.size, FFI.find_type(:int).size) do |p1, p2|
  p Thread.current
  p1.read_int <=> p2.read_int
end

It should use only one thread and the output should be kind of:

#<Thread:0x00005583304cf938 run>
{:wait_readable_start=>#<IO:fd 5>}
#<Thread:0x00005583304cf938 run>
#<Thread:0x00005583304cf938 run>
#<Thread:0x00005583304cf938 run>
{:wait_readable_end=>#<IO:fd 5>}

This could be archived by using the call frame that ruby-ffi manages for each thread and each call into C. This way we can track back from the callback to the causing ruby thread and invoke the callback block by passing this information through the same pipe the causing ruby thread is waiting for. Then the scheduler is notified about the pending callback and resumes the related fiber which then executes the callback block (instead of returning from the C call). A very similar mechanism is also used in Eventbox.

If there's no call frame or it doesn't have a pipe to signal this callback, it would be executed in a dedicated thread as currently. In this case the C library invoked the callback from a non-ruby thread or a ruby thread without scheduler and we don't have a chance to deliver it to a related thread/fiber.

@ioquatix What do you think about routing callbacks back to the causing C call? Or is that useless?

ioquatix · 2020-07-09T02:26:29Z

I think it's better it runs in the same thread.

There has been some discussion about how to send events from different threads into a scheduler. We don't have a firm plan yet but at least it's being considered.

Having some use cases like this can help immensely with firming up a specific interface and implementation, so when I circle back to the scheduler interface (hopefully before the end of this month) I'll try consider how this should work.

Regarding your specific implementation, I feel very strongly that you have a good opinion about how this should work, so I welcome your feedback and direction w.r.t. this functionality.

eregon · 2020-07-16T09:06:15Z

Looks like a nice prototype, I have a few questions:

How to handle if the native calls depends on the specific thread it's called on (e.g., it's using pthread_key_t)?
How to guarantee multiples calls from the same Ruby Thread will execute on the same pthread for FFI, or how to avoid races between calls to native code which expects FFI calls from a Ruby Thread are sequentialized (i.e., see all effects of the previous FFI calls from that Ruby Thread without data races)?
How to handle callbacks? They can't run on a raw pthread. And yet they should also have no data races from the FFI call invoking the callback. If executed on the original Ruby Thread then we just need to "publish" changes so far from the pthread to the Ruby Thread before entering the callback.
Spawning a pthread for every native call (as in this prototype) is very expensive, we should reuse the pthread, I think per Ruby Thread to guarantee the points above. Would be interesting to compare performance of FFI calls with and without a scheduler.

larskanis · 2020-07-16T13:22:13Z

How to handle if the native calls depends on the specific thread it's called on (e.g., it's using pthread_key_t)?

It's not possible to call C functions in a non-blocking fashion from a fiber/scheduler based thread. These are conflicting computation models. So there are two alternative options:

Call the C function per blocking: false. This obviously blocks all fibers managed by the thread scheduler.
Use a dedicated ruby thread without scheduler which allows you to explicit control which C functions are executed within which thread. This could be kind of worker thread which gets it's parameters per Queue, calls the function with blocking: true (to release the GVL) and passes return values back through another Queue to the fiber/scheduler based thread. Some discussions about connecting Queue, Mutex, etc. to the new scheduler are here: https://bugs.ruby-lang.org/issues/16792
Maybe this kind of feature could be added as additional class to FFI, but plain attach_function calls shouldn't implement it.

How to guarantee multiples calls from the same Ruby Thread will execute on the same pthread for FFI, or how to avoid races between calls to native code which expects FFI calls from a Ruby Thread are sequentialized (i.e., see all effects of the previous FFI calls from that Ruby Thread without data races)?

I don't think it's the task of FFI to avoid data races in the C library or sequentialize any calls to C functions. Different libraries have different requirements and FFI should be flexible enough to allow all of them. If the ruby library calls C functions concurrently, it's the task of the developer to verify that the C library allows this. So there is no such guarantee and I don't think we should enforce it.

How to handle callbacks? They can't run on a raw pthread. And yet they should also have no data races from the FFI call invoking the callback. If executed on the original Ruby Thread then we just need to "publish" changes so far from the pthread to the Ruby Thread before entering the callback.

All callbacks invoked by the mechanism of this PR are handled as asynchronous callbacks, because they are identified as a pthread (and no ruby thread). That means that each callback is executed in a dedicated ruby thread. To avoid data races on the ruby side I posted my ideas in the above comment.

Spawning a pthread for every native call (as in this prototype) is very expensive, we should reuse the pthread, I think per Ruby Thread to guarantee the points above. Would be interesting to compare performance of FFI calls with and without a scheduler.

I experimented with thread pooling in Eventbox. My result was that, due to the management overhead, a thread pool is not significant faster than dedicated threads. But on the down side, it can lead to hard to reproduce deadlocks, if the thread pool is limited in size. For now I would like to keep it at one pthread per C call. A thread pool is something we could implement and benchmark in the future.

eregon · 2020-07-21T20:17:53Z

Re 1., Right, blocking: true is a way to mark such calls. I meant we should reuse the same "blocking native calls" thread so that thread-local state via e.g. pthread_key_t doesn't "disapper" between two calls from the same Ruby Thread. As an example calling pthread_mutex_lock and pthread_mutex_unlock from the same Fiber via FFI only works if executed on the same native thread.
Re 2., I meant a single Ruby Thread with multiple Fibers should probably not schedule native calls concurrently, or should it? If it's one "blocking native calls" thread then it's not an issue.
Re 3., executing callbacks on the same Ruby thread seems best, especially on implementations without GVL. That way e.g. Fiber-local state is preserved as before.
Re 4. I think we should benchmark it. IIRC the cost of spawning a thread is fairly high.

BTW, what's the effect of blocking: true in FFI? Just releasing the GVL? Or also changing the Thread#status?

eregon · 2021-10-08T14:57:01Z

Reading about this again, I think it would be safer to have a new option (not just `blocking: true) to opt-in for blocking calls to execute on a separate thead.
Otherwise some calls will likely break due to depending on pthread-local state, avoiding races, etc.

larskanis force-pushed the master branch from c6c4016 to d9fe6c7 Compare July 10, 2020 16:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for new thread scheduler of Ruby-3.0 #799

Add support for new thread scheduler of Ruby-3.0 #799

larskanis commented Jul 8, 2020 •

edited

larskanis commented Jul 8, 2020

ioquatix commented Jul 8, 2020

larskanis commented Jul 8, 2020 •

edited

ioquatix commented Jul 9, 2020

eregon commented Jul 16, 2020

larskanis commented Jul 16, 2020 •

edited

eregon commented Jul 21, 2020

eregon commented Oct 8, 2021

Add support for new thread scheduler of Ruby-3.0 #799

Are you sure you want to change the base?

Add support for new thread scheduler of Ruby-3.0 #799

Conversation

larskanis commented Jul 8, 2020 • edited

larskanis commented Jul 8, 2020

ioquatix commented Jul 8, 2020

larskanis commented Jul 8, 2020 • edited

ioquatix commented Jul 9, 2020

eregon commented Jul 16, 2020

larskanis commented Jul 16, 2020 • edited

eregon commented Jul 21, 2020

eregon commented Oct 8, 2021

larskanis commented Jul 8, 2020 •

edited

larskanis commented Jul 8, 2020 •

edited

larskanis commented Jul 16, 2020 •

edited