Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EventedFileUpdateChecker boots once per process #25302

Merged

Conversation

schneems
Copy link
Member

@schneems schneems commented Jun 6, 2016

We need one file checker booted per process as talked about in #24990. Before we do a check to see if any updates have been registered by the listener we first check to make sure that the current process has booted a listener.

We are intentionally not starting a listener when the checker is created. This way we can avoid #25259 in which puma warns of multiple threads created before fork. As written the listener for each process will be invoked by the ActionDispatch::Executor middleware when the updated? method is called. This is the first middleware on the stack and will be invoked before application code is read into memory.

The downside of this approach is that the API is a little less obvious. I.e. that you have to call updated? to get the listener to start is not intuitive. We could make boot! not private if we want to make the API a little nicer. Alternatively we could boot when the checker is initialized however this reintroduces the puma threads warning, and also means that in cases of rails server or when using preload! that we have extra threads notifying of changes on a process that we don't care about.

[close #24990] [close #25259]

We need one file checker booted per process as talked about in rails#24990. Before we do a check to see if any updates have been registered by the listener we first check to make sure that the current process has booted a listener.

We are intentionally not starting a listener when the checker is created. This way we can avoid rails#25259 in which puma warns of multiple threads created before fork. As written the listener for each process will be invoked by the `ActionDispatch::Executor` middleware when the `updated?` method is called. This is the first middleware on the stack and will be invoked before application code is read into memory.

The downside of this approach is that the API is a little less obvious. I.e. that you have to call `updated?` to get the listener to start is not intuitive. We could make `boot!` not private if we want to make the API a little nicer. Alternatively we could boot when the checker is initialized however this reintroduces the puma threads warning, and also means that in cases of `rails server` or when using `preload!` that we have extra threads notifying of changes on a process that we don't care about.

[close rails#24990] [close rails#25259]
@schneems
Copy link
Member Author

schneems commented Jun 6, 2016

r? @matthewd

@schneems schneems changed the title Schneems/evented file boot at check time master EventedFileUpdateChecker boots once per process Jun 6, 2016
@schneems schneems added this to the 5.0.0 milestone Jun 6, 2016
Some files like routes.rb may be very large and vary between the initialization of the app and the first request. In these scenarios if we are using a forked process we cannot rely on the files to be unchanged between when the code is booted and the listener is started. 

For that reason we start a listener on the main process immediately, when we detect that a process does not have a listener started we force the updated state to be true, so we are guaranteed to catch any changes made between the code initialization and the fork.
@schneems
Copy link
Member Author

schneems commented Jun 6, 2016

Updated with logic to ensure we're never in a situation where a process has stale data.

end

def updated?
boot! unless @pid_hash[Process.pid]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be atomic? e.g.

pid = Process.pid
@pid_hash.compute_if_absent(pid) { boot! }
@pid_hash[pid] = true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice it's fine if we accidentally boot more than one listener per process, we want to make sure we boot at least once though. Concurrent::Hash doesn't have a compute_if_absent locking method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrent::Hash doesn't have a compute_if_absent locking method

The method is from Concurrent::Map (Conc::Map is a Hash-like data structure with a few concurrency methods thrown in, Concurrecy::Hash is just a synchronized carbon copy of Ruby Hash).

Do we even need hash for this? Can't we just have @listen_started_on_pid and check Process.pid == @listen_started_on_pid, with @listen_started_on_pid = Process.pid in boot!?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still use an atomic even if it's not a hash. While it's technically fine to have more than one listener, it's unnecessary and easy to avoid.

@sgrif
Copy link
Contributor

sgrif commented Jun 7, 2016

I definitely think we need the API to be more explicit here. The fact that updated? is what causes the checker to boot makes the code path that causes this to work to become really nonlocal. (I was actually convinced that this didn't fix the issue at first, took about 20 minutes to see why it does).

It'd be good to add an integration test for this to the railties test suite as well.

@schneems
Copy link
Member Author

schneems commented Jun 8, 2016

We need some hook that we know will be called in each process, in this case I know updated? will be executed in the middleware. If we had some kind of on_fork callback this logic would all be much simpler, but nothing like that exists and I don't feel like monkey patching fork is a great idea.

It'd be good to add an integration test for this to the railties test suite as well.

I can look into that.

Pretty proud of this. We are testing distributed processes synchronized via pipes which makes it deterministic. Pretty cool.

We boot a listener in the parent process we then fork. Before we touch the file we verify the fork is booted using pipes. Then the parent process will touch the file while the fork waits on a pipe. Once the parent process signals that the file has been touched we continue inside of the fork.
@schneems
Copy link
Member Author

schneems commented Jun 8, 2016

I added a test, i'm pretty pleased with the result. Using pipes to synchronize the two processes.

@schneems
Copy link
Member Author

schneems commented Jun 9, 2016

Updated to have a threadsafe boot! process, also using a instance variable instead of a Concurrent::Hash

@schneems schneems force-pushed the schneems/evented-file-boot-at-check-time-master branch from 60c11b5 to 844af9f Compare June 13, 2016 14:30
schneems added a commit to schneems/rails that referenced this pull request Jun 16, 2016
With rails#25302 we have to assume that the files on disk have changed in a fork. This means that the first call to `updated? must always be true. Instead we can use 3 different IO.pipe to synchronize communications between between any number of forked processes.

We use 1 pipe for the primary communication, another pipe is used by children to notify the parent that a new fork has booted. Once a child is detected as being booted a value is written to the primary pipe either 'T' if the files have changed or 'F' if they haven't. Once the value is read the same pipe responds with on the final pipe with an ACK. This lets the parent know that 1 of the processes has received a message and it can begin waiting for the next process to boot.

There exists a scenario where a race condition is possible. Parent PID boots, and there's no changes on disk. Child process boots and asks the parent if there are any changes. The parent PID may send a "F" to the child but between the time that the "F" is sent and it is read the file may change. We can eliminate this race condition if we boot the listener on the child process before we tell the parent that the child has booted. That's exactly what this PR does, is hooks into the `before_listen` and uses that to boot a worker before we even tell the parent process we've booted.

As is this is a WIP and more of a proof of concept. I think it's an interesting solution that may possibly be viable. It seems more technically complete than rails#25302 however that completeness comes at additional complexity. cc @e2
@sgrif
Copy link
Contributor

sgrif commented Jun 17, 2016

Going to go forward with this one. I think my previous concerns were rooted in thinking that the changed callback did more than it actually does. I think we should eventually consider an explicit Rails.application.forked! call which needs to occur if you're using a forking web server, but that is out of scope for 5.0.0 at this point.

@sgrif sgrif merged commit 30dd8b2 into rails:master Jun 17, 2016
sgrif added a commit that referenced this pull request Jun 17, 2016
…-check-time-master

EventedFileUpdateChecker boots once per process
sgrif added a commit that referenced this pull request Jun 17, 2016
…-check-time-master

EventedFileUpdateChecker boots once per process
@schneems
Copy link
Member Author

Yay! Thanks! Just saw this notification in my email.

My only concern is that it's kinda weird that update? returns true on the first call. It makes sense for our application of the checker but it's an odd implementation artifact that maybe shouldn't live in this class.

As an alternative we could implement a EventedFileChecker#forked? that takes a block and use that in the middleware. Something like:

evented_file_checker.forked? {|checker|
  checker.updated.make_true
  checker.boot!
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants