Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop checking for Process.pid and trust the fork decorators #41850

Closed
wants to merge 1 commit into from

Conversation

casperisfine
Copy link
Contributor

Process.pid is not cached on Linux and is surprisingly expensive. These checks are in hotspots, so the cost is not negligible.

In #37296 (comment) we established that it's pretty much impossible to bypass Process.fork / Kernel#fork.

We still added the checks to protect against user defined after_fork hooks that would use the connection. But I don't think the cost is worth this extra safety.

At the same time I'll try to push for Ruby to expose a native after_fork callback, but of course even if it works it might take a while before Rails can use it.

@matthewd What do you think? Do you still feel strongly about this extra check?

@matthewd
Copy link
Member

matthewd commented Apr 6, 2021

(I assume you're proposing removing all the check! calls introduced in the original PR, not just this one, which is definitely not a hotspot.)

I guess I don't feel super strongly about it, but given the failure mode is a possible segfault, I am still a bit cautious -- the pre-after_fork window could be arbitrarily large if someone else has decorated fork, for example.

I feel like I've had this thought before somewhere, but it apparently wasn't on the previous PR: instead of checking the pid, we could use a sacrificial no-op thread: if Thread.new { some_lock.synchronize { nil } } is dead, we've forked. In theory we could do that inside the fork tracker [where it can safely be a singleton, so we're not burning lots of threads], or we could do it inside the connection pool [which already has a convenient always-active thread in the form of the reaper].

Or, yeah, we just remove it, and trust it's probably fine. I am still feeling nervous that it'll eventually bite someone -- probably on a more medium-sized app: enough traffic to give it a lot of chances to fail, but still a low enough amount that they can afford a fork in the middle of request processing. (I considered suggesting you monkey-patch check! to alert you if it hits The Bad Case, and see how it fares for a little while in your app.... but in practice I think you're probably safe because you've eliminated any potentially-risky operations just because they'd be slow.)

Alternatively: add a config option, so that people with very-high-traffic sites can avoid the expensive-at-scale check -- and thereby opt in to a nastier debugging process if things do go wrong -- while leaving most people to spend a few extra syscalls to be definitively safe.

I really don't know which way I think we should go on this, sorry. 😕

All three of the above get the Process.pid call out of your request handlers, so it's perhaps a bit of an academic question, but I'm still curious: how surprisingly expensive are we talking here.. does it show up on a whole-request benchmark?

`Process.pid` is not cached on Linux and is surprisingly
expensive. These checks are in hotspots, the the cost is not
negligible.

In rails#37296 (comment)
we established that it's pretty much impossible to bypass
`Process.fork` / `Kernel#fork`.

We still added the checks to protect against user defined
`after_fork` hooks that would use the connection. But
I don't think the cost is worth this extra safety.
@casperisfine
Copy link
Contributor Author

I assume you're proposing removing all the check! calls introduced in the original PR, not just this one, which is definitely not a hotspot.

Absolutely, sometimes my editor plays tricks on me... It's corrected now. It's mostly the one in pool I'd like to get rid of.

I feel like I've had this thought before somewhere, but it apparently wasn't on the previous PR

You might have missed it because GitHub paginated the comments: #37296 (comment)

instead of checking the pid, we could use a sacrificial no-op thread: if Thread.new { some_lock.synchronize { nil } } is dead, we've forked.

Hum, I'll have to bench, but not sure if Thread#alive? is that much better than Process.pid. Ideally I'd like us to assume the fork decorator wasn't bypassed. If it was then the users must have done something really involved, and should be capable and responsible to debug it.

but in practice I think you're probably safe because you've eliminated any potentially-risky operations just because they'd be slow.

Yes, we never fork in the middle of a request. We fork after preloading the app, and that's it.

I'm still curious: how surprisingly expensive are we talking here.. does it show up on a whole-request benchmark?

So of course it varies from request to request based on how many connection checkouts you do. In most case it hover between 3 and 5ms spent in ForkTracker.check! for us, out of 20-30ms spent in ConnectionHandlers#retrieve_connection.

Part of the problem might be that our app likely goes through connection checkout a bit more than a regular one because of our sharding, as well our caching layer that need to check for open transactions.

But in general Active Record calls Base.connection quite a lot. I just checked in a "run-of-the-mill" Rails 6.1 app, I've put a print in ConnectionHandler#pool, a simple Shipit::Stack.first call check! 3 times!, so it's not just us.

There's probably some refactoring to do to do less connection checkouts, but I think making it a bit faster would be good.

@rails-bot
Copy link

rails-bot bot commented Jul 5, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

@rails-bot rails-bot bot added the stale label Jul 5, 2021
@zzak
Copy link
Member

zzak commented Jul 8, 2021

@casperisfine Is this still something you want to work on? Can you identify any blockers?

@rails-bot rails-bot bot removed the stale label Jul 8, 2021
@casperisfine
Copy link
Contributor Author

It's not as important as previously thought. Turns out Ruby causes sampling profilers to have a bias toward methods implemented in C (some context here: tmm1/stackprof#150).

But Process.pid still definitely cause a syscall on every call on mordern Linux, so I still think it would be preferable to avoid this. But I failed to get some traction about this here as well as upstream (https://bugs.ruby-lang.org/issues/17795), so I don't really see what more I could do.

@rails-bot
Copy link

rails-bot bot commented Oct 6, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

@rails-bot rails-bot bot added the stale label Oct 6, 2021
@rails-bot rails-bot bot closed this Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants