New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce support for fiber-per-request. #3101
base: master
Are you sure you want to change the base?
Conversation
Very interesting! I assume we don't need a Ruby version gate here - people just won't use
|
@nateberkopec There are two parts to this PR. One might be considered a feature and one might be considered a bug. Between requests, fiber locals are not reset, so even without supporting The other part of this is correctly supporting |
@nateberkopec I've just added tests, I think you will find them interesting to review because they show the current behaviour vs what I'm proposing (just comment out the first commit to see the behavioural differences). Also 100% we should test for performance regressions. Is there documentation you can point me at for the benchmark test suite? |
Right, thanks for calling that out - since
It's not good or bad, it's just Behavior, Which That People Inevitably Depend On 😆
Not really, consider this just an action item on the maintainer's part. |
We also failures on main, yikes. |
I think leaking state between requests is probably a bad thing. Don’t disagree that people might be depending on it but sounds flaky at best if so (order dependent and thread dependent). |
I might need to fix the exception handling when I’m back home. |
f8fc513
to
e980d4f
Compare
I observed Windows segfaulting: https://github.com/puma/puma/actions/runs/4445249381/jobs/7804200284 |
@ioquatix Thanks for working on this. I ran these commits on top of revisions to the test system, Windows 2.5 failed, and all the JRuby jobs. Everything else was ok. Some performance data using WSL2/Ubuntu 22.04 is below. It does seem to slow things down...
Above was run with |
I will check the performance too, that seems a little too costly to me. Maybe we can improve on it, or maybe there is something else going on. In any case, there are a couple of options to consider:
|
Interested in helping with that. Right now, the Fiber is wrapping nothing more than Or, we've got three main 'processes': A. Receive and process the request Current code is just wrapping B. I suspect that should be expanded? |
Feel free to modify this PR as you see fit. I will try to look at it a bit later tonight or tomorrow. |
Regarding your question, I'm only concerned about the app code being exposed to prior state. Well, we can wrap more, but I don't think there is a huge advantage... oh but streaming responses probably should be in the same context so... yeah we might want to expand it for that reason and add some more tests. |
Re |
Ah, nice example. I guess ideally with the newer "fiber per request" style interface, we shouldn't expect users to do this (one of the authors of request state was actually very excited by this). The semantics at the language level should be clear. |
Hmm... when you put it this way, it reminds me of the old "thread_safe" config setting in Rails. Aaron eventually removed it because "who would ever say, "give me the non thread safe version please!" 12% performance hit is rough, but we're still doing a response in far less than a tenth of a millisecond. I think the trade could be worth it. |
With a little bit of finangling, we might be able to introduce a more streamlined interface. For example, if you only care about Fiber storage, it possible to assign to it directly. Let me do a bit more analysis on the performance cost. It seems 1/10th of a ms is quite a bit, more than I expected. |
I just noticed there is some provision for |
I wouldn't call this particularly comprehensive (I was seeing a margin of error of like +-5%) but here are the results from my laptop: Normal
Fiber
|
(also as an aside, the benchmark by default uses too many wrk threads/connections and will over-saturate Puma giving less than robust results, since a whole ton of |
|
The Linux results are also a bit weird. Normal
Fiber
|
I tried it several times. It looks like on Linux, in some situations, this can be a performance advantage. But I don't know why. Maybe it's a bug in the benchmark. |
@ioquatix Thanks for looking into a bunch of things. I've been putting off getting a native Ubuntu host system (time, money, etc), so I use WSL2/Ubuntu, which runs on Windows. It is somewhat inconsistent, and often slower than a native system. So, I tried to get the benchmark default to hit Puma hard, but not 'hammered', or something like that. If you find different default would work better, we change them. |
@MSP-Greg do you accept github sponsors? |
While I think - if this is a Good Idea and It Works - we should just make the default, it is a breaking change and likely to Break Something Some Where. I wonder if, in the meantime, before Puma 7, we could ship this is a rack middleware to give people time to try it and see what happens? The middleware would just a be a temporary thing intended to test this out, not as a permanent fixture. |
Here are my thoughts in no particular order. The performance overhead will be more complex on TruffleRuby and JRuby until they adopt Loom/Virtual Threads. It's possible in the single process/thread case that users are always expecting the same thread locals to work as a kind of global state. Since connection keep-alive is bound to a worker, there is a kind of natural affinity that comes with every request being processed by the same thread. I could imagine some code depending on that (it's going to be janky but 🤷🏼). Additionally, if they are using thread-local caches, performance/locality might be affected. This feature can't quite be implemented as a middleware as it doesn't correctly cover streaming responses. However, there is NO specification that states that a streaming response should be handled on the same fiber as the request, but I imagine some poorly designed code that tries to take a lock during I believe that the performance overhead can and should be reduced to almost zero but it will require more investigation on my part. I will definitely consider this use case closely but I can't make any guarantees before Ruby 3.3. It's possible I'll have time and motivation to investigate it. An overhead of 2-3% should be attainable or better IMHO. Maybe better to start off as a feature of puma that is disabled by default in v6. It could be controlled by the existing option: |
78ec81c
to
e7b2ac4
Compare
It might be good to address #3360 before merging this. |
e7b2ac4
to
6cc6b48
Compare
@MSP-Greg do you have any idea why https://github.com/puma/puma/actions/runs/8539240372/job/23393556676?pr=3101 is failing? |
Still need a bit more coffee. I'll test it locally soon (today). I've got to switch out OpenSSL 3.2 in my MSYS2... BTW, thanks for your work on this. |
Looked at the Windows 2.5 issue, and I could verify locally. Ruby 2.6 works fine, didn't check 2.4. I thought I'd try to write some code without Puma to see if I can get a better idea what's happening. It's just stopping, with no indication why... |
If you add print statements to every line, can we see how far it executed the test? |
@MSP-Greg do you have some time now to show me? |
We can 'whereby' if you'd like. Sorry for the delay... |
@MSP-Greg I have some time now if that suits you. |
Knocking |
We came to the conclusion that Fibers on Windows, Ruby 2.5, do not work correctly. |
This enables the correct scope of `Fiber.storage` per request. - Unify `clean_thread_locals` and `fiber_per_request` configuration options.
c728fef
to
8594344
Compare
I've rebased this PR, so that it includes the fixes for This PR introduces a new configuration option While I have not measured it recently, CRuby does cache state relating to thread allocations and fiber allocations, so the overhead of doing this in Ruby/Puma may not make sense. In other words, a thread pool (reusing threads) constructed in Ruby-land may not be an advantage over just writing Regarding the naming, I am okay with it, but don't have a strong opinion. It seems to be to clearly represent the behaviour, and that seems good enough to me. There is one environment variable I was using for ease of testing, but I'm not sure if we want to keep that in the default configuration. In terms of performance, I think the impact will depend on what you are doing, but it's minimal (somewhere between "no impact" and "< 10% of a no-op web request"). In addition, the feature is opt-in. Whether it remains that way in the future, I'm convinced that there are potentially a lot of bugs relating to this - i.e. sharing state between requests. We have known use cases for this at my company, for example, and so we desire to enable this feature for additional safety. @MSP-Greg any thoughts on what is required to get this merged? |
I'm AFK for the rest of the evening. I'll have look tomorrow. Thanks. |
Approved. Maybe a day or two for others to comment.
Interesting. Wonder how JRuby and TruffleRuby would be affected... |
I don't want to make assertions without looking at the code, but my general feeling is, a lot of these platforms are internally caching the allocations where possible. It's not always possible or semantically correct, so sometimes things don't get reused, but I can attest to the fact that fibers are heavily reused and in a server like Puma, the implementation above will only allocate one "internal" fiber per thread and they will be reused over and over again - the As a consequence, I think in languages like Ruby, we should focus on correctness and simplicity, and push performance issues further down the stack. Of course, Puma has a well defined execution model: workers/threads/etc and that's a part of the contract, but using thread pools and keeping a list of threads, it would be much better if Puma could just do
|
Thanks for the info. Given that Puma has Aside from the things you mentioned, using it might make it easier to deal with a timeout 'waiting for an app response'. Not.sure. |
I've changed |
3851710
to
e25c964
Compare
This enables the correct scope of
Fiber.storage
per request.Fiber[]
andFiber[]=
are recently introduced features in Ruby 3.2. They are designed to provide per-operation or per-request state handling.Using a thread pool,
Fiber.storage
is retained beyond a single request, which can leak information from one request into another. The simplest way to avoid this is to wrap each request itself in a fiber. The overhead should be minimal as these fibers are cached and reused, so once warm, the overhead is a single Ruby VALUE allocation per request.The benefit is that code which uses
Fiber[]=
will be scoped correctly to a single request.Your checklist for this pull request
[ci skip]
to the title of the PR.#issue
" to the PR description or my commit messages.