Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pluggable EventLoop task queue #9105

Closed
belliottsmith opened this issue Apr 29, 2019 · 23 comments
Closed

Support pluggable EventLoop task queue #9105

belliottsmith opened this issue Apr 29, 2019 · 23 comments
Milestone

Comments

@belliottsmith
Copy link
Contributor

The default task queue used by EPoll and NIO EventLoop implementations is provided by JCTools, and while it is efficient it has some undesirable properties. Specifically, it is not non-blocking, and can lead to the EventLoop busy-spinning waiting for a task that may not be provided promptly because the offering thread's execution has been suspended by the operating system. This is a rare scenario, but it will happen and some application maintainers might like to avoid it.

Unfortunately, there is no simple fix in the use of the queue; the relaxedPoll() method appears to be a candidate, but does not provide volatile visibility guarantees to a preceding offer(), meaning a wakeup could be missed (although we could write to a dummy volatile variable after offering to provide this). A queue identical to the JCTools one but using a putVolatile to set the item in the queue's backing array would suffice to avoid the busy-spin blocking of the EventLoop, but the issue of future tasks being unreachable until the thread completes would remain.

Ideally, the queue used by these event loops would be pluggable, so that users can choose their ideal point on the spectrum of tradeoffs.

A simple replacement that guarantees progress is the JDK ConcurrentLinkedQueue, but at the cost of slightly more expensive offer() and poll(); the above mentioned modification to the JCTools offer() could be mixed with relaxedPoll() to simply avoid busy-spin blocking, and a custom Mpsc queue implementation could guarantee forward progress with only marginally higher cost. Permitting the application maintainer to pick their poison would be tremendously helpful.

@normanmaurer
Copy link
Member

@nitsanw FYI...

@franz1981
Copy link
Contributor

AFAIK the only misbehaviours could happen by using offer of a single-producer JCTools queue or just relaxedPoll instead of poll.
The first one seems not the case here, while on consumer side just using a mix of relaxedPoll()/drain() with poll()/isEmpty() is enough to guarantee strict empty recognition a-la CLQ.
IMO just using poll() could be a simple fix that would force a JCTools queue to use the same semantic of CLQ::poll.

@belliottsmith
Copy link
Contributor Author

@franz1981 I'm afraid I don't quite follow what you're saying, but we currently do use poll() and this has the problems I describe.

However, I have just taken another look at wakeup(), and since we always perform a compareAndSet, we actually could use relaxedPoll to solve the busy-spin issue. However, we probably shouldn't be doing an unconditional compareAndSet, since this is costly - it should ordinarily be guarded by a cheap test of its current value.

@normanmaurer
Copy link
Member

@belliottsmith when you say we should guard you think of doing something like this:

if (!inEventLoop && !wakenUp.get() && wakenUp.compareAndSet(false, true)) {
    selector.wakeup();
}

?

@franz1981
Copy link
Contributor

franz1981 commented Apr 29, 2019

@belliottsmith

The poll() always return null if there is no pending offering and the only (internal) spinning I'm aware of in JCTools is while awaiting the message slot to be filled by the producer (because JCTools's offer happen in 2 stages: producer sequence increment AND message slot fill).
If the event loop logic fall into a spinning i don't see how it can be related to JCTools unless:

  1. the queue used is single producer
  2. the event loop logic that perform the sleep/awake is not correct (assuming is using poll() as you've suggested)

I think that the awake logic could use a different strategy too, that would avoid using compareAndSwap on offer side: https://github.com/JCTools/JCTools/blob/master/jctools-experimental/src/main/java/org/jctools/queues/blocking/ScParkTakeStrategy.java#L19

@belliottsmith
Copy link
Contributor Author

@normanmaurer right

@belliottsmith
Copy link
Contributor Author

belliottsmith commented Apr 29, 2019

@franz1981 the two stages are not atomic, and a thread can be suspended by the operating system in between these two steps

also, the EventLoop should definitely not be parking; it should only be voluntarily suspending its execution when entering select()

@franz1981
Copy link
Contributor

franz1981 commented Apr 29, 2019

@belliottsmith Correct: that means that you would likely to see a spin into the queue::poll call: that's the issue you're seeing? Because I cannot see how using a putVolatile while setting the message slot could make any difference...@nitsanw can confirm/reject it

@normanmaurer
Copy link
Member

@belliottsmith I think I would even argue that we may not need the CAS at all and could just use a volatile. At worse we would call wakeup multiple times. That said maybe just add a get before is good enough as a failed CAS should be at least cheaper then an extra selector wakeup. WDYT ?

@belliottsmith
Copy link
Contributor Author

@normanmaurer I think the get + CAS is probably safest for now, the problem being the negotiation between the consumer resetting it to false before going to sleep, and a producer seeing the true before this happens and failing to prevent it going to sleep. It would be possible to avoid a CAS here, but it would require a bit more complex surgery and thought to modify all of the places we actually go to sleep. On modern chips a CAS isn't so costly anyway, and probably less costly than actually triggering the wakeup (though I don't know how actually costly this is, the code comments suggest it is costly) - it might be something to consider for 5.0 as part of wider refactors, but doesn't intuitively feel like it fits the risk/reward tradeoff for 4.x to me.

@njhill
Copy link
Member

njhill commented Jun 26, 2019

@belliottsmith @normanmaurer I think this can be closed now that #9247 is merged?

@normanmaurer
Copy link
Member

@njhill yes!

@normanmaurer normanmaurer added this to the 4.1.37.Final milestone Jun 27, 2019
@nitsanw
Copy link
Contributor

nitsanw commented Aug 19, 2019

@belliottsmith "Specifically, it is not non-blocking, and can lead to the EventLoop busy-spinning waiting for a task that may not be provided promptly because the offering thread's execution has been suspended by the operating system." - I assume you are referring to the following:
https://github.com/JCTools/JCTools/blob/58a08a14928ca966d55543d3e4f6f40a44a65518/jctools-core/src/main/java/org/jctools/queues/BaseMpscLinkedArrayQueue.java#L265

            if (casProducerIndex(pIndex, pIndex + 2))
            {
                break;
            }
        }
        // INDEX visible before ELEMENT
/* Producer thread is suspended here */
        final long offset = modifiedCalcElementOffset(pIndex, mask);
        soElement(buffer, offset, e); // release element e
        return true;

If this happens the consumer thread will spin-wait for the element to appear in poll():

        Object e = lvElement(buffer, offset);// LoadLoad
        if (e == null)
        {
            if (index != lvProducerIndex())
            {
/* index is visible, but element is not visible until the producer is able to make progress */
                do
                {
                    e = lvElement(buffer, offset);
                }
                while (e == null);
            }
            else
            {
                return null;
            }
        }

This makes the queue blocking. What is worse, successful offer calls from other producers cannot overtake the "stuck producer" and the consumer cannot attend to them (I've always thought of this as a bubble in the queue). The producers can make progress, but the consumer can't skip ahead and deq the items until the "stuck producer" releases it.

If this is the issue under discussion, I'm not sure what the following suggestion means: "A queue identical to the JCTools one but using a putVolatile to set the item in the queue's backing array would suffice to avoid the busy-spin blocking of the EventLoop"

The problem is that the index is visible before the element. Making the element "more visible" (by using a stronger barrier) will not make a difference. The index and the element are not atomically published (as @belliottsmith points out later). Using CLQ will provide better progress guarantees here AFAIK (IIRC the node is only published when fully initialized, so the "bubble" in the queue cannot happen) and I think the feature request is sensible.

@nitsanw
Copy link
Contributor

nitsanw commented Aug 19, 2019

@belliottsmith @franz1981 reading through the discussion and the java doc I realize JCTools should do a better job documenting the progress guarantees on this and other queues. I've filed a bug: JCTools/JCTools#259

@belliottsmith
Copy link
Contributor Author

If this is the issue under discussion, I'm not sure what the following suggestion means: "A queue identical to the JCTools one but using a putVolatile to set the item in the queue's backing array would suffice to avoid the busy-spin blocking of the EventLoop"

It means the eventLoop (by using relaxedPoll()) would at least be able to proceed with other work such as reading/writing to a socket, or could stop if it had no other work to do, freeing up a thread for the producer to complete its work.

Obviously a preferable solution is to use a truly non-blocking queue, but blocking the eventLoop is worse than just blocking the external work queue.

@nitsanw
Copy link
Contributor

nitsanw commented Aug 19, 2019

I was referring in particular to: "using a putVolatile to set the item in the queue's backing array would suffice to avoid the busy-spin blocking of the EventLoop".
I agree that using a relaxedPoll will return control to the EventLoop, giving you more options in how to utilize time until an element is visible. I don't see how putVolatile makes a difference here.
Thanks.

@belliottsmith
Copy link
Contributor Author

Well, in combination with #9109 you might appreciate my reasoning. Long story short, it depends on the behaviour of the code that wraps it to provide blocking behaviour. putVolatile instead of putOrdered would permit negotiating if a wake-up is not necessary with only LoadLoad barriers. However, today, Netty imposes a full CAS in each side of that transaction, so it wouldn't actually be necessary.

(It's been a while since I had the context to think about this, so I may bow out of further discussion for now, as I don't really have time to re-research my answers/thoughts)

@nitsanw
Copy link
Contributor

nitsanw commented Aug 19, 2019

Reading through the discussion, I understand you have some reservations regarding the semantics/guarantees of JCTools queues, but I can't see how they add up to anything actionable.

If you have a concern with a particular detail please file a bug with JCTools and I'll be happy to address it.

Making the element write a volatile write makes no difference to either ticket AFAICT. It imposes further ordering constraints on the offer, which are irrelevant to the offer and do not help the poll code where these tickets seems to focus. The offer method as a whole already has volatile write semantics (in this implementation) due to the successful CAS operation.

@belliottsmith
Copy link
Contributor Author

belliottsmith commented Aug 19, 2019

I have no concerns about the semantics, that would imply I thought they were overall problematic. Every concurrent structure's semantics matter only in the context they're used.

IMO, this project should be using a truly non-blocking queue, but that doesn't mean the algorithm employed by JCTools is of general concern.

(I will concede that I consider the default spin-lock blocking behaviour to be a bug, for any real-time application, but that's a personal stance on the matter of unbounded user-space spin locks and priority inversion)

@nitsanw
Copy link
Contributor

nitsanw commented Aug 19, 2019

"IMO, this project should be using a truly non-blocking queue" - The trade offs are between higher per-element costs (allocation + barriers) and the JCTools option which is lock-less but still risks blocking. The configurable queue impl solves this issue and offers choice, so I think this is solved.

"I consider the default spin-lock blocking behaviour to be a bug" - Would you prefer to put a yield in the loop instead? If you have suggestion I'm happy to hear it.

@franz1981
Copy link
Contributor

Given that a relaxed poll can return NULL either due to emptiness and a "not yet" published element I believe that it can be improved by make these 2 cases explicit and let a user to choose what to do while awaiting element publication, but it would mean an additional xxxPoll method: it could be useful, but most JCtools users love and uses it because qs are "kinda" replacement for j.u.c. queues and I'm not sure that Netty would love to be bounded to use MessagePassingQueue API instead of the more general java Queue one.

@belliottsmith
Copy link
Contributor Author

I've been down enough of these rabbit holes to know any answer will only lead to another, so I hope you don't mind if I bow out for real for now. I only provided that qualifier as a post-script because I felt I had been dishonest by disclaiming any concerns about JCTools in response to your claim I had expressed general concerns (which I do not feel I had, I was only discussing their semantics in the context of this project). JCTools is a great library, that you should be proud of; it would be surprising if nobody had any concerns or criticisms. Perhaps we will have some time to discuss them in person one day.

@nitsanw
Copy link
Contributor

nitsanw commented Aug 19, 2019

"JCTools is a great library, that you should be proud of" - Thanks :-)

"it would be surprising if nobody had any concerns or criticisms." - Indeed, and I am keen to address these concerns where I can. So please, if possible, make time to submit issues if appropriate. Even if they remain unresolved they will be informing other users of the risks/rewards/tradeoffs/decision making etc.

Thanks for your time, I hope we have an opportunity to discuss in person in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants