Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x] Prevent double throwing chained exception on sync queue #42950

Merged
merged 2 commits into from
Jul 7, 2022
Merged

[8.x] Prevent double throwing chained exception on sync queue #42950

merged 2 commits into from
Jul 7, 2022

Conversation

rodrigopedra
Copy link
Contributor

@rodrigopedra rodrigopedra commented Jun 25, 2022

Closes #42883

As noted by issue #42883, when a chain of jobs is dispatched and a closure is provided to the chain's catch method, if an exception occur, the callback is called multiple times.

For example, consider this code:

Bus::chain([
    new TestJob(1),
    new TestJob(2),
    new TestJob(3),
    new TestJob(4, exception: true),
    new TestJob(5),
    new TestJob(6),
    new TestJob(7),
])->catch(function (Throwable $e): void {
    dump('A job within the chain has failed...');
})->onConnection('sync')->dispatch();

When new TestJob(4, exception: true) throws an exception, the call back will be called 4 times, one for each of these job instances:

  • new TestJob(4, exception: true)
  • new TestJob(3)
  • new TestJob(2)
  • new TestJob(1)

If new TestJob(4, exception: true) was defined further down on the chain, the callback would be called more times, and vice-versa.

This happens as each job in the sync queue is executed in the same PHP process, so when the next job is dispatched it is actually "called" within the same execution call stack.

Thus when an exception is thrown the exception triggers the SyncQueue exception handler multiple times, one for each job awaiting the execution call stack to complete.

This PR

  • Adds an instance property to SyncQueue to track how many jobs were already pushed within its execution call stack
  • Adds a guard to SyncQueue@handleException method, to ensure the handler is called only once for that execution call stack
  • Adds a test case which fails without this PR's code

@taylorotwell
Copy link
Member

So does jobCount ever get decremented in any way? What about Octane?

@rodrigopedra
Copy link
Contributor Author

I missed that one, sent a new commit which decreases the job count as jobs are processed. Thanks!

@taylorotwell
Copy link
Member

Can you explain the fix a bit? The "guarded" logic... can you describe in English how it works, etc.?

@rodrigopedra
Copy link
Contributor Author

Sure, I will try

In a sync queue a job is executed by the same PHP program from where it was dispatched.

So when dispatching a job batch, every subsequent dispatach works like a recursive call,
as when calling Queueable@dispatchNextJobInChain, it will wait for the next job to be executed,
before leaving the method call.

public function dispatchNextJobInChain()
{
if (! empty($this->chained)) {
dispatch(tap(unserialize(array_shift($this->chained)), function ($next) {
$next->chained = $this->chained;
$next->onConnection($next->connection ?: $this->chainConnection);
$next->onQueue($next->queue ?: $this->chainQueue);
$next->chainConnection = $this->chainConnection;
$next->chainQueue = $this->chainQueue;
$next->chainCatchCallbacks = $this->chainCatchCallbacks;
}));
}
}

In an asynchronous queue this doesn't happen as the next job will be stored into the queue's store, so it would
be picked up later by a queue worker.

For every job in the batch, SyncQueue@push is called. This wraps the call to the SyncJob@fire into a try..catch,
and in the catch block it delegates the exception handling to the SyncQueue@handleException method, where the "guard" was added.

Currently the SyncQueue@handleException does some processing but ultimately re-throws the exception.

Remember the previous jobs in the batch were waiting for the Queueable@dispatchNextJobInChain method to end execution?
Much like a recursive call? Due to everything being processed by the same PHP program?

The exception will then "bubble up" to the previous awaiting job in the call stack, then its
SyncQueue@push's try..catch block will delegate the exception handling to
its instance's SyncQueue@handleException, which will re-throw the exception, and so on... up to the first job.

And that recursive "handle exception/re-throw exception" is what makes the error be reported several times.
One for each previous jobs before the actual exception happens on a sync queue.

The "guard" mechanism works on:

  • it allows first execution and set the guarded flag to true indicating it is "enabled"
  • On the second execution from the immediate previous job from the one which raised the exception,
    it will just "unguard", without actually processing the exception, as it was already handled by
    the "next" job (next as in the batch order)
  • The exception then is not double reported, nor re-thrown, thus stopping bubbling it up and being reported several times

Maybe "guarded" is not the best name. Alternatives could be sentinel, lock, flag, semaphore, gate, the idea
is to allow the first execution and then "guard" against a next recursive call.

Hope it is clearer now =)

@taylorotwell let me know if you need any further clarification.

@taylorotwell taylorotwell merged commit 41300c6 into laravel:8.x Jul 7, 2022
@taylorotwell
Copy link
Member

Thanks

@rodrigopedra rodrigopedra deleted the 8.x branch July 8, 2022 10:59
@bramdevries
Copy link
Contributor

This change seems to have caused a regression where, when using dispatchSync with a job that uses the Illuminate\Bus\Queueable trait and that results in other jobs being dispatched on the sync queue will no longer have their exceptions bubble up.

This took quite some time to debug in our application (we noticed it when a test would start to fail in which we expected an exception to be thrown, the test that ran before dispatched a job that dispatched other jobs synchronously).

I've attempted to create a case that reproduces this bug:

class Job implements \Illuminate\Contracts\Queue\ShouldQueue
{
   use Queuable;

    private int $jobId;

    public function __construct(int $jobId)
    {
        $this->jobId = $jobId;
    }

    public function handle()
    {
        if ($this->jobId % 2 === 0) {
            throw new InvalidArgumentException('Exception in job '.$this->jobId);
        }

        event('job-completed', $this->jobId);
    }
}

class NestedJob implements \Illuminate\Contracts\Queue\ShouldQueue
{
    use Queuable;

    private int $jobId;

    public function __construct(int $jobId)
    {
        $this->jobId = $jobId;
    }

    public function handle()
    {
        if ($this->jobId % 2 === 0) {
            throw new InvalidArgumentException('Exception in nested job '.$this->jobId);
        }
    }
}

I have two different types of jobs: Job and NestedJob. The jobs will throw an exception when a number that is divisible by 2 is passed, if otherwise it will emit a job-completed event.

In a separate event listener I'm then dispatching a NestedJob when the jobId is 3:

Event::listen('job-completed', function (int $id) {
    if ($id !== 3) {
        return;
    }

    Bus::dispatchSync(new NestedJob(4));
});

When dispatching a series of jobs, the following will happen:

try {
    Bus::dispatchSync(new Job(2));
} catch (Throwable $exception) {
    dump($exception->getMessage()); // Error in job 2
}

try {
    Bus::dispatchSync(new Job(3));
} catch (Throwable $exception) {
    dump($exception->getMessage()); 
   // No exception is thrown here, while I would expect "Error in nested job 4" to reach this.
}

try {
    Bus::dispatchSync(new Job(6));
} catch (Throwable $exception) {
    dump($exception->getMessage()); // Error in job 6
}

However, if I were to add the try...catch statement to the event listener the following would occur:

Event::listen('job-completed', function (int $id) {
    if ($id !== 3) {
        return;
    }

    try {
       Bus::dispatchSync(new NestedJob(4));
    } catch (Exception $exception) {
       dump($exception->getMessage()); // Error in nested job 4.
     }
});

try {
    Bus::dispatchSync(new Job(2));
} catch (Throwable $exception) {
    dump($exception->getMessage()); // Error in job 2
}

try {
    Bus::dispatchSync(new Job(3));
} catch (Throwable $exception) {
    dump($exception->getMessage()); 
   // No exception is thrown here as it is caught within the event handler.
}

try {
    Bus::dispatchSync(new Job(6));
} catch (Throwable $exception) {
    dump($exception->getMessage()); // no exception is thrown here as the `SyncQueue` is still `guarded`.
}

The workaround seems to be to not use the Queueable trait on jobs that dispatched as part of another job as this seems to avoid the SyncQueue all together. However, I do still think there is a regression here as this breaks the old behaviour of dispatching sync jobs within other jobs.

@driesvints
Copy link
Member

@rodrigopedra @gerardroche any thoughts on the above?

@gerardroche
Copy link

gerardroche commented Jul 21, 2022

Yes I think the fact that exceptions no longer bubble up may be a backwards compatibility break.

I think it's important to note that the sync queue connection has some very different behaviour to other queue connections like redis, sqs, etc. For example, exceptions don't bubble up on the other connections. This can be a gotcha because the sync queue is generally used during testing so some tests may be lying about what is really happening in production.

Of course if the sync is used directly and exceptions are expected to bubble up, I think that's going to be valid bc break.

@rodrigopedra
Copy link
Contributor Author

@driesvints I agree with @gerardroche it seems a breaking change.

I also think this is still unexpected behavior when chaining jobs, but out of my head I can't think of a better solution without adding convoluted if clauses, which I know is not the best way to solve this.

Maybe it is better to revert this PR and reopen issue #42883 until someone come up with a better solution.

@driesvints
Copy link
Member

Thanks all. We're reverting this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants