New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guzzle Pool promise finishes before all requests are complete #2534
Comments
To add, in case it helps, this is the case if I pass a generator or an array. Also, I don't believe "leaking" is actually correct looking into it a little further and I don't believe this is conflicting with Laravel's error handling, it seems the unwrapping on the |
I am failing to understand what the problem is.. Out the failure log I see that everything reached to failure callback and the exception is throw as it is supposed to be thrown... what do I miss? |
I think the problem is that it is only "intermittent"? |
After digging into the promise code some more, as far as I can tell the final exception leaking is ‘working as intended’? However, I could try/catch that and it would not be a problem - it doesn’t always bubble though. My core issue is that these exceptions are bubbled before all requests are complete. In the example error above I have not trimmed the log - that should be 100 failures not 19. Sometimes this executes all 100 requests, sometimes just a handful and occasionally just a single request is made before the exception bubbles from the wait call. I’m attempting to make requests for thousands of URLs via a generator, however intermittently this is failing and stopping further execution. |
I've edited the issue name/description slightly to be less misleading with the messaging - this doesn't appear to be an exception "leak" (as this is exactly how the |
Were you able to reproduce this @GrahamCampbell / @gmponos or do I have an issue with the implementation/my stack? |
Hi, sorry to jump in, but i found this issue while trying to debug this one: Which depends on this exact issue. And thanks for this awesome library ;) |
I've spent a couple hours this evening attempting to debug this. In I've added a (super simple) debug output, for when the script hits this mark, and rather than throwing the exception I've returned.
This then successfully outputs (see on line 4, this would have thrown an exception rather than reject the promise!):
I'm also seeing a lot of "Cannot change a rejected promise to fulfilled" which I believe is the underlying issue here and is causing the parent exception to leak. |
After a little more digging around, it seems that this line is attempting to fulfil an already rejected promise: This appears to be because the If I remove those assertions my code works as intended every time:
In short, I'm not sure on the resolution of this issue, but I now believe I understand the root cause:
Note, this problem is worse than I first thought as well, as soon as any promise fulfils then rejects, all further promises are rejecting. Rolling back my changes to the base causes ~5 of the 95 200 OK responses to reject which is a serious issue for anyone using the pool in cases where they may run into these assertions in the PSR7 library! |
Final message before I stop debugging for the evening. This boils down to the build of the Response object, called within the closure here: guzzle/src/Handler/CurlFactory.php Line 569 in 1cdd69a
As this build fails, it's causing some kind of catastrophic failure within Curl Multi, probably because an exception is being raised but this isn't actually halting execution due to the way that promises (usually) stop exceptions from bubbling up. As this is an exception that occurs during the header execution I feel this should be caught and the exception return a "error handling headers" exception - but that's not entirely user friendly. One option, which I'm not sure I'm happy with, would be to do as follows: Change EasyHandle to pre-validate the status code and give it a number that is not a valid status code, but within the assertion range:
Add a try/catch for any other errors within the closure which creates the response:
Anyone else have any bright ideas? This is seriously blocking for any sort of web crawling where you cannot expect a valid status code in advance and EasyHandle is not (ironically) easy to override. Alternatively, don't use the PSR7 Response object within Guzzle - again, not ideal but I'm struggling to find an alternative. |
Just chiming in. I suspect we may be experiencing this issue with our Guzzle pool, but it's difficult to debug. We have "missing" requests, which I can only trace to the promise finishing before the responses have been passed to the callback functions. |
Sad I have the problem with this issue as well |
Has there been any traction on this? Any sort of crawling I do is currently completely broken with Guzzle, mostly due to LinkedIn responses which return a 999 status code when crawling unless UA and IP address is white listed. |
I know it’s not ideal but the referenced pull request could be used instead, I’m keeping that branch open so if you alias your composer file to it, it’ll temporarily fix the bugs you’re experiencing :-) #2591 |
Composer-based patch: spatie/crawler#271 (comment) |
PR has been merged - to be released in 7.2 release. |
Hello guys and @williamjulianvicary , I still have the same error: request.CRITICAL: Uncaught PHP Exception GuzzleHttp\Promise\RejectionException: "The promise was rejected with reason: Inv
oking the wait callback did not resolve the promise" at /var/www/vendor/guzzlehttp/promises/src/Create.php line 62 {"exception":"[object] (GuzzleHttp\\Promis
e\\RejectionException(code: 0): The promise was rejected with reason: Invoking the wait callback did not resolve the promise at /var/www/vendor/guzzlehttp/pr
omises/src/Create.php:62)"} [] Part of stacktrace: GuzzleHttp\Promise\RejectionException:
The promise was rejected with reason: Invoking the wait callback did not resolve the promise
at vendor/guzzlehttp/promises/src/Create.php:62
at GuzzleHttp\Promise\Create::exceptionFor('Invoking the wait callback did not resolve the promise')
(vendor/guzzlehttp/promises/src/Promise.php:72)
at GuzzleHttp\Promise\Promise->wait() Using version: root@7a917597f9c1:/var/www# composer show | grep guzzle
guzzlehttp/guzzle 7.2.0 Guzzle is a PHP HTTP client library
guzzlehttp/promises 1.4.0 Guzzle promises library
guzzlehttp/psr7 1.7.0 PSR-7 message implementation that also provides common utility methods
kevinrob/guzzle-cache-middleware v3.3.1 A HTTP/1.1 Cache for Guzzle 6. It's a simple Middleware to be added in the HandlerStack. (RFC 7234)
spatie/guzzle-rate-limiter-middleware 2.0.0 A rate limiter for Guzzle My code where i have the issue: public function isValidProductIds(array $productIdsToCheck): array
{
$notValidProductIds = array();
// Get http headers (api token)
$httpHeaders = $this->getHttpHeaders();
// Create request function
$requests = function ($productIdsToCheck, $httpHeaders)
{
foreach ($productIdsToCheck as $productId)
{
$url = sprintf('article/%d', $productId);
yield new Request(
'GET',
$url,
$httpHeaders
);
}
};
// Create the pool
$pool = new Pool($this->httpClient, $requests($productIdsToCheck, $httpHeaders), [
'concurrency' => 5,
'fulfilled' => function (Response $response, $index)
{
// this is delivered each successful response
$this->logger->debug('OK for index '.$index.' : HTTP CODE '.$response->getStatusCode());
},
'rejected' => function (RequestException $reason, $index) use (&$notValidProductIds, $productIdsToCheck)
{
$notValidProductIds[] = $productIdsToCheck[$index];
$this->logger->info('GET index '.$index.' to check productId failed: '.$reason->getMessage());
},
]);
// Initiate the transfers and create a promise
$promise = $pool->promise();
// Force the pool of requests to complete.
$promise->wait();
return $notValidProductIds;
} Note: I do not know if it could help you or not to investigate but seems the error appears only when they are a lot of 'rejected' (ie. 404 not found). Tell me if you need more infos. Regards |
I think this is different, unless they aren't 404 errors? The issue that was resolved with the pull request was related to the header function throwing an exception that was not caught (which threw CURL into an odd state). A 404 status response would not hit that workflow so I think is unrelated to the fix that was deployed. I'm happy to look into this if you can simplify the code into a reproducible example with an endpoint that is publicly accessible (such as with https://httpbin.org/ as URLs?) |
Update : I tested different type of cache (flysystem, doctrinecache and psr6) with GreedyCacheStrategy. Same issue for both. I cannot test with PrivateCacheStrategy because i my use case the cache do not cache correctly with that strategy. But it seems to confirm that the problem appears only when the cache is used. @williamjulianvicary , you can reproduce the error by doing this steps below. Hope this will help you to troubleshoot. For info i am using Symfony 5.1.8. Step 1: private function test(){
$productIdsToCheck = new ArrayCollection([1,2,3,4,100,6,7,8,9,99]);
// Create request function
$requests = function ($productIdsToCheck)
{
foreach ($productIdsToCheck as $key => $productId)
{
yield new Request(
'GET',
'https://httpbin.org/'.$productId
);
}
};
// Create the pool using $request function
$pool = new Pool($this->httpClient, $requests($productIdsToCheck), [
'concurrency' => 5,
'fulfilled' => function (Response $response, $index)
{
// this is delivered each successful response
},
'rejected' => function (RequestException $reason, $index)
{
},
]);
// Initiate the transfers and create a promise
$promise = $pool->promise();
// Force the pool of requests to complete.
$promise->wait();
} Everything is ok the first time. Step 2: private function test(){
$productIdsToCheck = new ArrayCollection([1,2,3,4,10,6,7,8,9,99]);
// Create request function
$requests = function ($productIdsToCheck)
{
foreach ($productIdsToCheck as $key => $productId)
{
yield new Request(
'GET',
'https://httpbin.org/'.$productId
);
}
};
// Create the pool using $request function
$pool = new Pool($this->httpClient, $requests($productIdsToCheck), [
'concurrency' => 5,
'fulfilled' => function (Response $response, $index)
{
// this is delivered each successful response
},
'rejected' => function (RequestException $reason, $index)
{
},
]);
// Initiate the transfers and create a promise
$promise = $pool->promise();
// Force the pool of requests to complete.
$promise->wait();
} And the error message appears: The promise was rejected with reason: Invoking the wait callback did not resolve the promise Important: #services.yaml
services:
_defaults:
autowire: true # Automatically injects dependencies in your services.
autoconfigure: true # Automatically registers your services as commands, event subscribers, etc.
bind:
$httpClient: '@http_api_client'
Kevinrob\GuzzleCache\Storage\Psr6CacheStorage:
class: Kevinrob\GuzzleCache\Storage\Psr6CacheStorage
arguments: [ '@cache.app' ]
Kevinrob\GuzzleCache\Strategy\GreedyCacheStrategy:
class: Kevinrob\GuzzleCache\Strategy\GreedyCacheStrategy
arguments: [ '@Kevinrob\GuzzleCache\Storage\Psr6CacheStorage', '3600' ]
Kevinrob\GuzzleCache\CacheMiddleware:
class: Kevinrob\GuzzleCache\CacheMiddleware
arguments: [ '@Kevinrob\GuzzleCache\Strategy\GreedyCacheStrategy' ]
GuzzleHttp\HandlerStack:
class: GuzzleHttp\HandlerStack
factory: [ 'GuzzleHttp\HandlerStack', create ]
calls:
- [ push, [ '@log_middleware', 'log' ] ]
- [ push, [ '@Kevinrob\GuzzleCache\CacheMiddleware', 'cache' ] ]
- [ push, [ '@Spatie\GuzzleRateLimiterMiddleware\RateLimiterMiddleware', 'rate-limit' ] ]
http_api_client_
class: GuzzleHttp\Client
arguments:
- handler: '@GuzzleHttp\HandlerStack' Important 2: Tell me if you think it is better to open a new issue. |
Issue created as it seems not linked to the current issue : #2822 |
Guzzle version(s) affected: 6.5.2
PHP version: 7.3.9-1+ubuntu18.04.1+deb.sury.org+1
cURL version: 7.58.0
Description
I'm seeing an intermittent issue (frequent enough - the below may need to be ran a handful of times to see the problem) where, it appears that the Pool is finishing prematurely and the
->wait()
method call is unwrapping the most recent exception - even though there may be many other requests waiting to be processed.I've pasted a sample simple Laravel command below, which requests 100 URLs that all return a 999 response code (something that the assertions fail in the Response generation). This sometimes runs with no issue, but often (say 1 in 4 times) the call to
->wait()
bubbles an exception before all other requests are finished.How to reproduce
Additional context
Sometimes this does not happen and the full Pool works properly, however I am still seeing some odd failure reasons:
An example failure log:
The text was updated successfully, but these errors were encountered: