Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status code must be an integer value between 1xx and 5xx #271

Closed
ClaudiuTodosia opened this issue Nov 18, 2019 · 19 comments
Closed

Status code must be an integer value between 1xx and 5xx #271

ClaudiuTodosia opened this issue Nov 18, 2019 · 19 comments

Comments

@ClaudiuTodosia
Copy link

Hello,
I receive this answer when I'm using the crawler like this:
image
image

@Redominus
Copy link
Contributor

Redominus commented Nov 18, 2019

It looks like there is some code somwhere that is trying to create a PSR-7 response with status code different from that values. Probably a status code 0 retuned by curl when there are connection problems(DNS, Proxy, SSL, IP, etc). If you are using guzzlehttp as the psr-7 implementation there should be no problems. Some implementations of PSR-7 ResponseInterface only allow values between those numbers(Symfony).
EDIT:
Something is trying to create a response with a value outside those numbers. In that case it should creater a RequestException.

@ClaudiuTodosia
Copy link
Author

Ok, any idea how can I fix this problem? because if I lower the maximum crawler count it will work but I need all the pages to be crawled... And if I remove the set maximum crawler the I will have the same error.

@Redominus
Copy link
Contributor

Guzzle has recently added a check to enforce those numbers. 😞
I would suggest an x-debug session with automatic breakpoin on InvalidArgumentException exception.

@bengower
Copy link

bengower commented Jan 3, 2020

I reckon it could be related to this (bot blocking):
https://stackoverflow.com/questions/46214017/public-linkedin-profile-url-returns-server-status-code-999/49856601

Below will throw an error on code 999 as it's out of the range specified. Should really throw an error for blocked bots or code 999

guzzlehttp/psr7/src/Response.php
private function assertStatusCodeRange($statusCode)
{
if ($statusCode < 100 || $statusCode >= 600) {
throw new \InvalidArgumentException('Status code must be an integer value between 1xx and 5xx.');
}
}

@Skullbock
Copy link

Same here.
Basically, if any site responds with a different status code than guzzle expects (in my case, 999), the response class of guzzle triggers an InvalidArgumentException.

The issue is that since all the crawling is done in a queue, and this exception is not intercepted by the Guzzlehttp Pool class (it only intercepts the RequestException), it bubbles up and can't be intercepted through the crawlFailed() method in this package.

Unfortunately i think this is not solvable if not by Guzzle itself (catching any exception and bubbling it up with another callback i think).

@freekmurze any thoughts on this?

@freekmurze
Copy link
Member

I'm thinking this should be changed at Guzzle's end indeed.

@Skullbock
Copy link

Brought this up to the guzzle team with a possible solution here: guzzle/guzzle#2534

@williamjulianvicary
Copy link

Just to jump in as I've been debugging this this evening (my comments here: guzzle/guzzle#2534) this is a core bug in my opinion which was caused by the PSR7 update to the HTTP Response object (which validates the status codes).

Essentially these exceptions are getting caught up within the Curl Multi header callback which generates the response, these exceptions should be caught and treated as an error during header collection - currently they aren't and instead the exception is causing no end of trouble. (It goes beyond just occasionally stalling execution - in my tests it's actually blocking requests that would otherwise complete, marking them as failed when they are not - presumably because curl handles aren't being regenerated or similar).

There is no simple (read: not a bodge) solution to this problem as far as I can tell, Curl relies on an appropriate response (-1 on error, not an exception) to handle the execution flow. These exceptions are breaking something within the core, and Curl Multi has always been a little rogue within PHP too - couple that with some nice race conditions on the promise state and you've got a headache!

A roll-back of PSR7 changes to the Response object and a try/catch to pick up any other exceptions (see my notes in the other thread) would solve this, or alternatively avoiding PSR7 Response altogether (unlikely) or catching the assertion exceptions/pre-validating the status code and handling these gracefully are all doable, but it depends what Guzzle wants in the core.

Anyway, no immediate solution unfortunately, this is blocking some internal tech though so I'm on-board on finding a fix for this.

@Skullbock
Copy link

Hey @williamjulianvicary thanks for taking the time to answer here too.
I'll be following both issues so if i can help in any way, i'll be notified.
Thanks again!

@runningnet
Copy link

For me a composer require guzzlehttp/guzzle:~6.0 fixed the problem in my Project

@williamjulianvicary
Copy link

This is an intermittent problem that you’ll hit on the 6.5.2 branch (latest branch for v6). I’ve submitted a pull request in the link I submitted above that solves this issue - I’m maintaining a fork if you want to alias to that meanwhile!

@bobemoe
Copy link

bobemoe commented Feb 23, 2020

thanks @williamjulianvicary, just confirming your patch appears to have fixed this issue for me :)

@lilessam
Copy link

Just in case someone still looks for detailed solution:

composer require cweagans/composer-patches

then add in extras in your composer.json the following:

"patches": {
    "guzzlehttp/guzzle": {
        "Status code must be an integer value between 1xx and 5xx": "https://patch-diff.githubusercontent.com/raw/guzzle/guzzle/pull/2591.patch"
    }
}

then run composer install again and should be good.

@strarsis
Copy link

strarsis commented Sep 19, 2020

Thanks for the patch!
This is indeed a sore point in Guzzle as there is no way getting around untypical HTTP status codes with the current Guzzle code,
even using a middleware for manipulating the response status code, as this error occurs before the HandlerStack is processed.

@strarsis
Copy link

strarsis commented Sep 19, 2020

@lilessam: I added the patch to a generated OpenAPI PHP client (that uses guzzle as composer dependency).
However, the patch doesn't seem to be applied as the same error occurs after installing the package with composer patch.
Relevant parts of the composer.json of that library:

{
    // ...
    "require": {
        "php": ">=7.1",
        "ext-curl": "*",
        "ext-json": "*",
        "ext-mbstring": "*",
        "guzzlehttp/guzzle": "^6.2",
        "cweagans/composer-patches": "^1.6"
    },
    "require-dev": {
        "phpunit/phpunit": "^7.4",
        "squizlabs/php_codesniffer": "~2.6",
        "friendsofphp/php-cs-fixer": "~2.12"
    },
    "autoload": {
        "psr-4": { "OpenAPI\\Client\\" : "lib/" }
    },
    "autoload-dev": {
        "psr-4": { "OpenAPI\\Client\\" : "test/" }
    },
    "extra": {
        "patches": {
            "guzzlehttp/guzzle": {
                "Status code must be an integer value between 1xx and 5xx": "https://patch-diff.githubusercontent.com/raw/guzzle/guzzle/pull/2591.patch"
            }
        }
    }
}

Edit: I had to set "extras": { "enable-patching": true in the root project composer.json, too.

@williamjulianvicary
Copy link

For anyone tracking this, my PR was merged and is going to be released with 7.2 Guzzle release! :-)

@freekmurze
Copy link
Member

Fantastic, thanks for letting us know @williamjulianvicary

@Skullbock
Copy link

@williamjulianvicary very nice work indeed! thanks!

@ClaudiuTodosia
Copy link
Author

@williamjulianvicary thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants