Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent requests not really parallel #1625

Closed
NeoLegends opened this issue Oct 21, 2016 · 13 comments
Closed

Concurrent requests not really parallel #1625

NeoLegends opened this issue Oct 21, 2016 · 13 comments

Comments

@NeoLegends
Copy link

NeoLegends commented Oct 21, 2016

We're using guzzle to query Google's PageSpeed-API concurrently as per the docs via creating multiple asynchronous requests with $client->getAsync(...) and then waiting for the results via Promise\settle(...).

However, when firing lots of requests (30+) the server takes extremely long (about 5 minutes) to execute, while the same amount of requests fired locally from node takes only about 20s. 20s is what I would expect when all the requests are sent in parallel, since this is about the time the PageSpeed API takes for a full analysis of a reasonably large site. 5mins, however, is by far too much.

This is the code that creates the requests:

// Sorry for the verbosity, but I'm afraid there might be hidden caveats / blocking code
// in here I don't know about.

$requests = [];
// This block basically creates all the requests
foreach ($urls as $url) {
    foreach ($strategies as $strategy) { // Strategy is PageSpeed Strategy, either mobile or desktop
        $requests[$url][$strategy] = $this->client->getAsync('runPagespeed', [
            'query' => [
                'url' => $url,
                'locale' => $locale,
                'strategy' => $strategy
            ]
        ]);
    }
}

// This block could be omitted if Promise\settle supported nested arrays
// it processes the requests for each URL and strategy and decodes the response JSON

$promises = [];
foreach ($requests as $url => $reqs) {
    // Could it be b/c of the recursive settle here? I guess / hope not.
    $promises[$url] = Promise\settle($reqs)->then(function ($results) {
        $finalResults = [];

        foreach ($results as $strategy => $result) {
            // Do some processing with the result, like figure out whether the request succeeded and get the body, in either case
            $res = new \stdClass();
            $res->success = $result['state'] == Promise\PromiseInterface::FULFILLED;
            /** @var ResponseInterface $response */
            $response = $res->success ? $result['value'] : $result['reason']->getResponse();
            $res->data = json_decode($response->getBody()->getContents(), false);

            $finalResults[$strategy] = $res;
        }

        return $finalResults;
    });
}

Promise\settle($promises)->then(function ($results) {
    // We have the results
})->wait();

My guess is that either cURL multi handles aren't used the way I expect them to be used or that there is some hidden blocking in this code I don't know about. Any idea why this takes so long?

@NeoLegends NeoLegends changed the title Concurrent request not really parallel Concurrent requests not really parallel Oct 21, 2016
@sagikazarmark
Copy link
Member

Since this is not the first time we receive performance related issues lately, I will try to reproduce the issue. Will keep you posted.

@NeoLegends
Copy link
Author

Thanks! The code is from the duplexmedia/parallel-pagespeed composer package.

@sagikazarmark
Copy link
Member

Here are my tests: https://github.com/reproduce/guzzle/tree/issue1625

Using the example code above I couldn't reproduce the issue. Although there are some higher load results, my internet connection is pretty slow at the moment, so you might want to run the tests for yourself as well.

You might want to install Blackfire on your server and see if there are some environmental things that block your requests, although you need the Premium version to see HTTP response times separately.

@NeoLegends
Copy link
Author

Thank you very much for the extensive testing! I will look into the results and see if I can make more sense out of them.

@NeoLegends
Copy link
Author

NeoLegends commented Oct 28, 2016

Previously my code ran in a local Vagrant dev-machine, so I figured that maybe that might be causing the issues. Today I ran the requests on a dedicated webserver and I got the same results. 2:30min to Google PageSpeed for some 25 parallel requests.

EDIT: Nevermind, I'm stupid. Vagrant was indeed causing the issues.

@sagikazarmark
Copy link
Member

Are these dedicated servers in the same network as in case of your first tests? I've just tried it in another network with around the same results. Can you please try out your code in a different network? Can you please try to spin up my reproduced code with blackfire? Maybe we can catch the issues there.

@NeoLegends
Copy link
Author

NeoLegends commented Oct 28, 2016

Ah man, results get better by running stuff on a dedicated server, but not by that much. I'll try to spin up that benchmarking code. :)

No, the first tests I've made came from my local machine through the company network (which is pretty good), the results I'm posting now come from a data center somewhere in Germany. So it's a different network.

@NeoLegends
Copy link
Author

The tests also show that cURL itself seems to be responsible since curl_multi_select is where most of the time (even in the benchmarks that take long to execute) is spent.

@sagikazarmark
Copy link
Member

sagikazarmark commented Oct 28, 2016

Actually it only shows for how long did cURL run, but that includes the network traffic as well. My example also includes around 25 URLs and 23s (that's what you measured) is not that bad result IMO, given what kind of service you call.

Can you do some network profiling to see if it is actually the client responsible for the high load times in your application (not in the reproduction code)?

Also, you could try my example with your URLs to see if there is a bottleneck there.

@sagikazarmark
Copy link
Member

Any news @NeoLegends ?

As far as I can see it's either a cURL issue or for some reasons doing insights for your URLs take too long.

@NeoLegends
Copy link
Author

Yeah, seems like problems with cURL or the sites I've tested. We went around the problem by issuing the requests from the browser.

@sagikazarmark
Copy link
Member

All right, closing then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants