Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add curl handler streaming support #3115

Open
mvorisek opened this issue Apr 8, 2023 · 16 comments
Open

Add curl handler streaming support #3115

mvorisek opened this issue Apr 8, 2023 · 16 comments

Comments

@mvorisek
Copy link

mvorisek commented Apr 8, 2023

Feature request

Based on the discussione below, it seems the main bottleneck is caching the whole response thru disk. This issue is therefore a feature request to add streaming support for curl handler.

Guzzle version(s) affected: any (6.5, 7.5)
PHP version: any
cURL version: any

Description

PHP internal webserver can serve data at >4 GB/s speed. The biggest bottleneck is Guzzle client which seems to be capped around 500 MB/s.

How to reproduce

Server side:

server.php

<?php

$sizeMb = 1 * 1024;
$sizeBytes = $sizeMb * 1024 * 1024;

header('Content-Type: application/octet-stream');
header('Content-Length: ' . $sizeBytes);
header('Content-Disposition: attachment; filename="test.bin"');

$string = str_repeat(bin2hex(random_bytes(128)), 256); // 64 kB
for ($i = $sizeBytes / strlen($string); $i >= 0; $i--) {
    echo $string;
}

cmd:

php -S 127.0.0.1:9687 -t dir_path

Client side:

    public function testHugeOutputStream(): void
    {
        $client = new \GuzzleHttp\Client();
        $sizeMb = 1 * 1024;
        $sizeBytes = $sizeMb * 1024 * 1024;
        $response = $client->request('GET', 'http://127.0.0.1:9687/demos/_unit-test/stream.php');
        static::assertSame(200, $response->getStatusCode());
        static::assertSame('application/octet-stream', $response->getHeaderLine('Content-Type'));
        static::assertSame((string) $sizeBytes, $response->getHeaderLine('Content-Length'));

        $stream = $response->getBody();
        $length = 0;
        while (!$stream->eof()) {
            $length += strlen($stream->read(16 * 1024));
        }
        static::assertSame($sizeBytes, $length);
    }

The test pass, but the typical run time is around 2.2 secs on both Windows and linux. I would expect >2GB/s.

Possible Solution
Maybe the response is internally read in small receive buffers etc., IDK.

@GrahamCampbell
Copy link
Member

GrahamCampbell commented Apr 8, 2023

Can you replicate this in Guzzle 7? Guzzle 6 is EOL.

@GrahamCampbell
Copy link
Member

Possibly the slow part here is just your choice of size of string to read: $stream->read(16 * 1024)? Did you try setting that much higher?

@GrahamCampbell
Copy link
Member

GrahamCampbell commented Apr 8, 2023

How long is DNS lookup and getting the connection setup taking? That may be responsible for up to 1 second 100ms of the timing there.

@mvorisek
Copy link
Author

mvorisek commented Apr 8, 2023

I tried 10GB and 100GB, With 10GB, the speed is about the same. With 100GB I hit php time limit, but the speed would be probably the same too.

With Guzzle v7.5.0/latest, I do not see any significant perfomance improvement. The same/no improvement with 100x larger read chunk size.

@GrahamCampbell
Copy link
Member

Are you able to provide a flame graph so we can determine if this is just PHP being slow, or if we have some bottleneck on our side? Are you using the curl adapter, or the PHP streams adapter?

@mvorisek
Copy link
Author

mvorisek commented Apr 8, 2023

I am on Windows, so I cannot provide flame graph, on linux, I tested the perf using phpunit and GH Actions, so getting it would be probably not that easy...

I benched the server side with wget I can get > 2GB/s consistently, so the bottleneck has to be on the client side.

Are you using the curl adapter, or the PHP streams adapter?

I have url open disabled, so probably curl (which is enabled). The Guzzle client is created like: new \GuzzleHttp\Client().

@mvorisek
Copy link
Author

Are you able to reproduce it on your side? Is there anything that can be improved in this lib/php-src curl/streams?

@GrahamCampbell
Copy link
Member

I have not had any time to look into this. If you could get a flamegraph going, that would be helpful.

@TimWolla
Copy link
Contributor

The default curl handler eagerly retrieves the full response body into a php://temp stream before returning the response. php://temp will spill over into a disk-backed file after a certain size (2 MB by default). Thus you are reading the entire response body from net, write it into a file and then read the entire file. You are likely bottlenecked by your disk there

By setting 'stream' => true to use the StreamHandler, instead of the CurlHandler, I achieve 654 MB/s against a local nginx without using TLS. This is bottlenecked by PHP's default chunk size of 8192, resulting in quite a bit of syscall overhead. By calling stream_set_chunk_size($resource, 1024 * 1024) and raising the ->read() size accordingly I was able to achieve 760 MB/s on an 2.4 GHz CPU.

@mvorisek
Copy link
Author

mvorisek commented May 22, 2023

In atk4/ui project we have huge data streaming test: https://github.com/atk4/ui/blob/9ba705b3ced0b99b31b8378b8fdd33774d0647ca/tests/DemosTest.php#L347

I have modified the class to use http by modifying these methods:

    protected function getClient(): Client
    {
        return new Client(['base_uri' => 'http://localhost/vendor/atk4/ui/']);
    }

    protected function getResponseFromRequest(string $path, array $options = []): ResponseInterface
    {
        $options['stream'] = true;
        ...


    public function testHugeOutputStream(): void
    {
        $sizeMb = 5_000; // 5 GB
        ...

With $options['stream'] = true; it seems there is still something transfered via the disk as the disk write spikes to >200MB/s. And the test fails :(.

@TimWolla can you please test this with the atk4/ui test?

Also as allow_url_fopen is very often disabled, can streaming support be added to curl?

@TimWolla
Copy link
Contributor

can you please test this with the atk4/ui test?

I'm not going to debug your non-trivial code-base for you. I've provided the necessary pointers, the rest is up to you.

@mvorisek
Copy link
Author

By setting 'stream' => true to use the StreamHandler, instead of the CurlHandler

The different output suprised me.

Thank you. As I have written above I need to use curl anyway as allow_url_fopen cannot be enabled. So this is a feature request to add a streaming support for curl. This library is quite popular and there are many usecases where streaming support is needed or at least helpful as caching thru disk is definitely not wanted for repeatable large data transfers.

@mvorisek mvorisek changed the title HTTP download speed is limited to about 500 MB/s Add curl handler streaming support May 23, 2023
@mvorisek
Copy link
Author

With CURLOPT_WRITEFUNCTION is should be possible - example usage: https://stackoverflow.com/questions/2294344/what-for-do-we-use-curlopt-writefunction-in-phps-curl

@mvorisek
Copy link
Author

And the test fails :(

ignore this comment, I was using test server with limited response body size

With CURLOPT_WRITEFUNCTION is should be possible - example usage: https://stackoverflow.com/questions/2294344/what-for-do-we-use-curlopt-writefunction-in-phps-curl

the current impl. already uses it - https://github.com/guzzle/guzzle/blob/7.7.0/src/Handler/CurlFactory.php#L413 - so my question is, what is blocking the curl impl. to allow streaming without intermediate file

@Xaelp
Copy link

Xaelp commented Aug 24, 2023

In case using cURL is not mandatory for your here is an alternative:

The stream options doesn't seem to be supported by using Guzzle's default cURL handler.

I was able to prevent caching responses as temporary files by configuring it as follows:

use GuzzleHttp\Handler\StreamHandler;
use GuzzleHttp\RequestOptions;
use GuzzleHttp\HandlerStack;
use GuzzleHttp\Client;
...

$streamHandler = new CurlHandler();  
$stack = HandlerStack::create($streamHandler);

$client = new Client([
             // .... (here you can add other options)
             RequestOptions::STREAM => true,
            'handler' => $stack
        ]);

Copy link

stale bot commented Mar 17, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 2 weeks if no further activity occurs. Thank you for your contributions.

@stale stale bot added lifecycle/stale No activity for a long time and removed lifecycle/stale No activity for a long time labels Mar 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants