Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read larger chunks from sftp in stream wrapper #1452

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

icewind1991
Copy link

This speeds up reading from the stream wrapper about 5x in my local testing, increasing the chunk size further doesn't seem to make a difference

@bantu
Copy link
Member

bantu commented Feb 22, 2020

@icewind1991 Hi Robin. I've not been active in this project for quite some time, so please bear with me.

My understanding is that, essentially, you're introducing a buffer between (external) SFTP requests and (internal) PHP stream requests, correct? That seems to make sense to me.

if (strlen($this->buffer) < $count) {
// read 80k chunks from sftp even though php will only request 8k chunk
// this allows us to minimize the overhead of making sftp requests
$chunk = $this->sftp->get($this->path, false, $this->pos, 81920);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the 81920 a parameter? I think there is a mechanism for that on stream construction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what bantu is talking about is this:

<?php
require __DIR__ . '/vendor/autoload.php';

use phpseclib3\Net\SFTP\Stream;
use phpseclib3\Net\SFTP;

Stream::register();

$sftp = new SFTP('website.com');
$sftp->login('username', 'password');

$context = [
    'sftp' => ['sftp' => $sftp]
];
$context = stream_context_create($context);

$fp = fopen('sftp://dummy.com/home/vagrant/1mb', 'r', false, $context);
echo fread($fp, 10);

Other options are discussed at https://www.php.net/manual/en/wrappers.ssh2.php

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, "stream context" is what I was referring to.

@terrafrost
Copy link
Member

For the most common use case this does indeed appear to speed things up a good amount. A much less common use case, however, would be slowed down by this. eg. randomly seeking / reading 1kb chunks of a file.

That said, I noticed you targeted the master branch with this PR. Maybe 3.0 might not be a bad idea? I say that because long term I do want to do something like #655 (comment) , which could very well eliminate any speedups your PR provides.

}

$result = substr($this->buffer, 0, $count);
$this->buffer = substr($this->buffer, $count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nitpicky but... you could do $result = \phpseclib3\Common\Functions\Strings::shift($this->buffer, $count) instead. Also, I'd do $this->buffer.= $chunk instead of $this->buffer .= $chunk.

@bantu
Copy link
Member

bantu commented Feb 23, 2020

What if

  • multiple chunks are loaded into the buffer
  • the first one is read from the stream (i.e. buffer), modified, and written back to the underlying file of the stream at chunk 2

Does the stream still behave in accordance to its contracts? Pre-patch it would probably return the newly written data at chunk 2 and post-patch it would return stale data from the buffer.

@terrafrost
Copy link
Member

@bantu - good points. Maybe it'd just be easier if I implemented the ideas discussed in #655 (comment) . I could do that in the 1.0 / 2.0 / 3.0 / master branches. That's not a BC breaking change, after all.

@bantu
Copy link
Member

bantu commented Feb 27, 2020

@terrafrost Maybe call the parameter something like "readahead length" and make it optional. This way Robin can use it for his use case. If the parameter is not supplied, it should behave as before.

@icewind1991 What do you think?

@icewind1991
Copy link
Author

made the readahead length configurable and added cache invalidation logic to prevent reading stale chunks.

@bantu
Copy link
Member

bantu commented Mar 2, 2020

added cache invalidation logic to prevent reading stale chunks

In the buffering problem I described in #1452 (comment), I was (implicitly) considering two independent instances accessing the file. Your additional changes do not work in this case. Hence my suggestion for not changing the behavior in case the read ahead buffer length is not supplied.

Comment on lines +123 to +125
* @access public
*/
private $buffer = '';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say @access private?

@terrafrost
Copy link
Member

Here's my idea for an alternate way to deal with this:

https://github.com/terrafrost/phpseclib/tree/1.0-sftp-file-handling

I'm thinking that this - and making nlist and rawlist return an ArrayObject instead of an actual array (see #1418) - could create new versions of phpseclib - 1.1.0, 2.1.0 and 3.1.0.

What I've done can be used as a starting point. If no one else wants to take this task on I can do so in the fullness of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants