Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readTextFile will only read 16KB of data per tick of event loop #9149

Closed
DavidDeSimone opened this issue Jan 18, 2021 · 2 comments
Closed

readTextFile will only read 16KB of data per tick of event loop #9149

DavidDeSimone opened this issue Jan 18, 2021 · 2 comments

Comments

@DavidDeSimone
Copy link
Contributor

DavidDeSimone commented Jan 18, 2021

There is an outstanding issue in Tokio (see tokio-rs/tokio#1976) that a call to read a AsyncFile into a buffer will only read at most 16KB, no matter how much data is available, or the size of the buffer passed into the function. Based on my initial testing, this can cause inferior performance compared to reading up to the capacity of a buffer using the logic in buffer.js

async readFrom(r) {
let n = 0;
const tmp = new Uint8Array(MIN_READ);
while (true) {
const shouldGrow = this.capacity - this.length < MIN_READ;
// read into tmp buffer if there's not enough room
// otherwise read directly into the internal buffer
const buf = shouldGrow
? tmp
: new Uint8Array(this.#buf.buffer, this.length);
const nread = await r.read(buf);
if (nread === null) {
return n;
}
// write will grow if needed
if (shouldGrow) this.writeSync(buf.subarray(0, nread));
else this.#reslice(this.length + nread);
n += nread;
}
}

If you were to log nread per loop iteration for a large file, you will see it is always 16KB. If you were to log nread in the synchronous version of this function for the same large file, you will see that nread grows with as the resize logic increases the buffer size.

My proposed solution is to modify the read logic in io.rs to explicitly read up to our buffer size, or until we have completed our read. Below is a diff for testing:

diff --git a/runtime/ops/io.rs b/runtime/ops/io.rs
index 607da74c..75f3a699 100644
--- a/runtime/ops/io.rs
+++ b/runtime/ops/io.rs
@@ -498,6 +498,15 @@ pub fn op_read(
           stream.read(&mut zero_copy).await?
         } else if let Some(stream) = resource.downcast_rc::<StreamResource>() {
-          stream.clone().read(&mut zero_copy).await?
+          let mut total_bytes_read = 0;
+          let len = zero_copy.len();
+          let mut slice = &mut zero_copy[0..len];
+          while total_bytes_read < len {
+            let read = stream.clone().read(&mut slice).await?;
+             if read == 0 { break; }
+            total_bytes_read += read;
+            slice = &mut zero_copy[total_bytes_read..len];
+          }
+          total_bytes_read
         } else {
           return Err(bad_resource_id());
         };

Using the following minibenchmark, I am able to see a performance improvement for reading large files multiple times. Output is in milliseconds.

user@ubuntu:~/proj/ng-async-files$ deno run --allow-read benchmark.js
Starting...
3046
user@ubuntu:~/proj/ng-async-files$ deno run --allow-read benchmark.js
Starting...
2899
user@ubuntu:~/proj/ng-async-files$ deno run --allow-read benchmark.js
Starting...
2980
user@ubuntu:~/proj/ng-async-files$ ../deno/target/release/deno run --allow-read benchmark.js      
Starting...
2462
user@ubuntu:~/proj/ng-async-files$ ../deno/target/release/deno run --allow-read benchmark.js
Starting...
2444
user@ubuntu:~/proj/ng-async-files$ ../deno/target/release/deno run --allow-read benchmark.js
Starting...
2458
user@ubuntu:~/proj/ng-async-files$ 

deno is the latest release downloaded today, and target is a build with this patch applied. The benchmark code is as follows. large-file is a 110MB plain text file that I generated at random prior to starting the test:

let promises = [];
const now = Date.now();
for (let i = 0; i < 8; ++i) {
    promises.push(Deno.readTextFile("./large-file"));
}

Promise.all(promises)
    .then(() => console.log(Date.now() - now));

If the core maintainers are interested in me submitting this patch as a PR, I would be more than happy to. If there are additional tests or benchmarks they would like me to run, I would be happy to do that as well.

@DavidDeSimone
Copy link
Contributor Author

I found a flaw with my initial patch, I wasn't incrementing the index of the buffer to insert the data into. I've updated the patch and benchmarks in the initial issue.

@DavidDeSimone
Copy link
Contributor Author

After discussion on chat with some contributors, I am going to close this issue. While it's possible to improve performance via this patch, it would be a breaking change to existing behavior for anything that implements a stream in which you would read less data then the size of your buffer (think a low UDP stream). Even limiting only to files, it would change behavior for things like reading a single character from stdin when your buffer is larger than 1 character (see raw_mode.ts).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant