fs: Data loss while retrying `File::flush` when disk is full #6325

Xuanwo · 2024-02-04T10:06:37Z

Version

:) cargo tree | grep tokio
│   │   └── tokio v1.36.0
│   │       └── tokio-macros v2.2.0 (proc-macro)
│   │   │   │   ├── tokio v1.36.0 (*)
│   │   │   │   ├── tokio-util v0.7.10
│   │   │   │   │   ├── tokio v1.36.0 (*)
│   │   │   │   ├── tokio v1.36.0 (*)
│   │   │   │   ├── tokio v1.36.0 (*)
│   │   │   │   └── tokio-rustls v0.24.1
│   │   │   │       └── tokio v1.36.0 (*)
│   │   │   ├── tokio v1.36.0 (*)
│   │   │   ├── tokio-rustls v0.24.1 (*)
│   │   │   ├── tokio-util v0.7.10 (*)
│   ├── tokio v1.36.0 (*)
├── tokio v1.36.0 (*)

Platform

Linux xuanwo-work 6.7.3-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Thu, 01 Feb 2024 10:30:25 +0000 x86_64 GNU/Linux

Description

While addressing apache/opendal#4058, we discovered that retrying File::flush while disk is full could result in data loss.

To reproduce:

Setup a small fs by:

fallocate -l 512K disk.img
mkfs disk.img

mkdir /tmp/test_dir
sudo mount -o loop disk.img /tmp/test_dir

sudo chmod a+wr /tmp/test_dir

Now we have a fs that only have 512K.

Running the follwing code:

use std::env;
use rand::prelude::*;
use tokio::io::AsyncWriteExt;
use tracing_subscriber;

#[tokio::main]
async fn main() -> Result<()> {
    tracing_subscriber::fmt::init();

    let path = &env::var("OPENDAL_FS_ROOT").expect("root must be set for this test");

    let mut f = tokio::fs::OpenOptions::new()
        .create(true)
        .write(true)
        .open(format!("{path}/test"))
        .await
        .unwrap();

    let size = thread_rng().gen_range(512 * 1024 + 1..4 * 1024 * 1024);
    let mut bs = vec![0; size];
    thread_rng().fill_bytes(&mut bs);

    f.write(&bs).await.unwrap();

    let res = f.flush().await;
    dbg!(&res);

    // After some operations, we retry the file flush.
    let res = f.flush().await;
    dbg!(&res);

    Ok(())
}

The full code example code be found at apache/opendal#4141. I remove the opendal related code to make this example more readable.

The output is:

    Finished dev [unoptimized + debuginfo] target(s) in 0.79s
     Running `/home/xuanwo/Code/apache/opendal/core/target/debug/edge_file_close_with_retry_on_full_disk`
[edge/file_close_with_retry_on_full_disk/src/main.rs:48] &res = Err(
    Os {
        code: 28,
        kind: StorageFull,
        message: "No space left on device",
    },
)
[edge/file_close_with_retry_on_full_disk/src/main.rs:52] &res = Ok(
    (),
)

The first time, flush generates StorageFull which is expeceted. But the second time, the same flush call returns Ok.

I expected to see a StorageFull error instead.

The key problem here is:

Users can't recover from this error, even if they try removing other files. The previous write operation returned Ok.
Retrying the flush operation is permitted but risky. Data written may be lost forever once flush returns Ok.

Based on the code here:

tokio/tokio/src/fs/file.rs

Lines 887 to 906 in 63caced

    
               fn poll_flush(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), io::Error>> { 
        
                   if let Some(e) = self.last_write_err.take() { 
        
                       return Poll::Ready(Err(e.into())); 
        
                   } 
        
                   let (op, buf) = match self.state { 
        
                       State::Idle(_) => return Poll::Ready(Ok(())), 
        
                       State::Busy(ref mut rx) => ready!(Pin::new(rx).poll(cx))?, 
        
                   }; 
        
                   // The buffer is not used here 
        
                   self.state = State::Idle(Some(buf)); 
        
                   match op { 
        
                       Operation::Read(_) => Poll::Ready(Ok(())), 
        
                       Operation::Write(res) => Poll::Ready(res), 
        
                       Operation::Seek(_) => Poll::Ready(Ok(())), 
        
                   } 
        
               } 
        
           }

Maybe we should:

Perform the flush (write) again if we still have buf?
Update the last_write_err if the write operation failed?

I'm willing to give it a fix.

The text was updated successfully, but these errors were encountered:

carllerche · 2024-02-06T19:15:42Z

I'm a bit confused. I'm less familiar w/ the details of FS ops.

The key problem here is:

Users can't recover from this error, even if they try removing other files. The previous write operation returned Ok.

Retrying the flush operation is permitted but risky. Data written may be lost forever once flush returns Ok.

The key problem w/ Tokio's impl or getting an err when calling flush in general?

Maybe you could explain how to handle StorageFull and flush w/ blocking std calls and where converting that blocking code to Tokio's fs api fails.

Xuanwo · 2024-02-07T03:12:22Z

I believe it's more like an issue of Tokio's implementation, which doesn't pass the write error on to the user.

I believe it's more like an issue of Tokio's implementation, which doesn't pass the write error on to the user.

Maybe you could explain how to handle StorageFull and flush w/ blocking std calls and where converting that blocking code to Tokio's fs api fails.

There is no direct mapping from std::File::flush to tokio::File::flush. std::File::flush on linux is a no-op, while tokio::File::flush involves to it's internal buffer logic.

I prepared a full repro here: https://github.com/Xuanwo/tokio-issue-6325-storage-full

let n = f.write(&bs).await?;
dbg!(&n);
assert_eq!(n, size, "tokio file always write data into buffer first");

While we calling write on a file, tokio will store it inside buf directly. After flush returns the write error, we cleaned it up and won't write again.

The same repro doesn't work on std::fs since std::fs will return the correct write size in f.write(). User will got the error while trying to write more data.

Fixes: tokio-rs#6325 Signed-off-by: Xuanwo <github@xuanwo.io>

Xuanwo · 2024-02-07T05:44:22Z

During implement #6330, I found that tokio will clear the buffer while error happened during write.

tokio/tokio/src/io/blocking.rs

Lines 258 to 265 in 28d88cc

    
           pub(crate) fn write_to<T: Write>(&mut self, wr: &mut T) -> io::Result<()> { 
        
               assert_eq!(self.pos, 0); 
        
               // `write_all` already ignores interrupts 
        
               let res = wr.write_all(&self.buf); 
        
               self.buf.clear(); 
        
               res 
        
           }

I'm guessing we need to maintain the internal states here instead of droping all data?

Xuanwo added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Feb 4, 2024

Xuanwo mentioned this issue Feb 4, 2024

fix(services/fs,hdfs): fix poll_close when retry apache/opendal#4141

Closed

Darksonn added the M-fs Module: tokio/fs label Feb 4, 2024

Xuanwo linked a pull request Feb 7, 2024 that will close this issue

fs: Spawn new write task while buf is not empty #6330

Open

Xuanwo added a commit to Xuanwo/tokio-tcpstream-debug that referenced this issue Feb 7, 2024

fs: Spawn new write task while buf is not empty

bc27f6c

Fixes: tokio-rs#6325 Signed-off-by: Xuanwo <github@xuanwo.io>

Xuanwo added a commit to Xuanwo/tokio-tcpstream-debug that referenced this issue Feb 7, 2024

fs: Spawn new write task while buf is not empty

28d88cc

Fixes: tokio-rs#6325 Signed-off-by: Xuanwo <github@xuanwo.io>

Xuanwo mentioned this issue Feb 7, 2024

compio's flush is not error safe compio-rs/compio#209

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fs: Data loss while retrying `File::flush` when disk is full #6325

fs: Data loss while retrying `File::flush` when disk is full #6325

Xuanwo commented Feb 4, 2024 •

edited

carllerche commented Feb 6, 2024

Xuanwo commented Feb 7, 2024 •

edited

Xuanwo commented Feb 7, 2024 •

edited

fs: Data loss while retrying File::flush when disk is full #6325

fs: Data loss while retrying File::flush when disk is full #6325

Comments

Xuanwo commented Feb 4, 2024 • edited

carllerche commented Feb 6, 2024

Xuanwo commented Feb 7, 2024 • edited

Xuanwo commented Feb 7, 2024 • edited

fs: Data loss while retrying `File::flush` when disk is full #6325

fs: Data loss while retrying `File::flush` when disk is full #6325

Xuanwo commented Feb 4, 2024 •

edited

Xuanwo commented Feb 7, 2024 •

edited

Xuanwo commented Feb 7, 2024 •

edited