Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to use Builder in append mode, that would not write empty blocks on the end of the archive #227

Open
synek317 opened this issue May 25, 2020 · 5 comments · May be fixed by #234
Open

Comments

@synek317
Copy link

I've very recently used this library to stream tar.gz output using Rocket. While I was able to do this pretty easily, I had to give up using tar::Builder only because it forces me to call finish in all destructors (e.g.drop or into_inner).

Imagine, that I have StreamedBody struct that implements Read. In read, it loads some resources (for example from the DB or file system) and archives them using tar. Since it loads only data chunk, it also has to produce a chunk of tar = header + data. There is great tar::Builder::append_data method but

  1. Builder has to live only inside Read::read, otherwise I run into borrowing issues (Builder holds a reference to a buffer. In my example, both Builder and buffer would be fields of StreamedBody which is not easily achievable)
  2. because of (1) and because Builder writes empty blocks in drop, every time a chunk is archived, the end of archive mark is written.

I think we could add tar::Builder::continuous(&mut self) that would set a flag and then in tar::Builder::finish we could only write empty blocks if continuous == false.

@alexcrichton
Copy link
Owner

Thanks for the report! I'm not really sure I fully understand what's going on here, though? Can you perhaps give an example of where this might help things?

@synek317
Copy link
Author

Sorry, it seems I really wasn't clear in the initial report. Maybe this code can explain it better:

struct StreamGzResponse {
    // some iterator over files or cursor in DB or smthg similar
}

impl std::io::Read for StreamGzResponse {
    fn read(&mut self, mut buf: &mut [u8]) -> Result<usize, std::io::Error> {
        if self.finished { return Ok(0); }

        if let Some(doc) = self.get_next_document() {
            /* append 1 tar entry to buf, but don't finish tar */
            /* it would be easies using `tar::Builder` but if I create it here,
               it will write [0;1024], which is something that will make
               the next entry not visible for tar -x for example */
            return Ok(doc_bytes.length())
        }
        
        /* write [0;1024] to buf */
        self.finished = true;
    }
}

Of course it is simplified, e.g. it assumes that buf is big enough to handle one document or even [0;1024] at once, but I hope it shows the intention.

The 'problem' is quite easy to work around but still having opt-out from finish in tar::Builder would be nicer I think.

@alexcrichton
Copy link
Owner

Hm I'm still not sure I entirely understand. Is the idea that you want to use tar::Builder but when it's destroyed you don't want to finish, but you'd rather keep writing it later? Is there no way to have tar::Builder extend its lifetime to cover the entire write operation?

@synek317
Copy link
Author

Is the idea that you want to use tar::Builder but when it's destroyed you don't want to finish, but you'd rather keep writing it later?

Yes, exactly

Is there no way to have tar::Builder extend its lifetime to cover the entire write operation?

Unless I miss something - no, at least without additional memory. tar::Builder takes underlying writer when it is created, while place, where I want to write, exists only inside std::io::Read::read.

I would need to store tar::Builder with its own buffer inside StreamGzResponse, then in read function write to that buffer to create an archive entry and finally move memory from this intermediate buffer to the target buf.

It sounds unnecessary IMO, especially that all underlying operations are exposed by the crate so I can do it efficiently already, just in a more verbose, less convenient way.

@alexcrichton
Copy link
Owner

Ah ok makes sense. Given that the tar format is suitable for simply appending anything at any time, it seems reasonable to me to add a function like this to say "don't write the trailing bytes, I'll do that later"

@synek317 synek317 linked a pull request Jun 17, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants