Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(archive): An implementation of Tar as streams #4548

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

BlackAsLight
Copy link
Contributor

Following on from this pull request, #4538, I didn't know how to revert the changes on that branch so just made a new one since the idea is that I'm creating a new API and not replacing one.

Usage Examples

import { TarStream } from '@std/archive/tar'

await ReadableStream.from([
  {
    pathname: 'potato/'
  },
  {
    pathname: 'deno.json',
    size: (await Deno.stat('deno.json')).size,
    iterable: (await Deno.open('deno.json')).readable
  },
  {
    pathname: 'deno.lock',
    size: (await Deno.stat('deno.lock')).size,
    iterable: (await Deno.open('deno.lock')).readable
  }
])
  .pipeThrough(new TarStream())
  .pipeThrough(new CompressionStream('gzip'))
  .pipeTo((await Deno.create('./out.tar.gz)).writable)
import { UnTarStream } from '@std/archive/tar'

for await (
  const entry of (await Deno.open('./out.tar.gz'))
    .readable
    .pipeThrough(new DecompressionStream('gzip'))
    .pipeThrough(new UnTarStream())
) {
  console.log(entry.pathname)
  entry
    .readable?
    .pipeTo((await Deno.create(entry.pathname)).writable)
}

@iuioiua iuioiua requested a review from crowlKats April 4, 2024 21:58
@iuioiua
Copy link
Collaborator

iuioiua commented Apr 8, 2024

From a quick skim, this PR shows promise, but there are opportunities to simplify and make the implementation more straightforward to understand. @crowlKats, can you see this as a viable alternative to #1985? If so, I'll go ahead and help polish this up.

Copy link
Member

@crowlKats crowlKats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why this PR is wroking and mine is not is because this one does not use byte streams. Not sure if this should be a blocking action or not, but its definitively a downgrade when comparing to the non-stream API and doesnt line up with other streams we do in CLI that do file or io handling.

That aside, as I discussed with @kt3k, we are a bit concerned that since this PR implements tar from scratch, this will require a lot more exhaustive & thorough tests

archive/tar_stream.ts Outdated Show resolved Hide resolved
archive/tar_stream.ts Outdated Show resolved Hide resolved
archive/tar_stream.ts Outdated Show resolved Hide resolved
archive/tar_stream.ts Show resolved Hide resolved
archive/untar_stream.ts Outdated Show resolved Hide resolved
archive/untar_stream.ts Outdated Show resolved Hide resolved
archive/untar_stream.ts Show resolved Hide resolved
@BlackAsLight
Copy link
Contributor Author

... when comparing to the non-stream API and doesnt line up with other streams we do in CLI that do file or io handling.

With regards to this comment, I'd have assumed there was an intention to remove all Deno references, making it compatible with other runtimes and therefore not offer file or io handling.

Although that was 2y ago now so things may have changed.

@crowlKats
Copy link
Member

that has nothing to do with what I have said. I am talking about streams behaviour being consistent across similar purpouses, in this case being byte streams instead of normal streams

@iuioiua
Copy link
Collaborator

iuioiua commented Apr 10, 2024

@BlackAsLight, we're still determining whether to pursue this PR or #1985. We'll come back to you once a decision has been made.

Copy link

codecov bot commented Apr 12, 2024

Codecov Report

Attention: Patch coverage is 63.54916% with 152 lines in your changes are missing coverage. Please review.

Project coverage is 91.13%. Comparing base (8c87b7f) to head (dacaea5).

Files Patch % Lines
archive/tar_stream.ts 54.59% 83 Missing and 1 partial ⚠️
archive/untar_stream.ts 70.68% 68 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4548      +/-   ##
==========================================
- Coverage   91.44%   91.13%   -0.31%     
==========================================
  Files         480      482       +2     
  Lines       37324    37741     +417     
  Branches     5320     5391      +71     
==========================================
+ Hits        34132    34397     +265     
- Misses       3136     3287     +151     
- Partials       56       57       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@BlackAsLight
Copy link
Contributor Author

Looking into what bytes streams are and how they work, I do think I could implement them into both streams. At least for when they're serving content out. I don't think I'd run into the same problem you're facing in implementing byte streams with it closing early or something. Although I am yet to actually test it.

On a side note the byte streams examples on MDN make use of both the byteRequest and enqueue for pushing, but my implementation only works on a pulling method so it would only ever be doing one or the other based off what the user provided in the options.

@BlackAsLight
Copy link
Contributor Author

I don't really understand why, but for some reason when I added byte support for TarStream it consumed the text variable from the tests instead of copying them. Even when { mode: 'byob' } wasn't provided.

Adding byte support for the UnTarStream does seem doable from my understanding of it, but will be more complex.

Even though you haven't decided if you want to pursue my implementation or not. I'm still going to try and make the improvements in the mean time as I have the time.

@kt3k kt3k changed the title refactor(archive): An implementation of Tar as streams feat(archive): An implementation of Tar as streams Apr 22, 2024
@BlackAsLight
Copy link
Contributor Author

After some playing around with this code, it seems to be occasionally and inconsistently creating bad tarballs with the same information provided. The UnTarStream class can correctly decode the bad tarball, but the tar implementation on my computer seems to sometimes fail to expand it or extract rubbish from it.

@BlackAsLight
Copy link
Contributor Author

BlackAsLight commented Apr 23, 2024

Anyone know why the Ubuntu environment produces a different result? I suspect its from one of these positions when a byobRequest is made but there is no more information to send.

Seems it was a problem with Github Action's Ubuntu environment.

Added parsePathname function abstracting the logic out of TarStream allowing the developer to validate pathnames before providing them to TarStream hoping it doesn't throw an error and require the archive creation to start all over again.
When the appending file was exactly divisible by 512 bytes, an extra 512 bytes was being appending instead of zero to fill in the gap, causing the next header to be read at the wrong place.
Instead of using enqueue, the leftover bytes are saved for later for the next buffer provided.
@BlackAsLight
Copy link
Contributor Author

Looking at the original issue, #1658, they suggested the names TarEncoderStream and TarDecoderStream, would these be better names than TarStream and UnTarStream?

@iuioiua
Copy link
Collaborator

iuioiua commented May 13, 2024

I think TarStream and UntarStream (not UnTarStream) are best.

@codecov-commenter
Copy link

codecov-commenter commented May 24, 2024

Codecov Report

Attention: Patch coverage is 68.02575% with 149 lines in your changes are missing coverage. Please review.

Project coverage is 91.86%. Comparing base (f94f8f3) to head (b04dd13).

Files Patch % Lines
archive/tar_stream.ts 62.55% 72 Missing and 10 partials ⚠️
archive/untar_stream.ts 72.87% 67 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4548      +/-   ##
==========================================
- Coverage   92.15%   91.86%   -0.29%     
==========================================
  Files         488      490       +2     
  Lines       38873    39339     +466     
  Branches     5405     5486      +81     
==========================================
+ Hits        35823    36140     +317     
- Misses       2994     3133     +139     
- Partials       56       66      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants