Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TarInputStream.GetNextEntry hangs on corrupt archive #762

Closed
MichaelMalony opened this issue Jul 22, 2022 · 7 comments
Closed

TarInputStream.GetNextEntry hangs on corrupt archive #762

MichaelMalony opened this issue Jul 22, 2022 · 7 comments

Comments

@MichaelMalony
Copy link

Steps to reproduce

var httpClient = new HttpClient();
var stream = await httpClient.GetStreamAsync("https://files.pythonhosted.org/packages/04/b5/fea02ac9306b5d56a5019133f4edda6c9e3cfbd1b5ce663158486ba77eea/diffpy.srfit-1.3.tar.gz");
var gzipStream = new GZipInputStream(stream);
var tarStream = new TarInputStream(gzipStream, Encoding.ASCII);

while (tarStream.CanRead)
{
  var entry = tarStream.GetNextEntry();

  if (entry == null)
  {
    break;
  }
}

tarStream.Close();
gzipStream.Close();

Expected behavior

Exception should be thrown if archive cannot be processed.

Actual behavior

GetNextEntry() call hangs.

Version of SharpZipLib

1.3.3

Obtained from (only keep the relevant lines)

  • Package installed using NuGet

Other notes

Archive is an older python package hosted on PyPi.

Inspecting archive with 7zip shows "Unexpected end of data" for many files.

Also tried fully reading http response stream into byte array and reading from memory stream, same problem.

@piksel
Copy link
Member

piksel commented Jul 23, 2022

Interesting... the archive untars correctly using gunzip + tar x, so I am not sure what the issue is.

Running it with archivediag, it looks like the end blocks are not in the expected format. I think it gets stuck because it finds blocks in the end with the specified length set as zero, and keeps reading the same blocks over and over again. That is definitely a bug if that's the case.

https://pub.p1k.se/sharpziplib/archivediag/diffpy.srfit-1.3.tar.html

@piksel
Copy link
Member

piksel commented Jul 23, 2022

I added some code to just skip all null bytes between entries, and it correctly reads the archive. I still don't understand why there is these numbers of null bytes in the file, especially since it doesn't align with the block size (512 bytes):

dotnet run -- -t ~/Downloads/diffpy.srfit-1.3.tar | grep Skipped
  Skipped 0 null byte(s)
  Skipped 0 null byte(s)
  Skipped 382 null byte(s)
  Skipped 473 null byte(s)
  Skipped 162 null byte(s)
  Skipped 360 null byte(s)
  Skipped 82 null byte(s)
  Skipped 434 null byte(s)
  Skipped 474 null byte(s)
  Skipped 152 null byte(s)
  Skipped 0 null byte(s)
  Skipped 0 null byte(s)
  Skipped 296 null byte(s)
  Skipped 0 null byte(s)
  Skipped 335 null byte(s)
  Skipped 0 null byte(s)
  Skipped 437 null byte(s)
  Skipped 269 null byte(s)
  Skipped 452 null byte(s)
  Skipped 0 null byte(s)
  Skipped 421 null byte(s)
  Skipped 347 null byte(s)
  Skipped 371 null byte(s)
  Skipped 8 null byte(s)
  Skipped 453 null byte(s)
  Skipped 0 null byte(s)
  Skipped 327 null byte(s)
  Skipped 333 null byte(s)
  Skipped 370 null byte(s)
  Skipped 365 null byte(s)
  Skipped 195 null byte(s)
  Skipped 302 null byte(s)
  Skipped 268 null byte(s)
  Skipped 0 null byte(s)
  Skipped 71 null byte(s)
  Skipped 247 null byte(s)
  Skipped 435 null byte(s)
  Skipped 191 null byte(s)
  Skipped 210 null byte(s)
  Skipped 213 null byte(s)
  Skipped 189 null byte(s)
  Skipped 0 null byte(s)
  Skipped 37 null byte(s)
  Skipped 386 null byte(s)
  Skipped 102 null byte(s)
  Skipped 57 null byte(s)
  Skipped 19 null byte(s)
  Skipped 31 null byte(s)
  Skipped 334 null byte(s)
  Skipped 3 null byte(s)
  Skipped 444 null byte(s)
  Skipped 413 null byte(s)
  Skipped 0 null byte(s)
  Skipped 434 null byte(s)
  Skipped 295 null byte(s)
  Skipped 0 null byte(s)
  Skipped 119 null byte(s)
  Skipped 43 null byte(s)
  Skipped 207 null byte(s)
  Skipped 278 null byte(s)
  Skipped 338 null byte(s)
  Skipped 248 null byte(s)
  Skipped 61 null byte(s)
  Skipped 0 null byte(s)
  Skipped 172 null byte(s)
  Skipped 273 null byte(s)
  Skipped 423 null byte(s)
  Skipped 344 null byte(s)
  Skipped 90 null byte(s)
  Skipped 137 null byte(s)
  Skipped 453 null byte(s)
  Skipped 0 null byte(s)
  Skipped 128 null byte(s)
  Skipped 301 null byte(s)
  Skipped 312 null byte(s)
  Skipped 397 null byte(s)
  Skipped 181 null byte(s)
  Skipped 311 null byte(s)
  Skipped 76 null byte(s)
  Skipped 307 null byte(s)
  Skipped 0 null byte(s)
  Skipped 36 null byte(s)
  Skipped 127 null byte(s)
  Skipped 194 null byte(s)
  Skipped 70 null byte(s)
  Skipped 128 null byte(s)
  Skipped 75 null byte(s)
  Skipped 133 null byte(s)
  Skipped 12 null byte(s)
  Skipped 0 null byte(s)
  Skipped 499 null byte(s)
  Skipped 269 null byte(s)
  Skipped 355 null byte(s)
  Skipped 323 null byte(s)
  Skipped 254 null byte(s)
  Skipped 83 null byte(s)
  Skipped 246 null byte(s)
  Skipped 49 null byte(s)
  Skipped 320 null byte(s)
  Skipped 466 null byte(s)
  Skipped 341 null byte(s)
  Skipped 333 null byte(s)
  Skipped 303 null byte(s)
  Skipped 301 null byte(s)
  Skipped 186 null byte(s)
  Skipped 332 null byte(s)
  Skipped 374 null byte(s)
  Skipped 302 null byte(s)
  Skipped 480 null byte(s)
  Skipped 471 null byte(s)
  Skipped 337 null byte(s)
  Skipped 215 null byte(s)
  Skipped 449 null byte(s)
  Skipped 175 null byte(s)
  Skipped 259 null byte(s)
  Skipped 308 null byte(s)
  Skipped 0 null byte(s)
  Skipped 5 null byte(s)
  Skipped 14 null byte(s)
  Skipped 118 null byte(s)
  Skipped 369 null byte(s)
  Skipped 432 null byte(s)
  Skipped 260 null byte(s)
  Skipped 397 null byte(s)
  Skipped 381 null byte(s)
  Skipped 362 null byte(s)
  Skipped 0 null byte(s)
  Skipped 511 null byte(s)
  Skipped 511 null byte(s)
  Skipped 82 null byte(s)
  Skipped 230 null byte(s)
  Skipped 505 null byte(s)

@mikaeleiman
Copy link

mikaeleiman commented Nov 4, 2022

I'm seeing (likely) the same thing in 1.4.0 running in dotnet6. This is the code we're using to extract a .tar.gz:

public static void Extract(string sTarballPath, string sTargetPath)
{
	using (FileStream pInStream = File.OpenRead(sTarballPath))
	{
		using (GZipInputStream pGzipStream = new GZipInputStream(pInStream))
		{
			using (TarArchive pTarArchive = TarArchive.CreateInputTarArchive(pGzipStream, null))
			{
				pTarArchive.ExtractContents(sTargetPath);
			}
		}
	}
}

It hangs somewhere in ExtractContents(). The same file can be extracted using other extractors, so likely not corrupted (at least not beyond all help). The tar was creating using tar (GNU tar) 1.32.

Extracting using the code in this gist works on the same file:

https://gist.github.com/mikaeleiman/f9510716621b2a6343252df35c4259a1

@piksel
Copy link
Member

piksel commented Nov 4, 2022

@mikaeleiman This might be due to #789 if there are no end blocks at the end of the archive. In fact, this issue should be revisited with the new tar fixes...

@piksel
Copy link
Member

piksel commented Nov 4, 2022

Indeed. The file in the original issue unpacks perfectly fine in master.

@mikaeleiman
Copy link

Example file:

example-762.tar.gz

I looked into our code, and it seems that SharpZipLib 1.3.3 handles the file, but 1.4.0 does not. Not sure why we bumped to 1.4.0, but I suspect it was for the dotnet6 support. Seems like 1.3.3 is working fine for our purposes, though.

@piksel
Copy link
Member

piksel commented Nov 4, 2022

@mikaeleiman Yeah, then it probably is #788. It's fixed in master and should be released with 1.4.1 pretty soon.
The example file you provided extracts using master as well:

❯ dotnet run -- -xvfz ~/Downloads/example-762.tar.gz
drwx------ user/None        0 2022-11-04 14:31:27 ./debuglog/
-rw------- user/None      476 2022-11-04 14:31:27 ./debuglog/2178.txt
-rw------- user/None     1434 2022-11-04 14:31:27 ./debuglog/2423.txt
-rw------- user/None    37932 2022-11-04 14:31:27 ./debuglog/3966.txt
-rw------- user/None  1127093 2022-11-04 14:31:27 ./debuglog/4324.txt
-rw------- user/None   371028 2022-11-04 14:31:27 ./debuglog/4330.txt
-rw------- user/None   362221 2022-11-04 14:31:27 ./debuglog/4335.txt
-rw------- user/None   360946 2022-11-04 14:31:27 ./debuglog/4341.txt
-rw------- user/None  1286788 2022-11-04 14:31:27 ./debuglog/6000.txt
-rw------- user/None    72234 2022-11-04 14:31:27 ./statistics.txt

@piksel piksel closed this as completed Dec 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants