Skip to content

GZip and Tar Samples

Tom PoLáKoSz edited this page Aug 8, 2018 · 2 revisions

(Back to Code Reference main page)

How to use SharpZipLib to work with GZip and Tar files

GZip and Tar files are commonly encountered together. These samples cover handling them both individually and combined.

Table of Contents on this page

Extract the file within a GZip
Simple full extract from a Tar archive
Simple full extract from a TGZ or .Tar.GZip archive
Extract from a Tar with full control
Create a TGZ (.tar.gz)
Create a TAR or TGZ with control over filenames and data source
Updating files within a .tgz

Extract the file within a GZip

You create a new instance of GZipInputStream, passing in a stream (of any kind) containing the archive. You then read the contents of this stream until eof. This straightforward example shows how to extract the contents of a gzip file, and write the content to a disk file in the nominated directory.

using System;
using System.IO;
using ICSharpCode.SharpZipLib.Core;
using ICSharpCode.SharpZipLib.GZip;

/// <summary>
// Extracts the file contained within a GZip to the target dir.
// A GZip can contain only one file, which by default is named the same as the GZip except
// without the extension.
/// </summary>
public void ExtractGZipSample(string gzipFileName, string targetDir)
{
    // Use a 4K buffer. Any larger is a waste.    
    byte[ ] dataBuffer = new byte[4096];

    using (System.IO.Stream fs = new FileStream(gzipFileName, FileMode.Open, FileAccess.Read))
    {
        using (GZipInputStream gzipStream = new GZipInputStream(fs))
        {
            // Change this to your needs
            string fnOut = Path.Combine(targetDir, Path.GetFileNameWithoutExtension(gzipFileName));

            using (FileStream fsOut = File.Create(fnOut))
            {
                StreamUtils.Copy(gzipStream, fsOut, dataBuffer);
            }
        }
    }
}

VB

    Imports System
    Imports System.IO
    Imports ICSharpCode.SharpZipLib.Core
    Imports ICSharpCode.SharpZipLib.GZip
    
    ' Extracts the file contained within a GZip to the target dir.
    ' A GZip can contain only one file, which by default is named the same as the GZip except
    ' without the extension.
    '
    Public Sub ExtractGZipSample(gzipFileName As String, targetDir As String)
    
        ' Use a 4K buffer. Any larger is a waste.  
        Dim dataBuffer As Byte() = New Byte(4095) {}
    
    	Using fs As System.IO.Stream = New FileStream(gzipFileName, FileMode.Open, FileAccess.Read)
    		Using gzipStream As New GZipInputStream(fs)
    
    			' Change this to your needs
    			Dim fnOut As String = Path.Combine(targetDir, Path.GetFileNameWithoutExtension(gzipFileName))
    
    			Using fsOut As FileStream = File.Create(fnOut)
    				StreamUtils.Copy(gzipStream, fsOut, dataBuffer)
    			End Using
    		End Using
    	End Using
    End Sub

Simple full extract from a Tar archive

A Tar file or archive is essentially a simple concatenation of multiple files. If you only need to extract all the contents of the tar to a folder path with no conditionals or name transformations, this easy example may be all you need.

using System;
using System.IO;
using ICSharpCode.SharpZipLib.Tar;

public void ExtractTar(String tarFileName, String destFolder)
{
    Stream inStream = File.OpenRead(tarFileName);

    TarArchive tarArchive = TarArchive.CreateInputTarArchive(inStream);
    tarArchive.ExtractContents(destFolder);
    tarArchive.Close();

    inStream.Close();
}

VB

Imports System
Imports System.IO
Imports ICSharpCode.SharpZipLib.Tar
    
Public Sub ExtractTar(tarFileName As String, destFolder As String)

	Dim inStream As Stream = File.OpenRead(tarFileName)

	Dim tarArchive As TarArchive = TarArchive.CreateInputTarArchive(inStream)
	tarArchive.ExtractContents(destFolder)
	tarArchive.Close()

	inStream.Close()
End Sub

Simple full extract from a TGZ (.tar.gz)

A Unix TGZ provides concatenation of multiple files (tar) with compression (gzip). This sample illustrates the automatic extraction capabilities of the library. The folder structure of the Tar archive is preserved, within the nominated target directory.

using ICSharpCode.SharpZipLib.GZip;
using ICSharpCode.SharpZipLib.Tar;

// example: ExtractTGZ(@"c:\temp\test.tar.gz", @"C:\DestinationFolder")
public void ExtractTGZ(String gzArchiveName, String destFolder)
{
    Stream inStream = File.OpenRead(gzArchiveName);
    Stream gzipStream = new GZipInputStream(inStream);

    TarArchive tarArchive = TarArchive.CreateInputTarArchive(gzipStream);
    tarArchive.ExtractContents(destFolder);
    tarArchive.Close();

    gzipStream.Close();
    inStream.Close();
}

VB

    Imports ICSharpCode.SharpZipLib.GZip
    Imports ICSharpCode.SharpZipLib.Tar
    
    ' for example: 	ExtractTGZ("c:\temp\test.tar.gz", "C:\DestinationFolder")
    
    Public Sub ExtractTGZ(ByVal gzArchiveName As String, ByVal destFolder As String)
    
    	Dim inStream As Stream = File.OpenRead(gzArchiveName)
    	Dim gzipStream As Stream = New GZipInputStream(inStream)
    
    	Dim tarArchive As TarArchive = TarArchive.CreateInputTarArchive(gzipStream)
    	tarArchive.ExtractContents(destFolder)
    	tarArchive.Close()
    
    	gzipStream.Close()
    	inStream.Close()
    End Sub

Extract from a Tar with full control

By contrast with the sample above, this sample traverses through the tar, one entry at a time, extracting the contents to the nominated folder and allowing for skipping or renaming of individual entries. Updated: Also handles Ascii translate, and fixes problem if TAR entry filename begins with a "". Now sets the file date/time.

using System;
using System.IO;
using ICSharpCode.SharpZipLib.Tar;

/// </summary>
// Iterates through each file entry within the supplied tar,
// extracting them to the nominated folder.
/// </summary>
public void ExtractTarByEntry(string tarFileName, string targetDir, bool asciiTranslate)
{
    using (FileStream fsIn = new FileStream(tarFileName, FileMode.Open, FileAccess.Read))
    {
        TarInputStream tarIn = new TarInputStream(fsIn);
        TarEntry tarEntry;
        while ((tarEntry = tarIn.GetNextEntry()) != null)
        {
            if (tarEntry.IsDirectory)
                continue;

            // Converts the unix forward slashes in the filenames to windows backslashes
            string name = tarEntry.Name.Replace('/', Path.DirectorySeparatorChar);

            // Remove any root e.g. '\' because a PathRooted filename defeats Path.Combine
            if (Path.IsPathRooted(name))
                name = name.Substring(Path.GetPathRoot(name).Length);

            // Apply further name transformations here as necessary
            string outName = Path.Combine(targetDir, name);

            string directoryName = Path.GetDirectoryName(outName);

            // Does nothing if directory exists
            Directory.CreateDirectory(directoryName);

            FileStream outStr = new FileStream(outName, FileMode.Create);

            if (asciiTranslate)
                CopyWithAsciiTranslate(tarIn, outStr);
            else
                tarIn.CopyEntryContents(outStr);

            outStr.Close();

            // Set the modification date/time. This approach seems to solve timezone issues.
            DateTime myDt = DateTime.SpecifyKind(tarEntry.ModTime, DateTimeKind.Utc);
            File.SetLastWriteTime(outName, myDt);
        }

        tarIn.Close();
    }
}

private void CopyWithAsciiTranslate(TarInputStream tarIn, Stream outStream)
{
    byte[ ] buffer = new byte[4096];
    bool isAscii = true;
    bool cr = false;

    int numRead = tarIn.Read(buffer, 0, buffer.Length);
    int maxCheck = Math.Min(200, numRead);
    for (int i = 0; i < maxCheck; i++)
    {
        byte b = buffer[i];
        if (b < 8 || (b > 13 && b < 32) || b == 255)
        {
            isAscii = false;
            break;
        }
    }

    while (numRead > 0)
    {
        if (isAscii)
        {
            // Convert LF without CR to CRLF. Handle CRLF split over buffers.
            for (int i = 0; i < numRead; i++)
            {
                byte b = buffer[i];     // assuming plain Ascii and not UTF-16
                if (b == 10 && !cr)     // LF without CR
                    outStream.WriteByte(13);
                cr = (b == 13);

                outStream.WriteByte(b);
            }
        }
        else
            outStream.Write(buffer, 0, numRead);

        numRead = tarIn.Read(buffer, 0, buffer.Length);
    }
}

VB

    Imports System
    Imports System.IO
    Imports ICSharpCode.SharpZipLib.Tar
    
    ' Iterates through each file entry within the supplied tar,
    ' extracting them to the nominated folder.
    '
    Public Sub ExtractTarByEntry(tarFileName As String, targetDir As String)
    
    	Using fsIn As New FileStream(tarFileName, FileMode.Open, FileAccess.Read)
    
    		' The TarInputStream reads a UNIX tar archive as an InputStream.
    		'
    		Dim tarIn As New TarInputStream(fsIn)
    
    		Dim tarEntry As TarEntry
    
    		While (InlineAssignHelper(tarEntry, tarIn.GetNextEntry())) IsNot Nothing
    
    			If tarEntry.IsDirectory Then
    				Continue While
    			End If
    			' Converts the unix forward slashes in the filenames to windows backslashes
    			'
    			Dim name As String = tarEntry.Name.Replace("/"C, Path.DirectorySeparatorChar)
    
    			' Apply further name transformations here as necessary
    			Dim outName As String = Path.Combine(targetDir, name)
    
    			Dim directoryName As String = Path.GetDirectoryName(outName)
    			Directory.CreateDirectory(directoryName)
    
    			Dim outStr As New FileStream(outName, FileMode.Create)
    			If asciiTranslate Then
    				CopyWithAsciiTranslate(tarIn, outStr)
    			Else
    				tarIn.CopyEntryContents(outStr)
    			End If
    			outStr.Close()
    			' Set the modification date/time. This approach seems to solve timezone issues.
    			Dim myDt As DateTime = DateTime.SpecifyKind(tarEntry.ModTime, DateTimeKind.Utc)
    			File.SetLastWriteTime(outName, myDt)
    		End While
    		tarIn.Close()
    	End Using
    End Sub
    
    Private Sub CopyWithAsciiTranslate(tarIn As TarInputStream, outStream As Stream)
    	Dim buffer As Byte() = New Byte(4095) {}
    	Dim isAscii As Boolean = True
    	Dim cr As Boolean = False
    
    	Dim numRead As Integer = tarIn.Read(buffer, 0, buffer.Length)
    	Dim maxCheck As Integer = Math.Min(200, numRead)
    	For i As Integer = 0 To maxCheck - 1
    		Dim b As Byte = buffer(i)
    		If b < 8 OrElse (b > 13 AndAlso b < 32) OrElse b = 255 Then
    			isAscii = False
    			Exit For
    		End If
    	Next
    	While numRead > 0
    		If isAscii Then
    			' Convert LF without CR to CRLF. Handle CRLF split over buffers.
    			For i As Integer = 0 To numRead - 1
    				Dim b As Byte = buffer(i)	' assuming plain Ascii and not UTF-16
    				If b = 10 AndAlso Not cr Then	' LF without CR
    					outStream.WriteByte(13)
    				End If
    				cr = (b = 13)
    
    				outStream.WriteByte(b)
    			Next
    		Else
    			outStream.Write(buffer, 0, numRead)
    		End If
    		numRead = tarIn.Read(buffer, 0, buffer.Length)
    	End While
    End Sub

Create a TGZ (.tar.gz)

This shows how to create a tar archive and gzip that at the same time. This example recurses down a directory structure adding all the files.

For more advanced options giving control over filenames and data source, see the next example.

using System;
using System.IO;
using ICSharpCode.SharpZipLib.GZip;
using ICSharpCode.SharpZipLib.Tar;

//  example: CreateTarGZ(@"c:\temp\gzip-test.tar.gz", @"c:\data");
private void CreateTarGZ(string tgzFilename, string sourceDirectory)
{
    Stream outStream = File.Create(tgzFilename);
    Stream gzoStream = new GZipOutputStream(outStream);
    TarArchive tarArchive = TarArchive.CreateOutputTarArchive(gzoStream);

    // Note that the RootPath is currently case sensitive and must be forward slashes e.g. "c:/temp"
    // and must not end with a slash, otherwise cuts off first char of filename
    // This is scheduled for fix in next release
    tarArchive.RootPath = sourceDirectory.Replace('\\', '/');
    if (tarArchive.RootPath.EndsWith("/"))
        tarArchive.RootPath = tarArchive.RootPath.Remove(tarArchive.RootPath.Length - 1);

    AddDirectoryFilesToTar(tarArchive, sourceDirectory, true);

    tarArchive.Close();
}

private void AddDirectoryFilesToTar(TarArchive tarArchive, string sourceDirectory, bool recurse)
{
    // Optionally, write an entry for the directory itself.
    // Specify false for recursion here if we will add the directory's files individually.
    TarEntry tarEntry = TarEntry.CreateEntryFromFile(sourceDirectory);
    tarArchive.WriteEntry(tarEntry, false);

    // Write each file to the tar.
    string[] filenames = Directory.GetFiles(sourceDirectory);
    foreach (string filename in filenames)
    {
        tarEntry = TarEntry.CreateEntryFromFile(filename);
        tarArchive.WriteEntry(tarEntry, true);
    }

    if (recurse)
    {
        string[] directories = Directory.GetDirectories(sourceDirectory);
        foreach (string directory in directories)
            AddDirectoryFilesToTar(tarArchive, directory, recurse);
    }
}

VB

    Imports System
    Imports System.IO
    Imports ICSharpCode.SharpZipLib.GZip
    Imports ICSharpCode.SharpZipLib.Tar
    
    ' Calling example
    	CreateTarGZ(@"c:\temp\gzip-test.tar.gz", @"c:\data");
    
    
    Private Sub CreateTarGZ(tgzFilename As String, sourceDirectory As String)
    	Dim outStream As Stream = File.Create(tgzFilename)
    	Dim gzoStream As Stream = New GZipOutputStream(outStream)
    	Dim tarArchive__1 As TarArchive = TarArchive.CreateOutputTarArchive(gzoStream)
    
    	' Note that the RootPath is currently case sensitive and must be forward slashes e.g. "c:/temp"
    	' and must not end with a slash, otherwise cuts off first char of filename
    	' This is scheduled for fix in next release
    	tarArchive__1.RootPath = sourceDirectory.Replace("\"C, "/"C)
    	If tarArchive__1.RootPath.EndsWith("/") Then
    		tarArchive__1.RootPath = tarArchive__1.RootPath.Remove(tarArchive__1.RootPath.Length - 1)
    	End If
    
    	AddDirectoryFilesToTar(tarArchive__1, sourceDirectory, True)
    
    	tarArchive__1.Close()
    End Sub
    Private Sub AddDirectoryFilesToTar(tarArchive As TarArchive, sourceDirectory As String, recurse As Boolean)
    
    	' Optionally, write an entry for the directory itself.
    	' Specify false for recursion here if we will add the directory's files individually.
    	'
    	Dim tarEntry__1 As TarEntry = TarEntry.CreateEntryFromFile(sourceDirectory)
    	tarArchive.WriteEntry(tarEntry__1, False)
    
    	' Write each file to the tar.
    	'
    	Dim filenames As String() = Directory.GetFiles(sourceDirectory)
    	For Each filename As String In filenames
    		tarEntry__1 = TarEntry.CreateEntryFromFile(filename)
    		tarArchive.WriteEntry(tarEntry__1, True)
    	Next
    
    	If recurse Then
    		Dim directories As String() = Directory.GetDirectories(sourceDirectory)
    		For Each directory__2 As String In directories
    			AddDirectoryFilesToTar(tarArchive, directory__2, recurse)
    		Next
    	End If
    End Sub

Create a TAR or TGZ with control over filenames and data source

This shows how to create a TAR or TAR.GZ archive, using manual creation of entries and copying data to output. This sample shows the processing of files in a directory, and recursing down the directory structure.

To illustrate how to create TAR entries from any stream data, in this example we use the following construct: (Note that the type is the abstract Stream class.)

Stream inputStream = File.OpenRead(filename)

You can replace this with a Stream sourced in any other way - for example a MemoryStream (it does not have to be a File stream).

using System;
using System.IO;
using ICSharpCode.SharpZipLib.Tar;

public void TarCreateFromStream()
{
    // Create an output stream. Does not have to be disk, could be MemoryStream etc.
    string tarOutFn = @"c:\temp\test.tar";
    Stream outStream = File.Create(tarOutFn);

    // If you wish to create a .Tar.GZ (.tgz):
    // - set the filename above to a ".tar.gz",
    // - create a GZipOutputStream here
    // - change "new TarOutputStream(outStream)" to "new TarOutputStream(gzoStream)"
    // Stream gzoStream = new GZipOutputStream(outStream);
    // gzoStream.SetLevel(3); // 1 - 9, 1 is best speed, 9 is best compression

    TarOutputStream tarOutputStream = new TarOutputStream(outStream);

    CreateTarManually(tarOutputStream, @"c:\temp\debug");

    // Closing the archive also closes the underlying stream.
    // If you don't want this (e.g. writing to memorystream), set tarOutputStream.IsStreamOwner = false
    tarOutputStream.Close();
}

private void CreateTarManually(TarOutputStream tarOutputStream, string sourceDirectory)
{
    // Optionally, write an entry for the directory itself.
    TarEntry tarEntry = TarEntry.CreateEntryFromFile(sourceDirectory);
    tarOutputStream.PutNextEntry(tarEntry);

    // Write each file to the tar.
    string[] filenames = Directory.GetFiles(sourceDirectory);

    foreach (string filename in filenames)
    {
        // You might replace these 3 lines with your own stream code

        using (Stream inputStream = File.OpenRead(filename))
        {
            string tarName = filename.Substring(3); // strip off "C:\"

            long fileSize = inputStream.Length;

            // Create a tar entry named as appropriate. You can set the name to anything,
            // but avoid names starting with drive or UNC.
            TarEntry entry = TarEntry.CreateTarEntry(tarName);

            // Must set size, otherwise TarOutputStream will fail when output exceeds.
            entry.Size = fileSize;

            // Add the entry to the tar stream, before writing the data.
            tarOutputStream.PutNextEntry(entry);

            // this is copied from TarArchive.WriteEntryCore
            byte[] localBuffer = new byte[32 * 1024];
            while (true)
            {
                int numRead = inputStream.Read(localBuffer, 0, localBuffer.Length);
                if (numRead <= 0)
                    break;

                tarOutputStream.Write(localBuffer, 0, numRead);
            }
        }
        tarOutputStream.CloseEntry();
    }

    // Recurse. Delete this if unwanted.

    string[] directories = Directory.GetDirectories(sourceDirectory);
    foreach (string directory in directories)
        CreateTarManually(tarOutputStream, directory);
}

Updating files within a .tgz (.tar.gzip)

The Unix .tgz or .tar.gz format is almost the equivalent of a Zip archive in Windows, but this combination does not allow directly adding or replacing files within the archive. This is because all the files are concatenated into a single file (tar) which is then compressed as a unit.

Updating items within this would require the decompressing into the original tar, creating a new tar from the old one plus changes, and recompressing the entire thing.

Back to Code Reference main page