Wednesday, 24 March 2010

GZipInputStream, GZipStream, GZipUtil, C#.NET UNIX .gz files - decompression ends prematurely - Workaround

I have been trying for days now to decompress a 10MB gzip file using
  • ICSharpCode.SharpZipLib.GZip.GZipInputStream,
  • Ionic.Zlib.GZipStream,
  • and System.IO.Compression.GZipStream
Using both my own code and other people’s code from all over the Internet.  I have tried analyzing the file to make sure it’s valid – and it is. The only thing I know about this file is that it’s been compressed by a UNIX based system with the “DEFLATE” method, downloaded over http and can be successfully decompressed by 7Zip, WinZip and WinRAR.  But when it comes to decompressing the file within C# with various libraries, it’s just not happening.  It always prematurely ends the decompression, so I get a truncated file.
In the end, I decided to wrap a call to gzip.exe which can be downloaded from http://www.gzip.org/  or the direct Windows Download is here.
Put gzip.exe inside your project folder, in Solution Explorer and ensure “Copy to output directory” is set to “Copy always”.  The code for calling the exe is below.
/// <summary>
/// Uncompresses a gzip file 
/// </summary>
/// <param name="gzipFilename"></param>
/// <returns></returns>
public static string UngzipFile(string gzipFilename)
{
    const string GZIP_EXE_NAME = "gzip.exe";
    string gzipExePath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, GZIP_EXE_NAME);
    string args = string.Format("-d \"{0}\"", gzipFilename);
    var p = System.Diagnostics.Process.Start(gzipExePath, args);
    p.WaitForExit();
    return gzipFilename.Remove(gzipFilename.Length - 3, 3);
}


UPDATE: ok, the files I’m trying to decompress are corrupt – they have no ISIZE or CRC32 GZip File Trailer (well, they are zero, which can’t be good).  I wish there was a way to force it to decompress as the bandwidth usage for uncompressed it massive.

UPDATE MAY 2012: Turns out the GZip file is a multi-part BGZF "Blocked GNU Zip Format" file.  Basically, it's a file with multiple GZip files concatenated.  In order to support BGZF with GZipStream you must first pre-process the GZip file and strip out all the additional GZip file headers and footers... then concatenate all the compressed payloads. See: http://www.onicos.com/staff/iz/formats/gzip.html http://www.gzip.org/zlib/rfc-gzip.html#file-format  http://blastedbio.blogspot.co.uk/2011_11_01_archive.html http://dotnetzip.codeplex.com/discussions/51017