Compression in Dot Net (Zip format).
Like many of you out there, I have
also used
#ZipLib
(pronounced as SharpZipLib) to do compression programmatically. For those who do
not know about #ZipLib, it is an open source compression library for zip
formats, written in C#. The code is available at their
website.
I was thinking that how it is
possible that Microsoft did not provide any type of compression APIs in Dot Net
Framework. After googling for some time, I came to know that there is a way to
do that in dot net. But first let me give you some historical background.
Like many other compression formats
e.g. tar.gz, tar.bz2; zip format is one of most popular formats used for
compression. Its popularity comes from the fact (beside good compression) that
the Zip data format is open and not subject to patents or other legal issues.
Developers are free to create applications that manipulate Zip files and to use
the low-level Zip compression algorithms. The authors of the Zip ,
Jean-loup
Gailly (compression) and
Mark Adler (decompression), made the compression and decompression
algorithms available to developers in a library named
ZLib. This library was adopted by the Java
platform in version 1.1 of the Java Development Kit (JDK) to form the basis of
the Java Archive (JAR) file format. You can find these classes under the
java.util.zip namespace.
Now here comes the dot net. When
Microsoft "transforms" all of the Java APIs to J#, zip API in java
is also inherited in J#. so the zip API is not in the dot net framework
but it is present in J# APIs. Just import the assembly and start zipping.
The java.util.zip namespace is
implemented in the vjslib.dll assembly. This assembly can be found in the
C:\WINNT\Microsoft.NET\Framework\v1.0.4205\ directory (you will need to
replace WINNT with your actual Windows directoroy).
Check out
this article describing all of this in detail. Actually I was just reading
different compression
algorithms like Huffman coding, run length encoding , arithmetic coding,
lz-77 encoding etc when suddenly a question came to my mind that what are the
algorithms used in zip format compression. The
deflation algorithm used by gzip
(also zip and zlib) is a variation of Huffman coding and LZ77 compression. One
can find more details at their website.
A day away from database oriented
applications, a day for just algorithms. :)