<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;DEAMQn8-eyp7ImA9WhRRFE4.&quot;"><id>tag:blogger.com,1999:blog-5230787973393818940</id><updated>2011-11-27T15:33:03.153-08:00</updated><category term="compression" /><category term="java" /><title>Elegant Engineering</title><subtitle type="html" /><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://elegantengineering.blogspot.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://elegantengineering.blogspot.com/" /><author><name>Adam Thomas</name><uri>http://www.blogger.com/profile/10972094452234627438</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="http://4.bp.blogspot.com/_114OSgk1PdE/S3lsy_VFDbI/AAAAAAAAAVU/Vio3pirzpZw/S220/adamthomas.jpg" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>2</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/ElegantEngineering" /><feedburner:info uri="elegantengineering" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry gd:etag="W/&quot;D04NRng-eyp7ImA9WhZXEEU.&quot;"><id>tag:blogger.com,1999:blog-5230787973393818940.post-1067324043852905601</id><published>2008-12-11T10:51:00.001-08:00</published><updated>2011-04-29T07:06:37.653-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-04-29T07:06:37.653-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><category scheme="http://www.blogger.com/atom/ns#" term="compression" /><title>Data Compression Using Java - Part 2</title><content type="html">&lt;div class="separator" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em; text-align: center;"&gt;&lt;img border="0" height="200" src="http://1.bp.blogspot.com/-5HdvZ7UgQqA/Tbo5PUi7H2I/AAAAAAAAAgA/_BiB-3YQ75s/s200/zip.png" width="186" /&gt;&lt;/div&gt;In &lt;a href="http://elegantengineering.blogspot.com/2008/12/data-compression-using-java-part-1.html"&gt;Part 1&lt;/a&gt; of this series I detailed the types of compression, their formats, and when you might want to use each one. This portion of the discussion will begin detailing how to use the Java compression implementations contained within the java.util.zip package.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Methods of Compression&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
If you read the &lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/package-summary.html"&gt;JavaDocs&lt;/a&gt; for the java.util.zip package you will quickly notice that there are many more classes than the ones I will be discussing. Some of the java.util.zip classes perform compression, some wrap their functionality with InputStream and OutputStream, and others support checksum functionality. The compression classes I demonstrate in this posting should address the majority of your needs so don’t feel overwhelmed by the choices available.&lt;br /&gt;
&lt;br /&gt;
The three Java compression methods that I will be discussing are:&lt;br /&gt;
&lt;ul&gt;&lt;li&gt;Zip Compression (using ZipOutputStream and ZipInputStream)&lt;/li&gt;
&lt;li&gt;GZip Compression (using GZipOutputStream and GZipInputStream)&lt;/li&gt;
&lt;li&gt;DEFLATE Compression (using DeflaterOutputStream and InflaterInputStream)&lt;/li&gt;
&lt;/ul&gt;The first question we need to address is when to use each of thee compression types. All three compression types (zip, gzip, and deflate) use the exact same DEFLATE algorithm internally to compress data. The only reason there are multiple formats to choose from is because each format has unique features that make it desirable under certain circumstances.&lt;br /&gt;
&lt;br /&gt;
We will be discussing the compression types in order of most overhead to least overhead. The examples provided will demonstrate compressing a single file so similarities and differences between the compression methods can be better illustrated.&lt;br /&gt;
&lt;br /&gt;
&lt;div align="center"&gt;&lt;script charset="utf-8" src="http://ws.amazon.com/widgets/q?rt=tf_mfw&amp;amp;ServiceVersion=20070822&amp;amp;MarketPlace=US&amp;amp;ID=V20070822/US/computsecurih-20/8001/5e8253cd-a6a7-4a08-98fc-dd35a0b31677" type="text/javascript"&gt;
 
&lt;/script&gt; &lt;noscript&gt;&lt;a href="http://ws.amazon.com/widgets/q?rt=tf_mfw&amp;amp;ServiceVersion=20070822&amp;amp;MarketPlace=US&amp;amp;ID=V20070822%2FUS%2Fcomputsecurih-20%2F8001%2F5e8253cd-a6a7-4a08-98fc-dd35a0b31677&amp;amp;Operation=NoScript"&gt;Amazon.com Widgets&lt;/a&gt;&lt;/noscript&gt;&lt;/div&gt;&lt;br /&gt;
&lt;b&gt;Zip&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
The zip compression format is desirable for compressing multiple directories and files. In &lt;a href="http://elegantengineering.blogspot.com/2008/12/data-compression-using-java-part-1.html"&gt;Part 1&lt;/a&gt; of this series I explained that zip is an archiver and compressor, which allows it to store and compress multiple directories and files.&lt;br /&gt;
&lt;br /&gt;
Zip compression is easily implemented by using the ZipOutputStream and ZipInputStream classes. The following example will only zip a single file to better illustrate code similarities between zip, gzip, and DEFLATE.&lt;br /&gt;
&lt;br /&gt;
Compressing and uncompressing an individual file using zip:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java" name="code"&gt;package compression;

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.zip.CRC32;
import java.util.zip.DeflaterOutputStream;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
import java.util.zip.ZipOutputStream;

public class ZipCompression {

    private final static String FILE_NAME = "data";

    public static void main(String[] args) throws IOException {
        File dataFile = new File(FILE_NAME);

        // read and compress file data in memory using zip format
        byte[] zippedData = zipData(dataFile);

        // persist zipped data to disk
        File zippedFile = new File(FILE_NAME + ".zip");
        writeData(zippedData, zippedFile);

        // read and uncompress zipped file data in memory
        byte[] unzippedData = unzipData(zippedFile);

        // persist unzipped data to disk
        File unzippedFile = new File(FILE_NAME + "-unzipped");
        writeData(unzippedData, unzippedFile);

        // verify our methods are not corrupting data (for debug only)
        ByteArrayOutputStream baos = new ByteArrayOutputStream();

        readData(dataFile, baos);
        byte[] originalFileData = baos.toByteArray();

        baos.reset();

        readData(unzippedFile, baos);
        byte[] unzippedFileData = baos.toByteArray();

        boolean match = checksumsMatch(originalFileData, unzippedFileData);
        System.out.println("Checksums match: " + match);
    }

    private static byte[] zipData(File f) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        ZipOutputStream zipos = new ZipOutputStream(baos);
        ZipEntry entry = new ZipEntry(f.getName());
        zipos.putNextEntry(entry);
        readData(f, zipos);
        return baos.toByteArray();
    }

    private static byte[] unzipData(File f) throws IOException {
        InputStream is = new BufferedInputStream(new FileInputStream(f));
        ZipInputStream zipis = new ZipInputStream(is);
        zipis.getNextEntry();
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        byte[] buf = new byte[1024];
        int len = 0;
        while ((len = zipis.read(buf)) &amp;gt; 0) {
            baos.write(buf, 0, len);
        }
        return baos.toByteArray();
    }

    private static void readData(File f, OutputStream os) throws IOException {
        byte[] buf = new byte[1024];
        int len = 0;
        InputStream is = null;
        try {
            is = new BufferedInputStream(new FileInputStream(f));
            while ((len = is.read(buf)) &amp;gt; 0) {
                os.write(buf, 0, len);
            }
            if (os instanceof DeflaterOutputStream) {
                ((DeflaterOutputStream) os).finish();
            }
        } finally {
            if (is != null) {
                is.close();
            }
        }
    }

    private static void writeData(byte[] data, File f) throws IOException {
        OutputStream os = null;
        try {
            os = new BufferedOutputStream(new FileOutputStream(f));
            os.write(data);
        } finally {
            if (os != null) {
                os.close();
            }
        }
    }

    private static boolean checksumsMatch(byte[] d1, byte[] d2) {
        CRC32 crc = new CRC32();
        crc.update(d1);
        long checksum1 = crc.getValue();
        crc.reset();
        crc.update(d2);
        long checksum2 = crc.getValue();
        return (checksum1 == checksum2);
    }
}
&lt;/pre&gt;&lt;br /&gt;
The program does the following:&lt;br /&gt;
&lt;ul&gt;&lt;li&gt;Reads a single data file&lt;/li&gt;
&lt;li&gt;Compresses the file using zip&lt;/li&gt;
&lt;li&gt;Writes the zipped file to the hard disk&lt;/li&gt;
&lt;li&gt;Reads the zipped file from hard disk&lt;/li&gt;
&lt;li&gt;Unzips the file data&lt;/li&gt;
&lt;li&gt;Writes the unzipped file to the hard disk&lt;/li&gt;
&lt;/ul&gt;At the end of this program you will have three files. The “data” (original) file, the “data.zip” (compressed) file, and the “data-unzipped” (uncompressed) file. The original “data” file and uncompressed “data-unzipped” file should be identical.&lt;br /&gt;
&lt;br /&gt;
You will notice in the code above that I have included additional functionality to determine the CRC32 checksum for the input “data” file and uncompressed “data-unzipped” file. These checks are there to verify that corruption did not occur at any point during compression, decompression, file input/output. When using zip and gzip you should not need these manual checksum tests as both zip and gzip implement a form of checksum. I included the checksum code for illustration purposes only.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;GZip&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
The gzip compression format is desirable for compressing individual files or data streams. The gzip format has the ability to compress a single file or single data stream without incurring the overhead that zip incurs due to directory and file metadata storage.&lt;br /&gt;
&lt;br /&gt;
Compressing and uncompressing an individual file using gzip:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java" name="code"&gt;package compression;

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.OutputStream;
import java.util.zip.CRC32;
import java.util.zip.DeflaterOutputStream;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;

public class GzipCompression {

    private final static String FILE_NAME = "data";

    public static void main(String[] args) throws IOException {
        File dataFile = new File(FILE_NAME);

        // read and compress file data in memory using gzip format
        byte[] gzippedData = gzipData(dataFile);

        // persist gzipped data to disk
        File gzippedFile = new File(FILE_NAME + ".gzip");
        writeData(gzippedData, gzippedFile);

        // read and uncompress gzipped file data in memory
        byte[] ungzippedData = ungzipData(gzippedFile);

        // persist ungzipped data to disk
        File ungzippedFile = new File(FILE_NAME + "-ungzipped");
        writeData(ungzippedData, ungzippedFile);

        // verify our methods are not corrupting data (for debug only)
        ByteArrayOutputStream baos = new ByteArrayOutputStream();

        readData(dataFile, baos);
        byte[] originalFileData = baos.toByteArray();

        baos.reset();

        readData(ungzippedFile, baos);
        byte[] ungzippedFileData = baos.toByteArray();

        boolean match = checksumsMatch(originalFileData, ungzippedFileData);
        System.out.println("Checksums match: " + match);
    }

    private static byte[] gzipData(File f) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        GZIPOutputStream gzipos = new GZIPOutputStream(baos);
        readData(f, gzipos);
        return baos.toByteArray();
    }

    private static byte[] ungzipData(File f) throws IOException {
        InputStream is = new BufferedInputStream(new FileInputStream(f));
        GZIPInputStream gzipis = new GZIPInputStream(is);
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        byte[] buf = new byte[1024];
        int len = 0;
        while ((len = gzipis.read(buf)) &amp;gt; 0) {
            baos.write(buf, 0, len);
        }
        return baos.toByteArray();
    }

    private static void readData(File f, OutputStream os) throws IOException {
        byte[] buf = new byte[1024];
        int len = 0;
        InputStream is = null;
        try {
            is = new BufferedInputStream(new FileInputStream(f));
            while ((len = is.read(buf)) &amp;gt; 0) {
                os.write(buf, 0, len);
            }
            if (os instanceof DeflaterOutputStream) {
                ((DeflaterOutputStream) os).finish();
            }
        } finally {
            if (is != null) {
                is.close();
            }
        }
    }

    private static void writeData(byte[] data, File f) throws IOException {
        OutputStream os = null;
        try {
            os = new BufferedOutputStream(new FileOutputStream(f));
            os.write(data);
        } finally {
            if (os != null) {
                os.close();
            }
        }
    }

    private static boolean checksumsMatch(byte[] d1, byte[] d2) {
        CRC32 crc = new CRC32();
        crc.update(d1);
        long checksum1 = crc.getValue();
        crc.reset();
        crc.update(d2);
        long checksum2 = crc.getValue();
        return (checksum1 == checksum2);
    }
}
&lt;/pre&gt;&lt;br /&gt;
The program does the following:&lt;br /&gt;
&lt;ul&gt;&lt;li&gt;Reads a single data file&lt;/li&gt;
&lt;li&gt;Compresses the file using gzip&lt;/li&gt;
&lt;li&gt;Writes the gzipped file to the hard disk&lt;/li&gt;
&lt;li&gt;Reads the gzipped file from hard disk&lt;/li&gt;
&lt;li&gt;Ungzips the file data&lt;/li&gt;
&lt;li&gt;Writes the ungzipped file to the hard disk&lt;/li&gt;
&lt;/ul&gt;The CRC checksum is only included in this example for illustration purposes only.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;DEFLATE&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
The DEFLATE compression format is lightweight and desirable for individual files or data streams that do not require error checking. DEFLATE is used by zip and gzip to compress data; however both zip and gzip add additional overhead such as CRC checksums. Directory metadata is additionally added to the zip format.&lt;br /&gt;
&lt;br /&gt;
When using DEFLATE directly you can control the compression strategy and level. A compression level of 0-9 may be used to control how DEFLATE compresses data.&lt;br /&gt;
&lt;br /&gt;
Compressing and decompressing an individual file using DEFLATE:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java" name="code"&gt;package compression;

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.zip.CRC32;
import java.util.zip.Deflater;
import java.util.zip.DeflaterOutputStream;
import java.util.zip.InflaterInputStream;

public class DeflateCompression {

    private final static String FILE_NAME = "data";

    public static void main(String[] args) throws IOException {
        File dataFile = new File(FILE_NAME);

        // read and compress file data in memory using deflate format
        byte[] deflatedData = deflateData(dataFile);

        // persist deflated data to disk
        File deflatedFile = new File(FILE_NAME + ".def");
        writeData(deflatedData, deflatedFile);

        // read and uncompress deflated file data in memory
        byte[] inflatedData = inflateData(deflatedFile);

        // persist deflated data to disk
        File inflatedFile = new File(FILE_NAME + "-inflated");
        writeData(inflatedData, inflatedFile);

        // verify our methods are not corrupting data (for debug only)
        ByteArrayOutputStream baos = new ByteArrayOutputStream();

        readData(dataFile, baos);
        byte[] originalFileData = baos.toByteArray();

        baos.reset();

        readData(inflatedFile, baos);
        byte[] inflatedFileData = baos.toByteArray();

        boolean match = checksumsMatch(originalFileData, inflatedFileData);
        System.out.println("Checksums match: " + match);
    }

    private static byte[] deflateData(File f) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        Deflater def = new Deflater(Deflater.BEST_COMPRESSION);
        DeflaterOutputStream deflaterOS = new DeflaterOutputStream(baos, def);
        readData(f, deflaterOS);
        return baos.toByteArray();
    }

    private static byte[] inflateData(File f) throws IOException {
        InputStream is = new BufferedInputStream(new FileInputStream(f));
        InflaterInputStream inflateris = new InflaterInputStream(is);
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        byte[] buf = new byte[1024];
        int len = 0;
        while ((len = inflateris.read(buf)) &amp;gt; 0) {
            baos.write(buf, 0, len);
        }
        return baos.toByteArray();
    }

    private static void readData(File f, OutputStream os) throws IOException {
        byte[] buf = new byte[1024];
        int len = 0;
        InputStream is = null;
        try {
            is = new BufferedInputStream(new FileInputStream(f));
            while ((len = is.read(buf)) &amp;gt; 0) {
                os.write(buf, 0, len);
            }
            if (os instanceof DeflaterOutputStream) {
                ((DeflaterOutputStream) os).finish();
            }
        } finally {
            if (is != null) {
                is.close();
            }
        }
    }

    private static void writeData(byte[] data, File f) throws IOException {
        OutputStream os = null;
        try {
            os = new BufferedOutputStream(new FileOutputStream(f));
            os.write(data);
        } finally {
            if (os != null) {
                os.close();
            }
        }
    }

    private static boolean checksumsMatch(byte[] d1, byte[] d2) {
        CRC32 crc = new CRC32();
        crc.update(d1);
        long checksum1 = crc.getValue();
        crc.reset();
        crc.update(d2);
        long checksum2 = crc.getValue();
        return (checksum1 == checksum2);
    }
}
&lt;/pre&gt;&lt;br /&gt;
The program does the following:&lt;br /&gt;
&lt;ul&gt;&lt;li&gt;Reads a single data file&lt;/li&gt;
&lt;li&gt;Compresses the file using DEFLATE&lt;/li&gt;
&lt;li&gt;Writes the deflated file to the hard disk&lt;/li&gt;
&lt;li&gt;Reads the deflated file from hard disk&lt;/li&gt;
&lt;li&gt;Inflates the file data&lt;/li&gt;
&lt;li&gt;Writes the inflated file to the hard disk&lt;/li&gt;
&lt;/ul&gt;The CRC checksum is only included in this example for illustration purposes only.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Compressing Objects&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
What if you need to compress objects instead of files? The main difference is that the ObjectOutputStream and ObjectInputStreams classes must wrap the DeflaterOutputStream and InflaterInputStream classes to deflate a serialized object. The Decorator Pattern is used by the JDK to achieve this ability of wrapping InputStreams and OutputStreams to gain functionality.&lt;br /&gt;
&lt;br /&gt;
Compressing and decompressing serialized objects using DEFLATE:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java" name="code"&gt;package compression;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
import java.util.zip.Deflater;
import java.util.zip.DeflaterOutputStream;
import java.util.zip.InflaterInputStream;

public class ObjectCompression {

    public static class Person implements Serializable {

        private String firstName;
        private String lastName;
        private String address;

        public Person(String fn, String ln, String addr) {
            firstName = fn;
            lastName = ln;
            address = addr;
        }

        @Override
        public String toString() {
            String data = String.format("data={%s,%s,%s}", firstName, lastName,
                    address);
            return data;
        }
    }

    public static void main(String[] args) throws Exception {
        Person person = new Person("John", "Smith", "100 Memory Lane");

        // serialize Person object without compression (determine raw size)
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        ObjectOutputStream oos = new ObjectOutputStream(baos);
        oos.writeObject(person);
        oos.flush();
        oos.close();
        byte[] serializedPerson = baos.toByteArray();

        // serialized and compress Person object
        byte[] deflatedPerson = deflateObject(person);

        // decompress deserialize Person object
        Person inflatedPerson = (Person) inflateObject(deflatedPerson);

        // print results
        int uncompressedSize = serializedPerson.length;
        System.out.print("Before Compression: (" + uncompressedSize
                 + " bytes) ");
        System.out.println(person);

        int compressedSize = deflatedPerson.length;
        System.out.print("After Compression: (" + compressedSize + " bytes) ");
        System.out.println(inflatedPerson);
    }

    private static byte[] deflateObject(Object o) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        Deflater def = new Deflater(Deflater.BEST_COMPRESSION);
        DeflaterOutputStream deflaterOS = new DeflaterOutputStream(baos, def);
        ObjectOutputStream oos = new ObjectOutputStream(deflaterOS);
        oos.writeObject(o);
        oos.flush();
        oos.close();
        return baos.toByteArray();
    }

    private static Object inflateObject(byte[] d) throws Exception {
        ByteArrayInputStream data = new ByteArrayInputStream(d);
        InflaterInputStream inflaterIS = new InflaterInputStream(data);
        ObjectInputStream ois = new ObjectInputStream(inflaterIS);
        Object obj = ois.readObject();
        ois.close();
        return obj;
    }
}
&lt;/pre&gt;&lt;br /&gt;
The following output was generated by running this code:&lt;br /&gt;
&lt;pre&gt;Before Compression: (154 bytes) data={John,Smith,100 Memory Lane}
After Compression: (143 bytes) data={John,Smith,100 Memory Lane}
&lt;/pre&gt;A savings of 11 bytes was achieved by using compression on this single Person object. The savings may not seem impressive by itself, but compressing a List&lt;person&gt; object containing 1000 Person objects may yield substantial savings.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt; Summary&lt;/b&gt;&lt;br /&gt;
&lt;/person&gt;&lt;br /&gt;
&lt;person&gt;The compression capabilities of the JDK are very robust and should be sufficient for most situations. Three compression types exist in the java.util.zip package zip , gzip, and DEFLATE. Each of the compression types offers unique capabilities.&lt;br /&gt;
&lt;br /&gt;
Generally zip is used for compressing multiple directories and files. Gzip is suited for compressing single files or streams of data. The DEFLATE compression type is lightweight and is a great choice for compressing streams of data that do not require error checking.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;References&lt;/b&gt;&lt;br /&gt;
&lt;a href="http://en.wikipedia.org/wiki/ZIP_%28file_format%29#The_format_in_detail"&gt;&lt;/a&gt;&lt;/person&gt;&lt;br /&gt;
&lt;person&gt;&lt;a href="http://en.wikipedia.org/wiki/ZIP_%28file_format%29#The_format_in_detail"&gt;http://en.wikipedia.org/wiki/ZIP_(file_format)#The_format_in_detail&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://en.wikipedia.org/wiki/Gzip#File_format"&gt;http://en.wikipedia.org/wiki/Gzip#File_format&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://java.sun.com/developer/technicalArticles/Programming/compression/"&gt;http://java.sun.com/developer/technicalArticles/Programming/compression/&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://tools.ietf.org/html/rfc1951"&gt;http://tools.ietf.org/html/rfc1951&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://en.wikipedia.org/wiki/DEFLATE"&gt;http://en.wikipedia.org/wiki/DEFLATE&lt;/a&gt;&lt;/person&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5230787973393818940-1067324043852905601?l=elegantengineering.blogspot.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/rjsAel-YJeF4hMBaRTYHbl3y4VI/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/rjsAel-YJeF4hMBaRTYHbl3y4VI/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/rjsAel-YJeF4hMBaRTYHbl3y4VI/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/rjsAel-YJeF4hMBaRTYHbl3y4VI/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ElegantEngineering/~4/dGgqPsvoNnQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://elegantengineering.blogspot.com/feeds/1067324043852905601/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5230787973393818940&amp;postID=1067324043852905601" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5230787973393818940/posts/default/1067324043852905601?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5230787973393818940/posts/default/1067324043852905601?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ElegantEngineering/~3/dGgqPsvoNnQ/data-compression-using-java-part-2.html" title="Data Compression Using Java - Part 2" /><author><name>Adam Thomas</name><uri>http://www.blogger.com/profile/10972094452234627438</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="http://4.bp.blogspot.com/_114OSgk1PdE/S3lsy_VFDbI/AAAAAAAAAVU/Vio3pirzpZw/S220/adamthomas.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-5HdvZ7UgQqA/Tbo5PUi7H2I/AAAAAAAAAgA/_BiB-3YQ75s/s72-c/zip.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://elegantengineering.blogspot.com/2008/12/data-compression-using-java-part-2.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkEMRXo8cCp7ImA9WhZXEEU.&quot;"><id>tag:blogger.com,1999:blog-5230787973393818940.post-7546404938175602723</id><published>2008-12-03T13:34:00.001-08:00</published><updated>2011-04-29T06:44:44.478-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-04-29T06:44:44.478-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><category scheme="http://www.blogger.com/atom/ns#" term="compression" /><title>Data Compression Using Java - Part 1</title><content type="html">&lt;div class="separator" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em; text-align: center;"&gt;&lt;img border="0" height="200" src="http://1.bp.blogspot.com/-5HdvZ7UgQqA/Tbo5PUi7H2I/AAAAAAAAAgA/_BiB-3YQ75s/s200/zip.png" width="185" /&gt;&lt;/div&gt;Understanding compression is beneficial whether you are building a file archiving utility or simply trying to get better throughput with your web services. The JDK offers robust compression formats such as zip and gzip to handle most situations you are likely to encounter. A basic understanding of compression is required before diving into Java-specific implementations and their usages.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;DEFLATE&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
At their core both zip and gzip use the &lt;a href="http://en.wikipedia.org/wiki/DEFLATE"&gt;DEFLATE&lt;/a&gt; algorithm. DEFLATE is a lossless data compression algorithm that uses a combination of the &lt;a href="http://en.wikipedia.org/wiki/LZ77_and_LZ78" title="LZ77 and LZ78"&gt;LZ77&lt;/a&gt; algorithm and &lt;a href="http://en.wikipedia.org/wiki/Huffman_coding" title="Huffman coding"&gt;Huffman coding&lt;/a&gt;. Keep in mind that DEFLATE is only the means by which zip and gzip compress data and their individual features and formats are quite different.&lt;br /&gt;
&lt;br /&gt;
It is possible to use DEFLATE directly without incurring the additional overhead that zip and gzip incur. You lose some features such as &lt;a href="http://en.wikipedia.org/wiki/Cyclic_redundancy_check"&gt;Cyclic Redeundancy Checking (CRC)&lt;/a&gt; checksumming to detect data corruption, however the overhead reduction can be substantial in some cases. This approach may be appealing when compressing streams of data that are not persisted to hard disk.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Zip&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
The zip archive format consists of individual file entries and a central directory located at the end of the zip file that contains the offsets for the individual file entries. Compression is applied to the individual file entries within the zip archive, not the archive file as a whole. This method of compression is not without its drawbacks.&lt;br /&gt;
&lt;blockquote&gt;The ZIP format can hold collections of files without an external archiver, but is less compact than compressed tarballs holding the same data, because it compresses files individually and cannot take advantage of redundancy between files (&lt;a href="http://en.wikipedia.org/wiki/Gzip#File_format" target="_blank"&gt;Wikipedia&lt;/a&gt;)&lt;/blockquote&gt;The alternative to using zip is to use tar and gzip to archive and compress directories in a two step process. First, the tar algorithm creates an archive similar to a zip archive, except that the tar archive is not compressed. Next, gzip compression can be applied to the tar to create a .tar.gz or .tgz sometimes referred to as a tarball.&lt;br /&gt;
&lt;br /&gt;
One benefit of using tar and gzip over zip is that tarring and gzipping can take advantage of &lt;a href="http://en.wikipedia.org/wiki/Solid_compression"&gt;solid compression&lt;/a&gt;. Solid compression concatenates the data of all files contained within the archive and then treats that data as a single block, which allows the compression algorithm to better compress the redundant file data.&lt;br /&gt;
&lt;br /&gt;
&lt;div align="center"&gt;&lt;script charset="utf-8" src="http://ws.amazon.com/widgets/q?rt=tf_mfw&amp;amp;ServiceVersion=20070822&amp;amp;MarketPlace=US&amp;amp;ID=V20070822/US/computsecurih-20/8001/5e8253cd-a6a7-4a08-98fc-dd35a0b31677" type="text/javascript"&gt;
 
&lt;/script&gt; &lt;noscript&gt;&lt;a href="http://ws.amazon.com/widgets/q?rt=tf_mfw&amp;amp;ServiceVersion=20070822&amp;amp;MarketPlace=US&amp;amp;ID=V20070822%2FUS%2Fcomputsecurih-20%2F8001%2F5e8253cd-a6a7-4a08-98fc-dd35a0b31677&amp;amp;Operation=NoScript"&gt;Amazon.com Widgets&lt;/a&gt;&lt;/noscript&gt;&lt;/div&gt;&lt;br /&gt;
&lt;b&gt;GZip&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
The gzip algorithm is designed to compress individual files or streams of data. Gzip does not handle compressing multiple files or directories the way zip can. If you want to gzip a directory, you must first tar it, then gzip it.&lt;br /&gt;
&lt;b&gt;&lt;br /&gt;
File Compression Overhead&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Each type of compression incurs overhead. The overhead generated by zip is greater than that of gzip due to zip storing directory metadata for the archive. Remember, gzip is a compressor only, not an archiver and compressor.&lt;br /&gt;
&lt;br /&gt;
To illustrate this concept of compression overhead I compressed a single 4608 byte binary file using both zip and gzip. Your mileage may vary.&lt;br /&gt;
&lt;br /&gt;
Compression Results:&lt;br /&gt;
&lt;pre&gt;raw    4608
zip    1903
gzip   1799&lt;/pre&gt;&lt;br /&gt;
The code I used to test compression overhead:&lt;br /&gt;
&lt;pre class="brush: java" name="code"&gt;package compression;

import java.io.BufferedInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.zip.DeflaterOutputStream;
import java.util.zip.GZIPOutputStream;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;

public class FileCompressionOverhead {

    private final static String FILE_NAME = "data";

    public static void main(String[] args) throws IOException {
        File binFile = new File(FILE_NAME);
        long rawSize = binFile.length();

        byte[] zip = zipData(binFile);
        int zipSize = zip.length;

        byte[] gzip = gzipData(binFile);
        int gzipSize = gzip.length;

        String output = String.format("raw\t%d\nzip\t%d\ngzip\t%d", rawSize,
                zipSize, gzipSize);
        System.out.println(output);
    }

    private static byte[] zipData(File f) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        ZipOutputStream zipos = new ZipOutputStream(baos);
        ZipEntry entry = new ZipEntry(f.getName());
        zipos.putNextEntry(entry);
        readData(f, zipos);
        return baos.toByteArray();
    }

    private static byte[] gzipData(File f) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        GZIPOutputStream gzipos = new GZIPOutputStream(baos);
        readData(f, gzipos);
        return baos.toByteArray();
    }

    private static void readData(File f, OutputStream os) throws IOException {
        byte[] buf = new byte[1024];
        int len = 0;
        InputStream is = null;
        try {
            is = new BufferedInputStream(new FileInputStream(f));
            while ((len = is.read(buf)) &amp;gt; 0) {
                os.write(buf, 0, len);
            }
            ((DeflaterOutputStream) os).finish();
        } finally {
            if (is != null) {
                is.close();
            }
        }
    }
}
&lt;/pre&gt;&lt;b&gt;Compressible Data&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Not all data compresses the same way. The compression ratio is directly related to the randomness of the data. We can generate a pseudo-random binary file and attempt to compress it with both zip and gzip.&lt;br /&gt;
&lt;br /&gt;
The data will not compress well due to the distribution of data, in fact this example will actually cause the compressed files to be larger than the original uncompressed version. The reason this happens is because generating a dictionary and eliminating duplication is difficult when the data is random.&lt;br /&gt;
&lt;br /&gt;
Compression Results:&lt;br /&gt;
&lt;pre&gt;raw    2048
zip    2187
gzip   2071&lt;/pre&gt;The code I used to test compressible data:&lt;br /&gt;
&lt;pre class="brush: java" name="code"&gt;package compression;

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Random;
import java.util.zip.DeflaterOutputStream;
import java.util.zip.GZIPOutputStream;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;

public class CompressableData {

    private final static String FILE_NAME = "randomdata";

    private final static int FILE_SIZE = 2048;

    public static void main(String[] args) throws IOException {
        createRandomByteFile(FILE_SIZE);

        File binFile = new File(FILE_NAME);
        long rawSize = binFile.length();

        byte[] zip = zipData(binFile);
        int zipSize = zip.length;

        byte[] gzip = gzipData(binFile);
        int gzipSize = gzip.length;

        String output = String.format("raw\t%d\nzip\t%d\ngzip\t%d", rawSize,
                zipSize, gzipSize);
        System.out.println(output);
    }

    private static void createRandomByteFile(int size) throws IOException {
        byte[] data = new byte[size];
        new Random().nextBytes(data);
        File f = new File(FILE_NAME);
        OutputStream os = null;
        try {
            os = new BufferedOutputStream(new FileOutputStream(f));
            os.write(data);
        } finally {
            if (os != null) {
                os.close();
            }
        }
    }

    private static byte[] zipData(File f) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        ZipOutputStream zipos = new ZipOutputStream(baos);
        ZipEntry entry = new ZipEntry(f.getName());
        zipos.putNextEntry(entry);
        readData(f, zipos);
        return baos.toByteArray();
    }

    private static byte[] gzipData(File f) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        GZIPOutputStream gzipos = new GZIPOutputStream(baos);
        readData(f, gzipos);
        return baos.toByteArray();
    }

    private static void readData(File f, OutputStream os) throws IOException {
        byte[] buf = new byte[1024];
        int len = 0;
        InputStream is = null;
        try {
            is = new BufferedInputStream(new FileInputStream(f));
            while ((len = is.read(buf)) &gt; 0) {
                os.write(buf, 0, len);
            }
            ((DeflaterOutputStream) os).finish();
        } finally {
            if (is != null) {
                is.close();
            }
        }
    }
}
&lt;/pre&gt;&lt;b&gt;Summary&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
From a programming perspective you will typically use zip for compressing directories/files and gzip for compressing streams of data. Also remember that you can never be absolutely sure what compression ratio you will get among varying file types. In &lt;a href="http://elegantengineering.blogspot.com/2008/12/data-compression-using-java-part-2.html"&gt;Part 2&lt;/a&gt; of this series I will discuss how to utilize the Java implementation of the compression technologies mentioned in this post.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;References&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://en.wikipedia.org/wiki/ZIP_%28file_format%29#The_format_in_detail"&gt;http://en.wikipedia.org/wiki/ZIP_(file_format)#The_format_in_detail&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://en.wikipedia.org/wiki/Gzip#File_format"&gt;http://en.wikipedia.org/wiki/Gzip#File_format&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://java.sun.com/developer/technicalArticles/Programming/compression/"&gt;http://java.sun.com/developer/technicalArticles/Programming/compression/&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://tools.ietf.org/html/rfc1951"&gt;http://tools.ietf.org/html/rfc1951&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://en.wikipedia.org/wiki/DEFLATE"&gt;http://en.wikipedia.org/wiki/DEFLATE&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5230787973393818940-7546404938175602723?l=elegantengineering.blogspot.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/R5_DRw8UlOW5P2OglvFRwtPcDkk/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/R5_DRw8UlOW5P2OglvFRwtPcDkk/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/R5_DRw8UlOW5P2OglvFRwtPcDkk/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/R5_DRw8UlOW5P2OglvFRwtPcDkk/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ElegantEngineering/~4/Aao13tSDlqA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://elegantengineering.blogspot.com/feeds/7546404938175602723/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5230787973393818940&amp;postID=7546404938175602723" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5230787973393818940/posts/default/7546404938175602723?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5230787973393818940/posts/default/7546404938175602723?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ElegantEngineering/~3/Aao13tSDlqA/data-compression-using-java-part-1.html" title="Data Compression Using Java - Part 1" /><author><name>Adam Thomas</name><uri>http://www.blogger.com/profile/10972094452234627438</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="http://4.bp.blogspot.com/_114OSgk1PdE/S3lsy_VFDbI/AAAAAAAAAVU/Vio3pirzpZw/S220/adamthomas.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-5HdvZ7UgQqA/Tbo5PUi7H2I/AAAAAAAAAgA/_BiB-3YQ75s/s72-c/zip.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://elegantengineering.blogspot.com/2008/12/data-compression-using-java-part-1.html</feedburner:origLink></entry></feed>

