File Compression

Modules - File Streaming, (potentially) Data Structures

 

Teacher’s Notes:

========================================================================

This technique is interesting and students usually find it to be a neat look at simple file compression.

You can make the assignment harder by removing the spaces that separate the zeros and ones.  If you do this, the students will need to input the data as a String and then parse out the individual digits.

An expanded idea would be for the sequence to represent an ASCII encoded method with each letter represented in binary.  Students could then be instructed to write a program that would translate each binary sequence into its ASCII letter and find the secret message.
 

In the solutions section, I have included the code to generate a new random file filled with ones and zeros.  You can use this to make new sample data files if you want.  You can use the compression solution to change it into a compressed file for your students to decompress.

binary1.txt, and compression2.txt are needed for the assignment.  binary2.txt and compression1.txt are the solutions.  All files can be found at my github repository at:

https://github.com/TaoOfChow/YouTubePrograms/

In the file opening process, I sometimes create the file as a separate, persistent object and then refer to that object for the Scanner or PrintWriter.  I refer to this as the two line technique.  These two lines can be combined into one line since we don't need the File object to persist for the whole program.  I write it both ways to show the students different ways that it can be done.
 

 

Assignment:

========================================================================

Sometimes, a file can contain a lot of data and thus, takes up a lot of storage space.  Compression algorithms are used to reduce the amount of data that is needed to store the data.  A very simple form of compression for binary data (1’s and 0’s) is to just write a file that shows how many 1’s and 0’s are in a given sequence.  For instance,

1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 1 0 0 0 1 1 1 1 1 0 0 0 0 0 0 

Could be compressed to:

9 1 3 0 1 1 1 0 1 1 3 0 5 1 6 0

This would mean that there are 9 x ones followed by 3 x zeros, 1 x one, 1 x zero, 1 x one, 3 x zeros, 5 x ones, and 6 x zeros.

As you can see, the new data is shorter than the original data and the original data could be reconstructed from the compressed data.  

Consider that; while it is necessary to state the quantity of each digit, it is not necessary to state if the digit is a zero or a one.  Since the digits will alternate between zero and one, we only need to know which digit was the first digit.  Because of this, it is not necessary to say we have 9 x ones and then 3 x zeros, 1 x ones, etc.  We can just say we started with a “one” and then the quantities are 9,3,1,1,1,3,5,6 respectively.  Because of this, we can simplify the compression to read as follows:

1 9 3 1 1 1 3 5 6

This sequence would mean: we start with a "one" and we have 9 of them.  Then we have 3 x zeros (since we must alternate from one to zero) and then 1 x ones (since the zero must alternate back), etc.

Because of this, the original 30 character sequence can be compressed to 9 characters - a reduction of 70%!

Part 1:  Write a program that will open the provided file called “binary1.txt”.  This file is filled with random 1’s and 0’s to simulate a binary sequence in a file.  Use the described technique to compress it and save it as a new file called “compressed1.txt”.  Check the file size.  How much smaller is the compressed file?

Part 2: Write a program that will open the provided file called “compressed2.txt”.  This file is filled with numbers representing data that was compressed using the technique described above.  Your program should then use the described technique to decompress it and save the decompressed data in a file called “binary2.txt”.  Check the file size.  How much bigger is the decompressed file?

  

Decompression Solution (JAVA):

========================================================================

 
import java.io.File;
import java.io.PrintWriter;
import java.util.Scanner;
 
public class Decompression {
    public static void main(String[] args)throws Exception {
        
        //DECOMPRESSION SOLUTION:
        
        //opening the file for reading using the one line technique
        Scanner fileReader = new Scanner(new File("compressed2.txt"));
        
        //opening the file for reading using a two line technique
        File binaryData = new File("binary2.txt");
        PrintWriter pw = new PrintWriter(binaryData);
        
        //finding out what the first digit is
        int currentDigit = -1, count = 0;
        if(fileReader.hasNext()) {
            currentDigit = fileReader.nextInt();
        }
        
        while(fileReader.hasNext()) {//as long as there is more data to read
            
            count = fileReader.nextInt();//read how many of the current digit there are
            for(int x = 0; x < count; x++)
                pw.print(currentDigit + " ");//using a for loop to print that many of the given digit
            if(currentDigit == 1)//if the digit was a 1, now it is a 0
                currentDigit = 0;
            else 
                currentDigit = 1;//if the digit was a 0, now it is a 1
        }//go back and repeat for the next digit
        
        pw.close();//closing the PrintWriter

    }

}

Compression Solution (JAVA):

========================================================================

import java.io.File;
import java.io.PrintWriter;
import java.util.Scanner;
 
public class Compression {
    public static void main(String[] args)throws Exception {
 
//COMPRESSION SOLUTION:
        /*
        //Opening the raw data file for reading.
        //I am doing this on two lines to show the individual steps
        File rawData = new File("binary1.txt");
        Scanner fileReader = new Scanner(rawData);
        
        //Normally, I would open the file on one line like with
        //the PrintWriter below
        PrintWriter pw = new PrintWriter(new File("compressed1.txt"));
        
        int currentNumber = -1, count = 1, nextNumber = -1;
        //This is just reading the file to see if the first digit 
        //is a one or a zero.  This could be combined into a single
        //line command but is kept separate for clarity.
        if(fileReader.hasNext()){
            currentNumber = fileReader.nextInt();
            pw.print(currentNumber + " ");
        }
        
        
        while(fileReader.hasNext()) {//as long as there is data in the file...
            nextNumber = fileReader.nextInt();//read the data
            
            if(nextNumber == currentNumber)//if it's the same as the last number...
                count++; //add one to the count
            else {//if the number has changed...
                pw.print(count + " ");//print the count for the previous number to the file
                count = 1;//reset the count for the new number
                currentNumber = nextNumber;//record which number we are currently counting
            }
        }
        
        pw.close();//close the PrintWriter.
      
     }
}
 

Code For Making a Random Ones and Zeros File (JAVA):

========================================================================

 

        File rawData = new File("binary##.txt");//swap## with whatever number you want
        PrintWriter pw = new PrintWriter(rawData);
        
        for(int x = 0; x < 1000; x++)
        {
            if(Math.random() <= 0.5)
                pw.print(1 + " ");
            else
                pw.print(0 + " ");
        }
        pw.close();