This program requires you to write a program to decode strings encoded with Huffman codes. A Huffman code represents characters in variable length bit strings. Frequently occurring characters are represented with short bit strings; characters that occur rarely are represented with long bit strings. You are required to develop a program with appropriate class definitions that converts a coded string into it's uncoded representation.
The input to your program will be two files. One file containing the code and one file containing a coded string to be decoded.
The assignment will give you experience in the use of binary trees,
exposure to data compression techniques, and experience in solving
a problem from a specification.
Character | Bit Code |
A | 0 |
T | 100 |
C | 101 |
G | 11 |
Using this code, the string ATACG would be represented by the coded bit string 0100010111, where each character is replaced by its bit code.
In this assignment you are to write a program that decodes Huffman coded bit strings. Your program should take as input a file that specifies a Huffman code and a file that contains a bit string encoded with the given Huffman code. You can assume the file containing the Huffman contains one line for each character in the alphabet. Each line will be begin with the character followed by a single space and then the character's bit code. The file containing the coded string will consist of one line containing a sequence of 0's and 1's. Example files are available in the directory /group/class/c030/data.
To solve the decoding problem, you must first create a Huffman tree
from the table of Huffman codes. A Huffman tree is an extended binary
tree in which the leaf nodes contain the characters of the alphabet.
The tree is structured such that the path to a leaf node is determined
by the bit code, where 0 is interpreted as left and 1 is interpreted as
right. Thus, the code above defines the following Huffman tree:
Note that interior nodes contain no data. You must devise a method
to create a Huffman tree from a Huffman table. You should not assume that
the aphabet contains any particular symbols or is of any particular size.
The second part of your assignment is to decode a bit string. To do this you should read bits from the bits sequence and descend the tree in the appropriate direction until a leaf node is found. Each successive character visited should be printed to an output file. This process must be repeated until the bit sequence is exhausted.
Huffman codes are optimal in that sense that they produce minimal length encodings given the frequency with which characters occur. A more detailed explanation of Huffman Coding with illustrations is available here along with descriptions of some other coding methods.