Huffman Coding Algorithm
Huffman Coding Algorithm
Huffman Coding is a technique that is used for compressing data to reduce its size without
losing any of its details. It was first developed by David Huffman and was named after him.
Huffman Coding is generally used to compress the data which consists of the frequently
repeating characters.
The character with higher frequency gets the short-length variable code and vice-versa for
characters with lower frequency. It uses a variable-length encoding which means that it
assigns a variable-length code to all the characters in the given stream of data.
Step I - Building a Huffman tree using the input set of symbols and weight/ frequency for
each symbol
Priority Queue is used for building the Huffman tree such that nodes with lowest
frequency have the highest priority. A Min Heap data structure can be used to
implement the functionality of a priority queue.
Initially, all nodes are leaf nodes containing the character itself along with the weight/
frequency of that character
Internal nodes, on the other hand, contain weight and links to two child nodes
Step II - Assigning the binary codes to each symbol by traversing Huffman tree
Generally, bit ‘0’ represents the left child and bit ‘1’ represents the right child
Algorithm for creating the Huffman Tree-
Step 1- Create a leaf node for each character and build a min heap using all the nodes
(The frequency value is used to compare two nodes in min heap)
Step 2- Repeat Steps 3 to 5 while heap has more than one node
Step 3- Extract two nodes, say x and y, with minimum frequency from the heap
Step 4- Create a new internal node z with x as its left child and y as its right child.
Also, frequency(z)= frequency(x)+frequency(y)
Step 5- Add z to min heap
Step 6- Last node in the heap is the root of Huffman tree
Example:
Suppose a data file has the following characters and the frequencies.
Characters Frequencies
A 12
B 15
C 7
D 13
E 9
Solution:
Initially, create the Huffman Tree:
Step-2:
Step-3:
Step-4:
Step-5: