0% found this document useful (0 votes)
2 views

Huffman Coding Algorithm

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Huffman Coding Algorithm

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

HUFFMAN CODING

Huffman Coding is a technique that is used for compressing data to reduce its size without
losing any of its details. It was first developed by David Huffman and was named after him.

Huffman Coding is generally used to compress the data which consists of the frequently
repeating characters.

Huffman Coding is a famous Greedy algorithm. It is said to be a Greedy Algorithm because


the size of code assigned to a character depends on the frequency of the character.

The character with higher frequency gets the short-length variable code and vice-versa for
characters with lower frequency. It uses a variable-length encoding which means that it
assigns a variable-length code to all the characters in the given stream of data.

Encoding, in computers, can be defined as the process of transmitting or storing sequence of


characters efficiently. Fixed-length and variable length are two types of encoding schemes,
explained as follows-
Fixed-Length encoding - Every character is assigned a binary code using same number of
bits.

Variable- Length encoding - As opposed to Fixed-length encoding, this scheme uses


variable number of bits for encoding the characters depending on their frequency in the given
text.

The major steps involved in Huffman coding are-

Step I - Building a Huffman tree using the input set of symbols and weight/ frequency for
each symbol

 A Huffman tree, similar to a binary tree data structure, needs to be created


having n leaf nodes and n-1 internal nodes

 Priority Queue is used for building the Huffman tree such that nodes with lowest
frequency have the highest priority. A Min Heap data structure can be used to
implement the functionality of a priority queue.

 Initially, all nodes are leaf nodes containing the character itself along with the weight/
frequency of that character

 Internal nodes, on the other hand, contain weight and links to two child nodes

Step II - Assigning the binary codes to each symbol by traversing Huffman tree

 Generally, bit ‘0’ represents the left child and bit ‘1’ represents the right child
Algorithm for creating the Huffman Tree-
 Step 1- Create a leaf node for each character and build a min heap using all the nodes
(The frequency value is used to compare two nodes in min heap)
 Step 2- Repeat Steps 3 to 5 while heap has more than one node
 Step 3- Extract two nodes, say x and y, with minimum frequency from the heap
 Step 4- Create a new internal node z with x as its left child and y as its right child.
Also, frequency(z)= frequency(x)+frequency(y)
 Step 5- Add z to min heap
 Step 6- Last node in the heap is the root of Huffman tree

Example:
Suppose a data file has the following characters and the frequencies.

Characters Frequencies
A 12
B 15
C 7
D 13
E 9

Solution:
Initially, create the Huffman Tree:

Step-2:
Step-3:

Step-4:

Step-5:

The above tree is a Huffman Tree.


Now, assign weight to all the nodes.
Assign “0” to all left edges and “1” to all right edges.
The tree will become
Huffman Code of each character:
A: 00
B: 10
C: 110
D: 01
E: 111
Average code length per character = ∑ (frequency x code lengthi)/ ∑ frequency i
= {(12 x 2) + (13 x 2) + (15 x 2) + (7 x 3) + (9 x 3)} /
(12 + 13 + 15 + 7+ 9)
= (24+ 26+ 30 + 21 + 27)/56
= 128/56
= 2.28
Average code length per character = 2.28
Total number of bits in Huffman encoded message
= Total number of characters in the message x Average code length per character
= 56 x 2.28
= 127.68

You might also like