0% found this document useful (1 vote)
275 views

Data Compression and Huffman Algorithm

Data compression algorithms aim to reduce file sizes by eliminating redundant data. The Huffman algorithm assigns variable-length codes to characters based on their frequency, allowing more common characters to be encoded with fewer bits. Run-length encoding replaces repeated characters with a code indicating the character and number of repeats. Lossy techniques like JPEG discard insignificant data to achieve higher compression ratios, while lossless methods like LZW and run-length encoding allow exact reconstruction. Data compression is important for reducing storage and transmission requirements.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
275 views

Data Compression and Huffman Algorithm

Data compression algorithms aim to reduce file sizes by eliminating redundant data. The Huffman algorithm assigns variable-length codes to characters based on their frequency, allowing more common characters to be encoded with fewer bits. Run-length encoding replaces repeated characters with a code indicating the character and number of repeats. Lossy techniques like JPEG discard insignificant data to achieve higher compression ratios, while lossless methods like LZW and run-length encoding allow exact reconstruction. Data compression is important for reducing storage and transmission requirements.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

DATA COMPRESSION AND HUFFMAN ALGORITHM

Technical Seminar Paper Submitted by

Presented by
Vineet Agarwala IT200118155

Technical Seminar Under the guidance of Anisur Rahman

NATIONAL INSTITUTE OF SCIENCE & TECHNOLOGY

DATA COMPRESSION
Virtually all forms of data - text, numerical, image, video contain redundant elements Data can be compressed by eliminating the redundant elements. A code is substituted for the eliminated redundant element, where the code is shorter than eliminated element. When compressed data is retrieved from storage or received over a communications link, it is expanded back to its original form, based on the code. Compression is used: to save storage space to reduce communications transmission requirements The art or science of compactly representing information Digital realm: using lesser number of bits to represent information Data + Compression = information redundancy

REDUNDANCY
Most types of computer files are fairly redundant -- they have the same information listed over and over again. File-compression programs

simply get rid of the redundancy

Ask not what your country can do for you -- ask what you can do for your country. Ignoring the difference between capital and lower-case letters, roughly half of the phrase is redundant. Nine words - ask, not, what, your, country, can, do, for, you -- give us almost everything we need for the entire quote

Compression Techniques
Lossless
Data can be completely recovered after decompression Recovered data is identical to original Exploits redundancy in data

Lossy
Data cannot be completely recovered after decompression Some information is lost for ever Gives more compression than lossless Discards insignificant data components

IMAGE COMPRESSION
Image compression can be lossy or lossless Methods for lossless image compression are:
Run-length encoding Entropy coding Adaptive dictionary algorithms such as LZW

Methods for lossy compression are:


Reducing the color space to the most common colors in the image. The selected colors are specified in the color palette in the header of the compressed image. Each pixel just references the index of a color in the color palette. This method can be combined with dithering to blur the color borders. Transform coding. This is the most commonly used method. A Fourier-related transform such as DCT or the wavelet transform are applied, followed by quantization and entropy coding. Fractal compression.

JPEG (TRANSFORM COMPRESSION)


JPEG is named after its origin, the Joint Photographers Experts Group This involves reducing the number of bits per sample or entirely discard some of the samples

MULTIMEDIA COMPRESSION
Multimedia compression is a general term referring to the compression of any type of multimedia, most notably graphics, audio, and video MPEG (Moving Pictures Experts Group ) The future of this technology is to encode the compression and uncompression algorithms directly into integrated circuits. The approach used by MPEG can be divided into two types of compression: within-the-frame and between-frame

DATA COMPRESSION ALGORITHMS


LOSSY COMPRESSION
Run Length Encoding Huffman Coding

LOSS LESS COMPRESSION


CS & Q JPEG MPEG

Delta
LZW

RUN-LENGTH ENCODING
Data files frequently contain the same character repeated many times in a row.

Example of run-length encoding. Each run of zeros is replaced by two characters in the compressed file: a zero to indicate that compression is occurring, followed by the number of zeros in the run.

HUFFMAN ENCODING
This method is named after D.A. Huffman, who developed the procedure in the 1950s. More than 96% of this file consists of only 31 characters out of 127

HUFFMAN ENCODING EXAMPLE


Character frequencies
A: 20% (.20) B: 9% (.09) C: 15% D: 11% E: 40% F: 5%

E .4

BF .14

D .15

A .20

C .15

0
B .09

1
F .05

HUFFMAN ENCODING EXAMPLE (CONDT.)

Codes
A: 010 B: 0000 C: 011 D: 001 E: 1 F: 0001 0
B .09

ABCDEF 1.0

0
0
BF .14 BFD .25

ABCDF .6

1
0
A .20 AC .35

E .4

1
D .15

1
C .15

1
F .05

Run Length Encoding


CTAAAAAGGGTCGTTTTTTGCCCGGGGGCCTCCCCCCC

CTAAAAAGGGTCGTTTTTTGCCCGGGGGCCTCCCCCCC CTAAAAAGGGTCGTTTTTTGCCCGGGGGCCTCCCCCCC

CT5A3GTCG6TG3C5GCCT7C } Run length encoded: 21 symbols

Run Length Encoding (Contd.)


WWWBWWWWWBWWWBWWWWBWWWWWBWWWBWW WWWBWWBWWWWWWBBBWWWWWWWBWBWWWWW WWBWWBBWWWWWBWWWWBWWWWBWWWWB

WWWBWWWWWBWWWBWWWWB.

3WB5WB3WB4WB.
3151314 #W3151314..
possible optimization, but

Optimization requires escape character

Run Length Encoding (Contd.)


Is run length encoding practical for images?
No Yes

Chances of three or more identical consecutive pixels are low for most real images. Especially images with large color depth.

Some images do have lots of consecutive pixels. Especially images with low color depth. RLE is used for fax machines, and by BMP, TIFF and PCX files.

LZW Compression
LZW compression is named after its developers, A. Lempel and J. Ziv, with later modifications by Terry A. Welch. It is the foremost technique for general purpose data compression due to its simplicity and versatility

LZW Compression (contd.)


LZW compression flowchart. The variable, CHAR, is a single byte. The variable, STRING, is a variable length sequence of bytes. Data are read from the input file (box 1 & 2) as single bytes, and written to the compressed file (box 4) as 12 bit codes.

CONCLUSION
Is it possible to create a data compression algorithm that will always compress data? Is there an optimal data compression algorithm?
Lossless: No, compression rates depend on the data. Lossy: No, the quality of compression is subjective.

Is Data Compression is really that important?

You might also like