Struktur Data & Algoritme ( Data Structures & Algorithms )
Denny ([email protected]) Suryana Setiawan ([email protected])
Fakultas Ilmu Komputer Universitas Indonesia Semester Genap - 2004/2005
Version 2.0 - Internal Use Only
Tree: application
SDA/HUFF/V2.0/2
Objectives
understand one of file compression technique (Huffman)
SDA/HUFF/V2.0/3
Outline
Compression
Huffman compression
SDA/HUFF/V2.0/4
Compression
Process:
Encoding: raw →compressed
Decoding: compressed →raw
Types of compression
Lossy : MPEG, JPEG
Lossless
Compression Algorithm:
RLE: Run Length Encoding
Lempel-Zif
Huffman Encoding
Performance of compression depends on file types.
SDA/HUFF/V2.0/5
Huffman Compression
If a woodchuck could chuck wood!
32 char ×8 bit = 256 bits
13 distinct characters →4 bit
Compressed code: 128 bits
Variable length string of bits to further improve compression.
Using prefix codes
SDA/HUFF/V2.0/6
Huffman Compression
Frequently occurring letters: short representation.
Infrequent letters: long representations.
SDA/HUFF/V2.0/7
i
a 8 5 u3 e3
Huffman Encoding: comparation
a = 00 →16 bits i = 01 →10 bits u = 10 →6 bits e = 11 →6 bits Total : 42 bits
SDA/HUFF/V2.0/8
Huffman Encoding: comparation
i
5a
8u
3e
3 6 1119 a = 0 → 8 bits
i = 10 → 10 bits u = 110 → 9 bits e = 111 → 9 bits Total: 36 bits
SDA/HUFF/V2.0/9
0 0
0 0
0
0 0
0 0
0 0
0 1
1
1 1 1 1
1 1
1 1 1
1
! a l
u d
k w I f
h
c spaceo
Huffman Encoding
SDA/HUFF/V2.0/10
13
6 3
4 9
2 32
19 7 10
4
2
!:1 a:1 l:1
u:3 d:3
k:2 w:2 I:1 f:1
h:2
c:5 o:5
space:5
Huffman Encoding
SDA/HUFF/V2.0/11
Huffman Encoding (freq)
! = 0000 (1) I = 10000 (1) a = 00010 (1) f = 10001 (1) l = 00011 (1) h = 1001 (2) u = 001 (3) c = 101 (5) d = 010 (3) space= 110 (5) k = 0110 (2) o = 111 (5) w = 0111 (2)
Cost: ∑di* fi= 111 bits = 44% ×256 bits
SDA/HUFF/V2.0/12
Huffman Encoding: steps
c
5o
5 5u
3d
3w
2k
2a
1l
1f
1I
1h
2!
1 2SDA/HUFF/V2.0/13
Huffman Encoding: steps
c
5o
5 5u
3d
3w
2k
2f
1I
1h
2!
1a l
2 2
SDA/HUFF/V2.0/14
Huffman Encoding: steps
c
5o
5 5u
3d
3w
2k
2I
1h
2a l
2f !
23
SDA/HUFF/V2.0/15
c
5o
5 5u
3d
3w
2k
2h
2a l
I
2f !
34
Huffman Encoding: steps
SDA/HUFF/V2.0/16
c
5o
5 5u
3d
3k
2a l I
2f !
4 3w h
4
Huffman Encoding: steps
SDA/HUFF/V2.0/17
Huffman Encoding: steps
c
5o
5 5u
3d
3I
f !
3 4w h
k a l
4 6
SDA/HUFF/V2.0/18
Huffman Encoding: steps
c
5o
5 5f !
3 4w
h k
a l
4u d
6
I
SDA/HUFF/V2.0/19
c
5o
5 5f !
3 4w
h k
a l
4u d
6
I
7Huffman Encoding: steps
SDA/HUFF/V2.0/20
Huffman Encoding: steps
c
5o
5 5 4w h u d
6
f ! a l k
I
79
SDA/HUFF/V2.0/21
o
5 5u d
6
f !
a l k
I
7c
w h
9
10
Huffman Encoding: steps
SDA/HUFF/V2.0/22
u d
6f ! a l k
I c
7w h o
910
13
Huffman Encoding: steps
SDA/HUFF/V2.0/23
c
w h
9
o
10
u d f !
k a l
I
13 19
Huffman Encoding: steps
SDA/HUFF/V2.0/24
u d f !
k a l
I
13c
w h o
19
32
Huffman Encoding: steps
SDA/HUFF/V2.0/25
Huffman Encoding: steps
u d f !
a l k
I c
w h o
Total: 111 bits
SDA/HUFF/V2.0/26
Summary
Huffman encoding use frequency information to compress file.
The most frequent character get a shorter prefix code, and vice versa.
SDA/HUFF/V2.0/27
Further Reading
Chapter 12
SDA/HUFF/V2.0/28
What’s Next
Graph