Ragesh Jaiswal CSE, IIT Delhi
CSL 356: Analysis and Design of
Algorithms
Greedy Algorithms: Huffman
Coding
Greedy Algorithms: Huffman Coding
Problem: Given an alphabet set ∑ containing n alphabets and the frequency of occurrence of alphabets
(t(a1),t(a2),…,t(an)). Find a binary tree T with n leaves (one leaf labeled with one alphabet) such that:
OT = (d(a1)*t(a1)+d(a2)*t(a2)+…+ d(an)*t(an))
• d(ai) above is the depth of the leaf labeled with alphabet ai
What are the properties of the optimal tree T?
1. Claim: T is a complete binary tree.
Complete binary tree: Every non-leaf node has exactly two children.
2. Claim: Consider the two alphabets x, y with least frequency.
Then x and y have maximum depth in any optimal T and there is an optimal T where x and y are siblings.
Greedy Algorithms: Huffman Coding
Let Ω be a new symbol not present in ∑. Consider the following (smaller) problem:
∑’ = ∑ - {x,y} U {Ω}
For all z in ∑’\{Ω}, t’(z) = t(z)
t’(Ω) = t(x) + t(y)
Find the optimal binary tree for the new alphabet ∑’ and the new frequencies given by t’.
Let T’ be the optimal binary tree for the above problem.
Consider the leaf v labeled with Ω in T’. Consider a new tree T where v has two children that are leaves and are labeled with x and y.
Claim: T is the optimal tree for the original problem.
Running time?
Huffman(∑)
- Let v1,…,vn be nodes. Each node denoting an alphabet - S = {v1,…,vn}
-While (|S|>1)
- Pick two nodes x and y with the least value of t(x) and t(y) - Create a new node z and set t(z) = t(x) + t(y)
- Set x as the left child of z and y as the right child - Remove x and y from S and add z to S
-When |S|=1, return the only node in S as the root node of the Huffman Tree
Greedy Algorithms: Huffman Coding
Greedy Algorithms: Huffman Coding
A DNA sequence has four characters A,C,T,G and these
characters appear with frequency 30%, 20%, 10%, and 40%
respectively.
We have to encode a sequence of length 1 million(106) in bits.
If we use two bits for each character, then the size of the encoding will be 2 million bits.
Huffman coding:
f(A) = 10, f(C) = 110, f(T) = 111, f(G) = 0
We will need 1.9 million bits.
Greedy Algorithms
For some problems, even though the greedy strategy does not give an optimal solution but it might give a solution that is provably close to the optimal solution.
Greedy Approximation: Examples
Let S be a set containing n elements. A set of subsets
{S1,…,Sm} of S is called a covering set if each element in S is present in at least one of the subsets S1,…,Sm.
Problem (Set Cover): Given a set S containing n elements and m subsets S1,…,Sm of S. Find a covering set of S of minimum cardinality.
Example:
S = {a, b, c, d, e, f}
S1={a, b}, S2={a,c}, S3={a,c}, S4={d,e,f}, S5={e, f}
{S1,S2,S3,S4} is a covering set.
{S1,S2,S4} is a covering set of minimum size.
Greedy Approximation: Examples
Problem (Set Cover): Given a set S containing n elements and m subsets S1,…,Sm of S. Find a covering set of S of minimum cardinality.
Application: There are n villages and the government is
trying to figure out which villages to open schools at so that it has to open minimum number of schools. The constraint is that no children should have to walk more than 3 miles to get to a school.
Greedy Approximation: Examples
Problem (Set Cover): Given a set S containing n elements and m subsets S1,…,Sm of S. Find a covering set of S of minimum cardinality.
Greedy strategy: Give preference to the subset that covers most number of elements.
GreedySetCover(S, S1, …, Sm) -T = {}; R = S
- While R is not empty
- Pick a subset Si that covers the maximum number of elements in R -T=T U {Si}; R = R - Si
Greedy Approximation: Examples
Problem (Set Cover): Given a set S containing n elements and m subsets S1,…,Sm of S. Find a covering set of S of minimum cardinality.
Greedy strategy: Give preference to the subset that covers most number of elements.
GreedySetCover(S, S1, …, Sm) -T = {}; R = S
- While R is not empty
- Pick a subset Si that covers the maximum number of elements in R -T=T U {Si}; R = R - Si
Counterexample:
S={a,b,c,d,e,f,g, h}, S1={a, b, c, d, e}, S2={a, b, c, f}, S3={d, e, g, h}
Greedy Approximation: Examples
Claim: Let k be the optimal cardinality of the covering set.
Then the greedy algorithm outputs a covering set with cardinality at most k*ln(n).
Proof: Let Nt be the number of uncovered elements after t iterations of the loop.
Claim: Nt ≤ (1 – 1/k)*Nt-1
GreedySetCover(S, S1, …, Sm) -T = {}; R = S
- While R is not empty
- Pick a subset Si that covers the maximum number of elements in R -T=T U {Si}; R = R - Si
Greedy Approximation: Examples
Claim: Let k be the optimal cardinality of the covering set.
Then the greedy algorithm outputs a covering set with cardinality at most k*ln(n).
Proof: Let Nt be the number of uncovered elements after t iterations of the loop.
Claim: Nt ≤ (1 – 1/k)*Nt-1
Claim: N(k*ln(n)) < 1
Using the fact that (1-x) ≤ e-x and the equality holds only for x=0.
GreedySetCover(S, S1, …, Sm) -T = {}; R = S
- While R is not empty
- Pick a subset Si that covers the maximum number of elements in R -T=T U {Si}; R = R - Si
Greedy Approximation: Examples
Problem (Minimum Makespan): You have m identical machines and n jobs. For each job i, you are given the
duration of this job d(i) that denotes the time that required by any machine to perform this job. Assign these n jobs on m machines such that the maximum finishing time is
minimized.
Example:
10
40
5
30
60
35
Greedy Approximation: Examples
Problem (Minimum Makespan): You have m identical machines and n jobs. For each job i, you are given the duration of this job d(i) that denotes the time that required by any machine to
perform this job. Assign these n jobs on m machines such that the maximum finishing time is minimized.
Greedy Strategy: Assign the next job to a machine with least load
10
40
5
30
60
35
Greedy Approximation: Examples
Problem (Minimum Makespan): You have m identical machines and n jobs. For each job i, you are given the duration of this job d(i) that denotes the time that required by any machine to
perform this job. Assign these n jobs on m machines such that the maximum finishing time is minimized.
Greedy Strategy: Assign the next job to a machine with least load
10
40
5
30
60
35
Greedy Approximation: Examples
Problem (Minimum Makespan): You have m identical machines and n jobs. For each job i, you are given the duration of this job d(i) that denotes the time that required by any machine to
perform this job. Assign these n jobs on m machines such that the maximum finishing time is minimized.
Greedy Strategy: Assign the next job to a machine with least load
10
40
5
30
60
35
Greedy Approximation: Examples
Problem (Minimum Makespan): You have m identical machines and n jobs. For each job i, you are given the duration of this job d(i) that denotes the time that required by any machine to
perform this job. Assign these n jobs on m machines such that the maximum finishing time is minimized.
Greedy Strategy: Assign the next job to a machine with least load
10
40
5
30
60
35
Greedy Approximation: Examples
Problem (Minimum Makespan): You have m identical machines and n jobs. For each job i, you are given the duration of this job d(i) that denotes the time that required by any machine to
perform this job. Assign these n jobs on m machines such that the maximum finishing time is minimized.
Greedy Strategy: Assign the next job to a machine with least load
10
40
5
30 60
35
Greedy Approximation: Examples
Problem (Minimum Makespan): You have m identical machines and n jobs. For each job i, you are given the duration of this job d(i) that denotes the time that required by any machine to
perform this job. Assign these n jobs on m machines such that the maximum finishing time is minimized.
Greedy Strategy: Assign the next job to a machine with least load
10
40
5 30 60
35
Greedy Approximation: Examples
Problem (Minimum Makespan): You have m identical machines and n jobs. For each job i, you are given the duration of this job d(i) that denotes the time that required by any machine to
perform this job. Assign these n jobs on m machines such that the maximum finishing time is minimized.
Greedy Strategy: Assign the next job to a machine with least load
10
40
5 30 60
35
• Is this the optimal Solution?
Greedy Approximation: Examples
Let OPT be the optimal value.
Let G denote the maximum finishing time of a machine as per the greedy assignment.
Claim: G ≤ 2 * OPT
Claim: OPT ≥ (d(1) + … + d(n))/m
Claim: For any job t, OPT ≥ d(t)
Let the jth machine finish last. Let i be the last job assigned to machine j. Let s be the start time of job i on machine j.
Claim: s ≤ (d(1) + … + d(n))/m
GreedyMakespan
- While all jobs are not assigned
- Assign the next job to a machine with least load
Greedy Approximation: Examples
Let OPT be the optimal value.
Let G denote the maximum finishing time of a machine as per the greedy assignment.
Claim: G ≤ 2 * OPT
Proof:
Claim 1: OPT ≥ (d(1) + … + d(n))/m
Claim 2: For any job t, OPT ≥ d(t)
Let the jth machine finish last. Let i be the last job assigned to machine j. Let s be the start time of job i on machine j.
Claim 3: s ≤ (d(1) + … + d(n))/m
So, G ≤ s + d(i)
Greedy Approximation: Examples
Let OPT be the optimal value.
Let G denote the maximum finishing time of a machine as per the greedy assignment.
Claim: G ≤ 2 * OPT
Proof:
Claim 1: OPT ≥ (d(1) + … + d(n))/m
Claim 2: For any job t, OPT ≥ d(t)
Let the jth machine finish last. Let i be the last job assigned to machine j. Let s be the start time of job i on machine j.
Claim 3: s ≤ (d(1) + … + d(n))/m
So, G ≤ s + d(i)
This implies G ≤ (d(1) + … + d(n))/m + d(i) (from Claim 3)
Greedy Approximation: Examples
Let OPT be the optimal value.
Let G denote the maximum finishing time of a machine as per the greedy assignment.
Claim: G ≤ 2 * OPT
Proof:
Claim 1: OPT ≥ (d(1) + … + d(n))/m
Claim 2: For any job t, OPT ≥ d(t)
Let the jth machine finish last. Let i be the last job assigned to machine j. Let s be the start time of job i on machine j.
Claim 3: s ≤ (d(1) + … + d(n))/m
So, G ≤ s + d(i)
This implies G ≤ (d(1) + … + d(n))/m + d(i) (from Claim 3)
This implies G ≤ OPT + d(i) (from Claim 1)
Greedy Approximation: Examples
Let OPT be the optimal value.
Let G denote the maximum finishing time of a machine as per the greedy assignment.
Claim: G ≤ 2 * OPT
Proof:
Claim 1: OPT ≥ (d(1) + … + d(n))/m
Claim 2: For any job t, OPT ≥ d(t)
Let the jth machine finish last. Let i be the last job assigned to machine j. Let s be the start time of job i on machine j.
Claim 3: s ≤ (d(1) + … + d(n))/m
So, G ≤ s + d(i)
This implies G ≤ (d(1) + … + d(n))/m + d(i) (from Claim 3)
This implies G ≤ OPT + d(i) (from Claim 1)
This implies G ≤ OPT + OPT (from Claim 2)
End
Problems to think about:
1. Consider the following algorithm for minimum makespan problem:
• Sort the jobs in decreasing order of duration. Let L be the sorted list of jobs.
• While all jobs are not assigned
• Assign the next job in L to a machine with least load
Let G be the maximum finishing time as per greedy algorithm above and let OPT be the maximum finishing time as per the optimal schedule. Then G ≤ (4/3)*OPT