Greedy Technique

(1)

Greedy Technique

(2)

Greedy Technique - Definition

The greedy method is a general algorithm design paradigm, built on the following elements:

configurations: different choices, collections, or values to find

objective function: a score assigned to configurations, which we want to either maximize or minimize

It works best when applied to problems with the

Greedy-choice property

Optimal sub structure

(3)

Greedy Choice Property

Make whatever choice seems best at the moment and then solve the sub-problems arising after the choice is made.

The choice made by a greedy algorithm may depend on choices so far. But, it cannot depend on any future

choices, it progresses one greedy choice after another iteratively reducing each given problem into a smaller one.

A greedy algorithm makes the decision early and will never reconsider the old decisions. It may not be

accurate for some problems.

(4)

Optimal Sub-structure

A problem exhibits optimal sub-structure, if an optimal solution to the sub-problem contains within its optimal solution to the problem.

This property is used to determine the usefulness of dynamic programming and greedy algorithms in a problem.

(5)

Greedy Technique - Making Change

Problem: A dollar amount to reach and a collection of coin amounts to use to get there.

Objective function: Minimize number of coins returned.

Greedy solution: Always return the largest coin you can Example 1: Coins are valued $.32, $.08, $.01

Has the greedy-choice property, since no amount over $.32 can be made with a minimum number of coins by omitting a $.32 coin (similarly for amounts over $.08, but under $.32).

Example 2: Coins are valued $.30, $.20, $.05, $.01

Does not have greedy-choice property, since $.40 is best made with two $.20’s, but the greedy solution will pick three coins (which ones?)

(6)

Task Scheduling or Activities Selection

Given: a set T of n tasks, each having:

A start time, s_i

A finish time, f_i (where s_i < f_i)

Goal: Perform all the tasks using a minimum number of

“machines.”

1 2 3 4 5 6 7 8 9

Machine 1 Machine 3 Machine 2

(7)

Task Scheduling or Activity Selection

Brute force:

Try all subsets of activities.

Choose the largest subset which is feasible.

The running time for listing all subsets of activities would be Θ(2ⁿ)

(8)

Task Scheduling or Activity Selection

1 2 3 4 5 6 7 8 9

0 10

7 5

6

6 0

5

9 5

4

8 3

3

5 3

2

4 1

1

f_i s_i

i

(9)

1 2 3 4 5 6 7 8 9

0 10

7 5

6

6 0

5

9 5

4

8 3

3

5 3

2

4 1

1

f_i s_i

i

Sorted by finish times

(10)

Algorithm Greedy_Activity (s[0…n-1], f[0…n-1]) A ← {1}

j ← 1

for i ← 2 to n do if s[j]>=f[j]

A← A + {i}

j = i return Activity

Exclude the sorting time, this algorithm ‘s running time T(n) = Θ(n)

If include the sorting time, what will be the total running time ? T(n) = ??

(11)

Proving Optimality

Proof Greedy choice property

Purpose: Showing that activity#1 (greedy choice) is in the optimal solution.

Let S = {1, 2, . . . , n} be the set of activities. Since activities are in order by finish time. It implies that activity 1 has the earliest finish time.

Suppose, A ^∈ S is an optimal solution and let activities in A are ordered by finish time. Suppose, the first activity in A is k.

If k = 1, then A begins with greedy choice and we are done (or to be very precise, there is nothing to proof here).

If k ≠1, there is another solution B that begins with greedy choice, activity 1.

Let B = (A - {k})+{1}. Because f₁≤ f_k the activities in B are disjoint and since B has same number of activities as A, i.e.,

|A| = |B|, B is also optimal.

(12)

Proving Optimality

Proof Optimal Substructure

Purpose: Show that an optimal solution to the problem contain within its optimal solutions to sub-problems.

Once the greedy choice is made, the problem reduces to

finding an optimal solution for the problem. If A is an optimal solution to the original problem S, then A` = A - {1} is an optimal solution to the activity-selection problem S`= {i ∈ S:

s_i ≥ f_i }.

Why? Because if we could find a solution B` to S` with more activities than A`, adding 1 to B` would yield a solution B to S with more activities than A, there by contradicting the

optimality.

(13)

The Fractional Knapsack Problem

Given: A set S of n items, with each item i having

b_i - a positive benefit

w_i - a positive weight

Goal: Choose items with maximum total benefit but with weight at most W.

If we are allowed to take fractional amounts, then this is the fractional knapsack problem.

In this case, we let x_i denote the amount we take of item i

Objective: maximize

∑

∈S i

i i

i x w

b ( / )

∑ ≤

(14)

Example

Given: A set S of n items, with each item i having

b_i - a positive benefit

w_i - a positive weight

Goal: Choose items with maximum total benefit but with weight at most W.

Weight:

Benefit:

1 2 3 4 5

4 ml 8 ml 2 ml 6 ml 1 ml

$12 $32 $40 $30 $50

Items:

10 ml

Solution:

• 1 ml of 5

• 2 ml of 3

• 6 ml of 4

• 1 ml of 2

“knapsack”

(15)

The Fractional Knapsack Algorithm

Algorithm fractionalKnapsack(S, W)

Input: set S of items w/ benefit b_i and weight w_i; max. weight W Output: amount x_i of each item i to maximize benefit w/ weight at most W

for each item i in S x_i ← 0

v_i ← b_i/ w_i //{value}

w ← 0 //{total weight}

while w < W

remove item i w/ highest v_i x_i ← min{w_i , W - w}

w ← w + min{w_i , W - w}

(16)

Proving Optimality

Let v₁/w₁ ≥ v₂/w₂ ≥ v₃/w₃ … ≥ v_n/v_n

Let x be the solution Æ

Let y be any feasible solution vector Æ

We want to show that Æ

i iv

∑

yⁱ ⁱ v

∑

x

∑

=

≥

n −

i

i i

i y v

x

1

0 )

(

0

… x_k 0

1

… 1

1

1 2 k n

x x_k < 1 (Fractional)

1 = Select a whole item 0 = Does not select at all

(17)

Proving Optimality

∑

=

⎪ ⎪

⎪

⎩

⎪⎪

⎪

⎨

⎧

=

n

−

i

i i

i

y v

x

1

) (

∑

⁻

= 1 −

1

) (

k

i i

i i i

i w

w v y x

k k k k

k w

w v y

x )

( −

+

0

… x_k 0

1

… 1

1

1 2 k n

x x_k < 1 (Fractional)

∑

+

=

− + ⁿ

k

i i

i i i

i w

w v y x

1

)

(

⎪ ⎪

⎭

⎪ ⎪

⎬

⎫ ∑

⁻

=

−

≥ ¹

1

) (

k

i k

k i i

i w

w v y x

k k k k

k w

w v y

x )

( −

=

∑

+

=

−

≥ ⁿ

k

i k

k i i

i w

w v y x

1

) (

(18)

Proving Optimality

∑

=

⎪ ⎪

⎪

⎩

⎪⎪

⎪

⎨

⎧

=

n

−

i

i i

i

y v

x

1

) (

∑

⁻

= 1 −

1

) (

k

i i

i i i

i w

w v y x

k k k k

k w

w v y

x )

( −

+

0

… x_k 0

1

… 1

1

1 2 k n

x

∑

+

=

− + ⁿ

k

i i

i i i

i w

w v y x

1

)

(

⎪ ⎪

⎭

⎪ ⎪

⎬

⎫

∑

=

−

=

ⁿ

i

i i

i k

k

x y w

w v

1

) (

∑

=

−

≥ ⁿ

i k

k i i

i w

w v y x

1

) (

≥ 0

(19)

Huffman Code

Huffman code is a technique for compressing data.

Huffman's greedy algorithm look at the occurrence of each character and it as a binary string in an optimal way.

Suppose we have a data consists of 100,000 characters that we want to compress. The characters in the data occur with following frequencies.

Consider the problem of designing a "binary character code" in which each character is represented by a

5,000 9,000

16,000 12,000

13,000 45,000

Frequency

f e

d C

b a

(20)

Fix Length Code

In fixed length code, we needs only 3 bits to represent six characters.

Total number of characters are 45,000 + 13,000 + 12,000 + 16,000 + 9,000 + 5,000 = 100,000.

Add each character is assigned 3-bit codeword => 3*

100,000 = 300,000 bits.

5,000 9,000

16,000 12,000

13,000 45,000

Frequency

101 100

011 010

001 000

Fix Length Code

f e

d C

b a

(21)

Variable Length Code

A variable-length code gives frequent characters in shorter codewords (a sequence of bits) and infrequent characters in longer codewords using prefix codes.

In Prefix Codes no codeword is a prefix of other

codeword. The reason prefix codes are desirable is that they simply encoding (compression) and decoding.

A variable Length Code requires only 224,000 bits

5,000 9,000

16,000 12,000

13,000 45,000

Frequency

1100 1101

111 101

100 0

f e

d C

b a

(22)

Huffman Code – Binary Tree

a

b c

d

0

0 0

0 1

1 1 1

1

5,000 9,000

16,000 12,000

13,000 45,000

Frequency

1100 1101

111 101

100 0

f e

d C

b a

(23)

Constructing Huffman Code

a = 45 b = 13 c = 12 f = 5 e = 9 d = 16

30 14

25

55 100

(24)

Huffman Code – Binary Tree

Given a tree T corresponding to the prefix code,

compute the number of bits required to encode a file.

Let f(c) be the frequency of c and let dT(c) denote the depth of c's leaf. Note that dT(c) is also the length of codeword. The number of bits to encode a file is

B(T) = ∑f(c) dT(c)

= 45*1 +13*3 + 12*3 + 16*3 + 9*4 +5*4 = 224

= 224*1000 = 224,000

(25)

Huffman Code – Binary Tree

line 2, BuildHeap is in O(n) time.

for loop executed n - 1 times

Each heap operation requires O(log n) time. Therefore, the for Algorithm Huffman(C,n)

Q = BuildHeap(C) for i ← 1 to n-1 do

z ← Allocate-Node() z.left ← Extract_Min(Q) z.right ← Extract_Min(Q)

z.freq ← z.left.freg + z.right.freg Insert(Q,z)

Return Extract_Min(Q)

(26)

Optimal Substructure

B(T) = B(T*) + f(x)dT(x)+f(y)dT(y) – f(c)dT*(c)

= B(T*) + f(x)+f(y)

x y

T T*

c c

f(x)dT(x)+f(y)dT(y) = (f(x) + f(y))(dT*(c)+1)

= f(c)dT*(c) + f(x) + f(y)

(27)

Greedy Choice Property

If x and y are the nodes that have the least frequency, then there exists an optimal tree (represent prefix code) that contain node x and y as the deepest depth.

Let T is the optimal tree, x and y is the node that has the least frequency.

(28)

Greedy Choice Property

∑f(c)dT(c) - ∑f(c)dT*(c) //Before swap – After swap

= f(x)dT(x) + f(b)dT(b) – f(x)dT*(x) – f(b)dT*(b)

= f(x)dT(x) + f(b)dT(b) – f(x)dT(b) – f(b)dT(x)

b c

T

x y

x c

T*

b y

x y

T**

b c

(29)

Single Source Shortest Path (Dijkstra’s Algorithm)

Given a vertex called the source in a weighted

connected graph, find shortest paths to all its other vertices. (This is not a TSP)

The distance of a vertex v from a vertex s is the length of a shortest path between s and v

Dijkstra’s algorithm computes the distances of all the vertices from a given start vertex s

Assumptions:

the graph is connected

the edges are undirected

the edge weights are nonnegative

(30)

Single Source Shortest Path (Dijkstra’s Algorithm)

Grow a “cloud” of vertices, beginning with s and eventually covering all the vertices

Store with each vertex v a label d(v) representing the distance of v from s in the sub-graph consisting of the cloud and its adjacent vertices

At each step

Add to the cloud the vertex u outside the cloud with the smallest distance label, d(u)

Update the labels of the vertices adjacent to u

(31)

Edge Relaxation

Consider an edge e = (u,z) such that

u is the vertex most recently added to the cloud

z is not in the cloud

The relaxation of edge e updates distance d(z) as follows:

d(z) ← min{d(z),d(u) + weight(e)}

d(z) = 75 d(u) = 50

10

u e

d(z) = 60 d(u) = 50

10

s u e z

(32)

Example

C B

A

E

D

F 0

4 2

8

∞ ∞

8 4

7 1

2 5

2

3 9

C B

A

D 0

3 2

8

5 11

8 4

7 1

2

3 9

B C

A

E

D F

0

3 2

8

5 8

8 4

7 1

2 5

2

3 9

C B

A

D 0

3 2

7

5 8

8 4

7 1

2

3 9

(33)

Example

C B

A

E

D F

0

3 2

7

5 8

8 4

7 1

2 5

2

3 9

C B

A

E

D F

0

3 2

7

5 8

8 4

7 1

2 5

2

3 9

(34)

Pseudo Code

1 Algorithm Dijkstra(G, w, s)

2 for each vertex v in V[G] // Initializations 3 d[v] ← infinity

4 previous[v] ← undefined 5 d[s] ← 0

6 S ← empty set

7 Q ← V[G] // Build Q and Store V to Q

8 while Q is not an empty set // The algorithm itself 9 u ← Extract_Min(Q)

10 S ← S + {u}

11 for each edge(u,v) outgoing from u

12 if d[u] + w(u,v) < d[v] // Relax (u,v)

13 d[v] ← d[u] + w(u,v)

14 previous[v] ← u

(35)

Analysis

The time efficiency of Dijkstra’s algorithm depend on the data structures used for implementing the priority queue and for representing an input graph itself.

If we store a graph in a form of an ordinary linked list or array and for representing Q, operation

Extract-Min(Q) is a linear search through all vertices in Q. The running time is O(V²).

If we store a graph in a form of adjacency lists and using a binary heap as a priority queue (to implement the Extract-Min() function). With a binary heap, the algorithm requires O((E+V)logV) time, recall that ∑ deg(v) = 2E

The running time can also be expressed as

(36)

Why Dijkstra’s Algorithm Works

Dijkstra’s algorithm is based on the greedy

method. It adds vertices by increasing distance.

C B

A

E

D F

0

3 2

7

5 8

8 4

7 1

2 5

2

3 9

Suppose it didn’t find all shortest distances. Let F be the first wrong vertex the algorithm processed.

When the previous node, D, on the true shortest path was considered, its distance was correct.

But the edge (D,F) was relaxed at that time!

Thus, so long as d(F)>d(D), F’s

distance cannot be wrong. That is, there is no wrong vertex.

(37)

Why It Doesn’t Work for Negative- Weight Edges

If a node with a negative

incident edge were to be added late to the cloud, it could mess up distances for vertices already in the cloud.

C B

A

E

D F

0

4 5

7

5 9

8 4

7 1

2 5

6

0 -8

Dijkstra’s algorithm is based on the greedy

method. It adds vertices by increasing distance.

C’s true distance is 1, but it is already in the cloud

(38)

Acknowledgement

http://www.personal.kent.edu/~rmuhamma/Algorith ms/algorithm.html

http://ww3.algorithmdesign.net/

http://en.wikipedia.org/wiki/