DIVIDE AND CONQUER ALGORITHMS - Analysis of Algorithms : An Active Learning Approach

2. For each of the following pairs of functions f(n) and g(n), either f(n)=O(g(n)) or g(n)=O(f(n)), but not both. Determine which is the case.

for i = 1 to numberSmaller do

DivideAndConquer(smallerSets[i], smallerSizes[i], smallSolution[i]) end for

CombineSolutions(smallSolution, numberSmaller, solution) end if

This algorithm will ﬁrst check to see if the problem size is small enough to determine a solution by some simple nonrecursive algorithm (called Direct- Solution above) and, if so, will do that. If the problem is too large, it will ﬁrst call the routine DivideInput, which will partition the input in some fashion into a number (numberSmaller) of smaller sets of input values.

These smaller sets may be all of the same size or they may have radically different sizes. The elements in the original input set will all be put into at least one of the smaller sets, but values can be put in more than one. Each of these smaller sets will have fewer elements than the original input set. The ^Divide- AndConquer algorithm is then called recursively for each of these smaller input sets, and the results from those calls are put together by the Combine- Solutions function.

The factorial of a number can easily be calculated by a loop, but for the pur- pose of this example, we consider a recursive version. You can see that the factorial of the number N is just the number N times the factorial of the number N 1. This leads to the following algorithm:

Factorial( N )

N is the number we want the factorial for Factorial returns an integer

If (N = 1) then return 1 else

smaller = N - 1

answer = Factorial( smaller ) return (N * answer)

end if

This algorithm is written in simple detailed steps so that we can match things up with the standard algorithm above. Even though earlier in this chapter we discussed how multiplications are more complex than additions and are, therefore, counted separately, to simplify this example we are going to count them together.

In matching up the two algorithms, we see that the size limit in this case is 1, and our direct solution is to return the answer of 1, which takes no mathe- matical operations. In all other cases, we use the ^else clause. The ﬁrst step in the general algorithm is to “divide the input” into smaller sizes, and in the factorial function that is the calculation of smaller, which takes one subtraction.

The next step in the general algorithm is to make the recursive calls with these smaller problems, and in the factorial function there is one recursive call with a problem size that is 1 smaller than the original. The last step in the general algorithm is to combine the solutions, and in the factorial function that is the multiplication in the last return statement.

Recursive Algorithm Efﬁciency

How efﬁcient is a recursive algorithm? Would it make it any easier if you knew that the direct solution is quadratic, the division of the input is logarithmic, and the combination of the solutions is linear,¹ all with respect to the size of the input, and that the input set is broken up into eight pieces all one- quarter of the original? This is probably not a problem for which you can quickly ﬁnd an answer or for that matter are even sure where to start. It turns out, however, that the process of analyzing any divide and conquer algorithm is very straightforward if you can map the steps of your algorithm into the four steps shown in the generic algorithm above: a direct solution, division of the input, number of recursive calls, and combination of the solutions. Once you know how each piece relates to the others, and you know how complex each piece is, you can use the following formula to determine the complexity of the divide and conquer algorithm:

where DAC is the complexity of DivideAndConquer DIR is the complexity of DirectSolution DIV is the complexity of DivideInput

COM is the complexity of CombineSolutions

1 To say that an algorithm is linear is the same as saying its complexity is in O(N). If it’s quadratic, it is in O(N²), and logarithmic is in O(lgN).

DAC( )N

DIR( )N forN≤SizeLimit

DIV( )N DAC smallerSizes( [ ]i )+COM( )N

i=1 numberSmaller

∑

+ forN>SizeLimit







Now that we have this generic formula, the answer to the question posed at the start of the last paragraph is quite easy. All we need to do is to plug in the complexities for each piece given into the previous general equation. This gives the following result:

or a bit more simply, because all smaller sets are the same size,

This form of equation is called a recurrence relation because the value of the function is based on itself. We prefer to have our equations in a form that is dependent only on N and not other function calls. The process that is used to remove the recursion in this equation will be covered in Section 1.6, which covers recurrence relations.

Let’s return to our factorial example. We identiﬁed all of the elements in the factorial algorithm relative to the generic DivideAndConquer. We now use that identiﬁcation to decide what values get put into the general equation above. For the ^Factorial function, we said that the direct solution does no calculations, the input division and result combination steps do one calculation each, and the recursive call works with a problem size that is one smaller than the original. This results in the following recurrence relation for the number of calculations in the ^Factorial function:

■ 1.5.1 Tournament Method

The tournament method is based on recursion and can be used to solve a number of different problems where information from a first pass through the data can help to make later passes more efficient. If we use it to find the largest value, this method involves building a binary tree with all of the elements in

DAC( )N

N² forN≤SizeLimit

lgN DAC(N⁄4)+N

i=1

∑

+ forN>SizeLimit









DAC( )N N² forN≤SizeLimit

lgN+8 DAC(N⁄4)+N forN>SizeLimit



= 

Calc( )N 0 for N= 1

1+Calc(N–1)+1 forN>1



= 

the leaves. At each level, two elements are paired and the larger of the two gets copied into the parent node. This process continues until the root node is reached. Figure 1.3 shows a complete tournament tree for a given set of data.

In Exercise 5 of Section 1.1.3, it was mentioned that we would develop an algorithm to ﬁnd the second largest element in a list using about N comparisons. The tournament method helps us do this. Every comparison produces one “winner” and one “loser.” The losers are eliminated and only the winners move up in the tree. Each element, except for the largest, must “lose” one comparison. Therefore, building the tournament tree will take N 1 comparisons.

The second largest element could only have lost to the largest element. We go down the tree and get the set of elements that lost to the largest one. We know that there can be at most lgN of these elements because of our tree formulas in Section 1.3.2. There will be lgN comparisons to ﬁnd these elements in the tree and lgN 1 comparisons to ﬁnd the largest in this collec- tion. The entire process takes N + 2 lgN 2 comparisons, which is O(N).

The tournament method could also be used to sort a list of values. In Chapter 3, we will see a method called heapsort that is based on the tournament method.

■ 1.5.2 Lower Bounds

An algorithm is optimal when there is no algorithm that will work more quickly. How do we know when have we found an algorithm that is optimal or when is an algorithm not optimal, but good enough? To answer these ques-

6 8

4 6

3 2

8 7

1 5

■ FIGURE 1.3 Tournament tree for a set of eight values

tions, we need to know the absolute smallest number of operations needed to solve a particular problem. This must be determined by looking at the problem itself and not any particular algorithm to solve it. This lower bound tells us the amount of work that is necessary to solve the problem and shows that any algorithm that claims to be able to solve the problem more quickly must fail in some cases.

We can again use a binary tree to help us analyze the process of sorting a list of three numbers. We can construct a binary tree for the sorting process by labeling each internal node with the two elements of the list that would be compared. The ordering of the elements that would be necessary to move from the root to that leaf would be in the leaves of the tree. The tree for a list of three elements is shown in Fig. 1.4. Trees of this form are called decision trees.

Each sort algorithm produces a different decision tree based on the elements that it compares. Within a decision tree, the longest path from the root to a

x₁≤x₂

x₁≤x₃ x₁≤x₃

x₃,x₁,x₂

x₂≤x₃ x₂≤x₃

x₁,x₃,x₂ x₁,x₂,x₃

x₂,x₁,x₃

x₂,x₃,x₁ x₃,x₂,x₁

■ FIGURE 1.4 The decision tree for sorting a three- element list

leaf represents the worst case. The best case is the shortest path. The average case is the total number of edges in the decision tree divided by the number of leaves in the tree. As simple as it would seem to be able to determine these numbers by drawing decision trees and counting, think about what the decision tree would look like for a sort of 10 numbers. As was said before, there are 3,628,800 different ways these can be ordered. A decision tree would need at least 3,628,800 different leaves, because there may be more than one way to get to the same order. This tree would then need at least 22 levels.

So, how can decision trees be used to give us an idea of the bounds on an algorithm? We know that a correct sorting algorithm must properly order all of the elements no matter what order in which they begin. There must be at least one leaf for every possible permutation of input values, which means that there must be at least N! leaves in the decision tree. A truly efﬁcient algorithm would have each permutation appear only once. How many levels does a tree withN! leaves have? We have already seen that each new level of the tree will have twice as many nodes as the previous level. Because there are 2^K–1 nodes on level K, our decision tree will have L levels, where L is the smallest integer withN!≤ 2^L–1. Applying algebraic transformations to this formula we get

Because we are trying to ﬁnd out the smallest value for L, is there anyway to simplify this equation further to get rid of the factorial? Let’s see what we can observe about the factorial of a number. Consider the following:

By Equation 1.5, we get

By Equation 1.21, we get

lgN!≤L–1

lgN! = lg(N * (N–1) * (N–2) * … * 1)

lg(N * (N–1) * (N–2) * … * 1) = lg( )N +lg(N–1)+lg(N–2) …+ +lg(1) lg( )N +lg(N–1)+lg(N–2) …+ +lg 1( ) lgi

i=1

∑

lgi

i=1

∑

N ^≈^{N lg N}^–^1.5

lgN!≈N lg N

This means that L, the minimum depth of the decision tree for sorting problems, is of order O(N lg N). We now know that any sort that is of order O(N lg N) is the best we will be able to do, and it can be considered optimal.

We also know that any sort algorithm that runs faster than O(N lg N) must not work.

This analysis for the lower bound for a sorting algorithm assumed that it does its work through the comparison of pairs of values from the list. In Chapter 3, we will see a sorting algorithm (radix sort) that will run in linear time. That algorithm doesn’t compare key values but rather separates them into “buckets” to accomplish its work.

1.5.3

1. Fibonacci numbers can be calculated with the algorithm that follows. What is the recurrence relation for the number of “additions” done by this algorithm? Be sure to clearly indicate in your answer what you think the direct solution, division of the input, and combination of the solutions are.

int Fibonacci( N )

N the N^th Fibonacci number should be returned if (N = 1) or (N = 2) then

return 1 else

return Fibonacci( N-1 ) + Fibonacci( N-2 ) end if

2. The greatest common divisor (GCD) of two integers M and N is the largest integer that divides evenly into both M and N. For example, the GCD of 9 and 15 is 3, and the GCD of 51 and 34 is 17. The following algorithm will calculate the greatest common divisor of two numbers:

GCD(M, N)

M, N are the two integers of interest

GCD returns the integer greatest common divisor if ( M < N ) then

swap M and N end if

if ( N = 0) then return M else

quotient = M / N //NOTE: integer division 1.5.3 EXERCISES

■

remainder = M mod N

return GCD( N, remainder ) end if

Give the recurrence relation for the number of multiplications (in this case, the division and mod) that are done in this function.

3. We have a problem that can be solved by a direct (nonrecursive) algorithm that operates in N² time. We also have a recursive algorithm for this problem that takes N lg N operations to divide the input into two equal pieces and lg N operations to combine the two solutions together. Show whether the direct or recursive version is more efﬁcient.

4. Draw the tournament tree for the following values: 13, 1, 7, 3, 9, 5, 2, 11, 10, 8, 6, 4, 12. What values would be looked at in stage 2 of ﬁnding the second largest value in the list?

5. What is the lower bound on the number of comparisons needed to do a search through a list with N elements? Think about what the decision tree might look like for this problem in developing your answer. (Hint: The nodes would be labeled with the location where the key is found.) If you packed nodes into this tree as tightly as possible, what does that tell you about the number of comparisons needed to search?

Dalam dokumen Analysis of Algorithms : An Active Learning Approach (Halaman 41-49)