2.1.3
1. Sequential search can also be used for a sorted list. Write an algorithm called SortedSequentialSearch that will return the same results as the algo- rithm above but will run more quickly because it can stop with a failure the minute it finds that the target is smaller than the current list value. When you write your algorithm, use the Compare(x,y) function defined as
The Compare function should be counted as one comparison and can best be used in a switch. Do an analysis of the worst case, average case with the target found, and average with the target not found. (Note: This last analysis has many possibilities because of all of the additional early exits when the target is smaller than the current value.)
2. What is the average complexity of sequential search if there is a 0.25 chance that the target will not be found in the list and there is a 0.75 chance that when the target is in the list, it will be found in the first half of the list?
what is left of the list with each comparison. This results in the following algorithm:2
BinarySearch( list, target, N ) list the elements to be searched target the value being searched for N the number of elements in the list start = 1
end = N
while start ≤ end do
middle = (start + end) / 2
select (Compare(list[middle], target)) from case -1:start = middle + 1
case 0:return middle case 1:end = middle - 1 end select
end while return 0
In this algorithm, start gets reset to 1 larger than the middle when we know that the target is larger than the element at the middle location. End gets reset to 1 smaller than the middle when we know that the target is smaller than the element at the middle location. These are shifted by 1 because we know by the three-way comparison that the middle value is not equal and so can be elimi- nated from consideration.
Does this loop always stop? If we find the target, the answer is obviously Yes, because of the return. If we don’t find a match, each pass through the loop will either increase the value of start or decrease the value of end. This means that they will continue to get closer to each other. Eventually, they will become equal to each other, and the loop will be done one more time, with start = end = middle. After this pass (assuming that this is not the element we are looking for), either start will be 1 greater than middle and end, or end will be 1 less than middle and start. In both of these cases, the while
2 The function Compare(x,y) is defined in Exercise 1 of Section 2.1.3. As was men- tioned in that exercise, this function will return 1, 0, or 1, depending on whether x is less than, equal to, or greater than y, respectively. When analyzing an algorithm that uses Compare, it is counted as just one comparison.
loop’s conditional will become false, and the loop will stop. Therefore, the loop does always stop.
Does this algorithm return the correct answer? If we find the target, the answer is obviously Yes because of the return. If the middle element doesn’t match, each pass through the loop eliminates from consideration one-half of the remaining elements because they are all either too large or too small. As was discussed in the previous paragraph, we will eventually get down to just one element that must be examined.3 If this is the key we are looking for, the value of middle will be returned. If it is not the key we are looking for, start will become greater than end or end will become less than start. This means that if the target was in the list, it would be above or below the middle value, respectively. But, based on the values of start and end, we know that previous comparisons eliminated all of the other values, so the target is not in the list. The loop will stop, and the function will indicate a failed search by returning zero. So, the algorithm does return the correct answer.
Because of the halving nature of this algorithm, we will assume for our anal- ysis that N = 2k 1 for some value of k. If this is the case, how many elements will be left for the second pass? What about the third pass? In general, you should see that if on some pass of the loop we have 2j 1 elements under consideration, there are 2j1 1 elements in the first half, 1 element in the middle, and 2j1 1 in the second half. Therefore, the next pass will have 2j1 1 elements left (for 1 ≤j≤k). This assumption will make the following analysis easier to do, but this assumption is not necessary, as you will see in the exercises.
■ 2.2.1 Worst-Case Analysis
In paragraph above, we showed that the power of 2 is decreased by one each pass of the loop. It was also shown that the last pass of the loop occurs when the list has a size of 1, which occurs when j is 1 (21 1 = 1). This means that there are at most k passes when N = 2k 1. Solving this equation tells us that the worst case is k = lg(N + 1).
3 You should also see this from the process of repeatedly doing an integer division by 2.
No matter what size list you start with, if you keep dividing by 2 (throwing away the fractional portion), you will eventually wind up with a list of one element.
Building a decision tree for the search process can also help with this analy- sis. The nodes of the decision tree would have the element that is checked at each pass. Those elements that would be checked if the target is less than the current element would go into the left subtree and those checked when the target is greater would go into the right subtree. If our list had just seven ele- ments, the tree that would result is shown in Fig. 2.1. In general, we know that this tree is relatively balanced because we always choose the middle of the var- ious parts of the list. So, we can use formulas related to binary trees from Sec- tion 1.3.2 to get the number of comparisons.
Because we chose N = 2k 1, the resulting decision tree will be complete.
There will be k levels in the resulting tree, where k = lg(N + 1). Because we do one comparison on each level, the most we do is lg(N + 1) comparisons.
■ 2.2.2 Average-Case Analysis
As with sequential search, we will consider two situations when doing an aver- age-case analysis. In the first, the target will always be in the list, and in the sec- ond, the target may not be in the list.
The first situation will have N possible locations for the target. We will con- sider each of these to be equivalent and so will give each a probability of 1/N.
If we consider the binary tree that represents this search process, we will see that one comparison is done to find the element that is in the root of the tree on level 1. Two comparisons are done to find the elements that are in the nodes on level 2, and three comparisons are done to find the elements that are in the nodes on level 3. In general, i comparisons are done to find the elements that are in the nodes on level i. Section 1.3.2 showed that for a binary tree there are 2i1 nodes on level i, and when N = 2k 1, there are k levels in the tree. This
list[4]
list[2] list[6]
list[7]
list[5]
list[3]
list[1]
■ FIGURE 2.1 Decision tree for a search of a list of seven elements
means that we can determine the total number of comparisons done for every possible case by adding, for every level, the product of the number of nodes on each level and the number of comparisons for that level. This gives an average case of analysis of
We can use Equation 1.19 to simplify this equation to
BecauseN = 2k 1, 2k = N + 1.
AsN gets larger, k/N becomes zero, giving A N( ) 1
N---- i2i–1 forN
i=1
∑
k 2k–1= =
A N( ) 1 N---- * 1
2-- i2i
i=1
∑
k=
A N( ) 1 N---- * 1
2-- * [(k–1)2k+1+2]
= A N( ) 1
N----[(k–1)2k+1]
= A N( ) 1
N----[k 2k–2k+1]
=
A N( ) [k 2k–(2k–1)] ---N
=
A N( ) [k 2k–N] ---N
=
A N( ) k 2k ---N –1
=
A N( ) k N( +1) ---N –1
=
A N( ) k*N+k ---N –1
=
A N( )≈k–1
A N( )≈lg(N+1)–1
forN = 2k–1 forN = 2k–1
Now, let’s consider the second situation where we include the possibility that the target is not in the list of elements. We still have N possibilities for the target being in the list, but now we have to add in the N + 1 possibilities that the target is not in the list. There are N + 1 of these possibilities because the target can be smaller than the element in location 1, larger than the element in location 1 but smaller than the one in location 2, larger than the element in location 2 but smaller than the one in location 3, and so on, through the possi- bility that the target is larger than the element in location N. In each of these cases, it takes k comparisons to learn that the target is not in the list. There are now 2 * N + 1 possibilities to include in our calculation. Putting all of this together, we get
By a simiar series of substitutions as above, we get
This is just a little larger than the average case for when the key is known to be in the list. So, if the list has 1,048,575 (220 1) elements, the first average case is about 19 and the second is 19.5.
A N( ) 1 2N+1
--- i2i-1
i=1
∑
k
N+1
( )k
+ forN 2k–1
= =
A N( ) [(k–1)2k+1]+(N+1)k 2N+1
---
=
A N( ) [(k–1)2k+1]+(2k–1+1)k 2 2( k–1)+1
---
=
A N( ) (k2k–2k+1)+2kk 2k+1–1 ---
=
A N( ) k2k+1–2k+1 2k+1–1 ---
=
A N( ) k2k+1–2k+1 2k+1 ---
≈ A N( ) k 1
2--
≈ – lg(N+1) 1
2-- forN
– 2k–1
= =
2.2.3
1. Draw the decision tree for the binary search algorithm for a list of 12 ele- ments. For the internal nodes of your decision tree, the node should be labeled with the element checked, the left child should represent what hap- pens if the target is less than the element checked, and the right child should represent what happens if the target is greater than the element checked.
2. The analysis of binary search in this chapter assumed that the size was always 2k 1 for some value of k. For this question, we will explore other possibil- ities for the size:
a. What is the worst case when N≠ 2k 1?
b. What is the average case when N ≠ 2k 1, assuming that the key is in the list? Hint: Think about what this change in size means for the bottom of the search tree.
c. What is the average case when N≠ 2k 1, if the key might not be in the list? Hint: Think about what this change in size means for the bottom of the search tree.
3. When the collection of data is large, there can still be a large number of comparisons needed to do a binary search. For example, a telephone direc- tory of a large city could easily take about 25 comparisons per search. To improve this, multiway searching uses a general tree, which is a tree data structure that can have more than two children. In multiway searching, we store a few keys in each tree node, and the children represent the subtrees containing (a) the entries smaller than all the keys, (b) the entries larger than the first key but smaller than the rest, (c) the entries larger than the first two keys but smaller than the rest, and so on. The following figure shows an example of a general tree that can be used for multiway searching. In the root of this tree we have the keys of 6 and 10, so if we are looking for a key less than 6, we would take the left branch. If we are looking for a key between 6 and 10, we would take the middle branch, and for a key larger than 10, we would take the right branch.
2.2.3 EXERCISES
■
6/10 8
2/4 12/14/16
9 11 13 15 17
7 5
3 1
Write an algorithm to do a multiway search. For your answer, you can assume that each node has two arrays called Keys[3] and Links[4] and that you have a function called Compare(keyList, searchKey) that returns a positive integer indicating the key that matches or a negative inte- ger indicating the link to take. (For example, Compare([2, 6, 10], 6) would return a value of 2 because the second key matches, and Com- pare([2, 6, 10], 7) would return a value of –3 because 7 would be found on the third link associated with the gap between the second and third key value.) When you have finished your algorithm, do a worst- and average-case analysis assuming that the tree is complete and each internal node has four children. (You might want to draw a sample tree.) What would be the impact on your two analyses if the tree was not complete or if some internal nodes had less than four children?