a. If the key is a number in the range of 0 to 264, choose two options (one smaller and one larger) for the number of bits that will be used on each pass and indicate how many buckets and passes will be needed.
b. If the key is a string of 40 characters, choose two options (one smaller and one larger) for the number of bits that will be used on each pass and indicate how many buckets and passes will be needed.
c. Based on your answers to parts (a) and (b), can you give any general rec- ommendations for how to make the choice of passes and key subdivi- sions?
of the list grows. We can, however, use the space for the list itself and do this sort without extra space. We can “build” the list into a heap if we notice that in a heap each internal node has two children, except for perhaps one node toward the bottom. If we consider the following mapping, we can use the list to hold these values. For the node at location i, we will store its two children at locations 2 *i and 2 *i + 1. Notice that this process produces distinct locations for each node’s children. We know that node i is a leaf if 2 *iis greater than N, and we know that node i has only one child if 2 * i is equal to N. Figure 3.3 shows a heap and its list version.
FixHeap
When we take the largest element out of the root and move it to the list, this leaves the root vacant. We know that the larger of its two children needs to be moved up, but then that child’s node becomes vacant, and so we look at its two children, and so on. In this process, we need to maintain the heap as close to a complete tree as possible. When we fix the heap, we will also pass in the right- most node from the bottom level, to be inserted back into the heap. This will remove nodes evenly from the bottom. If we don’t do this and all of the large values are on one side of the heap, the heap will be unbalanced and the algo- rithm’s efficiency will go down. This gives us the following algorithm:
FixHeap( list, root, key, bound ) list the list/heap being sorted
root the index of the root of the heap
8
11
6
4 3
7 2
1
5 9
10
12
Locations 1 2 3 4 5 6 7 8 9 10 11 12
12 10 11 5 9 8 6 1 2 7 3 4
■ FIGURE 3.3 A heap and its list implementation
key the key value that needs to be reinserted in the heap bound the upper limit (index) on the heap
vacant = root
while 2*vacant ≤ bound do largerChild = 2*vacant
// find the larger of the two children
if (largerChild < bound) and (list[largerChild+1] > list[largerChild]) then largerChild = largerChild + 1
end if
// does the key belong above this child?
if key > list[ largerChild ] then // yes, stop looping
break else
// no, move the larger child up list[ vacant ] = list[ largerChild ] vacant = largerChild
end if end while
list[ vacant ] = key
When you look at this algorithm’s parameters, you might wonder why we have chosen to pass in the root location. Because this routine is not recursive, the root of the heap should always be location 1. You will see, however, that this additional parameter will make it possible for us to use this function to construct the heap from the bottom up. We pass in the size of the heap, because as the elements get moved from the heap to the list, the heap shrinks.
Constructing the Heap
The way that we have chosen to implement the FixHeap function means that we can use this in the initial construction of the heap. Any two values can be treated as leaves of a vacant node. We do not need to do any work on the sec- ond half of the list because they are all leaves. We just need to construct small heaps from the leaves and then combine these until eventually all values are in the heap. This is accomplished by the following loop:
for i = N/2 down to 1 do
FixHeap( list, i, list[ i ], N ) end for
Final Algorithm
Putting these pieces together, and adding the final details needed to move the elements from the heap to the list, gives the algorithm
for i = N/2 down to 1 do
FixHeap( list, i, list[ i ], N ) end for
for i = N down to 2 do max = list[ 1 ]
FixHeap( list, 1, list[ i ], i-1 ) list[ i ] = max
end for
■ 3.5.1 Worst-Case Analysis
We begin by analyzing FixHeap because the rest of the algorithm depends on it. This algorithm will do, for each level of the heap, a comparison between the two children and then the larger child and the key. This means that for a heap of depth D, there will be no more than 2D comparisons. 2
In the heap construction stage, we first call FixHeap for each node on the second level from the bottom, which means heaps of depth 1. Then it is called for each node on the third level from the bottom, which means heaps of depth 2. On the last pass at the root level, the heap will have depth lgN. The only thing we need to consider is how many nodes is FixHeap called for at each pass. At the root level there is one node, and it has two children at the second level. The next level down has at most four nodes, and the level after that has eight. Putting all of this together gives the following formula:
2 The depth is 1 less than the level number. A heap with four levels will have a maxi- mum depth of 3. The root is at depth 0, its children are at depth 1, their children are at depth 2, and so on.
WConstruction( )N 2(D–i)2i
i=0 D–1
∑
=
WConstruction( )N 2D 2i 2 i2i
i=0 D–1
∑
–
i=0 D–1
∑
=
Using Equations 1.17 and 1.19, we get
We now substitute D = lg N, giving
So the heap construction phase is linear with respect to the number of ele- ments in the list.
Now we consider the main loop of the algorithm. In this loop, we remove one element from the heap and then call FixHeap. This gets repeated until there is only one element left in the heap. On each pass the number of ele- ments decreases by 1, but how does this change the depth of the heap? We have already said that the entire heap has a depth of lg N, and so it is easy to see that if there are K nodes left in the heap, it will have depth of lgK, and the number of comparisons is twice this number. This means that the worst case for the loop is
The problem we now have is that there is no standard form for this summation.
Let’s think about how this breaks down. When k is 1, lgk will be 0. When k is 2 and 3, lgk will be 1. When k is 4 through 7, lgk will be 2. When k is 8 through 15, lg k will be 3. We notice that in each of these cases, when the result is j, there are 2j values that give that result. This will be true for all but the last level of the heap if it is not complete. On that level, we see that there are N 2lgN elements.3 This means that our equation can be represented as:
3 Because there is a floor involved with the exponent of 2, 2lgN < N, except when N is an exact power of 2, and then these are equal.
WConstruction( )N = 2D(2D–1)–2[(D–2)2D+2] WConstruction( )N = 2D+1D–2D–2D+1D+2D+2–4 WConstruction( )N = 2D+2–2D–4
WConstruction( )N = 4*2lgN–2 lg N–4 WConstruction( )N = 4N–2 lg N–4 = O N( )
Wloop( )N 2 lgk
k=1
N–1
∑
2 lgkk=1 N–1
∑
= =
Wloop( )N 2 k2k
k=1
∑
d–1
d N( –2d)
+ where d lgN
= =
Using Equation 1.19, we get
Now, substituting lgN for d, we get
Now, we have to add together the construction and loop stages to get our final result. This gives the equation
■ 3.5.2 Average-Case Analysis
To get the average-case analysis for heapsort, we will take a different approach.
Let’s consider the best case, where the values are initially in the array in the reverse order. You should be able to see that this is automatically a correct heap.
This means that each call to FixHeap in the construction phase will do only two comparisons to show the values are properly ordered. So, because Fix- Heap is called for about one-half of the elements and each call does two com- parisons, the construction stage does about N comparisons, which is the same order as in the worst case.
Notice that no matter what order the elements are in at the start, after the construction stage we always have a heap. So, in every case, the for loop will have to execute the same number of times as in the worst case, because to get the sorted values we have to take each element out of the heap and then fix the heap. So, in the best case, heapsort does about N + N lg N comparisons. This means that the best case is O(N lg N).
For heapsort, the best and worst cases are both O(N lg N). This can only mean that the average case must also be O(N lg N).
Wloop( )N = 2[(d–2)2d+2+d N( –2d)] Wloop( )N = d2d+1–2d+2+4+2d*N–d2d+1 Wloop( )N = 2d*N–2d+2+4
Wloop( )N = 2 lgN N–2 lgN +2+4
Wloop( )N = 2 lgN N–4 lgN +4 = O N( lgN)
W N( ) = Wconstruction( )N +Wloop( )N
W N( ) = 4N–2lgN–4+2 lgN N–4 lgN +4 W N( ) ≈ 4N+2 lgN (N–3)
W N( ) = O N( lgN)
3.5.3
1. Given the list of elements, [23, 17, 21, 3, 42, 9, 13, 1, 2, 7, 35, 4], what would be their order after the loop for the heap construction phase exe- cutes?
2. Given the list of elements, [3, 9, 14, 12, 2, 17, 15, 8, 6, 18, 20, 1], what would be their order after the loop for the heap construction phase exe- cutes?
3. We could shorten the second for loop in our heapsort by changing the ending condition to i ≥ 3. What, if anything, would need to be added after thisfor loop to assure that the final list is sorted? Would this change reduce the number of comparisons (give a detailed reason for your answer)?
4. Prove that a list in reverse order is a heap.