Splitting the List
There are at least two versions of the PivotList function. The first is easy to program and understand and is presented in this section. The other is more complicated to write but is faster than this version. The second version will be considered in the exercises.
The function PivotList will pick the first element of the list as its pivot element and will set the pivot point as the first location of the list. It then moves through the list comparing this pivot element to the rest of the ele- ments. Whenever it finds an element that is smaller than the pivot element, it will increment the pivot point and then swap this element into the new pivot point location. After some of the elements are compared to the pivot inside the loop, we will have four parts to the list. The first part is the pivot element in the first location. The second part is from location first + 1 through the pivot point and will be all of the elements we have looked at that are smaller than the pivot element. The third part is from the location after the pivot point through the loop index and will be all of the elements we have looked at that are larger than the pivot element. The rest of the list will be values we have not yet examined. This is shown in Fig. 3.4.
The algorithm for PivotList is as follows:
PivotList( list, first, last ) list the elements to work with first the index of the first element last the index of the last element PivotValue = list[ first ]
PivotPoint = first
for index = first + 1 to last do if list[ index ] < PivotValue then
pivot < pivot ≥ pivot unknown
Pivot Point
Index
First Last
■ FIGURE 3.4 Relationship between the indices and element values in PivotList
PivotPoint = PivotPoint + 1
Swap( list[ PivotPoint ], list[ index ] ) end if
end for
// move pivot value into correct place Swap( list[ first ], list[ PivotPoint ] ) return PivotPoint
■ 3.7.1 Worst-Case Analysis
WhenPivotList is called with a list of N elements, it does N 1 compari- sons as it compares the PivotValue with every other element in the list. Because we have already said that quicksort is a divide and conquer algorithm, you might assume that a best case would be when PivotList creates two parts that are the same size and you would be correct. The worst case would then be when the lists are of drastically different sizes. The largest difference in the size of these two lists occurs if the PivotValue is smaller (or larger) than all of the other values in the list. In that case, we wind up with one part that has no ele- ments and the other that has N 1 elements. If the same thing happens each time we apply this process, we would only remove one element (the Pivot- Value) from the list at each recursive call. This means we would do the num- ber of comparisons given by the following formula:
What original ordering of elements would cause this behavior? If each pass chooses the first element, that element must be the smallest (or largest). A list that is already sorted is one arrangement that would cause this worst case behavior! In all of the other sort algorithms we have considered, the worst and average cases have been about the same, but as we are about to see, this is not true for quicksort.
■ 3.7.2 Average-Case Analysis
You will recall that when we looked at shellsort, we considered the number of inversions that each comparison removed in our analysis. At that time, we pointed out that bubble sort and insertion sort didn’t do well on average because they both removed only one inversion for each comparison.
W N( ) (i–1)
i=2
∑
N N N---( 2–1)= =
So, how does quicksort do in removing inversions? Consider a list of N ele- ments that the PivotList algorithm is working on. Let’s say that the Pivot- Value is greater than all of the values in the list. This means that at the end of the routine PivotPoint will be N, and so the PivotValue will be switched from the first location to the last location. It is also possible that the element in the last location is the smallest value in the list. So swapping these two values will move the largest element from the first location to the last and will move the smallest element from the last location to the first. If the largest element is first, there are N 1 inversions of it with the rest of the elements in the list, and if the smallest element is last, there are N 1 inversions of it with the rest of the elements in the list. This one swap can remove 2N 2 inversions from the list. It is because of this possibility that quicksort has an average case that is significantly different from its worst case.
Notice that PivotList is doing all of the work, and so we first look at this algorithm to see what it does in the average case. We first notice that it is possi- ble for each of the N locations in the list to be the location of the Pivot- Value when PivotList is done. To get the average case, we have to look at what happens for each of these possibilities and average the results. When look- ing at the worst case, we noticed that for a list of N elements there are N 1 comparisons done by PivotList in dividing the list. There is no work done to put the lists back together. Lastly, notice that when PivotList returns a value of P, we call Quicksort recursively with lists of P 1 and NP ele- ments. Our average case analysis needs to look at all N possible values for P. Putting this together gives the recurrence relation
If you look closely at the summation, you will notice that the first term is used with values from 0 through N 1, and the second term is used with val- ues from N 1 down to 0. This means that the summation adds up every value of A from 0 to N 1 twice. This gives us the following simplification:
A N( ) (N–1) 1
N---- [A i( –1)+A N( –i)]
i=1
∑
N
+ forN≥2
=
A( )1 = A( )0 = 0
A N( ) (N–1) 1
N---- 2 A i( )
i=0 N–1
∑
forN≥2 +
=
A( )1 = A( )0 = 0
This is a very complicated form of recurrence relation because it depends on not just one smaller value of A, but rather on every smaller value for A.
There are two ways to go about solving this. The first is to come up with an educated guess for the answer and to then prove that this answer does satisfy the recurrence relation. The second way is to look at the equations for both A(N) and A(N 1). Those two equations differ by only a few terms. We now computeA(N)* N and A(N 1) * (N 1) to get rid of the two fractions.
This gives
Now, we subtract the third equation above from the second and simplify to get
AddingA(N 1) * (N 1) to both sides, we get
This gives our final recurrence relation:
Solving this is not difficult but does require care because of all of the terms on the right-hand side of the equation. If you work through all of the details, you will see the final result is A(N)⬇ 1.4 (N + 1) lg N. Quicksort is, therefore, O(N lg N) on average.
A N( )*N (N–1)N 2 A i( )
i=0 N–1
∑
+
=
A N( )*N (N–1)N 2A N( –1) 2 A i( )
i=0 N–2
∑
+ +
=
A N( –1)*(N–1) (N–2)(N–1) 2 A i( )
i=0 N–2
∑
+
=
A N( )*N–A N( –1)*(N–1) = 2A N( –1)+(N–1)N–(N–2)(N–1) A N( )*N–A N( –1)*(N–1) = 2A N( –1)+N2–N–(N2–3N+2) A N( )*N–A N( –1)*(N–1) = 2A N( –1)+2N–2
A N( )*N = 2A N( –1)+A N( –1)*(N–1)+2N–2 A N( )*N = A N( –1)*(2+N–1)+2N–2
A N( ) (N+1)*A N( –1)+2N–2 ---N
=
A( )1 = A( )0 = 0
3.7.3
1. Trace the operation of Quicksort on the list [23, 17, 21, 3, 42, 9, 13, 1, 2, 7, 35, 4]. Show the list order and the stack of (first, last, pivot) values at the start of every call. Count the number of comparisons and swaps that are done.
2. Trace the operation of Quicksort on the list [3, 9, 14, 12, 2, 17, 15, 8, 6, 18, 20, 1]. Show the list order and the stack of (first, last, pivot) values at the start of every call. Count the number of comparisons and swaps that are done.
3. We showed that the Quicksort algorithm performs poorly when the list is sorted because the pivot element is always smaller than all of the elements left in the list. Just picking a different location of the list would have the same problem because you could get “unlucky” and always pick the smallest remaining value. A better alternative would be to consider three values list[ first ], list[ last ], and list[ (first + last) / 2 ] and pick the median or mid- dle value of these three. The comparisons to pick the middle element must be included in the complexity analysis of the algorithm.
a. Do Question 1 using this alternative method for picking the pivot ele- ment.
b. Do Question 2 using this alternative method for picking the pivot ele- ment.
c. In general, how many comparisons are done in the worst case to sort a list of N keys? (Note: You are now guaranteed to not have the smallest value for the PivotValue, but the result can still be pretty bad.)
4. An alternative for the PivotList algorithm would be to have two indices into the list. The first moves up from the bottom and the other moves down from the top. The main loop of the algorithm will advance the lower index until a value greater than the PivotValue is found, and the upper index is moved until a value less than the PivotValue is found. Then these two are swapped. This process repeats until the two indices cross. These inner loops are very fast because the overhead of checking for the end of the list is elim- inated, but the problem is that they will do an extra swap when the indices pass each other. So, the algorithm does one extra swap to correct this. The full algorithm is
PivotList( list, first, last ) list the elements to work with first the index of the first element last the index of the last element 3.7.3 EXERCISES
■
PivotValue = list[ first ] lower = first
upper = last+1 do
do upper = upper - 1 until list[upper] ≤ PivotValue do lower = lower + 1 until list[lower] ≥ PivotValue Swap( list[ upper ], list[ lower ] )
until lower ≥ upper
// undo the extra exchange
Swap( list[ upper ], list[ lower ] ) // move pivot point into correct place Swap( list[ first ], list[ upper ] ) return upper
(Note: This algorithm requires one extra list location at the end to hold a special sentinel value that is larger than all of the valid key values.)
a. Do Question 1 using this alternative method for PivotList. b. Do Question 2 using this alternative method for PivotList.
c. What operation is done significantly less frequently for this version of PivotList?
d. How many key comparisons does the new PivotList do in the worst case for a list of N elements? (Note: It is not N 1.) How does this affect the overall worst case for quicksort?
5. How many comparisons will Quicksort do on a list of N elements that all have the same value?
6. What is the maximum number of times that Quicksort will move the largest or smallest value?