Sequences
4.7 Quicksort
In the second version of the list, two merges are done for the lists of length two.
However, each merge is done on one half the list. The purple half is one merge, the green half includes the items that are in the second merge. Together, these two merges include alln items again. So, at the second deepest level again at mostn items are merged in O(n) time.
Finally, the last merge is of all the items in yellow from the two sorted sublists.
This merge also takes O(n) time since it merges all the items in the list, resulting in the sorted list seen in the last version of the list.
So, while merging is a O(n) operation, the merges take place on sublists of then items in the list which means that we can count the merging at each level as O(n) and don’t have to count each individual merge operation as O(n). Since there arelog n levels to the merge sort algorithm and each level takes O(n) to merge, the algorithm is O(n log n).
110 4 Sequences of the pivot is to have the quicksort algorithm start by randomizing the sequence.
The quicksort algorithm is given in Sect.4.7.1.
4.7.1 The Quicksort Code
1 import random
2
3 def partition(seq, start, stop):
4 # pivotIndex comes from the start location in the list.
5 pivotIndex = start
6 pivot = seq[pivotIndex]
7 i = start+1
8 j = stop-1
9
10 while i <= j:
11 #while i <= j and seq[i] <= pivot:
12 while i <= j and not pivot < seq[i]:
13 i+=1
14 #while i <= j and seq[j] > pivot:
15 while i <= j and pivot < seq[j]:
16 j-=1
17
18 if i < j:
19 tmp = seq[i]
20 seq[i] = seq[j]
21 seq[j] = tmp
22 i+=1
23 j-=1
24
25 seq[pivotIndex] = seq[j]
26 seq[j] = pivot
27
28 return j
29
30 def quicksortRecursively(seq, start, stop):
31 if start >= stop-1:
32 return
33
34 # pivotIndex ends up in between the two halves
35 # where the pivot value is in its final location.
36 pivotIndex = partition(seq, start, stop)
37
38 quicksortRecursively(seq, start, pivotIndex)
39 quicksortRecursively(seq, pivotIndex+1, stop)
40
41 def quicksort(seq):
42 # randomize the sequence first
43 for i in range(len(seq)):
44 j = random.randint(0,len(seq)-1)
45 tmp = seq[i]
46 seq[i] = seq[j]
47 seq[j] = tmp
48
49 quicksortRecursively(seq, 0, len(seq))
Once the list is randomized, picking a random pivot becomes easier. The partition function picks the first item in the sequence as the pivot. The partitioning starts from
both ends and works it way to the middle. Essentially every time a value bigger than the pivot is found on the left side and a value smaller than the pivot is found on the right side, the two values are swapped. Once we reach the middle from both sides, the pivot is swapped into place. Once the sequence is partitioned, the quicksort algorithm is called recursively on the two halves. Variablesiandjare the indices of the left and right values, respectively, during the partitioning process.
If you look at the partition code, the two commented while loop conditions are probably easier to understand than the uncommented code. However, the uncom- mented code only uses the less than operator. Quicksort is the sorting algorithm used by the sort method on lists. It only requires that the less than operator be defined between items in the sequence. By writing the two while loops as we have, the only required ordering is defined by the less than operator just as Python requires.
The snapshot in Fig.4.7shows the effect of partitioning on a sequence. In this figure, the sequence has been partitioned twice already. The first partitioning picked a pivot that was almost dead center. However, the second partitioning picked a pivot that was not so good. The red line indicates the part of the sequence that is currently being partitioned. See how the left-most value in that sub-sequence is the pivot value.
The two green dots are the pivot values that are already in their correct locations.
All values above the pivot will end up in the partition to the right of the pivot and all values to the left of the pivot are less than the pivot. This is the nature of quicksort.
Again, by amortized complexity we can find that the quicksort algorithm runs in O(n log n) time.
Consider sorting the list [5 8 2 6 9 1 0 7] using quicksort. Figure4.8depicts the list after each call to the partition function. The pivot in each call is identified by the orange colored item. The partition function partitions the list extending to the right of its pivot. After partitioning, the pivot is moved to its final location by swapping
Fig. 4.7 Quicksort Snapshot
112 4 Sequences
Fig. 4.8 Quicksorting a List
it with the last item that is less than the pivot. Then partitioning is performed on the resulting two sublists.
The randomization done in the first step of quicksort helps to pick a more random pivot value. This has real consequences in the quicksort algorithm, especially when the sequence passed to quicksort has a chance of being sorted already. If the sequence given to quicksort is sorted, or almost sorted, in either ascending or descending order, then quicksort will not achieve O(n log n) complexity. In fact, the worst case complexity of the algorithm is O(n2). If the pivot chosen is the next least or greatest value, then the partitioning will not divide the problem into to smaller sublists as occurred when 9 was chosen as a pivot for a sublist in Fig.4.8. The algorithm will simply put one value in place and end up with one big partition of all the rest of the values. If this happened each time a pivot was chosen it would lead to O(n2) complexity. Randomizing the list prior to quicksorting it will help to ensure that this does not happen.
Merge sort is not affected by the choice of a pivot, since no choice is necessary.
Therefore, merge sort does not have a worst case or best case to consider. It will always achieve O(n log n) complexity. Even so, quicksort performs better in practice than merge sort because the quicksort algorithm does not need to copy to a new list and then back again. The quicksort algorithm is the de facto standard of sorting algorithms.