• Tidak ada hasil yang ditemukan

can be removed by the exchange of two nonadjacent elements? Give an example for a list with 10 elements.

bucketNumber = (list[entry].key / shift) mod 10 Append( bucket[bucketNumber], list[entry] ) end for entry

list = CombineBuckets() shift = shift * 10

end for loop

We’ll begin by reviewing this algorithm. The calculation of bucketNumber will pull a single digit out of a key. The division by shift will cause the key value to be moved to the right some number of digits, and then the mod will eliminate all but the units digit of the resulting number. On the first pass with a shift value of 1, the division will do nothing, and the mod result will return just the units digit of the key. On the second pass, shift will now be 10, so the integer division and then the mod will return just the tens digit. On each succeeding pass, the next digit of the key will be used.

TheCombineBuckets function will append the buckets back into one list starting with bucket[0] through bucket[9]. This recombined list is the starting point for the next pass. Because the buckets are recombined in order and because the numbers are added to the end of each bucket list, the keys will eventually be sorted. Figure 3.2 shows the three passes that would be done for keys with three digits. To make this example simpler, all of the keys just use the digits 0 through 3, so only four buckets are needed.

In looking at Fig. 3.2(c), you should see that if the buckets are again com- bined in order, the list will now be sorted.

3.4.1 Analysis

An analysis of radix sort requires that we consider issues beyond just number of operations, because in this case they are significant. How this particular algo- rithm is implemented has an impact on its overall efficiency. We consider both the time and space efficiency of this algorithm.

Each key is looked at once for each digit (or letter if the keys are alphabetic) of the longest key. So, if the longest key has M digits and there are N keys, radix sort has order O(M* N). But if we look at these two values, the size of the keys will be relatively small when compared to the number of keys. For example, if we have six-digit keys, we could have a million different records.

Recalling the discussion of Section 1.4 on rates of growth, we see that the size of the keys is not significant, and this algorithm is of linear complexity, O(N).

FIGURE 3.2 The three passes of a radix sort

Original list

(a) Pass 1, Units Digit

Pass 1 list

(b) Pass 2, Tens Digit

Pass 2 list

(c) Pass 3, Hundreds Digit

310 213 023 130 013 301 222 032 201 111 323 002 330 102 231 120

Bucket Number Contents

0 310 130 330 120

1 301 201 111 231

2 222 032 002 102

3 213 023 013 323

310 130 330 120 301 201 111 231 222 032 002 102 213 023 013 323

Bucket Number Contents

0 301 201 002 102

1 310 111 213 013

2 120 222 023 323

3 130 330 231 032

301 201 002 102 310 111 213 013 120 222 023 323 130 330 231 032

Bucket Number Contents

0 002 013 023 032

1 102 111 120 130

2 201 213 222 231

3 301 310 323 330

This is very efficient, and so you might wonder why any of the other sorting algorithms are even used.

The issue in this case becomes space efficiency. In sorts we’ve seen, we need extra space for at most one additional record as we are swapping. In this case, the space needs are more significant. If we use arrays for the buckets, these will need to be extremely large arrays. In fact, they will need to be the size of the original list, because we can’t assume that the keys will be uniformly distrib- uted among the buckets as in Fig. 3.2. The chance that the keys will be distrib- uted equally among the buckets is the same as the chance that they will all be in the same bucket. Both can happen. Using arrays means that we will need 10N additional space if the keys are numeric, 26N additional space if the keys are alphabetic, and even more if the keys are alphanumeric or if case matters in alphabetic characters. If we use arrays, we also have the time to copy the records to the buckets in the distribution step and from the buckets back into the original list in the coalescing step. This means each record will be “moved”

2M times. If the records are large, this can take a substantial amount of time.

An alternative is to use a linked list structure for the records. Now, putting a record into a bucket just requires changing a link, and coalescing the buckets again just requires changing links. There is still significant space overhead, because most implementations of linked lists will require 2 to 4 bytes per link, making the total additional space needs 2N to 4N bytes.

3.4.2

1. Use the RadixSort algorithm to sort the list [1405, 975, 23, 9803, 4835, 2082, 7368, 573, 804, 746, 4703, 1421, 4273, 1208, 521, 2050]. Show the buckets for each pass and the list after each bucket coalescing step.

2. Use the RadixSort algorithm to sort the list [117, 383, 4929, 144, 462, 1365, 9726, 241, 1498, 82, 1234, 8427, 237, 2349, 127, 462]. Show the buckets for each pass and the list after each bucket coalescing step.

3. Another way of looking at radix sort is to consider the key as just a bit pat- tern. So, if the keys are 4-byte integers, they are just considered as 32 bits, and if the keys are strings of 15 alphanumeric characters (15 bytes), they are just considered as 120 bits. These bit streams are then subdivided into pieces, which determine the number of passes and the number of buckets. So, if we have 120-bit keys, we might do 12 passes with 10-bit pieces, 10 passes with 12-bit pieces, or 5 passes with 24-bit pieces.

3.4.2 EXERCISES

a. If the key is a number in the range of 0 to 264, choose two options (one smaller and one larger) for the number of bits that will be used on each pass and indicate how many buckets and passes will be needed.

b. If the key is a string of 40 characters, choose two options (one smaller and one larger) for the number of bits that will be used on each pass and indicate how many buckets and passes will be needed.

c. Based on your answers to parts (a) and (b), can you give any general rec- ommendations for how to make the choice of passes and key subdivi- sions?