Selecting a random element - Containers that support efficient look-up

Using associative containers

7.1 Containers that support efficient look-up

7.4.4 Selecting a random element

works in this case, because you know that at worst, its argument is a category that does not lead to any further bracketed words. Eventually, you will see that the function works in all cases, because each recursive call simplifies the argument.

We do not know any sure way to explain recursion. Our experience is that people stare at recursive programs for a long time without understanding how they work. Then, one day, they suddenly get it—and they don't understand why they ever thought it was difficult. Evidently, the key to understanding recursion is to begin by understanding recursion. The rest is easy.

Having written gen_sentence, read_grammar, and the associated auxiliary functions, we'll want to use them:

int main() {

// generate the sentence

vector<string> sentence = gen_sentence(read_grammar(cin));

// write the first word, if any

vector<string>::const_iterator it = sentence.begin();

if (!sentence.empty()) { cout << *it;

++it;

}

// write the rest of the words, each preceded by a space while (it != sentence.end()) {

cout << " " << *it;

++it;

}

cout << endl;

return 0;

}

We read the grammar, generate a sentence from it, and then write the sentence a word at a time. The only even minor complexity is that we put a space in front of the second and subsequent words of the sentence.

RAND_MAX.

You might think that it would suffice to compute rand() % n, which is the remainder when dividing the random integer by n. In practice, this technique fails for two reasons.

The most important reason is pragmatic: rand() really returns only pseudo-random numbers.

Many C++ implementations' pseudo-random-number generators give remainders that aren't very random when the quotients are small integers. For example, it is not uncommon for successive results of rand() to be alternately even and odd. In that case, if n is 2, successive results of rand() % n will alternate between 0 and 1.

There is another, more subtle reason to avoid using rand() % n: If the value of n is large, and RAND_MAX is not evenly divisible by n, some remainders will appear more often than others. For example, suppose that RAND_MAX is 32767 (the smallest permissible value of RAND_MAX for any implementation) and n is 20000. In that case, there would be two distinct values of rand() that would cause rand() % n to be 10000 (namely, 10000 and 30000), but only one value of rand() that would cause rand() % n to be 15000 (namely, 15000). Therefore, the naive implementation of nrand would yield 10000 as a value of nrand(20000) twice as often as it would yield 15000.

To avoid these pitfalls, we'll use a different strategy, by dividing the range of available random numbers into buckets of exactly equal size. Then we can compute a random number and return the number of the corresponding bucket. Because the buckets are of equal size, some random numbers may not fall into any bucket at all. In that case, we keep asking for random numbers until we get one that fits.

The function is easier to write than to describe:

// return a random integer in the range [0, n) int nrand(int n)

{

if (n <= 0 || n > RAND_MAX)

throw domain_error("Argument to nrand is out of range");

const int bucket_size = RAND_MAX / n;

int r;

do r = rand() / bucket_size;

while (r >= n);

return r;

}

The definition of bucket_size relies on the fact that integer division truncates its result. This

property implies that RAND_MAX / n is the largest integer that is less than or equal to the exact quotient. As a consequence, bucket_size is the largest integer with the property that n * bucket_size <= RAND_MAX.

The next statement is a do while statement. A do while is like a while statement, except that it always executes the body at least once, and tests the condition at the end. If that condition yields true, then the loop repeats, executing the push_back until the while fails. In this case, the body of the loop sets r to a bucket number. Bucket 0 will correspond to values of rand() in the range [0, bucket_size), bucket 1 will correspond to values in the range [bucket_size, bucket_size * 2), and so on. If the value of rand() is so large that r >= n, the program will continue trying random numbers until it finds one that it likes, at which point it returns the corresponding value of r.

For example, let's assume that RAND_MAX is 32767 and n is 20000. Then bucket_size will be 1, and nrand will work by discarding random numbers until it finds one less than 20000. As another example, assume that n is 3. Then bucket_size will be 10922. In this case, values of rand() in the range [0, 10922) will yield 0, values in the range [10922, 21844) will yield 1, values in the range [21844, 32766) will yield 2, and values of 32766 or 32767 will be discarded.

7.5 A note on performance

If you have used associative arrays in other languages, those arrays were probably

implemented in terms of a data structure called a hash table. Hash tables can be very fast, but they have compensating disadvantages:

For each key type, someone must supply a hash function, which computes an appropriate integer value from the value of the key.

A hash table's performance is exquisitely sensitive to the details of the hash function.

There is usually no easy way to retrieve the elements of a hash table in a useful order. C++ associative containers are hard to implement in terms of hash tables:

The key type needs only the < operator or equivalent comparison function.

The time to access an associative-container element with a given key is logarithmic in the total number of elements in that container, regardless of the keys' values.

Associative-container elements are always kept sorted by key.

In other words, although C++ associative containers will typically be slightly slower than the best hash-table data structures, they perform much better than naive data structures, their performance does not require their users to design good hash functions, and they are more convenient than hash tables because of their automatic ordering. If you're generally familiar with associative data structures, you might want to know that C++ libraries typically use a balanced self-adjusting tree structure to implement associative containers.

If you really want hash tables, they are available as parts of many C++ implementations.

However, because they are not part of standard C++, they are beyond the scope of this book.

Although no standard can be ideal for every purpose, the standard associative containers are more than adequate for most applications.

7.6 Details

The do while statement is similar to the while statement (§2.3.1/19), except that the test is at the end. The general form of the statement is

do statement while (condition);

The statement is executed first, after which the condition and statement are executed alternately until the condition is false.

Value-initialization: Accessing a map element that doesn't yet exist creates an element with a value of V(), where V is the type of the values stored in the map. Such an expression is said to be value-initialized. §9.5/164 explains the details of value-initialization; the most important aspect is that built-in types are initialized to 0.

rand() is a function that yields a random integer in the range [0, RAND_MAX]. Both rand and RAND_MAX are defined in <cstdlib>.

pair<K, V> is a simple type whose objects hold pairs of values. Access to these data values is through their names, first and second respectively.

map<K, V> is an associative array with key type K and value type V. The elements of a map are key-value pairs, which are maintained in key order to allow efficient access of elements by key. The iterators on maps are bidirectional (§8.2.5/148). Dereferencing a map iterator yields a value of type pair<const K, V>. The map operations include:

map<K, V> m;

Creates a new empty map, with keys of type const K and values of type V. map<K, V> m(cmp);

Creates a new empty map with keys of type const K and values of type V, that uses the predicate cmp to determine the order of the elements.

m[k]

Indexes the map using a key, k, of type K, and returns an lvalue of type V. If there is no entry for the given key, a new value-initialized element is created and inserted into the map with this key. Because using [] to access a map might create a new element, [] is not allowed on a const map.

Dalam dokumen Accelerated c++ practical programming by example (Halaman 195-200)