Using library algorithms
6.1 Analyzing strings
6.2.2 Analyzing the grades
We now know how to read and classify student records into the did and didnt vectors. The next step is to analyze them, which means we need to think a little about how to structure the
analysis.
We know that we have three analyses to perform, and each analysis has two parts, which analyze separately the students who did and who didn't do all the homework. Because we will do each analysis on two sets of data, we certainly want to make each analysis its own
function. However, there are some operations, such as reporting in a common format, that we are going to want to do on pairs of analyses, rather than on individual analyses. Evidently, we'll want to make writing the results of each pair of analyses into a function as well.
The tricky part is that we want to call the function that writes the analysis results three times, once for each kind of analysis. We want that function to call the appropriate analysis function twice, once each for the did and didnt objects. However, we want the function that generates the reports to call a different analysis function each time we call it! How do we arrange that?
The easiest solution is to define three analysis functions and pass each one as an argument to the reporting function. Remember that we've used such arguments already, such as when we passed the compare function to the library sort routine in §4.2.2/64. In this case, we want our output routine to take five arguments:
The stream on which to write the output
A string that represents the name of the analysis The function to use for the analysis
Two arguments, each of which is one of the vectors that we want to analyze For example, let's assume that the first analysis, which looks at the medians, is done by a function called median_analysis. Then, we'd like to report the results for each group of students by executing
write_analysis(cout, "median", median_analysis, did, didnt);
Before we define write_analysis, let's define median_analysis. We would like to give that function a vector of student records, and we would like it to compute the students' grades according to the normal grading scheme and to return the median of those grades. We can define that function as follows:
// this function doesn't quite work
double median_analysis(const vector<Student_info>& students) {
vector<double> grades;
transform(students.begin(), students.end(), back_inserter(grades), grade);
return median(grades);
}
Although this function might appear difficult at first glance, it introduces only one new idea, namely the transform function. This function takes three iterators and a function. The first two iterators specify a range of elements to transform; the third iterator is the destination into which to put the result of running the function.
When we call transform, we are responsible for ensuring that the destination has room for the values from the input sequence. In this case, there is no problem, because we obtain the destination by calling back_inserter (§6.1/102), thereby arranging that transform's results will be appended to grades, which will automatically grow as necessary to accommodate the results.
The fourth argument to transform is a function that transform applies to each element of the input sequence to obtain the corresponding element of the output sequence. Therefore, when we call transform in this example, the effect is to apply the grade function to each element of students, and to append each grade to the vector named grades. When we have all these students' grades, we call median, which we defined in §4.1.1/53, to compute their median.
There's only one problem: As the comment notes, this function doesn't quite work.
One reason that it doesn't work is that there are several overloaded versions of the grade function. The compiler doesn't know which version to call, because we haven't given grade any arguments. We know that we want to call the version from §4.2.2/63, but we need a way to tell the compiler to do so.
The other reason is that the grade function will throw an exception if any student did no homework at all, and the transform function does nothing about exceptions. If an exception occurs, the transform function will be stopped at the point of the exception, and control will return to median_analysis. Because median_analysis doesn't handle the exception either, the exception will continue to propagate outward. The effect will be that this function will also exit prematurely, passing control to its caller, and so on, until control reaches an appropriate catch. If there is no such catch, as would be likely in this case, the program itself is
terminated, and the message that was thrown is printed (or not, depending on the implementation).
We can solve both problems by writing an auxiliary function that will try the grade function and handle the exception. Because we are calling the grade function explicitly, rather than passing it as an argument, the compiler will be able to figure out which version we mean:
double grade_aux(const Student_info& s) {
try {
return grade(s);
} catch (domain_error) {
return grade(s.midterm, s.final, 0) ;
} }
This function will call the version of grade from §4.2.2/63. If an exception occurs, we will catch it and call the version of grade, from §4.1/52, that takes three doubles that represent the exam scores and overall homework grade. Thus, we'll assume that students who did no homework at all got a 0 grade on their homework, but their exams still count. Now, we can rewrite the analysis function to use grade_aux:
// this version works fine
double median_analysis(const vector<Student_info>& students) {
vector<double> grades;
transform(students.begin(), students.end(), back_inserter(grades), grade_aux);
return median(grades);
}
Having seen what an analysis routine looks like, we are now in a position to define write_analysis, which uses an analysis routine to compare two sets of students:
void write_analysis(ostream& out, const string& name, double analysis(const vector<Student_info>&), const vector<Student_info>& did,
const vector<Student_info>& didnt) {
out << name << ": median(did) = " << analysis(did) <<
", median(didnt) = " << analysis(didnt) << endl;
}
Again, this function is surprisingly small, although it does introduce two new ideas. The first is how to define a parameter that represents a function. The parameter definition for analysis looks just like the function declaration that we wrote in §4.3/67. (Actually, as we shall learn in
§10.1.2/172, there is slightly more going on here than meets the eye. The additional detail doesn't affect the current discussion directly.)
The other new idea is the return type, void. The built-in type void can be used only in a few restricted ways, one of which is to name a return type. When we say a function "returns" a void, we're really saying that it has no return value. We can exit from such a function by executing a return statement with no value, such as
return;
or, as we do here, by falling off the end of the function. Ordinarily, we cannot just fall off the end of a function, but the language allows functions that return void to do so. At this point, we can write the rest of our program:
int main() {
// students who did and didn't do all their homework vector<Student_info> did, didnt;
// read the student records and partition them Student_info student;
while (read(cin, student)) { if (did_all_hw(student)) did.push_back(student);
else
didnt.push_back(student);
}
// verify that the analyses will show us something if (did.empty()) {
cout << "No student did all the homework!" << endl;
return 1;
}
if (didnt.empty()) {
cout << "Every student did all the homework!" << endl;
return 1;
}
// do the analyses
write_analysis(cout, "median", median_analysis, did, didnt);
write_analysis(cout, "average", average_analysis, did, didnt);
write_analysis(cout, "median of homework turned in", optimistic_median_analysis, did, didnt);
return 0;
}
All that remains is to write average_analysis and optimistic_median_analysis. 6.2.3 Grading based on average homework grade
We would like the average_analysis function to compute the students' grades by using the average homework grade, rather than the median. Therefore, the logical first step is to write a
function to compute the average of a vector, with the aim of using it instead of median for grade computation:
double average(const vector<double>& v) {
return accumulate(v.begin(), v.end(), 0.0) / v.size();
}
This function uses accumulate, which, unlike the other library algorithms we've used, is declared in <numeric>. As this header's name implies, it offers tools for numeric computation.
The accumulate function adds the values in the range denoted by its first two arguments, starting the summation with the value given by its third argument.
The type of the sum is the type of the third argument, so it is crucially important for us to use 0.0, as we did here, instead of 0. Otherwise, the result would be an int, and any fractional part would be lost.
Having used accumulate to generate the sum of all the elements in the range, we divide that sum by v.size(), which is the number of elements in the range. The result of that division, of course, is the average, which we return to our caller.
Once we have the average function, we can use it to implement the average_grade function to reflect this alternative grading policy:
double average_grade(const Student_info& s) {
return grade(s.midterm, s.final, average(s.homework));
}
This function uses the average function to compute an overall homework grade, which it then gives to the grade function from §4.1/52 to use in computing the final grade.
With this infrastructure in place, the average_analysis function is simplicity itself:
double average_analysis(const vector<Student_info>& students) {
vector<double> grades;
transform(students.begin(), students.end(), back_inserter(grades), average_grade);
return median(grades);
}
The only difference between this function and median_analysis (§6.2.2/113) is its name and
its use of average_grade instead of grade_aux. 6.2.4 Median of the completed homework
The last analysis scheme, optimistic_median_analysis, gets its name from the optimistic assumption that the students' grades on the homework that they didn't turn in would have been the same as the homework that they did turn in. With that assumption, we would like to compute the median of just the homework that each student submitted. We'll call this
computation an optimistic median, and we'll begin by writing a function to compute it. Of course, we have to contend with the possibility that a student did no homework at all, in which case we'll use 0 as the overall homework grade:
// median of the nonzero elements of s.homework, or 0 if no such elements exist double optimistic_median(const Student_info& s)
{
vector<double> nonzero;
remove_copy(s.homework.begin(), s.homework.end(), back_inserter(nonzero), 0);
if (nonzero.empty())
return grade(s.midterm, s.final, 0);
else
return grade(s.midterm, s.final, median(nonzero));
}
This function works by extracting the nonzero elements from the homework vector and
putting them into a new vector, called nonzero. Once we have the nonzero homework grades, we call the version of grade defined in §4.1/52 to compute the final score based on the
median of the homework assignments that were actually submitted.
The only new idea in this function is how we get values into nonzero, which we do by calling the remove_copy algorithm. To understand the call to remove_copy, you may find it useful to know that the library provides "copying" versions of many of the algorithms. So, for example, remove_copy does what remove does, but copies its results to an indicated destination.
The remove function finds all values that match a given value and "removes" those values from the container. All the values in the input sequence that are not "removed" will be copied into the destination. We'll have more to say shortly about what "remove" means in this context.
The remove_copy function takes three iterators and a value. As with most algorithms, the first two iterators denote the input sequence. The third denotes the beginning of the destination for the copy. As with copy, the remove_copy algorithm assumes that there is enough space in
the destination to hold all the elements that are copied. We call back_inserter to grow nonzero as needed.
We should now be able to see that the effect of the remove_copy call is to copy into nonzero all the nonzero elements in s.homework. We then check whether v is empty, and if not, we do the normal grade calculation based on the median of the nonzero grades. If v is empty, then we use 0 as the homework grade.
Of course, to complete our analysis, we need to write an analysis function to call our optimistic_median function. We leave doing so as an exercise.
6.3 Classifying students, revisited
In Chapter 5, we looked at the problem of copying records with failing grades into a separate vector and then removing those records from the existing vector. The obvious, easy approach to this problem proved to have abysmal performance as the input size grew. We went on to show how to solve the performance problem by using a list instead of a vector, but we also promised to revisit the problem and show an algorithmic solution that would perform similarly to the revised data structure.
We can use the algorithm library to demonstrate two other solutions. The first is slightly slower because it uses a pair of library algorithms and visits every element twice. We can do better by using a more specialized library algorithm that will let us solve the problem in a single pass.