Using sequential containers and analyzing strings
5.1 Separating students into categories
5.8.3 Horizontal concatenation
By horizontal concatenation, we mean taking two pictures, and making a new picture in which one of the input pictures forms the left part of the new picture, and the other forms the right part. Before we start, we need to think about what we want to do when the pictures to concatenate are different sizes. We'll arbitrarily decide that we'll align them along their top edges. Thus, each row of the output picture will be the result of concatenating the
corresponding rows of the two input pictures. We'll have to pad the left-hand picture's rows to make them take up the right amount of space in the output picture.
In addition to padding the left-hand picture, we also have to worry about what to do when the pictures have a different number of rows. For example, if p holds our initial picture, we might want to concatenate the original value of p horizontally with the result of framing p. That is, we'd like hcat(p, frame(p)) to produce
this is an **************
example * this is an * to * to * illustrate * illustrate * framing * framing *
**************
Note that the left-hand picture has fewer rows than the right-hand picture. This fact implies that we will have to pad the output on the left-hand side to account for these missing rows. If the left-hand picture is longer, we'll just copy the strings from it into the new picture; we won't bother to pad the (empty) right side with blanks. With this analysis complete, we can write our function:
vector<string>
hcat(const vector<string>& left, const vector<string>& right) {
vector<string> ret;
// add 1 to leave a space between pictures string::size_type width1 = width(left) + 1;
// indices to look at elements from left and right respectively vector<string>::size_type i = 0, j = 0;
// continue until we've seen all rows from both pictures while (i != left.size() || j != right.size()) {
// construct new string to hold characters from both pictures string s;
// copy a row from the left-hand side, if there is one if (i != left.size())
s = left[i++];
// pad to full width
s += string(width1 - s.size(), ' ');
// copy a row from the right-hand side, if there is one if (j != right.size())
s += right[j++];
// add s to the picture we're creating ret.push_back(s);
}
return ret;
}
We start, as we did for frame and vcat, by defining the picture that we'll return. Our next step is to compute the width to which we must pad the left-hand picture. That width will be one more than the width of the picture itself, to leave a space between the pictures when we
concatenate them. Next, we iterate through both pictures, copying an element from the first, padded as necessary, followed by an element from the second.
The only tricky part is taking care of what to do if we run out of elements in one picture before we run out of elements in the other. Our iteration continues until we have copied all the elements for each input vector. Hence, the while loop continues until both indices reach the end of their respective pictures.
If we have not yet exhausted left, we copy its current element into s. Regardless of whether we copied anything from left, we next call the string compound assignment operator, +=, to pad the output to the appropriate width. The compound assignment operator defined by the string library operates as you might expect: It adds the right-hand operand to its left-hand operand and stores the result in the left-hand side. Of course, "add" here means string concatenation.
We determine how much to pad by subtracting s.size() from width1. We know that either s.size() is the size of the string that we copied from left, or it is zero because there was no entry to copy. In the first case, s.size() will be greater than zero and less than width1,
because we added one to the length of the longest string to account for a space between the two pictures. Thus, in this case, we'll append one or more blanks to s. If s.size() is zero, then we'll pad the entire output line.
Having copied and padded the string for the left-hand picture, we need only append the string from the right-hand picture, assuming that there still is an element from right to copy.
Regardless of whether we added a value from right, we push s onto the output vector, and continue until we've processed both input vectors— remembering to return to our caller the picture that we've created.
It is important to note that the correct behavior of our program depends on the fact that s is local to the while loop. Because s is declared inside the while, it is created, with a null value, and destroyed on each trip through the loop.
5.9 Details
Containers and iterators: The standard library is designed so that similar operations on different containers have the same interface and the same semantics. The containers we have used so far are all sequential containers. We'll see in Chapter 7 that the library also provides associative containers. All the sequential containers and the string type provide the following operations:
container<T>::iterator
container<T>::const_iterator
The name of the type of the iterator on this container.
container<T>::size_type
The name of the appropriate type to hold the size of the largest possible instance of this container.
c.begin() c.end()
Iterators referring to the first and (one past) the last element in the container.
c.rbegin() c.rend()
Iterators referring to the last and (one beyond) the first element in the container that grant access to the container's elements in reverse order.
container<T> c;
container<T> c(c2);
Defines c as a container that is empty or a copy of c2 if given.
container<T> c(n);
Defines c as a container with n elements that are value-initialized (§7.2/125) according to the type of T. If T is a class type, that type will control how to initialize the elements. If T is a built-in arithmetic type, then the elements will be initialized to 0.
container<T> c(n, t);
Defines c as a container with n elements that are copies of t. container<T> c(b, e);
Creates a container that holds a copy of the elements denoted by iterators in the range [b,
e). c = c2
Replaces the contents of container c with a copy of the container c2. c.size()
Returns the number of elements in c as a size_type. c.empty()
Predicate that indicates whether c has no elements.
c.insert(d, b, e)
Copies elements denoted by iterators in the range [b, e) and inserts them into c immediately before d.
c.erase(it) c.erase(b, e)
Removes the element denoted by it or the range of elements denoted by [b, e) from the container c. This operation is fast for list but can be slow for vector and string, because for these types it involves copying all the elements after the one that is removed. For list,
iterators to the element(s) that are erased are invalidated. For vector and string, all iterators to elements after the one erased are invalidated.
c.push_back(t)
Adds an element to the end of c with the value t.
Containers that support random access, and the string type, also provide the following:
c[n]
Fetches the character at position n from the container c. Iterator operations:
*it
Dereferences the iterator it to obtain the value stored in the container at the position that it denotes. This operation is often combined with . to obtain a member of a class object, as in (*it).x, which yields the member x of the object denoted by the iterator it. * has lower precedence than . and the same precedence as ++ and —.
it->x
Equivalent to (*it).x, which returns the member x denoted by the object obtained by dereferencing the iterator it. Same precedence as the . operator.
++i it++
Increments the iterator so that it denotes the next element in the container.
b == e b != e
Compares two iterators for equality or inequality.
The string type offers iterators that support the same operations as do iterators on vectors.
In particular, string supports full random access, about which we'll learn more in Chapter 8. In addition to the operations on containers, string also provides:
s.substr(i, j)
Creates a new string that holds a copy of the characters in s with indices in the range [i, i + j).
getline(is, s)
Reads a line of input from is and stores it in s. s += s2
Replaces the value of s by s + s2.
The vector type offers the most powerful iterators, called random-access iterators, of any of the library containers. We'll learn more about these in Chapter 8.
Although all the functions we've written have relied on dynamically allocating our vector elements, there are also mechanisms for preallocating elements, and an operation to direct the vector to allocate, but not to use, additional memory in order to avoid the overhead of repeated memory allocations.
v.reserve(n)
Reserves space to hold n elements, but does not initialize them. This operation does not change the size of the container. It affects only the frequency with which vector may have to allocate memory in response to repeated calls to insert or push_back.
v.resize(n)
Gives v a new size equal to n. If n is smaller than the current size of v, elements beyond n are removed from the vector. If n is greater than the current size, then new elements are added to v and initialized as appropriate to the type in v.
The list type is optimized for efficiently inserting and deleting elements at any point in the container. The operations on lists and list iterators include those described in §5.9/96. In
addition, l.sort() l.sort(cmp)
Sorts the elements in l using the < operator for the type in the list, or the predicate cmp. The <cctype> header provides useful functions for manipulating character data:
isspace(c) true if c is a whitespace character.
isalpha(c) true if c is an alphabetic character.
isdigit(c) true if c is a digit character.
isalnum(c) true if c is a letter or a digit.
ispunct(c) true if c is a punctuation character.
isupper(c) true if c is an uppercase letter.
islower(c) true if c is a lowercase letter.
toupper(c) Yields the uppercase equivalent to c tolower(c) Yields the lowercase equivalent to c
Exercises
5-0. Compile, execute, and test the programs in this chapter.
5-1. Design and implement a program to produce a permuted index. A permuted index is one in which each phrase is indexed by every word in the phrase. So, given the following input,
The quick brown fox jumped over the fence
the output would be
The quick brown fox jumped over the fence The quick brown fox
jumped over the fence jumped over the fence The quick brown fox jumped over the fence The quick brown fox
A good algorithm is suggested in The AWK Programming Language by Aho, Kernighan, and Weinberger (Addison-Wesley, 1988). That solution divides the problem into three steps:
Read each line of the input and generate a set of rotations of that line. Each rotation puts the next word of the input in the first position and rotates the previous first word 1.
to the end of the phrase. So the output of this phase for the first line of our input would be
The quick brown fox quick brown fox The brown fox The quick fox The quick brown
Of course, it will be important to know where the original phrase ends and where the rotated beginning begins.
Sort the rotations.
2.
Unrotate and write the permuted index, which involves finding the separator, putting the phrase back together, and writing it properly formatted.
3.
5-2. Write the complete new version of the student-grading program, which extracts records for failing students, using vectors. Write another that uses lists. Measure the performance difference on input files of ten lines, 1,000 lines, and 10,000 lines.
5-3. By using a typedef, we can write one version of the program that implements either a vector-based solution or a list-based one. Write and test this version of the program.
5-4. Look again at the driver functions you wrote in the previous exercise. Note that it is possible to write a driver that differs only in the declaration of the type for the data structure that holds the input file. If your vector and list test drivers differ in any other way, rewrite them so that they differ only in this declaration.
5-5. Write a function named center(const vector<string>&) that returns a picture in which all the lines of the original picture are padded out to their full width, and the padding is as evenly divided as possible between the left and right sides of the picture. What are the properties of pictures for which such a function is useful? How can you tell whether a given picture has those properties?
5-6. Rewrite the extract_fails function from §5.1.1/77 so that instead of erasing each failing student from the input vector v, it copies the records for the passing students to the beginning of v, and then uses the resize function to remove the extra elements from the end of v. How does the performance of this version compare with the one in §5.1.1/77?
5-7. Given the implementation of frame in §5.8.1/93, and the following code fragment vector<string> v;
frame(v);
describe what happens in this call. In particular, trace through how both the width function and the frame function operate. Now, run this code. If the results differ from your
expectations, first understand why your expectations and the program differ, and then change
one to match the other.
5-8. In the hcat function from §5.8.3/95, what would happen if we defined s outside the scope of the while? Rewrite and execute the program to confirm your hypothesis.
5-9. Write a program to write the lowercase words in the input followed by the uppercase words.
5-10. Palindromes are words that are spelled the same right to left as left to right. Write a program to find all the palindromes in a dictionary. Next, find the longest palindrome.
5-11. In text processing it is sometimes useful to know whether a word has any ascenders or descenders. Ascenders are the parts of lowercase letters that extend above the text line; in the English alphabet, the letters b, d, f, h, k, l, and t have ascenders. Similarly, the descenders are the parts of lowercase letters that descend below the line; In English, the letters g, j, p, q, and y have descenders. Write a program to determine whether a word has any ascenders or descenders. Extend that program to find the longest word in the dictionary that has neither ascenders nor descenders.