• Tidak ada hasil yang ditemukan

Extensions

Dalam dokumen Think Data Structures (Halaman 166-172)

If you get a basic version of this exercise working, you might want to work on these optional exercises:

ˆ Read about TF-IDF at http://thinkdast.com/tfidf and implement it. You might have to modify JavaIndex to compute document frequen-cies; that is, the total number of times each term appears on all pages in the index.

16.6 Extensions 153

ˆ For queries with more than one search term, the total relevance for each page is currently the sum of the relevance for each term. Think about when this simple version might not work well, and try out some alterna-tives.

ˆ Build a user interface that allows users to enter queries with boolean operators. Parse the queries, generate the results, then sort them by relevance and display the highest-scoring URLs. Consider generating

“snippets” that show where the search terms appeared on the page. If you want to make a Web application for your user interface, consider using Heroku as a simple option for developing and deploying Web ap-plications using Java. See http://thinkdast.com/heroku.

Chapter 17 Sorting

Computer science departments have an unhealthy obsession with sort algo-rithms. Based on the amount of time CS students spend on the topic, you would think that choosing sort algorithms is the cornerstone of modern soft-ware engineering. Of course, the reality is that softsoft-ware developers can go years, or entire careers, without thinking about how sorting works. For almost all applications, they use whatever general-purpose algorithm is provided by the language or libraries they use. And usually that’s just fine.

So if you skip this chapter and learn nothing about sort algorithms, you can still be an excellent developer. But there are a few reasons you might want to do it anyway:

1. Although there are general-purpose algorithms that work well for the vast majority of applications, there are two special-purpose algorithms you might need to know about: radix sort and bounded heap sort.

2. One sort algorithm, merge sort, makes an excellent teaching example because it demonstrates an important and useful strategy for algorithm design, called “divide-conquer-glue”. Also, when we analyze its perfor-mance, you will learn about an order of growth we have not seen before, linearithmic. Finally, some of the most widely-used algorithms are hybrids that include elements of merge sort.

3. One other reason to learn about sort algorithms is that technical inter-viewers love to ask about them. If you want to get hired, it helps if you can demonstrate CS cultural literacy.

So, in this chapter we’ll analyze insertion sort, you will implement merge sort, I’ll tell you about radix sort, and you will write a simple version of a bounded heap sort.

17.1 Insertion sort

We’ll start with insertion sort, mostly because it is simple to describe and implement. It is not very efficient, but it has some redeeming qualities, as we’ll see.

Rather than explain the algorithm here, I suggest you read the insertion sort Wikipedia page athttp://thinkdast.com/insertsort, which includes pseu-docode and animated examples. Come back when you get the general idea.

Here’s an implementation of insertion sort in Java:

public class ListSorter<T> {

public void insertionSort(List<T> list, Comparator<T> comparator) { for (int i=1; i < list.size(); i++) {

T elt_i = list.get(i);

int j = i;

while (j > 0) {

T elt_j = list.get(j-1);

if (comparator.compare(elt_i, elt_j) >= 0) { break;

}

list.set(j, elt_j);

j--;

}

list.set(j, elt_i);

} } }

I define a class, ListSorter, as a container for sort algorithms. By using the type parameter, T, we can write methods that work on lists containing any object type.

17.1 Insertion sort 157

insertionSort takes two parameters, a List of any kind and a Comparator that knows how to compare type T objects. It sorts the list “in place”, which means it modifies the existing list and does not have to allocate any new space.

The following example shows how to call this method with a List of Integer objects:

List<Integer> list = new ArrayList<Integer>(

Arrays.asList(3, 5, 1, 4, 2));

Comparator<Integer> comparator = new Comparator<Integer>() {

@Override

public int compare(Integer elt1, Integer elt2) { return elt1.compareTo(elt2);

} };

ListSorter<Integer> sorter = new ListSorter<Integer>();

sorter.insertionSort(list, comparator);

System.out.println(list);

insertionSort has two nested loops, so you might guess that its run time is quadratic. In this case, that turns out to be correct, but before you jump to that conclusion, you have to check that the number of times each loop runs is proportional to n, the size of the array.

The outer loop iterates from 1 to list.size(), so it is linear in the size of the list, n. The inner loop iterates from i to 0, so it is also linear in n. Therefore, the total number of times the inner loop runs is quadratic.

If you are not sure about that, here’s the argument:

ˆ The first time through, i = 1 and the inner loop runs at most once.

ˆ The second time, i = 2 and the inner loop runs at most twice.

ˆ The last time, i = n − 1 and the inner loop runs at most n − 1 times.

So the total number of times the inner loop runs is the sum of the series 1, 2, . . . , n − 1, which is n(n − 1)/2. And the leading term of that expression (the one with the highest exponent) is n2.

In the worst case, insertion sort is quadratic. However:

1. If the elements are already sorted, or nearly so, insertion sort is linear.

Specifically, if each element is no more than k locations away from where it should be, the inner loop never runs more than k times, and the total run time is O(kn).

2. Because the implementation is simple, the overhead is low; that is, al-though the run time is an2, the coefficient of the leading term, a, is probably small.

So if we know that the array is nearly sorted, or is not very big, insertion sort might be a good choice. But for large arrays, we can do better. In fact, much better.

Dalam dokumen Think Data Structures (Halaman 166-172)

Dokumen terkait