Chapter 19. Data Structures and Algorithm Complexity

Algorithm complexity is a rough approximation of the number of steps that will be performed depending on the size of the input data. It takes the order of N2 number of steps, where N is the size of the input data, to perform a given operation. It takes the sequence of N3 steps, where N is the size of the input data, to perform an operation on N elements.

Comparison between Basic Data Structures

When to Use a Particular Data Structure?

Array (T[])

Searching a linked list is a slow operation because we have to traverse all its elements. Use LinkedList when you need to add and remove elements at both ends of the data structure.

Dynamic Array (List<T>)

Linked lists are rarely used in practice because dynamic arrays (List) can perform almost exactly the same operations as LinkedList, and for most of them it works faster and more conveniently. When you need a linked list, use List instead of LinkedList because it doesn't run slower and gives you more speed and flexibility. When you need to add and remove elements from both ends of a list, use LinkedList.

A dynamic array (List) is suitable when we need to add elements frequently and keep their order of addition and access them through an index. If we need to find or delete elements frequently, List is not the right data structure. Use List when you need to quickly add elements and access them via index.

Stack

Queue

The operation access through index is not available because the elements of the hash table have no order ie. for example, if we need to count how many times each word is encountered in a set of words in a text file, we can use Dictionary – the key will be a specific word, the value – how many times we have seen it. Unlike hash tables where we can reach linear complexity if we choose a bad hash function, in SortedDictionary the number of steps in the basic operations is the same in average and worst case – log2(N) .

As an example of using a SortedDictionary, we can give the following task: find all words in a text file that appear exactly 10 times, and print them alphabetically. This is a task that we can also solve successfully with Dictionary, but we will need to do an additional sort on . Set, implemented with a hash table (class HashSet) is a special case of a hash table, in which we only have keys.

Another similarity with hash tables is that if we choose a bad hash function, we can achieve a linear complexity by running the basic operations. As an example of using a HashSet, we can highlight the task of finding all the different words in a text file.

Choosing a Data Structure – Examples

As an example of using a SortedSet, we can point to the task of finding all the different words in a given text file and printing them alphabetically. Use SortedSet when you need to quickly add an element to a set and check whether a particular element belongs to the set and whether all elements should be sorted in ascending order.

Generating Subsets

We choose an array because it is the simplest data structure of all and is easy to work with. The next step is to choose a structure in which we want to store one of the subsets we generate, for example {ocean, happiness}. The operations are checking if an element exists and adding an element, not true.

We're running out of options, so let's see what the data structure set offers. We don't need to sort the words in alphabetical order, so we choose the faster implementation –HashSet. If we examine the subset generation algorithm, we will notice that each subset is processed in a "first generated, first processed" style.

Queue> subsetsQueue = new Queue>();. If we run the code above, we will see that it successfully generates all subsets of S, but some of them are generated twice.

Sorting Students

We can implement the problem using a hash table, which will hold a list of students with a course name. Another option is to use SortedSet for students taking each course (because it is sorted internally), but since one can have students with the same name, we should use SortedSet> . We choose the easiest way – using List and sort it before printing it.

Now we are able to write the code that reads the students and their courses and stores them in a hash table that holds a list of students by course name (Dictionary>). Read the file and build the hash table of courses Dictionary> courses. New Course -> create a list of students for that student = new List();.

After storing each student's information, hash tables are checked to see if their course exists. If the subject is found, the student is added to the list of students of this course.

Sorting a Phone Book

Searching in a Phone Book

So far so good, but what should we keep as key and value in the hash table. So why don't we make a hash table with a key name of a person and value another hash table which after city name will return a list of phone numbers. It looks like this could solve our problem and we will only use one hash table for all the queries.

Using the last idea, we can figure out the following algorithm: we read line by line from the telephone book and for each word from the name of a person d1, d2, …, dk and for each city name t we make new records in the phonebook hash table by the following keys: d1, d2, …, dk, "d1 of t",. After that, the search is trivial – we just search the hash table by a given word d or if a town t is given "d of t". Since we can have many phone numbers under the same key, for a value in the hash table we need to use a list of strings (List).

After that, the names are split and each word is added to the hash table. In order to be case sensitive, all keys in the hash table are added as lowercase letters.

Choosing a Data Structure – Conclusions

The search is done directly using the hash table, which is created after reading the phonebook file.

External Libraries with .NET Collections

One of the most popular and richest libraries with efficient implementations of the fundamental data structures for C# and .NET software developers is the open source project "Wintellect's Power Collections for .NET" – http://powercollections.codeplex.com. In terms of functionality and way of working, the class is similar to the standard class HashSet in the .NET Framework. In terms of functionality and way of working, the class is similar to the standard class SortedDictionary in the .NET Framework.

Deque – represents an efficient implementation of a double-ended queue, which practically combines the stack and queue data structures. BagList – a list of elements, accessible via an index, which enables quick insertion and deletion of an element from a certain position. The structure is a good alternative to List, in which inserting and removing an element at a given position takes linear time due to the need to swap a linear number of elements left or right.

We give the reader the opportunity to download the "Power Collections for .NET" library from his site and experiment with it. Another very powerful library of data structures and collection classes is “The C5 Generic Collection Library for C# and CLI” (www.itu.dk/research/c5/).

Exercises

It can be very useful when solving some of the problems from the exercises. It provides standard interfaces and collection classes such as lists, sets, bags, multi-sets, balanced trees and hash tables, as well as non-traditional data structures such as "hashed linked list", "wrapped arrays" and "interval heaps". The C5 collections and the book about them are the ultimate resource for data structure developers.

Implement a class BiDictionary, which allows adding triplets {key1, key2, value} and quickly searching with one of the keys key1, key2, but also with a combination of both keys. What data structure can we use to quickly find all products that cost between $5 and $10? What data structure could we have to quickly add events and quickly check whether the venue is available within a certain interval [start date and time]; end date and time].

Implement the PriorityQueue data structure, which allows fast execution of the following operations: adding an element, extracting the smallest element. What data structures would you use to ensure fast searching on one or more criteria?

Solutions and Guidelines

When searching with two keys, you can search the two hash tables separately and cut the corresponding subsets. If we keep the products sorted by price in an array (for example in List, which we first populate and then sort), to find all the products that cost between 5 and 10 dollars, we can do a binary search two times used. First, we can start to find the smallest index, in which lies a product that costs at least 5 dollars.

Then we can find the largest index end, in which lies a product that costs at most 10 dollars. It has an operation SortedSet.GetViewBetween(lowerBound, upperBound) that returns a subset of the elements in a certain range (interval). It has a method to extract a subrange of values: OrderedSet.Range(from, fromInclusive, to, toInclusive).

We can create two sorted arrays (List): the first will hold the events sorted in ascending order by start date and time; the other will keep the same events sorted by end date and time. We can find the entire set E of all events that end before the end of the moment (using binary search).