Sequences
4.2 Lists
In the first and second chapter we developed a sequence calledPyList. ThePyList class is really just a repackaging of the Python list class. The example sequence demonstrates some of the operators that are supported by Python. In this section we want to look more deeply into how lists are implemented. There are many operations supported on lists. Chapter16contains the full list. The table in Fig.4.1is a subset of the operations supported by lists.
Each of the operations in the table has an associated complexity. The performance of an algorithm depends on the complexity of the operations used in implementing that algorithm. In the following sections we’ll further develop our own list datatype, calledPyList, using the built-in list only for setting and getting elements in a list. The indexed getandindexed setoperations can be observed to have O(1) complexity. This complexity is achieved because the memory of a computer is randomly accessible, which is why it is calledRandom Access Memory. In Chap.2we spent some time demonstrating that each location within a list is accessible in the same amount of time regardless of list size and location being retrieved. In the following sections we’ll enhance the PyList datatype to support the operations given in this table.
Operation Complexity Usage Method
List creation O(n) or O(1) x = list(y) calls __init__(y)
indexed get O(1) a = x[i] x.__getitem__(i)
indexed set O(1) x[i] = a x.__setitem__(i,a)
concatenate O(n) z = x + y z = x.__add__(y)
append O(1) x.append(a) x.append(a)
insert O(n) x.insert(i,e) x.insert(i,e))
delete O(n) del x[i] x.__delitem__(i)
equality O(n) x == y x.__eq__(y)
iterate O(n) for a in x: x.__iter__()
length O(1) len(x) x.__len__()
membership O(n) a in x x.__contains__(a)
sort O(n log n) x.sort() x.sort()
Fig. 4.1 Complexity of List Operations
4.2.1 The PyList Datatype
In the first couple of chapters we began developing our PyList data structure. To support the O(1) complexity of theappend operation, the PyList contains empty locations that can be filled whenappend is called as first described in Sect.2.10.
We’ll keep track of the number of locations being used and the actual size of the internal list in ourPyListobjects. So, we’ll need three pieces of information: the list itself calleditems, the size of the internal list calledsize, and the number of locations in the internal list that are currently being used callednumItems. While we wouldn’t have to keep track of the size of the list, because we could call thelenfunction, we’ll store the size in the object to avoid the overhead of callinglenin multiple places in the code.
All the used locations in the internal list will occur at the beginning of the list. In other words, there will be no holes in the middle of a list that we will have to worry about. We’ll call this assumption aninvarianton our data structure. Aninvariantis something that is true before and after any method call on the data structure. The invariantfor this list is that the internal list will have the firstnumItemsfilled with no holes. The code in Sect.4.2.3provides a constructor that can also be passed a list for its initial contents.
Storing all the items at the beginning of the list, without holes, also means that we can randomly access elements of the list in O(1) time. We don’t have to search for the proper location of an element. Indexing into the PyList will simply index into the internalitemslist to find the proper element as seen in the next sections.
4.2.2 The PyList Constructor
1 class PyList:
2 def __init__(self,contents=[], size=10):
3 # The contents allows the programmer to construct a list with
4 # the initial contents of this value. The initial_size
5 # lets the programmer pick a size for the internal size of the
6 # list. This is useful if the programmer knows he/she is going
7 # to add a specific number of items right away to the list.
8 self.items = [None] * size
9 self.numItems = 0
10 self.size = size
11
12 for e in contents:
13 self.append(e)
The code in Sect.4.2.3builds aPyListobject by creating a list of 10Nonevalues.
Noneis the special value in Python for references that point at nothing. Figure4.2 shows a sample list after it was created and three items were appended to it. The specialNonevalue is indicated in the figure by the three horizontal lines where the empty slots in the list point. The initial size of the internalitemslist is 10 by default, but a user could pass a larger size initially if they wanted to. This is only the initial size. The list will still grow when it needs to. Thecontentsparameter lets
94 4 Sequences
Fig. 4.2 A Sample Pylist Object
the programmer pass in a list or sequence to put in the list initially. For instance, the object in Fig.4.2could have been created by writing the following.
sampleList = PyList(["a", "b", "c"])
Each element of the sequence is added as a separate list item. The complexity of creating aPyListobject isO(1)if no value is passed to the constructor andO(n)if a sequence is passed to the constructor, wherenis the number of elements in the sequence.
4.2.3 PyList Get and Set
1 def __getitem__(self,index):
2 if index >= 0 and index < self.numItems:
3 return self.items[index]
4
5 raise IndexError("PyList index out of range")
6
7 def __setitem__(self,index,val):
8 if index >= 0 and index < self.numItems:
9 self.items[index] = val
10 return
11
12 raise IndexError("PyList assignment index out of range")
Our PyList class is a wrapper for the built-in list class. So, to implement the get item and set item operations on PyList, we’ll use the get and set operations on the built-in list class. The code is given here. The complexity of both operations is O(1).
In both cases, we want to make sure the index is in the range of acceptable indices.
If it is not, we’ll raise an IndexError exception just as the built-in list class does.
4.2.4 PyList Concatenate
1 def __add__(self,other):
2 result = PyList(size=self.numItems+other.numItems)
3
4 for i in range(self.numItems):
5 result.append(self.items[i])
6
7 for i in range(other.numItems):
8 result.append(other.items[i])
9
10 return result
To concatenate two lists we must build a new list that contains the contents of both. This is an accessor method because it does not mutate either list. Instead, it builds a new list. We can do this operation inO(n)time wherenis the sum of the lengths of the two lists. Here is some code to accomplish this.
In Sect.4.2.5the size is set to the needed size for the result of concatenating the two lists. The complexity of the__add__method is O(n) where n is the length of the two lists. The initial size of the list does nothaveto be set because append has O(1) complexity as we saw in Sect.2.10. However, since we know the size of the resulting list, setting the initial size should speed up the concatenation operation slightly.
4.2.5 PyList Append
1 # This method is hidden since it starts with two underscores.
2 # It is only available to the class to use.
3 def __makeroom(self):
4 # increase list size by 1/4 to make more room.
5 # add one in case for some reason self.size is 0.
6 newlen = (self.size // 4) + self.size + 1
7 newlst = [None] * newlen
8 for i in range(self.numItems):
9 newlst[i] = self.items[i]
10
11 self.items = newlst
12 self.size = newlen
13
14 def append(self,item):
15 if self.numItems == self.size:
16 self.__makeroom()
17
18 self.items[self.numItems] = item
19 self.numItems += 1 # Same as writing self.numItems = self.numItems + 1
In Sect.2.10we learned that the append method has O(1) amortized complexity.
When appending, we will just add one more item to the end of theself.itemslist if there is room. In the description of the constructor we decided thePyList objects would contain a list that had room for more elements. When appending we can make use of that extra space. Once in a while (i.e. after appending some number of items), the internalself.itemslist will fill up. At that time we must increase the size of the itemslist to make room for the new item we are appending by a size proportional to the current length ofself.items.
96 4 Sequences As we learned in Chap.2, to make theappendoperation run inO(1)time we can’t just add one more location each time we need more space. It turns out that adding 25 % more space each time is enough to guaranteeO(1)complexity. The choice of 25 % is not significant. If we added even 10 % more space each time we would get O(1)complexity. At the other extreme we could double the internal list size each time we needed more room as we did in Sect.2.10. However, 25 % seems like a reasonable amount to expand the list without gobbling up too much memory in the computer.
We just need a few more cyber dollars stored up for each append operation to pay for expanding the list when we run out of room. The code in Sect.4.2.6implements theappendoperation with an amortized complexity ofO(1). Integer division by 4 is very quick in a computer because it can be implemented by shifting the bits of the integer to the right, so computing our new length, when needed, is relatively quick.
The Python interpreter implements append in a similar way. The Python interpreter is implemented in C, so the interpreter uses C code. Python also chooses to increase the list size by other values. In Python list sizes increase by 4, 8, 16, 25, and so on.
The additional space to add to the internal list is calculated from the newly needed size of the list and grows by 4, 8, 16, 25, 35, 46, 58, 72, 88, and so on. You can see that the amount to add grows as the list grows and that leads to an amortized complexity of O(1) for the append operation in the Python interpreter.
4.2.6 PyList Insert
1 def insert(self,i,e):
2 if self.numItems == self.size:
3 self.__makeroom()
4
5 if i < self.numItems:
6 for j in range(self.numItems-1,i-1,-1):
7 self.items[j+1] = self.items[j]
8
9 self.items[i] = e
10 self.numItems += 1
11 else:
12 self.append(e)
To insert into this sequential list we must make room for the new element. Given the way the list is organized, there is no choice but to copy each element after the point where we want to insert the new value to the next location in the list. This works best if we start from the right end of the list and work our way back to the point where the new value will be inserted. The complexity of this operation is O(n) wherenis the number of elements in the list after the insertion point.
The indexiis the location where the new valueeis to be inserted. If the index pro- vided is larger than the size of the list the new item,e, is appended to the end of the list.
4.2.7 PyList Delete
1 def __delitem__(self,index):
2 for i in range(index, self.numItems-1):
3 self.items[i] = self.items[i+1]
4 self.numItems -= 1 # same as writing self.numItems = self.numItems - 1
When deleting an item at a specificindex in the list, we must move everything after the item down to preserve our invariant that there are no holes in the internal list.
This results in aO(n)implementation in the average and worst case wherenis the number of items after theindexin the list. Here is code that accomplishes deletion.
In the Python interpreter, to conserve space, if a list reaches a point after deletion where less than half of the locations within the internal list are being used, then the size of the available space is reduced by one half.
4.2.8 PyList Equality Test
1 def __eq__(self,other):
2 if type(other) != type(self):
3 return False
4
5 if self.numItems != other.numItems:
6 return False
7
8 for i in range(self.numItems):
9 if self.items[i] != other.items[i]:
10 return False
11
12 return True
Checking for equality of two lists requires the two lists be of the same type. If they are of different types, then we’ll say they are not equal. In addition, the two lists must have the same length. If they are not the same length, they cannot be equal. If these two preconditions are met, then the lists are equal if all the elements in the two lists are equal. Here is code that implements equality testing of twoPyListobjects.
Equality testing is a O(n) operation.
4.2.9 PyList Iteration
1 def __iter__(self):
2 for i in range(self.numItems):
3 yield self.items[i]
The ability to iterate over a sequence is certainly a requirement. Sequences hold a collection of similar data items and we frequently want to do something with each item in a sequence. Of course, the complexity of iterating over any sequence isO(n) wherenis the size of the sequence. Here is code that accomplishes this for thePyList sequence. Theyieldcall in Python suspends the execution of the__iter__method and returns the yielded item to the iterator.
98 4 Sequences
4.2.10 PyList Length
1 def __len__(self):
2 return self.numItems
If the number of items were not kept track of within thePyListobject, then counting the number of items in the list would be aO(n)operation. Instead, if we keep track of the number of items in the list as items are appended or deleted from the list, then we need only return the value ofnumItemsfrom the object, resulting inO(1)complexity.
4.2.11 PyList Membership
1 def __contains__(self,item):
2 for i in range(self.numItems):
3 if self.items[i] == item:
4 return True
5
6 return False
Testing for membership in a list means checking to see if anitemis one of the items in the list. The only way to do this is to examine each item in sequence in the list. If theitemis found thenTrueis returned, otherwiseFalseis returned. This results inO(n)complexity.
This idea of searching for an item in a sequence is so common that computer scientists have named it. This is calledlinear search. It is named this because of its O(n)complexity.
4.2.12 PyList String Conversion
1 def __str__(self):
2 s = "["
3 for i in range(self.numItems):
4 s = s + repr(self.items[i])
5 if i < self.numItems - 1:
6 s = s + ", "
7 s = s + "]"
8 return s
It is convenient to be able to convert a list to a string so it can be printed. Python includes two methods that can be used for converting to a string. The first you are probably already familiar with. Thestr function calls the __str__method on an object to create a string representation of itself suitable for printing. Here is code that implements the__str__method for thePyListclass.
4.2.13 PyList String Representation
1 def __repr__(self):
2 s = "PyList(["
3 for i in range(self.numItems):
4 s = s + repr(self.items[i])
5 if i < self.numItems - 1:
6 s = s + ", "
7 s = s + "])"
8 return s
The other method for converting an object to a string has a different purpose.
Python includes a function calledevalthat will take a string containing an expres- sion and evaluate the expression in the string. For instance,eval(“6+5”) results in 11 andeval(“[1,2,3]”) results in the list [1,2,3]. Thereprfunction in Python calls the __repr__method on a class. This method, if defined, should return a string represen- tation of an object that is suitable to be given to theevalfunction. In the case of the PyListclass, thereprform of the string would be something like “PyList([1,2,3])” for thePyListsequence containing these items. Here is the code that accomplishes this.
It is nearly identical to the__str__code, except thatPyListprefixes the sequence.
Notice that in both Sects.4.2.13and 4.2.14 thatrepr is called on the elements of the list. Callingrepris necessary because otherwise a list containing strings like [”hi”,”there”] would be converted to [hi,there] in itsstrorreprrepresentation.