Acknowledgments
3.9 Artificial intelligence and machine learning
3.9.2 Neural network
51 # if necessary start again
52 if self.k<k: self.__init__(self.points,self.metric)
53 # step until we get k clusters
54 whileself.k>k: self.step()
55 # return list of cluster members
56 returnself.r, self.v
Given a set of points, we can determine the most likely number of clusters representing the data, and we can make a plot of the number of clusters versus distance and look for a plateau in the plot. In correspondence with the plateau, we can read from the y-coordinate the number of clusters.
This is done by the function cluster in the preceding algorithm, which returns the average distance between clusters and a list of clusters.
For example:
Listing3.21: in file:nlib.py
1 >>> def metric(a,b):
2 ... returnmath.sqrt(sum((x-b[i])**2 for i,x in enumerate(a)))
3 >>> points = [[random.gauss(i % 5,0.3) for j in xrange(10)] for i in xrange(200) ]
4 >>> c = Cluster(points,metric)
5 >>> r, clusters = c.find(1) # cluster all points until one cluster only
6 >>> Canvas(title='clustering example',xlab='distance',ylab='number of clusters'
7 ... ).plot(c.dd[150:]).save('clustering1.png')
8 >>> Canvas(title='clustering example (2d projection)',xlab='p[0]',ylab='p[1]'
9 ... ).ellipses([p[:2] for p in points]).save('clustering2.png')
With our sample data, we obtain the following plot (“clustering1.png”):
and the location where the curve bends corresponds to five clusters. Al- though our points live in10dimensions, we can try to project them into two dimensions and see the five clusters (“clustering2.png”):
Figure3.7: Number of clusters found as a function of the distance cutoff.
ganized in the layers with oneinput layerof neurons connected only with the input and the next layer. Another one, theoutput layer, comprises neu- rons connected only with the output and previous layers, or manyhidden layersof neurons connected only with other neurons. Each neuron is char- acterized by input links and output links. Each output of a neuron is a function of its inputs. The exact shape of that function depends on the network and on parameters that can be adjusted. Usually this function is chosen to be a monotonic increasing function on the sum of the inputs, where both the inputs and the outputs take values in the [0,1] range. The inputs can be thought as electrical signals reaching the neuron. The out- put is the electrical signal emitted by the neuron. Each neuron is defined by a set of parametersawhich determined the relative weight of the input signals. A common choice for this characteristic function is:
outputij =tanh(
∑
k
aijkinputik) (3.97) wherei labels the neuron,jlabels the output,klabels the input, and aijk
Figure3.8: Visual representation of the clusters where the points coordinates are pro- jected in2D.
are characteristic parameters describing the neurons.
The network is trained by providing an input and adjusting the character- isticsaijk of each neuron kto produce the expected output. The network is trained iteratively until its parameters converge (if they converge), and then it is ready to make predictions. We say the network has learned from the training data set.
Listing3.22: in file:nlib.py
1 class NeuralNetwork:
2 """
3 Back-Propagation Neural Networks
4 Placed in the public domain.
5 Original author: Neil Schemenauer <nas@arctrix.com>
6 Modified by: Massimo Di Pierro
7 Read more: http://www.ibm.com/developerworks/library/l-neural/
8 """
9
10 @staticmethod
11 def rand(a, b):
12 """ calculate a random number where: a <= rand < b """
Figure3.9: Example of a minimalist neural network.
13 return (b-a)*random.random() + a
14
15 @staticmethod
16 def sigmoid(x):
17 """ our sigmoid function, tanh is a little nicer than the standard 1/(1+
e^-x) """
18 return math.tanh(x)
19
20 @staticmethod
21 def dsigmoid(y):
22 """ # derivative of our sigmoid function, in terms of the output """
23 return 1.0 - y**2
24
25 def __init__(self, ni, nh, no):
26 # number of input, hidden, and output nodes
27 self.ni = ni + 1 # +1 for bias node
28 self.nh = nh
29 self.no = no
30
31 # activations for nodes
32 self.ai = [1.0]*self.ni
33 self.ah = [1.0]*self.nh
34 self.ao = [1.0]*self.no
35
36 # create weights
37 self.wi = Matrix(self.ni, self.nh, fill=lambda r,c: self.rand(-0.2, 0.2) )
38 self.wo = Matrix(self.nh, self.no, fill=lambda r,c: self.rand(-2.0, 2.0)
)
39
40 # last change in weights for momentum
41 self.ci = Matrix(self.ni, self.nh)
42 self.co = Matrix(self.nh, self.no)
43
44 def update(self, inputs):
45 if len(inputs) != self.ni-1:
46 raise ValueError('wrong number of inputs')
47
48 # input activations
49 for i in xrange(self.ni-1):
50 self.ai[i] = inputs[i]
51
52 # hidden activations
53 for j in xrange(self.nh):
54 s = sum(self.ai[i] * self.wi[i,j]for i in xrange(self.ni))
55 self.ah[j] = self.sigmoid(s)
56
57 # output activations
58 for k in xrange(self.no):
59 s = sum(self.ah[j] * self.wo[j,k]for j in xrange(self.nh))
60 self.ao[k] = self.sigmoid(s)
61 returnself.ao[:]
62
63 def back_propagate(self, targets, N, M):
64 if len(targets) != self.no:
65 raise ValueError('wrong number of target values')
66
67 # calculate error terms for output
68 output_deltas = [0.0] * self.no
69 for k in xrange(self.no):
70 error = targets[k]-self.ao[k]
71 output_deltas[k] = self.dsigmoid(self.ao[k]) * error
72
73 # calculate error terms for hidden
74 hidden_deltas = [0.0] * self.nh
75 for j in xrange(self.nh):
76 error = sum(output_deltas[k]*self.wo[j,k] for k in xrange(self.no))
77 hidden_deltas[j] = self.dsigmoid(self.ah[j]) * error
78
79 # update output weights
80 for j in xrange(self.nh):
81 for k in xrange(self.no):
82 change = output_deltas[k]*self.ah[j]
83 self.wo[j,k] = self.wo[j,k] + N*change + M*self.co[j,k]
84 self.co[j,k] = change
85 #print N*change, M*self.co[j,k]
86
87 # update input weights
88 for i in xrange(self.ni):
89 for j in xrange(self.nh):
90 change = hidden_deltas[j]*self.ai[i]
91 self.wi[i,j] = self.wi[i,j] + N*change + M*self.ci[i,j]
92 self.ci[i,j] = change
93
94 # calculate error
95 error = sum(0.5*(targets[k]-self.ao[k])**2 for k in xrange(len(targets)) )
96 return error
97
98 def test(self, patterns):
99 for p in patterns:
100 print p[0], '->', self.update(p[0])
101
102 def weights(self):
103 print 'Input weights:'
104 for i in xrange(self.ni):
105 print self.wi[i]
106 print
107 print 'Output weights:'
108 for j in xrange(self.nh):
109 print self.wo[j]
110
111 def train(self, patterns, iterations=1000, N=0.5, M=0.1, check=False):
112 # N: learning rate
113 # M: momentum factor
114 for i in xrange(iterations):
115 error = 0.0
116 for p in patterns:
117 inputs = p[0]
118 targets = p[1]
119 self.update(inputs)
120 error = error + self.back_propagate(targets, N, M)
121 if checkand i % 100 == 0:
122 print'error %-14f' % error
In the following example, we teach the network the XOR function, and we create a network with two inputs, two intermediate neurons, and one output. We train it and check what it learned:
Listing3.23: in file: nlib.py
1 >>> pat = [[[0,0], [0]], [[0,1], [1]], [[1,0], [1]], [[1,1], [0]]]
2 >>> n = NeuralNetwork(2, 2, 1)
3 >>> n.train(pat)
4 >>> n.test(pat)
5 [0, 0] -> [0.00...]
6 [0, 1] -> [0.98...]
7 [1, 0] -> [0.98...]
8 [1, 1] -> [-0.00...]
Now, we use our neural network to learn patterns in stock prices and predict the next day return. We then check what it has learned, comparing the sign of the prediction with the sign of the actual return for the same days used to train the network:
Listing3.24: in file:test.py
1 >>> storage = PersistentDictionary('sp100.sqlite')
2 >>> v = [day['arithmetic_return']*300 for day in storage['AAPL/2011'][1:]]
3 >>> pat = [[v[i:i+5],[v[i+5]]] for i in xrange(len(v)-5)]
4 >>> n = NeuralNetwork(5, 5, 1)
5 >>> n.train(pat)
6 >>> predictions = [n.update(item[0]) for item in pat]
7 >>> success_rate = sum(1.0 for i,e in enumerate(predictions)
8 ... if e[0]*v[i+5]>0)/len(pat)
The learning process depends on the random number generator; there- fore, sometimes, for this small training data set, the network succeeds in predicting the sign of the next day arithmetic return of the stock with more than50% probability, and sometimes it does not. We leave it to the reader to study the significance of this result but using a different subset of the data for the training of the network and for testing its success rate.