Ising Spin Systems - Constraint Satisfaction

Chapter 4 Stacks of Software Stacks 77

5.5 Constraint Satisfaction

5.5.5 Ising Spin Systems

For each atom that constitutes a solid, it is possible to define a net spin magnetic moment µE resulting from the intrinsic spin of the subatomic particles and the orbital motion of electrons around their atomic nucleus. Such magnetic moments interact in complex ways giving rise to a range of microscopic and macroscopic phenomena. A simple description of such interactions is given by the Ising model, where eachµE in a crystal is represented by a spinSE taking values from{+1,−1} on a regular discrete grid of points {i, j,k}. Furthermore, the interaction of the

spins{ ES_i}is considered only between nearest neighbours and represented by a con- stantJ_i_,_j which determines if the two neighbouring spins will tend to align parallel J_i_,_j >0or anti-parallelJ_i_,_j <0with each other. Given a particular configuration of spin orientationsω, the energy of the system is then given by the Hamiltonian operator:

H= −X

i,j

J_i_,_jSE_iSE_j −hX

S_i (5.9)

wherehis an external magnetic field that tends to align the spins in a preferential orientation [9]. In this form, eachJ_i,_j defines a constraintC_i,_j between the values D= {+1,−1}taken by the variablesSE_i andSE_j. It is easy to see that the more constraints are satisfied, the lower the value ofHˆ becomes in Equation5.9. This simple model allows the study of phase transitions between disordered configurations at high temperature and ordered ones at low temperature. For ferromagneticJ_i,_j >0 and antiferromagneticJ_i,_j <0interactions the configurations are similar to those in Figure5.22(d) and (e) for 3D lattices. These correspond to the stable states of ourSNNsolver when the Ising models forJ_i,_j >0andJ_i,_j <0are mapped to an SNNusing Algorithm5.1and a 3D grid of 1,000 spins. Figure5.22(g) shows the result for a 1D antiferromagnetic spin chain. It is interesting to note that the statistical mechanics of spin systems has been extensively used to understand the firing dynamics ofSNNs, presenting a striking correspondence between their behaviour even in complex regimes. Our framework allows the inverse problem of mapping theSNNdynamics to spin interactions. This equivalence between dynamical systems and algorithms has largely been accepted and we see an advantage in com- puting directly between equivalent dynamical systems. However, it is clear that the network parameters should be adequately chosen in order to keep the computation valid.

If instead of fixing J_i,_j to some value U for all spin pairs {(i, j)}one allows it to take random values from{U,−U} with probabilities p_AF and p_FM, it will be found that certain interactions would be frustrated (unsatisfiable constraints).

Figure5.22(f) illustrates the frustration with three antiferromagnetic interacting spins in a way that any choice of orientation for the third spin will conflict with one or the other. This extension of the Ising model when the grid of interactions is a random mixture of AF and FM interactions was described by Surunganet al.

[246]. The model is the representation of the spin glass systems found in nature;

these are crystals with low concentrations of magnetic impurities that, due to the frustrated interactions, are quenched into a frozen random configuration when the temperature is lowered (at room or high temperature the magnetic moments of a material are constantly and randomly precessing around their average orientation).

Figure 5.22.SNN simulation of Ising spin systems. (a) and (b) show two 2-dimensional spin glass quenched states obtained with interaction probabilitiesp_{A F} =0.5andp_{A F} = 0.1. The results for the three-dimensional lattices for CSPs of 1,000 spins with ferromagnetic and antiferromagnetic coupling constant are shown in (e) and (d), respectively. In (c) are plotted the temporal dependence of the network entropy H, firing rateνand states countduring the stochastic search for the system in (d). (f) illustrates the origin of frustrated interactions in spin glasses. (g) depicts the result for the one-dimensional chain.

The statistical analysis of those systems was fundamental for the evolution of artifi- cial neural networks and machine learning. Furthermore, the optimisation problem of finding the minimum energy configuration of a spin glass has been shown to be NP-complete [9]. The quenching of the grid happens when it gets trapped in a local minimum of the state space of all possible configurations. In Figure5.22(a) and (b), we show a quenched state found by our SNN with p_AF = 0.5 and p_AF =0.1, respectively. A spin glass in nature will often be trapped in local min- ima and will need specific temperature variations to approach a lower energy state;

our SNNsreplicate this behaviour and allow for the study of thermal processes, controlling the time variation and intensity of the excitatory and inhibitory stimulations. If the underlying stochastic process of such stimulations is a good rep- resentative of heat in solids, they will correspond to an increase and a decrease of

temperature, respectively, allowing, for example, the implementation of simulated annealing optimisation. Figure5.22(c) shows the time evolution of the entropy, firing rate and states count for the antiferromagnetic 3D lattice of Figure5.22(d).

Similar plots, but converging to unsatisfying states, are found for the spin glasses in Figure5.22(a) and (b). In the case of the ferromagnetic lattice in5.22(e) with a very low noise, the network immediately converges to a solution. If the noise is high, however, it is necessary to stimulate the network several times to have a per- fect ordering. This is because more noise implies more energy to violate constraints;

even in nature, magnetic ordering is lost at high temperatures.

Chapter 6

From Activations to Spikes

By Francesco Galluppi, Teresa Serrano Gotarredona, Qian Liu and Evangelos Stromatias

Tackling real-world tasks requires being comfortable with chance, trading off time with accuracy, and using approximations.

— Brian Christian in Algorithms to Live By:

The Computer Science of Human Decisions

Deep learning has become the answer to an increasing number of AI problems since Hintonet al.[98] first proposed the training method of theDeep Belief Net- work (DBN). Machine learning is an extremely interesting space in 2019. The past few years have seen DeepMind create a narrowAIto master the game of Go and defeat Lee Sedol, a professional Go player [227]. Raising the ante, OpenAI designed a system to play a co-operative computer game and beat a professional team at it.¹ The game in question was DOTA 2 (Defense of the Ancients) a multiplayer online battle arena where each player (orAI) generally controls a single character. The same year, DeepMind showcased their system (AlphaStar) playing an arguably even more difficult game – Starcraft 2 – a real-time strategy game where each opponent can control up to 200 units, while also needing to focus on more abstract goals such as maintaining a functioning economy and production facilities [260]. These are just

1. https://openai.com/f ive/

160

examples of applications of oneAItechnique (Reinforcement Learning) to a very narrow field, although even these have much wider applicability.

However, deep learning is not new ‘magic’, but rather has a history over a few decades. An overview of some popularArtificial Neural Networks (ANNs)is offered below. The rest of the chapter reveals how one can use the application-focused insights fromDeep Neural Networks (DNNs)to engineerSNNsand an approach to convert pre-trained networks to use spikes.

6.1 Classical Models

We call the well-known and widely used deep learning models ‘classical’ and give a brief introduction to those models in this section. As mentioned above, the first break-through in training deep (>2 layer) networks was the greedy layer-wise strategy [98] proposed to train stackedRestricted Boltzmann Machines (RBMs). Shortly after, this method was proved also to be efficient for training other kinds of deep networks including stacked autoencoders (AEs) [13].

RBMsand AEs are suitable for dimensionality reduction and feature extraction when trained with unsupervised learning on unlabelled data. In 2012, using such an unsupervised deep learning architecture, the Google Brain team achieved a milestone in the deep learning era: the neural network learned to recognise cats by ‘watching’ 10 million images generated from random frames of YouTube videos [137].

Convolutional Neural Networks (ConvNets)are vaguely inspired from biology and the significant discovery of Hubel and Wiesel that simple cells have a preferential response to oriented bars (convolution) and complex cells collate responses from the simple ones (pooling); it is believed that these represent the basic functions in the primary visual cortex in cats [109]. These simple cells fire at a high frequency to their preferred orientation of visual stimuli within their receptive fields, small sub- regions of the visual field. Meanwhile, a complex cell corresponds to the existence of a pattern within a larger receptive field but loses the exact position of the pattern.

The NeoCognitron [63] was the first network to mimic the functions of V1 simple and complex neurons in anANN, and later, this feature detection of single cells was improved by sharing weights among receptive fields in LeNet-5 [138]; typically, ConvNetsfollow the same principle to this day. The mechanism of shared weights forms the essence of convolution in aConvNet, which hugely reduces the number of trainable parameters in a network. The usual procedure to trainConvNetsis a supervised one and is known as the back-propagation algorithm; it relies on the calculus chain rule to send error signals through the layers of the network starting from the output and ending at the input.

The most significant examples of ConvNet have dominated the best perfor- mances in the annual ImageNet Challenge [215]: AlexNet [132], VGG Net [228], GoogLeNet [249], ResNet [93] and MobileNet [108].

Despite the powerful capabilities of these feed-forward deep networks, sequence processing is a challenge for them since the size of the input and output vec- tors are constrained to the number of neurons. Thus,Recurrent Neural Networks (RNNs), containing feed-back connections, are ideal solutions for dealing with sequential information since their current output is always dependent on the previous ‘memory’. As training mechanisms have become more mature, for example, usingLong Short-Term Memory (LSTM)[99],RNNshave shown great success in many natural language processing tasks: language modelling [166], machine trans- lation [247], speech recognition [83] and image caption generation [125].

The current trend in deep learning is to combineMachine Learning (ML)algorithms towards more complex objectives such as sequential decision-making and data generation.

Reinforcement Learning (RL) is inspired from animal behaviour when agents learn to make sequential optimised decisions to control an environment [248]. To address complex decision-making problems in practical life, RL requires a suffi- ciently abstract representation of the high-dimensional environment. Fortunately, deep learning nicely complements this requirement and performs effectively at dimensionality reduction and feature extraction. Advances in RL techniques, such as asynchronous advantage actor-critic (A3C) [167], are what allowed DeepMind and OpenAI to perform the feats presented at the beginning of this chapter.

Generative Adversarial Networks (GANs)[80] are proposed for training generative models of complex data. Instead of training discrimination networks (e.g.

image classification using ConvNets) and generation networks (e.g. data sam- pling on RBMs) separately with different objectives,GANstrain two competing networks – one the discriminator, the other the generator – simultaneously by making them continuously play games with each other. Thus, the generator learns to produce more realistic data to fool the discriminator, while the discriminator learns to become better at distinguishing generated from real data. Exciting achievements have been reported in generating complex data such as realistic image generation based on descriptions in text [203].

6.2 Symbol Card Recognition System with Spiking ConvNets

TheConvNetis the most commonly used machine learning architecture for image recognition. It is a biologically inspired generic architecture for intelligent data

Figure 6.1.Generic ConvNet architecture.

processing [139]. The generic architecture of aConvNetfor visual object recognition is depicted in Figure6.1. The visual scene coming out of the retina is fed to a sequence of layers that emulate the processing layers of the brain visual cortex. Each layer consists of the parallel application of 2D-filters to extract the main image characteristics. Each image representation obtained is named a feature map. The first layer extracts oriented edges of the image according to different orientations and different spatial scales. The subsequent layers combine the feature maps obtained in the previous layers to detect the presence of combinations of edges, detecting progressively more complex image characteristics, until achieving the recognition of complex objects in the higher levels. Along theConvNetlayers, the sizes of the feature maps are progressively reduced through applying image subsampling. This subsampling process is intended to introduce invariance to object size and position.

In conventional AI vision systems, the ConvNet architectures are used in a frame-based manner. A frame representing the particular scene to be analysed is fed to the architecture. The output of the different convolutional layers is computed in a sequential way (layer by layer) until a valid output is obtained in the upper layer indicating the category of the recognised object. However, this is not what happens in biological brains. In a biological system, the retina ‘pixels’ send, in an asynchronous way, sequences of spikes representing the visual scene. Those spikes are sent through the optic nerve to the visual cortex where they are processed as they arrive by the subsequent neuron layers with just the delay of the spike propagation and neuron processing.

We have used theSpiNNakerplatform to implement a spikingConvNet. Each time a spike is generated by a neuron in a layer, the spike is propagated to the

connected neural populations of the next layer and the weights of the corresponding 2D-filter kernels are summed to the neuron states of the subsequent connected layer. That way the convolution is performed on the fly in a spike-based manner [220].

The input stimulus provided to the system is a flow of spikes representing the symbols of a poker card deck passing in front of aDVS[141,143,219] at high speed. We used an event-driven clustering algorithm [47] to track the passing symbols and, at the same time, we adjusted the tracking area to a 32×32 resolution.

Each symbol passed in 10–20 ms producing 3 k–6 k spikes. The 40 symbols passed in 0.95 s generating a total of 174,643 spikes.

To achieve real-time recognition with, at the same time, reproducibility of the recordings, we loaded the spike sequence onto a data player board [218]. The data player board stores the neuron addresses and timestamps of the recorded spikes in a local memory and reproduces them as events through a parallelAERlink in real time. The parallelAERevents are converted to the 2-of-7SpiNNakerprotocol and fed in real time to theSpiNNakermachine.

The particularConvNetarchitecture used for the card symbol recognition task is detailed in Figure6.2. It consists of three convolutional layers (C1, C3 and C5) interleaved with two subsampling layers (S2 and S4) and a final fully-connected category layer (C6). Table6.1details the numbers and sizes of the feature maps as well as the sizes of the kernels in each layer.

The kernels in the first layer are a set of six Gabor filters in three different orientations and for two different spatial scales and are fixed, not trained. The rest of

Figure 6.2. ConvNet architecture for card symbol recognition.

Table 6.1. Number and size of layers in card symbol ConvNet architecture.

ConvNet structure

C1 S2 C3 S4 C5 C6

Feature maps (FM) 6 6 4 4 8 4

FM dimension 28×28 14×14 10×10 5×5 1×1 1×1

Kernel size 10×10 – 5×5 – 5×5 1×1

Number of kernels 6 – 24 – 32 32

Number of weights 600 – 600 – 800 32

Trainable weights 0 – 600 – 800 32

the network weights are trained using frames and a method to convert the weights to the spiking domain [190].

6.2.1 Spiking ConvNet on SpiNNaker

One of the peculiarities of aConvNetarchitecture is the weight-sharing property.

The weights of the kernels that connect two neurons in two different feature maps do not depend on the particular neurons but just on the relative positions of the two neurons in the origin and destination feature maps. Because of that ‘weight- sharing’ property, the number of synaptic weights that must be stored for a neuronal population is greatly reduced compared to populations with full connectivity and independent non-shared weights. To optimise the processing speed, theSpiNNaker tool flow was modified to admit a special ‘convolution connector’. The ‘convolution connector’ is shared by all the neurons belonging to the same convolutional feature map population and contains the kernel weights which are stored in the localDTCMof the corresponding population. This solution avoids the reading of the kernel weights from the externalSDRAMeach time a spike arrives to the convolution module. Each time a spike arrives at a convolution module, depending on the source population of the incoming spike; the corresponding kernel is read from theDTCMmemory and the neuron states of the neighbour pixels are updated cor- respondingly. If any of the updated neurons passes the firing threshold, an output spike is generated and immediately sent to the next processing layer, in the usual way neural systems are implemented onSpiNNaker. In this way, theConvNetis truly spike- or event-driven.

Another characteristic of theConvNet is that most of the neuron parameters (such as neuron voltage thresholds, voltage reset levels, leakage rates and refractory times) are shared by all the neurons in the same population. Only the particular

neuron state and firing times are individual for each neuron. In the standard SpiNNakertool chain, all the neuron parameters are replicated and stored individ- ually for each neuron in theDTCM. Thus, theDTCMcapacity sometimes limits the number of neurons that can be implemented per core. Here the tool chain was also modified to distinguish between the parameters that are individual for each neuron and the parameters shared by all the population. With this approximation, we are able to implement 2,048 convolution neurons per core, where this number is determined by the maximum number of addressable neurons supported by the routeing scheme.

6.2.2 Results

To test the recognition rate, we used a test sequence of 4032×32tracked symbols obtained from the events recorded with a DVS[219]. As already explained, the recording consists of a total of 174,643 spikes encoded asAERevents.

We first tested the correct functionality of theConvNetfor card-symbol classification programmed on theSpiNNakerboard at low speed. For this experiment, we multiplied by a factor 100 the timestamps of all of the events of the sequence that was reproduced by the data player board. To maintain the same classification capability as theConvNetarchitecture optimised for card symbol recognition, we had to multiply the time parameters (the refractory and leakage times) of the network by the same factor 100. In Figure 6.3, we reproduce four snapshots of the composition grabbed with the jAER board of the input stimulus and the output category obtained with theSpiNNaker ConvNetclassifier. As can be seen, correct classification of the four card symbols is obtained. These snapshots are generated by collecting and histogramming events with the jAER [178] board over 1.2 ms.

The classification of the test sequence [190] of 40 card symbols slowed down a factor 100 was repeated 30 times. During the appearance time of each input symbol, the number of output events generated by the correct output category was counted as well as the number of output events generated by each of the other three output categories. The classification is considered successful if the number of output events of the correct category is the maximum. The mean of the success classification rate was 97% for the 30 repetitions of the experiment, with a maximum of 100% and a minimum of 93% in the success classification rate, thus achieving a recognition success rate slightly higher than the one obtained in the software real-time experiment [190].

Once we had tested that theSpiNNaker ConvNet classifier functionality was correct, we tested its maximum operation speed. For that purpose, we repeated the experiment for different slow-down factors of the event timings of the input stimulus sequence while, at the same time, we applied the same factor to theConvNet

Dalam dokumen SPINNAKER (Halaman 179-185)