Results - Symbol Card Recognition System with Spiking ConvNets

Chapter 6 From Activations to Spikes 160

6.2 Symbol Card Recognition System with Spiking ConvNets

6.2.2 Results

To test the recognition rate, we used a test sequence of 4032×32tracked symbols obtained from the events recorded with a DVS[219]. As already explained, the recording consists of a total of 174,643 spikes encoded asAERevents.

We first tested the correct functionality of theConvNetfor card-symbol classification programmed on theSpiNNakerboard at low speed. For this experiment, we multiplied by a factor 100 the timestamps of all of the events of the sequence that was reproduced by the data player board. To maintain the same classification capability as theConvNetarchitecture optimised for card symbol recognition, we had to multiply the time parameters (the refractory and leakage times) of the net- work by the same factor 100. In Figure 6.3, we reproduce four snapshots of the composition grabbed with the jAER board of the input stimulus and the output category obtained with theSpiNNaker ConvNetclassifier. As can be seen, correct classification of the four card symbols is obtained. These snapshots are generated by collecting and histogramming events with the jAER [178] board over 1.2 ms.

The classification of the test sequence [190] of 40 card symbols slowed down a factor 100 was repeated 30 times. During the appearance time of each input symbol, the number of output events generated by the correct output category was counted as well as the number of output events generated by each of the other three output categories. The classification is considered successful if the number of output events of the correct category is the maximum. The mean of the success classification rate was 97% for the 30 repetitions of the experiment, with a maximum of 100% and a minimum of 93% in the success classification rate, thus achiev- ing a recognition success rate slightly higher than the one obtained in the software real-time experiment [190].

Once we had tested that theSpiNNaker ConvNet classifier functionality was correct, we tested its maximum operation speed. For that purpose, we repeated the experiment for different slow-down factors of the event timings of the input stimulus sequence while, at the same time, we applied the same factor to theConvNet

Figure 6.3.Snapshots of merging the input stimulus with the SpiNNaker classifier output.

The input stimulus was generated with a 100 slow-down factor over real recording time.

Figure 6.4.(a) Recognition rate for the sequence of 40 card symbols versus the slow- down factor of the input stimulus. (b) Total number of output events generated by the output recognition layer for the whole sequence of 40 card symbols versus the slow- down factor of the input stimulus.

timing parameters. We repeated the classification of each test sequence 30 times, measuring the classification success rate as explained above. Figure 6.4(a) shows the mean, maximum and minimum recognition success rates obtained for slow down factors [1, 2, 5, 15, 20, 25, 30, 50, 100, 200]. A1×slow-down factor means

real-time operation, that is, classification of the sequence of the high-speed browsed cards as they each pass in a 400 microsecond interval. We can observe that for slow- down factors higher than 25, the mean successful classification rate is higher than 90%. However, for slow-down factors lower than 25, the success recognition rate suffers from a severe degradation. In Figure6.4(b), we have plotted the total number of output events generated at the output of theSpiNNakerclassifier as a function of the input stimulus slow-down factor. We can observe that for slow-down factors below 50, the number ofSpiNNakeroutput events decreases quickly. Another observation is that there is a local peak in the recognition rate (and the corresponding number of output events) at a slow-down factor of 10. For higher slow-down factors (slow-down factors of 15 and 20), the recognition rate and the number of output events in the category layer are lower.

Going into the details of the problem, we observed that the main bottleneck that limits the operation of the system is the processing time of the events in the convolution layers. We have also observed that when events are unevenly lost in subsequent layers, the spatio-temporal congruence of the recognised patterns is lost and the recognition rate decreases. This phenomenon has already been reported by Camuñas [31] who observed that queuing events in a highly saturated event processing system gives a worse performance than simply dropping them, because queuing introduces time delays, while dropping keeps the temporal coherence of the processed events. In the present case, when events are lost simultaneously in the different processing layers, the performance is better than when there is a layer that has a dominant delay. This explains the lower recognition accuracy for intermediate slow-down factors.

In Figure6.2, we show in red numbers the total number of events that enter into the corresponding layer that have to be processed by each feature map. It can be observed that each neural population in the second convolutional layer (C3) has to process 4.7×more events/second than the first convolution layer (C1). As we have tried to maximise the number of neurons implemented perSpiNNakercore to the maximum that can be allocated, this has the downside that each core in layer C3 has to process the incoming 816,163 events in 0.95 seconds for real-time operation.

As theSpiNNaker architecture is flexible, it allows us to trade-off the maximum number of neurons per core against the maximum event processing throughput.

In a first experiment, we noticed that more than one half of the weights in the first convolution layer (C1) were zero. Zero weights in the kernel add computation time per event but do not affect the result of the computation. So, we eliminated all the zero values of the kernels in the first convolution layer. In Figure6.5, we have plotted in blue the recognition rate of the original experiment (before eliminating the zero elements in the C1 kernels) and in the green trace, we plot the recognition rate after eliminating the zero values in the C1 kernels. It can be observed that

Figure 6.5.Recognition accuracy for the whole sequence of 40 card symbols versus the slow-down factor of the input stimulus when splitting each C3 neural population among several cores.

both systems perform similarly for low and high slow-down factors. However, the

‘optimised’ system has worse performance for intermediate slow-down factors. The reason is that by speeding-up the operation of the first C1 convolution layer, we obtain more decorrelation between the first and second convolutional layer (C3), as the second convolutional layer (C3) is the one causing the performance bottleneck in this particular case.

In a further experiment, to speed-up the operation of the second convolutional layer (C3), we mapped each neural population of layer C3 onto different SpiNNakercores. Figure6.5plots the recognition rates obtained for different dis- tributions of the feature map populations of the second convolutional layer (C3).

In these experiments, we kept the elimination of the zero kernel elements in the C1 layer. In Figure6.5, the red trace corresponds to splitting each C3 feature map operation across 2 cores. The cyan, black and magenta traces correspond to splitting each C3 feature map across 4, 5 and 6 cores, respectively. As can be observed, the 4-core division gives the optimum performance as it equalises the delays of the different layers. For further speeding up the C3 layer, the delay of the third convolutional layer (C5) becomes dominant.

6.3 Handwritten Digit Recognition with Spiking DBNs

Dalam dokumen SPINNAKER (Halaman 189-192)