EXPLOITING THE
INHERENT
PARALLELISMS OF BACK-PROPAGATIONNEURAL
NETWORKS TO DESIGN A SYSTOLIC ARRAYJai-Hoon Chung, Hyunsoo Yoon. and Seung R p d M a w Department of Computer Science
Korea Advanced Institute
of
Science and Technology (KAIsT) 373-1 Kusong-Dong, Ywung-GvTaejon 305- 701, Korea
ABSTRACT
I n this paper, a rwo-dimensional systolic array for back-propagation neural network is presented. The design is based on the classical systolic algorithm of matrix-by-wctor multiplication, and exploits the inherent parallelisms of back-propogcrtion neural networks. This design executes the forward and bockward passes in parallel, and a- ploits the pipelined parallelkm of multiple patterns in each pass. The estimated peerfornuurce ofthis des& shows
that the pipelining of multiple patterns is an important factor in V U 1 neural network implementations.
1. Intmductioo
As simulations of artificial neural networks (ANNs) a~ time consuming on a sequential computer, dedicated BT-
chitectures that exploit the padelisms inherent in the A " s are required to implement these netwds. A systol- ic array Wung821 is one of the best solutions to these problems. It can ovemme the communicatbn problems generated by the highly interconnected nemns. and can exploit the massive parallelism inherent in the problem.
M m v a since the computation of A N N s can be represented by a series of matrix-by-vector multiplications. the classical systolic algorithms cmbe used to implement them.
There have been several research efforts on systolic algorithms and systolic array architectures to implement ANNs. The approaches can be classified into three groups. One is mapping the systolic algorithms for ANNs onto parallel computers such as Warp, MasPar MF-1, and Transputer arrays. another is designing programmable systolic arrays for general ANN models, and the other is designing a VLSI systolic array dedicated to a specific model. All of these implementations exploit the spatial parallelism and training pattern paraZleZism inherent in ANNs. and suggest the systolic ring array or two-dimensional array structures.
The back-propagation neural network mume86l, in addition to the above two types of parallelism, has one more parallel aspect that the forward and backward passes can be executed in parallel with pipeling of multiple pat- terns. This paper proposes a systolic design dedicated to backpropagation neural network that can exploit this type of piuallelism.
In the following sections, the existing systolic implementations and inherent parallelisms that we can exploit are first reviewed. Then a systolic array that can exploit the forwatd/backwad pipelining of multiple pattems is presented. F i y the potential performance of this design is analyzed.
2. Background
The Back-Propagation Model
Let US consider an L -layer (layer 1 to layer L) network consisting of Nk neurons at the k-th layer. Layer 1 is the input layer, layer L is the output layer, and layers 2 to L-1 are the hidden layers. The operations of the back- propagation model are represented by following equations.
The Forward Pass
The weight Incranent update
N
P Pa=
E
=E
E, = (y#)-
dH)z < aInhuent Parpuelimrs
l e l h which a n bewploited by executing the ope” quired ineach l a y a m many p c e s s m . The spatial
’Ibueare thne types OfplarPUelipmsinhelWltinback-prapaeation neural netwalcn nle first is thespcrrialpcual-
S1) Wrtitioningthenavonsofeachlayerintogroups
S2) pamdoningtheope” ofcach llmron intosumof prodwxs operationsand the non-linearfunction S3) partitioniae thewun dproaucrP opaatiarp into several groups.
’Ihese pdtkmedopaetiOnS can be executedon diffaent proctssors in parallel. Most of neural network simula- tcrs implwneAted compltcrs exploit parallelism Sl) and S2). but pdlelism S3) is not exploited much d u e t o t h e a m m u “ ovemead
The stcond type of parallelism is the puttern parallelism which can be exploited by executing the different pat-
P1) partitioniae the training pattean set, independent execution on many processors, and epoch update
€2) Pipelining of multiple training paaeans P3) Pipelining of multiple pattan recalls
T h e p ” P2) can be exploited by allowing some processors that completed the parts of operations required to execute training for a paaem to srart mhing for the next training ~#utern while the other processors are pro- cessing for the pviuus aaining paems presented. The parallelism F9) can be exploited by allowing the pipelin- ing of multiple pattern recalls.
The third type of pnrallelism is the fomwd/&ckwurd pipfined parallelism which can be exploited by executing the forward and backward jmsses of different training paeans in parallel [Sin@]. This pppauelism implies the exploitation of the padewp2). The depth of pipelining can be maximized to two times thenumber of neurons in the network. and it is effective if the size dthe mining set is much larga than the number of neurons in the nehkrok The forward/backward pipelined parallelism can be classifred accodhg to the depth of pipelining as follows:
F1) Pass-by-pas pipelining F2) Layer*-lay= pipelining
maybe classified into thrae levekpccordiag tothedegme o f p ” cxploih”
terns on many p”X. The paaean parallelism can be classified as follows:
[pomc88,
Mill89. s i 1
n)
N-bY-”euro”PiPe?iningM) C Q M U X h l ~ y ~ ~ t l ~ plpehhg
2205
Systolic Implementations of Artificial Neural Networks
There have been several systolic implementations of artificial neural networks. which can be classified as follows:
1) Mapping systolic algorithms for ne& networks onto paralael computers WandO, W89. Pome-88. Chin901 2) Design of a programmable systolic array for general n dnetworks [Kung88. Rama901
3) Design of a VLSI systolic array dedicated to a specific neural network model Play89. Blay90, KwmWI
‘Ihese also can be classified according to the dimension of their &tant systolic array. lhese implementations aresommanzed in Table 1. As shown in Table 1, the two-dimensional approaches can exploit more parallelisms.
Our approach is distinguished from the other approaches in that it exploits the. f o ” & pipelined @- lelism described above.
Table 1. Systolic Implementations of Artificial Neural Networks
__.-
Slap & L e h ” BlaY901 Kwsn & Tsang [ K w d l cldrm et al.
[ M d l
I
Kuo. et al. I Back I SPatialS1) I SandYB
Hopfdd Spatial SI), SZ), S3) Dedicated
Kohmen Patternp3) VLSI
Back Spatial SI). W S 3 ) Ropsgation PaaemP3)
Bedr S ~ t i a l $11 S2). S3) MasPar MP-1
.
I I I I
Updating the Weights
The issue of the weight updating time has been controversial with contention W e e n researchers who update net- work weights continuously after each training pattern is presented (continuow updating) and those who update weights only after some subset, or often after the entire set, of training patterns (batch updaring). To exploit the tmhg pattern pipelining. we use the batch updating method because the weights must not be changed for the for- ward pass and the backward pass of a pattean as shown in Equations (1)-(6), although the systolic army proposed in this paper can support both of the weight updating methods.
3. A Systolic Array for Back-Propagation Neural Network
The computation of &cial neural networks can be represented by a series of matrix-by-vector multiplications interleaved with the non-linear activation functions. The matrix represents the weights associated to the connec- tions between two adjacent layers, and the vector represents the input patterns or the outputs of a hidden layer in the forward pass. or the e m pmpagated in the backward pass. The resultant vector of a matrix-by-vector multi- plication is applied to the non-linear activation function and to the next layer.
To exploit the spatial parallelism, we do not consider the one-dimensional arrays. And to efficiently organize the systolic array even for a network in which the numbers of neurons of the adjacent layers are much different, we didn’t use the systolic algorithms poposed in [Kung88, KatoW].
Array OrgaIliZption
In F i i Ne). the systolic array on between twoad'acent layers is shown.
one
layercoasists o f 5 new Weightvahand tbeweightinaanent itwowed tothecannectron bemealne~iofalayerandnevronjof mu and the otha layer consists%%%I!.
Each basic ceh wii is a computational element which contains theOpaations as shown UI E ~ ~ ~ I o M (1.2) and (3.2) and
can be executed in parallel arc shown in Figun
lo).
Activ.tiOn W J
NW NE
Error
back-
. . ,,''
WN Npropagated
a"
.
ypJ ws Se f? ES
-
NW X W + WSSW
-
NWSE
-
W N X W + NEEN
-
W N(b)
Figure 1. Systolic array qanizdon between two adjacent layers of multi-layex neural networks: (a) combined design for forward and backward passes; (a) M cell opemiom.
Inputpattems ir ir
.
b . .
k . . .
lation
L
*
DesiicdoutputsRgure 2. An example of a systolic array organization for (5-3-2) 3-layer neural network.
In Figure 2. an example of a systolic array aganization for (5-3-2) 3-layer network is shown. The elements of two weight mauiX are set into the two dimensional arrays. the 3-by-5 weight matrix represents
de
weights associ- ated to the connections between the input layer and the hidden layer, and the 2-by-3 weight matrix represents the weights Bssociated to the connections between the hidden layer and the output layer. The second matrix is tran- sposed in the figure. W i l e the weights are updated, the feeding of training pattems are stopped, and the pipeline1s stalled.
Basic Cell Architecture
The intend path of the basic cell is shown in Figure 3. It consists of three multipliers. three adden, two re- a t e m for weight value and weight increment. The communication clock cycle can be two times fast than the computation clock cycle, and the g unit sends the value at idle cy&.
Figure 3. Internal data path of the basic cell 4. Performance Evaluation
Let us consider an L -layer network consisting of Nk neurons at the k-th layer. The required cycles. C , for for- ward and backward passes of this design are denoted by Equation (8):
C f . - = $ N i + L - 2 ,=
When the forward and backward passes are pipelined, the required cycles for a single training pattem are denoted by Equation (9):
C,i+ =
(cje,-,i
+ Ck-1- NL + 1 (9)Equation (9) shows only the effects of the spatial parallelism and the forwardbackward pipelining. The effects of the forward/backward pipelining on performance is not great even though NL is much greater than 1. However it makes the pipelining of multiple training patterns possible. When multiple pattems m pipelined. the required cycles for learning p training pattems are denoted by Equation (10):
c,
=C,k#l. +(p-
1) (10)The speedup. S,. by the pipelining of p training paaems is denoted by Equation (11). When p >> (&, 1.. the speedup can be approximated to C+l. that is the depth of the pipelining and dependent only on the network size.
For u w g the weights, the cycles that feeding of training paatterns is stopped are denoted by Equation (12):
C- = 2
$
Ni - N I-
N z + L+
1 (12)I =
When we the weight t times during presentation o f p paaems. the tMal required cycles m deMted by
-0
).c
The performancc of neurocomputers is
be the number of connections of the target neural network, in MCUPS then the (mUwns MCUPS is calculated of connection updates by Fipation (14): per second). Let N
In this &sign, the basic clock cycle time is tip- eoneaddition,
speed "Uunmts ofsevaal lngb
' as the maximum timeamong the time by one mul- time for 2tZd- activation function
(w
-). an=time requiredneural netwak imprUnentationS have bten performed for NET- wth 100 nsec of ,248 M& caa beobtluacd%iTtak fora single eaining pattan. only by the spatial parallelism and the forwad/baclrward pipelinin& and for (128-128-128) t h d a y a network [Sing901 with 64K lraining patterns pipehed. the pe&mawe that can be obtained is 324,000 MCUPS.
to feed thelnput pattems conanuously.
Assumin we implanent .this
5. Concluskm
In this paper. a systolic array for back-pmpagarion neural networL that executes the forward and backwad passes in parallel. and exploits the pipelining of multiple pattans in each
pass
has been presented. The estimated perfor- mance of this design shows that the pipelining of multiple paaans IS an important factor in VLSI implementations of artificial neural networks. To inplemMt a large network with this design. an efficient mapping method that still allows the exploitation of the pipelining is required as a huther study.6. References
F. Blayo and P. Hurat, "A VLSI Systolic Array Dedicated to Hopfield Neural Network," VLSI for Artzjkial Intelligcncc. Kluwex Academic Press, 1989, pp.255-264.
F. Slap and C. Lehmann, "A Systolic Implementation of the Self Organization Algorithm," Proc.
INNC. Vo1.H. Paris, July, 1990.
Chinn, et al.. "Systolic Array Implementations of Neural Nets on the Maspar MP-1 Massively Parallel
Processor."
Proc.B. Fipre and*G.
Mazare.
qmplementation of Back-Fmpagarion on a VLSI Asynchronous Cellular ma"A Parallel NeuC0ca"puter l., Architecture Towards Billion Connection Updates Per Second," Proc. ofIJCNN, Vol.II. W e g t o n . D.C..January. 1sqO. pp.51-54.H.K. Kwan and P.C. Tsang. "Systohc Implementatm of Mulh-Laya Feed-Forward Neural Net- work with Back-- Learning Scheme." Proc. of the IJCNN, VoI.II, Washington, D.C..
January. 1990. pp.84-87.
H.T. Kung, "Why Syst~lic Architectu~~?." IEEE Cou~uter. January 1982.
S.Y. Kung aRd JN. Hwang, "Padel Architectures for Artificial N d k e t s , " Proc. ofICNN, Volfl, San Diego, California. July 1988, pp.16S172.
R. Mann and S. Haykin. "A Parallel Implaneneetion of Kohmn Feature Maps on the Warp Systolic Com er," Proc. o IJCNN, VoLII, Washin
J.R b a n and P. Lfidl, "Learning by B&%opagation: A Systolic Algorithm and Its Transputer Implementation," Neural Networks. Vol.1. No.3. July 1989. pp.119-137.
D.+ pomalcsll, et. al., " N e u r a l " n k SimuttiOaat Warp SpeSa: How WeGot 17 M i C o n - ncchons P a S" Proc. NICNN, V d J I , San Diego. Califcnnia, July 1988. pp.143-150.
U. Ranwhex and J. Beichta. "Systolic Synthesis of Neural Networks." Proc. of INNC, Vol.11. Paris.
DE: Rum& G.E. Hinran. and RJ. Williams. "Learning In"I Repmentations by Error Fro- Pagation." Parallel Distributed Processin
.
Explorations in the Microstructures of Cognition, V01.1, Formdcuionr, MlTpnsS. 1986, pp.318-36%.TJ. Sejnowski and C.R. Rosenberg, " W e 1 Networks that Leam to Pmnounce English Text,"
A. Smger, "Exploiting the Inherent Parallelism of Artificial Neural Networks to Achieve 1300 Mil- lion Interconnects pa Second," Proc. of 1°C. Vo1.n. Paris. July. 1990. pp.656-660.
IJCNN, VoLII. San Diego. California, June, 1990, pp.169-173.
h o c . ofINNC, VoLII. Ppris July. 1990. pp.631434.
. 3 7 4 .
n. D.C.. January, 1990. pp.84-87.
July 1990 572-576.
Comqlu S y ~ t ~ m r , VOLl, 1987, pp.145-168.