Simulation of Artificial Neural Networks. Connectionist modelling - generic features - silicon optical molecular

(1)

Simulation of Articial Neural Networks

1 Gerd Kock and Nikola B. Serbedzija

GMD FIRST Berlin

Rudower Chaussee 5, D{12489 Berlin

Abstract

The purpose of this paper is to give a structured overview of the current techniques used to simulate articial neural networks. To illustrate the variety and the complexity of problems that occur, rstly a short survey of articial neural networks is presented. Then, various simulation approaches are explained, from implementations of specic network models on general purpose parallel machines through architectural emulations that mimic neurobehaviour in hardware to comprehensive neurosimulators that oer comfortable environments for neuroprogramming. Each approach is presented through its rationales and is judged on its usefulness, generality, exibility, and eciency. The paper concludes with the summary of the results achieved so far and points out general directions and perspectives for future neurosimulations.

1 Introduction

In the last years much eort has been put into the development of articial neural networks or connectionist systems. These systems are inspired by biological neural systems, and characterised by "knowledge" which is distributed across a network of interconnected units and represented by modiable weighted connections. Articial neural networks do exhibit learning abilities by adaptation but, nevertheless, are very simplied abstractions of their biological counterparts.

Connectionist systems consist of a large number of simple and mutually independent processing units which solve a given task by exchanging inhibitory or excitatory signals via some interconnec-tion pattern. In general such networks are not programmed explicitly. Rather, they are trained by presenting examples and, in this, they adjust automatically to solve the given task. Therefore, the connectionist approach is dierent from classical symbol oriented data processing, where there is a strict separation between data and an explicit program that manipulates the data.

Connectionist systems are well suited for problems which are dicult to solve by traditional data processing (e.g. problems that are dicult to describe by mathematical or other formal means). Their use either is in the domain of associative memory (where information is retrieved by part of its content or according to relations among its constituents) or in the domain of learning systems (where a problem is solved by learning through examples, with learning being either supervised or unsupervised). The typical neurosolution goes like this: (1) Submit a number of examples, (2) let the system learn about the problem, (3) given a new example - oer a solution through generalization. Such a solution avoids problem specication in the sense of the conventional approaches and does not require programming. There are successful applications in domains like character, speech and image recognition, signal analysis, process control, robot kinematics, medical diagnosis, time series prediction, and nancial systems. Oering an alternative way to solve problems, articial neural networks do not aim to substitute traditional digital techniques, but to complement them.

Due to the popularity of articial neural networks, there is an explosion of simulation sys-tems whose number is growing on a daily basis. They range from simple programs that solve a certain problem using a particular model to general purpose simulation systems. Underlying software/hardware support varies from standard programming languages (Fortran, C) to special purpose neural network specication languages, running on PCs, workstations, parallel machines or neurocomputers.

Instead of trying to discuss all results in the domain of articial neural network simulation one by one, main streams, their basic principles, and their main achievements will be pointed out. Consequently, this paper will not give a cross reference of all existing neurosimulators. Rather it will present a critical and structured survey of methods, techniques and tools which have been used

(2)

silicon optical molecular control parallel data parallel Simulation tools module libraries NN description languages Simulation of Artificial Neural Networks

Parallel implementation Hardware emulation Connectionist modelling

generic features

-graphic-menu systems

Figure 1: Approaches to simulate articial neural networks

in articial neural network simulation. In this, however, specic examples will be used to illustrate the argumentation. The text can be useful for both, developers of new simulation systems and users of existing systems.

Before any simulation is made, a model has to be dened which emphasizes the generic features of the phenomena to be simulated. Then, according to the abstraction level of the model and the techniques going to be used dierent simulations can be done. In Figure 1 three main streams in articial neural network simulation are indicated. The major goal of the rst eort is to use potential parallelism oered by existing general-purpose parallel computers to achieve better performance in neurosimulation. The generic neural model either is seen as a collection of independent processing units and control parallelism is applied or units with their interconnections are represented by weight matrices and data parallel techniques are used to optimize neuroprocessing. Hardware approaches to emulate neurons are characterized by the development of either general purpose neurocomputers that can support various connectionist models or by development of special-purpose neurochips that, when interconnected, can mimic a particular neuromodel. Several architectural techniques have been used to construct hardware which directly supports neuroprocessing. They are based on silicon, optical or biological technology. Finally, the application of software engineering principles in the area of neurosimulation leads to comprehensive tools for network description, implementation, execution and analysis. Neurosimulators are usually implemented on PCs, workstations or special neurocomputers. There exists no strict separation among these three streams in articial neural network simulation. Many approaches actually combine both software and hardware methods to achieve a better simulation.

The rest of the paper is structured as follows. In section 2 the functionality of connectionist systems is outlined and some network models are pointed out. Section 3 presents various parallel implementation techniques that successfully implement particular connectionist models on general purpose parallel machines. Section 4 is devoted to the eorts to construct neurocomputers which

(3)

Associative Networks Mapping Networks

Feedforward Networks

Backpropagation Radial Basis Functions

Kohonen Map Binary Associative Recurrent Backpropagation Neocognitron . . . Memory . . . .

and Hierarchical Networks Artificial Neural Networks

Spatiotemporal, Stochastic,

Self-Organizing Systems

Hopfield Network

Neural-Gas Boltzmann Machine

Figure 2: Taxonomy of neural network models

are able to emulate neural networks. Section 5 surveys the present neurosimulators, characterised by a set of tools for the development and execution of neural network models. The conclusion summarizes the results in neurosimulation and indicates directions of further research.

2 Articial Neural Networks

Inspired from biology, the basic principle of articial neural networks is to solve a problem via cooperation of a large number of simple processing units (neurons)n1;:::;nN. Each unit has several

inputs and an output value yi. This, for example, can be a bipolar value

f?1;+1g, a binary value f0;1g, or a real value from [?1;1] or [0;1]. The processing units are connected via unidirectional

links. These links are of dierent strengths which is represented by positive or negative weights

wij, representing the link from nj to ni. When two units are not interconnected, this can be

represented by a weight wij = 0. The neurons are often organised in layers representing a rather

regular connection scheme. The interconnection structure of a network is called its topology. Usually, each neuron ni has a net input neti =

P N j=1w

ijyj which is the weighted sum of all

signals impinging onni. The activationaiof neuronni then is computed by an activation function

Fi: ai=Fi(neti). Common activation functions are the sigmoid functionx

7!1=(1 + exp(?x)) or

the sign function x7!1 forx0, else x7!?1. In general, the output valuey

i of neuronni is a

transformation of the activationai, which is done by an output function. However, very often this

output function is the identical mappingx7!x. Therefore, in the following text, activationa iand

outputyi will be identied.

In some network models in which neurons are organised in layers, their application | i.e. their use after the weights have been adjusted | consists of activating each neuron only once, starting with the input layer and proceeding to the output layer. In other network models the activation functions are executed until a xed point is reached, i.e. until there is no more alternation when

(4)

input layer hidden layer output layer ... ... ... y1 i;i=1;::I y 2 j; j=1;::H y 3 k ;k =1;::O xJ x2 x1 r r r -y3 O y3 2 y3 1 - 3 ? ? ? ? ? ? ? ? 1 Q Q Q Q Q Q Q Q s P P P P P P P P q @ @ @ @ @ @ @ @ R 3 ? ? ? ? ? ? ? ? 1 Q Q Q Q Q Q Q Q s P P P P P P P P q @ @ @ @ @ @ @ @ R

Figure 3: Feedforward network example computingyi=Fi(neti).

With respect to a given problem the art of designing a connectionist solution consists of speci-fying a suitable topology, choosing suitable activation functions, and determining suitable weights. The latter is done using so called learning algorithms, a variety of which has been developed. Dif-ferent choices for the mentioned parameters have lead to a vast number of network models which cannot be compared with each other easily. There are also models which are driven randomly or which have dynamically developing topologies. Figure 2 presents a taxonomy of neural network models. To give an impression of the variety of models the so called backpropagation network, the Hopeld network, and the Kohonen network are outlined. These networks are specic examples of feedforward networks, associative memories, or self{organizing maps, respectively.

2.1 Feedforward Networks

Feedforward networks are so called heteroassociative networks. For a given set of training data

f(x;t)g they are able to learn the mapping x 7! t. In this, input x = (x

1;:::;xI) and target

t = (t1;:::;tO) usually are vectors. In general, learning this mapping is not the only goal of

training. Another goal is that the network is able to \generalize", i.e. after training it should be able to map an input x, which has not yet been seen so far, in a \sensible" way.

A feedforward network with one hidden layer consists of a total of three layers: beside the hidden layer there is an input and an output layer (see Figure 3). The input layer hasI neurons with activations y1

1;:::;y 1

I, the hidden layer has H neurons with activationsy 2 1;:::;y

2

H, and the

output layer hasOneurons with activationsy3 1;:::;y

3

O. All neurons of the input layer are connected

to the hidden layer neurons (w21

ij;i= 1;:::;H;j= 1;:::;I) and all neurons of the hidden layer are

connected to the output layer neurons (w32

ij;i= 1;:::;O;j= 1;:::;H).

The only task of the input layer is to present an inputx= (x1;:::;xI), i.e.y 1 i =x

i;i= 1;:::;I.

The net input of the hidden layer isnet2 i = P I j=1w 21 ijy 1

j;i= 1;:::;H. Often, the activation function

of the hidden layer neurons is the sigmoid function, i.e. y2

i = 1=(1 + exp( ?net

2

i));i = 1;:::;H.

Finally, the net input of the output layer neurons is net3 i = P H j=1w 32 ijy 2 j;i= 1;:::;O. Sometimes

this net input is transformed by an activation function, but often it is the immediate output:

y3 i =net

3

i;i= 1;:::;O.

A simple application or recall of a feedforward network consists of computing the network output

y = (y3 1;:::;y

3

O) for a given input x= (x

(5)

::: x1 x2 xN r r r y1 y2 yN 6 66 6 66 6 66

Figure 4: Hopeld network example

can be applied like this, it has to be trained by presenting the example pairsf(x;t)g. In this, rst

of all the weights are initialized randomly. Then, for a given example input x = (x1;:::;xI), the

network outputy= (y3 1;:::;y

3

O) is compared with the target vectort= (t

1;:::;tO), and the weights

are modied in order to reduce the dierence between the network output and the target vector. An important learning algorithm is the so called backpropagation algorithm, which essentially is a gradient descent method minimizing the quadratic error measureP

O i=1(t i ?y 3 i) 2(seen as a function

of the weights). Learning can be done either in online or in batch mode. Within online learning, the weights are adjusted each time, a training example has been presented; within batch learning, weight adjustment takes place only after all training examples have been seen. In any case, usually the set of training data has to be presented many times. For this reason the training of a network is often very time{consuming. Once a network is trained, a recall is done in one forward pass. The backpropagation training algorithms is an example for supervised learning. As its name implies, the errors of the output neurons are propagated backward through the network such that the error contributions of the other neurons are computed successively.

Backpropagation networks are by far the most applied connectionist systems and there are a lot of variants with respect to the number of hidden layers, the degree of interconnection, learning al-gorithms etc. In principle, feedforward networks can approximate \any" function and this accounts for many elds of application. E.g. they are used for pattern recognition, time series analysis and prediction, data compression, control, etc.

2.2 Associative Memories

The so called Hopeld network is an example for an associative memory and exists in many variants. Here, a network aiding in storing and reconstructing patterns is outlined (see Figure 4). A pattern is represented by a vector ofN bipolar components 1. Accordingly, the network has N neurons

with bipolar output values y1;:::;yN. The neurons are fully interconnected: there are weightswij

for alli;j= 1;:::;N;i6=j. Initially, all weights are set to 0. For a given pattern (x

1;:::;xN) to be

stored the weights are modied according to the so called Hebbian learning law: wij =wij+xixj.

Of course, it is not possible to store any number of patterns: one has to pay attention to the capacity of a network. In the case that the occurrence of all patterns is equally probable a network ofN neurons is able to store about 0:138N patterns.

The procedure for reconstructing a noisy pattern (x1;:::;xN) is as follows. At rst, yi is set

to xi, i= 1;:::;N. Then one of the neurons is randomly chosen and the corresponding net input

neti = P j6=iw ijyj is computed. Ifneti 0 one setsy i = +1, otherwiseyi = ?1. The process of

(6)

... ... ::: ::: n1 nN xn x2 x1 r r r ... 7 * X X X X X X X X X X X X z Q Q Q Q s J J J J J ^ H H H H H H H H H H H H j : 3 J J J J J J ^ H H H H H H H H H H H H j : 3

Figure 5: Kohonen network example

randomly choosing a neuron and recomputing its activity is repeated until the network output is stable, i.e. until there is no more change in the activity of any neuron. It is guaranteed that this stable state will be reached, and in the case that not too many patterns have been stored, this state will be the reconstruction of the noisy pattern.

2.3 Self{Organizing Maps

Self{organizing maps are inspired by the cerebral cortex of the human brain. The cortex essentially is a large sheet consisting of six layers of neurons. With respect to dierent tasks, the dierent regions of the cortex can be viewed as ordered feature maps. For example, neighbouring neurons of the tonotopic map react on similar sound frequencies. The so called Kohonen network or Kohonen map is an attempt to utilize the biological principle for technical applications.

In the standard case, the Kohonen network consists of nodesni;i= 1;:::;N, which are organized

as a two{dimensional layer (see Figure 5). Each node has ann{dimensional weight vectorwi, where

each weight vector represents a specic constellation of the features. Initially, the weights are set randomly. The training of a Kohonen network is an example for an unsupervised learning algorithm. It uses a neighbourhood function which, for example, assigns numbers 0 and 1 to pairs of node indices: (i;j)2f0;1g. Number 1 is assigned to the indices of \neighbouring" nodes, and in the

other case 0 is assigned. During training, the notion of neighbourhood becomes more and more restrictive, and at the end(i;j) = 1 holds true only fori=j. When the next inputx= (x1;:::;xn)

is presented to the network, rst of all, the winner node ni

0 is determined. This is the node for

whichkw i

0

?xkkw i

?xkholds true for alli= 1;:::;N. In this,kkstands for the Euklidean

norm. In the second step, the weights are adapted: wi=wi+ (i

0;i) (x?w

i), where 0< <1.

I.e. the weight vector of the winner node, and the weight vectors of its neighbours are moved a step into the direction of the input vectorx. The size of the step depends on the learning rate. Like the neighbourhood, this rate is shrunk during the training process.

If everything works ne during training, the Kohonen map folds into the space from which the input feature vectors stem from. That means that for two given input vectors two neighbouring nodes are the winners, if and only if the two input vectors represent similar features. In this ideal case the Kohonen network is a topology{preserving map of the input data to the competitive units. A famous application of the Kohonen network has been its use as part of the phonetic typewriter which translates spoken Finnish language to written language. The use of a two{dimensional Kohonen map was to represent phonemes. It was trained with a 15{component spectral analyses

(7)

of spoken word sampled every 9.83 milliseconds. After training, the resulting map was calibrated by using the spectra of phonemes as input vectors.

Another application has been to establish a relation between the angles of a robot arm, and the position in space of the end eector. Here, the property of being a topology preserving map is essential for using the map in controlling robot arm movements.

2.4 Summary and Comments

The three network types mentioned above exist in many variants, there are other models not covered by these types, and there are networks combining dierent types. The Kohonen network, for example, can be used for vector quantization; another important method in this domain is the so called neural{gas network. A further example for a feedforward network is the so called RBF network (RBF: radial basis function). Here, the hidden units usually employ Gaussian activation functionsx7!exp((?kx?k=)

2); the centersof these functions often are determined by vector

quantization. Variants of the Hopeld networks often are used for solving optimization problems; in this, the weights are derived from an energy function which is to be minimized.

The goal of many methods in neural computation can be stated in terms of minimizing an error function or an energy function. The actual algorithm often is a gradient descent algorithm. To avoid local minima, however, other techniques are employed. Examples are probabilistic and evolutionary methods, both resulting in high computational needs.

Also, there are network topologies explicitly derived from the structure of problems. An example is the Neocognitron, where the analysis of an input eld (in the simplest case a letter or digit) is started by many local receptive elds looking for specic features in small parts of the input eld. These local features are, from layer to layer, gradually put together for further analysis.

The reader interested in more details about the networks discussed above, or interested in variants or other network models, may start with one of the following textbooks: practical oriented introductions are [8, 9, 15, 20], more theory can be found in [16], and [14, 59] almost have the character of compendia.

3 Parallel Implementation Techniques

The rst parallel implementations of articial neural networks were done on general purpose parallel machines. In the late eighties and early nineties, when most of the approaches have been exercised, the major challenge was to achieve maximal performance on the available parallel machines per-forming this non-trivial computation task. The most often reported work was in the domain of back-propagation networks, the most popular and widely applied model, and the most favorite machines were the Connection Machine, MasPar, systolic arrays, and transputers.

The common strategy shared by all parallel neuroimplementations on general-purpose parallel computers was to speed up the processing using intrinsic articial neural network features and spe-cial characteristics of the target architectures. Before the concrete examples are presented, some general remarks on parallel neurocomputing are given, including parallel decomposition, computa-tion and communicacomputa-tion demands, target architectures and performance.

Parallel decomposition

is the major challenge in any parallel application. In [39] a widely accepted categorization has been proposed, suggesting a number of structuring approaches for parallelizing feed-forward networks: training session parallelism, training example parallelism, layer parallelism, node parallelism, weight parallelism and bit parallelism. This categorization can be applied to many other neural models as well.

Training session parallelism means to train a given network simultaneously with dierent learn-ing parameters on the same trainlearn-ing examples. For this kind of structurlearn-ing one needs powerful computing nodes, such that dierent sessions can be placed on dierent processors.

Training example parallelism means simultaneous learning on dierent training examples within the same session, i.e. it implements batch learning. A given training set is split into a number of

(8)

subsets and the corresponding number of network instances are trained simultaneously. For each instance and for each training subset weight updates are accumulated. At the end, the accumulated weight updates are brought together and weights are changed. For this type of structuring the dierent training subsets are distributed on the dierent processors.

Layer parallelism, applicable to networks that contain more than one layer, provides concurrent computation for layers. Layers are organized into a pipeline such that several training examples are fed through the network and each layer works on a dierent training example improving overall throughput. To exploit this type of structuring the dierent layers are distributed on the dierent processors.

Node parallelism, applicable to all connectionist models, means parallel execution of units (nodes). All processing units perform weighted input summation and other computation in parallel. Weight parallelism further renes node parallelism allowing for simultaneous calculation of each weighted input. This form of structuring is also possible for most of connectionist models.

Bit serial parallelism is the nest division of processing, where each bit is processed in parallel. It is very much hardware dependent and can be combined with other forms of structuring.

Theoretically, any neural network model allows for parallel evaluation within nodes. That makes node and weight parallelism both, natural and the most often used structuring method within parallel implementations. Training example and layer parallelism are restricted to those models that allow for batch training, basically feed-forward networks with back-propagation learning.

Computation demands

are very high within neural simulations, specically in the learning phase, when weights are to be adjusted. A neuroprogram is often structured such that input or activation values are grouped as a vector and weights are arranged into a matrix re ecting the neurons' interconnections. Actual neuroprocessing is to multiply a vector with a corresponding matrix, and then, to apply the appropriate activation function to resulting elements forming another vector of activation values. That implies a need of ecient vector-matrix multiplication i.e. add-and-multiply operations. In many models, output values have to be compared and a maximum value has to be extracted, therefore there is a need for maximum nding and comparison operations.

Communication and synchronization

are key issues in parallel processing. Whatever struc-turing approach to parallelize a connectionist model is applied, one has to deal with massive data ow between processing elements. Although dierent approaches have dierent communication de-mands, there are a few communication patterns most commonly used within parallel architectures:

Broadcast

is a common way of propagating values within neural networks, i.e. a unit sends a value to all nodes that are connected to it. It is also the fastest way to propagate data in parallel architectures. Broadcast is often used together with centralized synchronization (typical SIMD architectures).

Circulation

is another ecient way to propagate values between processor arrays. It is specically convenient when processing is synchronized such that processors form a pipeline with data arriving regularly and being partially processed (systolic arrays).

General routing

is a technique that provides communication by message-passing, such that a message can be send from one to another processing element. General routing may cause an overhead because it requires extra processing for determining the shortest path for a message and for actual forwarding. The synchronization is usually explicit and is incorporated into send/receive mechanisms. This communication schema is typical for MIMD architectures. A communication strategy have to be carefully considered for each parallel decomposition and for each specic computer architecture. It should allow for ecient balance between computation and communication and should re ect the mutual dependence between algorithms and architecture. For example, broadcast can be exploited within architectures where numerous processors perform the same instruction on dierent data, communication by circulation should be used when an algorithm

(9)

Data-parallel Control-parallel

Layer/node

parallelization parallelization parallelization (transputers) (systolic arrays) (CM, MasPar) (CM)

Partitions

Virtual neurons Layer/node (hypercube) (transputers)

Node/training Node/weight Parallel Implementation Techniques

(general-purpose parallel computers)

Figure 6: Taxonomy of neurosimulations on general-purpose parallel computers

can be arranged for processing on systolic arrays and general-routing is optimal for architectures where dierent programs are executed on dierent processors.

Target architectures

are general purpose parallel machines which may be roughly divided, for the purpose of this paper, into two broad categories: data parallel and control parallel machines. Data parallel architectures process simultaneously a large set of dierent data using centralised (typical SIMD) or regular (e.g. pipelined) control ow. Control parallel architectures perform pro-cessing in a decentralised manner allowing dierent programs to be executed on dierent processors (typical MIMD).

Data parallel architectures are well known for their eciency in numerical computation and as neurocomputing requires fast vector-matrix calculation, these architectures are often used. To match existing solutions for fast multiply-and-add operations, the neuroproblem is usually presented in the following way:

1. take a vector of input values;

2. multiply each input with corresponding weights; 3. accumulate the weighted sum;

4. calculate the non-linear transfer function;

5. multi-cast the vector of output values to the destination units.

Whether weighted connections are kept in a global memory or are circulated (together with activa-tion values) among processor arrays is an architecture-dependent decision, but such an organisaactiva-tion of work appears to be very ecient and convenient for neurosimulations.

The dominant characteristic of control parallel architectures is that processing is organized around control ow providing execution of dierent threads on dierent processors. This approach is much more dicult to conceptualize because communication and synchronisation are explicit and parallelism can hardly be automatized (contrary to data parallel techniques where data dependences can be automatically determined). Common architectural properties of control parallel machines

(10)

are: a number of independent processors (usually more complex then in data-parallel machines) with local memory, high-band interconnection channels, absence of global memory and global clock, message-passing communication. The basic problem that have to be solved within control parallel implementation of neurosystems is a mapping between a great number of relatively simple processing units that have enormous number of interconnections to the architecture that usually have less processors and much less communication paths.

The two mentioned categories require signicantly dierent style of programming. Furthermore, within each category, there is a variety of architectural solutions with respect to the interconnection schema and control ow. Figure 6 presents a taxonomy of parallel techniques for neurosimulations on general-purpose parallel computers.

Performance

is an important issue within neuroimplementations. Since weighted-connection computation and weight updates are the most time-demanding operations, the following units of measure are commonly accepted for evaluation and comparison of neuroimplementations:

CPS

- Connection Per Second, measures how fast a network performs mapping from input to output. A number of connections that can be calculated in one second gives a fair picture of the performance of a certain network implementation on a concrete architecture. O course, there are other factors that can in uence such a measure, like precision used in calculations, or choice of the nonlinear activation function (which is implicitly included in the measure).

CUPS

- Connection Updates Per Second measures how fast a network learns. The previous mea-sure illustrates the speed of a system in the recall phase, but does not say anything about the learning phase, which is usually more time-critical as it includes not only evaluation but also the update of weights. Usually, CUPS is between a fth up to a half of the CPS.

EPS

- Epoch Per Second measures how often the training set can be performed per second. It is an alternative for CUPS and re ects the speed of learning. Here, an epoch is dened as the time needed for processing each example from the training set. Such a measure is very much "problem-oriented" and depends on a possibility to clearly dene epoches within a given training set.

A lot of data bases for neural network benchmarking (speed and generalization of learning algorithms etc.) have been established. However, still there are no standards, but there is some work going on in this eld [43]. For benchmarking the implementation performance, very often the NETtalk application [49] has been used. NETtalk uses a backpropagation network to translate a text into phonemes. A network has 29 input units (26 for English letters and 3 for punctation characters) 60 to 120 hidden units and 21 output units (representing dierent phonemes).

3.1 Simulation on the Connection Machine

The Connection Machine is a massively parallel computer with up to 64K processors and 64K or 256K bits of local RAM. Processors are connected in a cube topology, which permits ecient n-dimensional grid communication. Local neighborhood communication is additionally supported and for arbitrarily message-sending, a general routing is used. The CM-2 uses a conventional computer such as VAX or SUN as a front-end machine. Having numerous simple processors with ecient connection abilities, CM used to be one of the most popular architectures for implementation of articial neural networks [46, 50, 58].

Node-and-training set parallelism

is a combination of the two structuring techniques and is the most popular technique used for parallel implementation of backpropagation networks. Zhang et al. [58] were among the rst to show how node and training set parallelism (i.e. batch learning) can be combined. They organized the parallel implementation of a backpropagation network so that one processor is used to store a node from each of the layers of a network. With such a strategy "a slice"

(11)

of nodes (one from each of the layers) is placed on a single processor. The number of processors needed to store a network is equal to the number of nodes in the largest layer of the network. The weights are stored in a memory structure that is shared by groups of 32 processors (re ecting the CM specic architecture which allows a 32-bit number to be stored across 32 processors, all sharing a oating point unit and ecient local memory access). Having 64K processors, the CM is a perfect candidate for training example parallelism. Authors called that replication, as they replicate networks to make full use of the machine (e.g. if n is a number of nodes in the largest layer, then m is a number of replication such that (nm)64K).

The simulation has three phases: forward-pass, backward-pass and weight-update. Each phase uses circular-rotate for interprocessor communication. Due to the memory sharing, described above, both memory saving and speed increase are achieved.

Authors have tested backpropagation network of dierent sizes using NETtalk as a benchmark. Peak performance achieved in a training phase is 40 MCUPS and forward pass performs 180 MCPS.

Node and weight parallelism

is best illustrated in the Rosenberg and Blelloch's implementation [46] of a backpropagation network. They organized CM-processors into a one-dimensional array such that one processor is allocated for a node and two processors are allocated for each weight, one for the output and one for the input side of a connection. The forward pass is implemented by spreading the activation value to the processors that hold weights for outgoing connections of each node. The outgoing-weight processors multiply the value with their respective weights and forward the products to corresponding input processors, such that products are incrementally added at the destinations (incoming-weights processors). After the sums are accumulated and sent to the node processors, sigmoid operation is performed and new activation values (for the next layer) are prepared. The backpropagation phase is performed in a similar way.

The implementation was tested on a NETtalk and the maximum speed of 13 MCUPS was achieved.

3.2 Simulation on MasPar

The MasPar computer is a typical SIMD machine with up to 16K processing elements organized in an array. Processing elements are connected with a 3-stage crossbar, controlled by a router which provides up to 1K simultaneous connections. For local communication, there is a 2-dimensional network which connects each processing elements with its neighbors. Each processor has forty 32-bit registers, a 4-32-bit integer ALU, oating-point unit, and a 4-32-bit broadcast bus. The whole system is controlled by the central control unit which is connected to the Unix subsystem.

The parallel implementation on a MasPar [56] for a backpropagation network is similar to the Zhang's implementation on the connection machine. The network is placed on an array of processors such that each processors contains a vertical set of neurons (one neuron from each layer) making a total number of processors equal to the number of neurons in the largest layer. Each processor stores weights of corresponding neurons in its local memory. In the forward phase, the weighted sum is evaluated with intermediate results rotated from right to left (using MasPar's local interconnection schema). Once the input value is evaluated, a sigmoid activation function is applied and the procedure is repeated for the next layer. In the backward phase, a similar procedure is performed from output layer down to the input layer.

The described implementation exploits node parallelism. Similar to the Zhang's implementa-tion, as the MasPar computer has much more processors than an average neural net has nodes per layer, the node parallelism can be combined with the training example parallelism (i.e. batch learn-ing). That was achieved by placing multiple copies of the same network on available processors. It was particularly convenient to exploit the two-dimensional connection schema of the MasPar computer such that one instance of the network is placed along one dimension and network copies are duplicated along the other dimension. The batch learning was performed by rst accumulating the weights changes within each copy of a network and then synchronously updating the weights.

(12)

Due to the high number of processors and a careful use of architectural advantages, the MasPar computer is one of the most ecient host for neurosimulations. Maximal performance obtained on MasPar, measured on a NETtalk benchmark with the backpropagation model using 203 input, 60 hidden and 26 output neurons, has been 176 MCUPS in the recall phase and 42 MCUPS in the learning phase.

3.3 Simulation on Systolic Arrays

Systolic arrays are specic hardware architectures that aim at mapping high-level computation directly into hardware. Numerous simple processors are arranged in one or more dimensional arrays performing simple operations in a pipelined fashion. Communication is arranged such that data arrive at regular time intervals from (possibly) dierent directions when they are processed and pipelined for further processing. A major eciency gain is achieved through pipelining, which signicantly reduces memory access and improve overall throughput. Nevertheless, implementations on systolic arrays, though ecient, are restricted to certain class of problems and are very much hardware dependent.

In [41] authors used the Warp computer with 10 processors organised into a "systolic ring". In the forward phase the activation values are shifted circularly along the processor array and are multiplied by corresponding weights. Each processor accumulate the partial weighted sum. When the sum is evaluated, the activation function is performed. For a backward phase, the processing is similar, only instead of activation values, accumulated errors are shifted circularly. The performance obtained for a NETtalk application was 17 MCUPS.

3.4 Simulation on Transputers

A Transputer system is typical representative of a MIMD machine. It consists of a number of trans-puters, 32-bit RISC architecture stack-machines with up to 4KB RAM with a maximum access rate of 80MBytes/sec. Each transputer has four bidirectional communication links supported by DMA working in parallel with the CPU, operated with 1.8MBaud in both directions. Special hardware support for communication, process switching and oating-point calculation make Transputer an ecient and popular parallel machine. Numerous neuroimplementations on transputers [51, 35] have been reported.

One problem that neuroimplementations have to deal with is to match the usually high num-ber of neurons onto the relatively small numnum-ber of transputers. Some approaches provide support for virtual neurons and place several virtual neurons on the same transputer, other divide neu-rocalculations into subtasks and place dierent tasks on dierent machines. In both approaches, communication eciency remains the biggest problem, since communication requires extra process-ing for handlprocess-ing message-routprocess-ing.

In [51] authors describe a back-propagation implementation on the transputer T8000, consisting of up to seven transputers (more transputers can be added). They divided the transputer system logically to one master and many slaves. The master machine controls computation and maintains global tables of errors (needed for backpropagation) and the state of neurons. Each slave is allocated a package (sub-task) to work on and is synchronised by the master. A multi-layer feedforward network is "vertically divided" such that each slave contains a fragment of nodes from each layer. Computation is synchronised from the master such that layers are computed in a sequence. A number of slaves represents a number of parallel activities that perform calculations per layer. Each slave maintains a local table of errors and states and receives other global information from the master, keeping the system consistent. Communication with the master is a major bottleneck within this implementation, as due to the transputer interconnection schema, some slaves are not directly connected to the master.

The authors did not report on which neuroapplication they have tested their implementation, but they claim performance improvements with higher number of transputers indicating their peak performance, achieved with 7 transputers, as 58.2 KCUPS for a smaller network (3-30-30-1=1051

(13)

interconnections) and 207 KCUPS for a larger network (3-150-150-1 = 23251 interconnections). Better performance for larger network is due to the fact that the ratio computation vs. commu-nication is much better for bigger networks (though commucommu-nication time increases slightly for the bigger network, slaves are much better utilised).

3.5 Summary and Comments

The signicance gained by parallel neuroimplementations on general-purpose parallel computers is twofold: rstly, a number of parallel techniques have been discovered and secondly, architectural features, convenient for neuroprocessing, have been outlined. This experience was crucial for the development of neurohardware.

According to performance gures, the data parallel machines showed the best results. Node-and-training set and node-and-weight parallelism seem to be the most appropriate parallel techniques. Processor arrays organized in either SIMD manner with broadcast communication or systolic arrays with circular communication oer the most ecient platforms for neurosimulations. On the other hand, control-parallel architectures were not as successful hosts for neurosimulations. Straightfor-ward implementations of neurons as independent processing units on MIMD architectures require too much of overhead. It is often the case that MIMD architectures were used in a data-parallel manner [17, 24] to gain better performance and avoid problems of communication overhead and topology mappings.

There are not many survey papers dedicated solely to simulations of articial neural networks on general-purpose parallel computers. For more detailed information interested reader should look into original papers referenced throughout this section.

4 Hardware Technology for Neurosimulation

A hardware approach to support simulation of connectionist systems is characterised by the devel-opment of a special purpose device to directly mimic the behaviour of the underlying models. The resulting machine is called neurocomputer and is able to execute single or various connectionist models.

The ultimate goal of the novel architectures is the development of simple processors with the possibility of massive interconnections that should better imitate neuroprocessing and increase exe-cution speed. Well-known architectural techniques like caching, instruction and memory pipelining, superscalar technology and bit-serial processing are applied[10] as well as direct hardware support for specic neurooperations. Nevertheless, the question remains, whether those digital techniques are sucient or even adequate for the neuroprocessing paradigm. Thus, there is an increasing interest in the use of other technologies to emulate articial neural networks. Special attention has recently been paid to optical and molecular computing. Eorts to exploit these alternative approaches add yet another colour to the multi-disciplinary spectrum of articial neural network research. They are still in the experimental phase but convincing results will certainly motivate more and more future work in this direction. Figure 7 presents a taxonomy of hardware approaches for neurosimulations.

Neurocomputers could be roughly divided, according to the constituent components to: (1) custom design neurocomputers and (2) programmable neurocomputers. Custom design neuro-computers include specially designed chips that support a particular neural network model, while programmable neurocomputers consists of commercially available general-purpose chips that can support a wider range of articial neural network models.

4.1 Custom Design Neurocomputers

Custom Design Neurocomputers or special purpose neurocomputers represent hardware implemen-tations of specic neural network models. The diculty in the construction of such a machine lies not only in the limitations of nowadays hardware technology but also in the complexity of the

(14)

Custom design neurocomputers Special-purpose

Programmable neurocomputers General-purpose

PC-boards SPMD SIMD Systolic Molecular technology Optical technology Silicon technology Neurohardware Processor arrays Co-processors

Figure 7: A Taxonomy of hardware technologies for neurosimulations

operation of certain articial neural network models (e.g. backpropagation). Novel architectures are mainly based on silicon, optical, or molecular technology. The criteria for comparison of neurocom-puters are the amount of neurons that can be packed on a certain area (e.g. chip), interconnection capacity, and the speed of neurocomputing usually expressed in the number of updated connections per second (learning speed).

4.1.1 Silicon Technology

The common goal of neurochip designers is to pack as many processing elements as possible on a single silicon chip, thus providing faster connectivity and improving execution time. The simplied processing element (articial neuron) is typically constructed in the following way: an amplier models a cell body; resistors (placed between two ampliers) represent synaptic connections; and wires are used to carry input and output. The articial neural network is represented as a crossbar interconnection of these elements. Optimum performance would have been achieved if all neurons were placed on a single chip. But current technology allows a maximum of several hundred elements to be packed on a single chip. There are two possible strategies for overcoming this problem: either the complete network is integrated on a single chip, or functional blocks, emulating a part of a neuroalgorithm, are integrated on a single chip, which is then added to a host processor performing the rest of the computation[44]. Another problem is weight adaptation, which requires run-time modications. Though some progress has been made in designing a learning chip[28], an easier solution would be to perform the learning phase o-line and then to construct the already trained network. The main advantage, however, for the use of silicon technology is the price of the end product.

Silicon neurons are realised using VLSI (Very Large Scale Integration) to place electronic circuits on CMOS (Complementary Metal Oxide Semiconductor), using digital, analog or hybrid design techniques.

(15)

Digital technology

is most often used in the design of special-purpose neurochips. One example is the Ni1000 Recognition Accelerator, made by Intel and Nestor Inc [40], specialised for pattern recognition problems. It emulates a radial basis function network. A chip has 512 processors, each emulating two nodes of a network. The maximal speed in recall phase is 10 GCPS. Another example is a neurochip, made by Hitachi[54], with 576 digital neurons integrated and interconnected with each other, on a 5-inch silicon wafer. As an indication of the product performance, the authors reported that the 16 cities Traveling Salesman Problem was solved using Hopeld network in less then a 0.1 second.

Analogue technology

with its high packing density, high potential parallelism and low con-sumption is a very good candidate for neuroprocessing. Though sensitive to the environment (tem-perature, interference) the analog technology prove to be very successful in neurochip production.

One example is an analog chip [29] for feedforward networks. The authors applied pure analog technology to achieve maximal speed, giving up programmability, on-chip learning and accuracy. A prototype chip used obsolete 2.5-m CMOS technology performing 6G multiplications per second for the classication of 70 dimensional vectors using a feedforward network (7041). Advanced

0.8m CMOS technology would increase the speed and the capacity by the factor of ten.

Hybrid technology

combines the two above technologies taking precision and programmability of the digital processing (convenient for the learning phase) and eciency with high processing packing density of analog technology.

A typical example of the hybrid approach is the ANNA chip [47] that combines on-chip, high-speed analog dot-product calculations with digital I/O processing and o-chip learning. The chip is made by 0.9-m CMOS technology on a 4:57mm

2area with a capacity to hold 4K synapses.

Reported recall speed was 240 MCPS for an optical character recognition problem, using a back-propagation network.

4.1.2 Optical Technology

Optical technology takes benet of light beam processing that is inherently massively parallel, very fast and without interference side-eects. A lot of eort has been invested to develop opti-cal components that can be eciently used in neurocomputing. The results ranges from special purpose associative memory systems through various optical devices (e.g. holographic elements for implementing weighted interconnections) to optical neurochips.

Optical techniques match ideally needs for the realisation of a dense network of weighted in-terconnections. Spatial optics oers three-dimensional interconnection networks with enormous bandwidth and a very low power consumption. A "classical" example of an optical neurocomputer is the Caltech Holographic Associative Memory" [1]. The goal of the system is to nd the best match between an input image and a set of holographic images (that represents its memory). Neurons are modeled by nonlinear optical switching elements (optical transistors) that are able to change their transmittance properties as the brightness of a light beam changes. Weighted interconnections are modeled by holograms which are able to record and reconstruct the intensity of light rays. An one-inch planar hologram, produced on a tiny photographic lm, can fully interconnect 10.000 light sources with 10.000 light sensors making 100M interconnections. The whole system consisting of a set of lenses and mirrors, a pinhole array, 2 holograms and an "optical transistor", is realised as an optical loop: the input, an image to be recognised (e.g. a part of the stored image) is projected into the system; and after a few iteration (in which input image interact with the stored images) the system outputs the corresponding stored image.

In [21] authors used both electronic and optical technology to solve problems in real-time image processing applications. They have fabricated an optical neurochip for fast analog multiplication with weight storage elements and onchip learning capability. The chip can hold up to 128 fully interconnected neurons. They have also developed the "articial retina chip", a device that can

(16)

concurrently sense and process images (eg. edge enhancement or feature extraction). Applications of their optical devices are in the domain of image compression and character recognition.

4.1.3 Molecular Technology

The molecular technology is a relatively new approach which combines protein engineering, biosen-sors and polymer chemistry in the eorts to develop a molecular computer. The computation uses physical recognition ability of large molecules, like proteins, which can change their shape depend-ing of the chemical interactions with other molecules. The potential packdepend-ing density of molecular devices (three dimensional structures can be packed with several orders of magnitude greater than semiconductors [6]) makes this approach particularly attractive. A direct consequence is the massive parallelism obtained on the molecular level.

The building blocks of a molecular computer are proteins and enzymes. A protein is usually organised as a linear chain of up to 300 smaller molecules called amino acids. There are around 20 dierent types of amino acids. Under certain conditions (bio-chemical interaction among amino acids) a protein can change the shape. The role of the enzymes is to cause the change of shape and to recognise certain shapes. The possible molecular computer may consist of three parts: (1) a receptor, whose role is to transform analogue signals to messenger-molecules that are presented into (2) tactilising medium - which consists of processing molecules that transform messenger-molecules causing shape changes; and (3) readout enzymes, that read the local messenger-molecules and generate the output signals. In short, a molecular computer uses proteins to sense the signal, to transform it, and to output signals.

The molecular computing is still in its infancy. The major problem is to develop appropriate technology that would allow for construction of bio-transistors. Nevertheless, it is attractive because it introduces qualitatively dierent way of processing thus being able to address dierent problems. The computing is context-dependent i.e. inputs are processed as dynamic physical structures, not bit by bit. It is inherently parallel and has generalisation and adaptation capabilities, that perfectly match the needs of neural networks. No complete molecular computer has been built so far, but partial results are very promising [7] and it is a matter of years when realistic solutions will appear.

4.2 Programmable Neurocomputers

The common characteristics of programmable neurocomputers are to provide both hardware and software support for ecient and exible execution of dierent neural network models. Hardware support consist of a selection of commercially available chips (optimized for dot-product calcula-tions) which are then connected into an appropriate topology to re ect needs of neuroprocessing. Having commercial and programmable components, such architectures are exible and allow for the use of general-purpose programming environments.

Programmable neurocomputers can be further sub-divided into SIMD processor arrays, SPMD processor arrays, systolic arrays and neuroaccelerators. The processor array architectures use data-parallel techniques to provide highly data-parallel neurosimulations and neuroaccelerators are simple co-processors that are added to PCs or workstations to accelerate neuroprocessing.

SIMD processor array

neurocomputers are based on commercial processors usually connected with a bus with fast broadcast communication. The style of programming is such that each pro-cessor executes the same instruction over dierent data and processing is centrally synchronized. One of the most popular SIMD-based neurocomputer is the CNAPS System, developed by Adap-tive Solutions[13]. It has up to 64 processors per chip with local memory, connected into a one-dimensional array structure. The CNAPS system can perform 1.6 GCPS and up to 300 MCUPS.

A more recent SIMD-based neurocomputer is the DREAM (Dynamically Recongurable Ex-tended Array Multiprocessor) machine [48], designed as a programmable and recongurable plat-form for ecient implementation of dierent neural network models. It consists of a host machine, the controller and the processor array. Processors are arranged in a 2-D lattice, such that each

(17)

processor is connected to its eight neighbors through four programmable switches. The DREAM machine memory is mapped to the memory space of the host and each processor has access to its part of the memory. With such an organization both communication and computation are pro-grammable and a size of the local memory of each processor is extendible. That allows for high exibility of the system. Several neural network models have been eciently implemented using hardware supported mapping methods. For example, performance achieved for the NETtalk was 512 MCPS, and a solution for the Traveling Salesman Problem (TSP) with 30 cities, using Hopeld network ran at a speed of 2 GCPS.

SPMD processor array

neurocomputers are based on DSP (Digital Signal Processing) chips and similar commercial chips optimized for fast dot-matrix operations. Further performance is gained by parallelizing matrix operations. Each processor executes the same program in a Single Program Multiple Data style of programming, thus avoiding strict lock-step synchronization of a SIMD processing.

A successful example is the MUSIC system (Multiprocessor System with Intelligent Communica-tion) [34]. The system uses up to 63 Motorola DSP chips connected into a global ring (allowing that communication overlap with computation). A group of three chips are placed on a 98.5in board,

that also contains an Inmos T805 transputer, used for load balancing and performance measure-ment. The whole system is connected either to a PC or a Sun workstation. The peak performance, obtained with a backpropagation network is 3.8 G ops or 1.9 GCPS. For writing neuroapplications the C programming language may be used (though for a best performance, the use of assembler language is recommended by the authors).

Systolic arrays

proved to be a very convenient hardware organization for neuroprocessing. The basic components of a systolic system are simple processors dedicated to the Multiply-Accumulate (MAC) operation organized in a pipeline (a ring or mash topology) that rhythmically compute incoming data.

A successful example is the SYNAPSE system, produced by Siemens. The basic components are eight MA16 chips which are pipelined, 16-bit multipliers and adders. Each chip has a throughput of 800 MCPS. Processing is organised in a two-dimensional systolic array, and when connected to a workstation, gives a performance of 5.12 GCPS [45]. Recently, Siemens announced SYNAPSE{2, a PC{board based on one MA16 chip.

Neuroaccelerators

are widely used upgrades for neuroprocessing. They are special co-processors that can be plugged to PCs or to workstations. Their basic purpose is to provide oating point pro-cessors for vector-matrix arithmetics and to speed up memory access. Besides a hardware upgrade, neuroaccelerators are delivered together with a software package for easier neuroprogramming.

Some of the more famous systems are: the Mark III and Mark IV series[15] which upgrade the VME-based workstations being able to update between 450K up to 5M connections per second. The Mark series uses common ANSE (Articial Neural Network Environment) software. The ANZA and ANZA Plus[15] co-processor boards are produced by Hecht-Nielson Neurocomputer Corporation. The ANZA systems come together with PC-AT computers and a collection of routines called UISL (User Interface Subroutine Library). The SAIC SIGMA-1 neurocomputer[52] is a PC-AT computer with DELTA oating point processor board and the software packages: ANSim (a neural net library) and ANSpec (an object-oriented language). Philips has produced a general purpose building block processor called Lneuro (LEP neuromimetic circuit) [30] which consists of a number of Lneuro chips that can be connected to a host-transputer combining coarse-grain (MIMD-like) parallelism of the host with a ner-grain (SIMD-like) parallelism of the VLSI chips.

4.3 Summary and Comments

With growing needs for faster execution of neuroapplications it has become clear that better per-formance can only be achieved if neuroalgorithms are emulated in hardware. Two approaches

(18)

Simulation Tools

Menu-based/

graphic-oriented systems Module libraries

Specific programming languages

Figure 8: Taxonomy of simulation tools

have been discussed: construction of custom, special-purpose neurohardware and construction of programable, general-purpose neurocomputers.

Special-purpose neurohardware shows the best performance, but nevertheless the use of custom-based products is a too exclusive solution (especially in the neurodomain where existing models are often modied and many new ones are evolving). Novel techniques that use optical and molecular processing are specically interesting. The future will show whether these analog technologies are closer to and more convenient for neural processing. However, due to its maturity, the silicon technology still dominates in the production of neurohardware.

General-purpose neurohardware oer the most practical platforms for neurosimulations, achiev-ing both eciency and exibility on an acceptable price. Especially popular and convenient are neuroaccelerators that are relatively inexpensive and widely available. It may be said that their appearance on the market actually made a wider use of articial neural networks possible.

The domain of neurocomputer design is very vivid. Technology is constantly improving oering better performance with every new product. It is very likely that the future neurodevices will be made as co-processors fabricated in all mentioned technologies oering high speed and massive parallelism on a microscale. The readers interested in neurohardware should look for the further references in [32, 11, 19, 3].

5 Simulation Tools

In contrary to parallel neurosimulations where performance has been the major design goal, tools for neurosimulations, in the rst place, take a software engineering point of view. In general, these tools support a variety of network models, and oer assistance in handling, proling, and analysis. The actual execution of the networks usually takes place on PCs or workstations which may have a coprocessor. Execution on parallel computers is the exception and in this case only a subset of those network models, which run on PCs or workstations, can be used.

The tools for development and simulation of articial neural networks can be divided into three categories (see Figure 8):

Menu based/graphic oriented systems

. Many systems are characterized by their menu based and/or graphic oriented interface. It is possible to choose from a predened number of elements, to combine them, and to instantiate them with dierent parameters (e.g. for layer sizes) (SNNS[57], NeuralWorks[38], NeuroGraph[53]).

Module libraries

. Other tools basically oer a library of programmed modules (written in a general purpose language like C or C++). Again, networks can be instantiated with dierent

(19)

parameters (SESAME[25], RCS[12], Xerion[4]).

Specic programming languages

. Finally, there are systems oering a special programming language for specifying networks. In addition there may also be a library of modules written in that language (DESIRE/NEUNET[20], Nessus[55], Aspirin[23], PlaNet[33], AXON[15], CONNECT[18]). Some of the systems are hybrid with respect to these categories (e.g. SNNS/Nessus and Neu-ralWorks with respect to the rst and last category). However, in these cases one category is predominant.

The categories mentioned above will be discussed with respect to a number of criterias, and specic tools will be used to illustrate the argumentation. The criterias can be organized into four groups:

System handling

. The user interface of a system should be easy to learn and to use. It should oer abstract descriptions of network models which, at the same time, should be complete. On the one hand this means that it should oer a somehow readable view onto the elements oered by the system. And on the other hand it should be possible to nd out all interesting details about the functionality of these elements; for example, a user interested in the learning law used in his application should be able to get corresponding information. Finally, a user interface of a neural network simulation tool should support the analysis of networks by providing means for their graphical representation and for the inspection of error curves, weights etc.

Flexibility

. A neural network simulation tool should support at least the most important paradigms like backpropagation, feature maps, Hopeld nets etc. It should be easy to combine dierent paradigms and to play around with dierent topologies, dierent learning laws etc. Ideally, the user can modify existing entities and can extend the system by integrating self dened ones. As will be seen below, the exibility of most systems can be characterized by an (implicit) generic model about connectionist systems. This generic model depends on data structures or other items xed during design and construction of a simulation tool.

System integration

. From the practical point of view it is necessary that a neural network solution can be combined with other (software) systems. An important point, for example, is the combination with any kind of data preprocessing and data postprocessing; specically, this refers to data base access. Also, it should be possible to use a neural network as part of a complex system which mainly consists of non connectionist components.

Pragmatics

. For a given simulation tool scalability and performance play an important role. Running the same network with dierent parameters (e.g. for layer sizes) may enlarge the simulated network signicantly, and that requires a scalable environment. Performance can be signicantly improved by the use of optimization techniques, parallelization or inclusion of neuro-accelerators.

There exists no tool which fullls all criteria in an optimal way. For example, a system consisting of C modules basically is exible, as everything is explicitly coded in C, but such a system does not oer abstraction means, and in practice this deciency destroys the exibility. These points will be discussed in more detail, and possible trade{os between the dierent goals will be pointed out. However, the importance of the criteria mentioned above varies from user to user: those who want to use a tool just for learning about connectionist systems may not be interested in the question whether a network can be integrated into other software, where this is an important point in practice; those who have specic applications in mind may be satised with a tool oering the corresponding paradigm, but are not so much interested in exibility, where this is an important point for a neural network researcher; etc.

5.1 Menu Based and Graphic Oriented Systems

User interfaces of systems like SNNS[57], NeuralWorks[38], or NeuroGraph[53] provide menus and graphical tools for creating networks and for controlling and analysing simulations. Usually, such systems oer a number of predened network models and basic elements which can be combined more or less freely to create own networks. In this, units can be associated with a number of given activation and output functions, and topologies of any kind can be dened. Also, the networks can be trained with dierent learning algorithms. However, in general certain learning algorithms

(20)

can only be applied to networks with certain activation functions. This has a natural reason in the mathematical denitions of the involved elements, but destroys the concept that the predened basic elements can be freely combined, and therefore may be confusing to a novice.

System handling

. Menus and graphical means can be considered as \abstract" and \readable" representations of networks. Of course, some eort is needed to learn the handling of such an interface, but if it is well organized and documented, which is the case for the systems mentioned above, this is a question of days rather than of weeks. A specic advantage lies in the analysis of networks. E.g., it is easy to observe the development of weights, which might help to identify deciencies of a network topology. However, the network representation is not a complete one: for example, menus and graphical network representation oer no possibility to nd out all functional aspects of a training algorithm. Here, on has to rely on a good (online) documentation.

Flexibility

. Many of the systems under consideration oer a lot of predened network models and basic elements for the creation of own networks. However, the exibility of these systems is restricted. A problem occurs, if a user needs a network model which is not oered by the system and which cannot be constructed by combining predened elements, i.e. if he wants to modify given entities or if he wants to extend the system by additional ones. Carrying out such modications or extensions might be possible, but needs specic system knowledge which goes far beyond knowledge about the user interface as discussed above.

The SNNS system, for example, can be extended by user dened activation or output func-tions, and even by user dened learning algorithms, but extending the system here means to write corresponding C routines and to link them to the system. As these routines work on the internal data structures, detailed knowledge about these structures is a prerequisite, specically for writing new learning algorithms. NeuralWorks oers an assembly{like language for specifying new \con-trol strategies". This language is dicult to use. However, the approach to extent the system by dening new elements with the aid of a specic description language, is interesting.

Whatever mechanism is oered to the user, every extension has to be based on the internal data structures of the systems. In general, these are xed record structures (representing units, connections etc.), linked by pointers. These internal data structures constitute the implicit generic model about connectionist systems implemented by a given tool, and this implicit model denes the \space" of possible modications and extensions.

System integration

. The integration of neural network solutions into other software systems can be realized by \exporting" networks. NeuralWorks, for example, provides a tool that translates a network into C code. It can then easily be incorporated into any other C programs. Also, the concept of the menu based systems allows for the combination with any kind of data preprocessing and data postprocessing.

Pragmatics

. For scalability, there are no fundamental limits. The fact, however, that internal structures (e.g. for units) are linked with the aid of pointers, may lead to performance problems. Here, parallel simulations (or other specic implementations) can help. So far, no general solutions have been developed. Rather, some tools have been extended by components for the parallel simulation of specic network models [27]. As usual, such a component is controlled via menus. But as specic data structures etc. are employed for its implementation, it cannot be combined with \sequential" elements in the usual way.

5.2 Systems Based on Module Libraries

The systems considered here are characterized by the provision of a module library written in a general purpose programming language (usually C or C++). Such a library can be considered as a toolbox consisting of basic building blocks for the construction and execution of connectionist experiments. These basic building blocks may be complete network models, elements thereof, tools for training networks, but also tools for their graphical representation and analysis. Moreover, tools for data base access, for data preprocessing and postprocessing etc., can be found.

Of course, despite these common general characteristics, there are dierences between the sys-tems. RCS[12] and Xerion[4], for example, provide C libraries; the predened routines and

Simulation of Artificial Neural Networks. Connectionist modelling - generic features - silicon optical molecular

Simulation of Arti cial Neural Networks

1 Introduction

2 Arti cial Neural Networks

2.1 Feedforward Networks

2.2 Associative Memories

2.3 Self{Organizing Maps

2.4 Summary and Comments

3 Parallel Implementation Techniques

Parallel decomposition

Computation demands

Communication and synchronization

Broadcast

Circulation

General routing

Target architectures

Performance

CPS

CUPS

EPS

3.1 Simulation on the Connection Machine

Node-and-training set parallelism

Node and weight parallelism

3.2 Simulation on MasPar

3.3 Simulation on Systolic Arrays

3.4 Simulation on Transputers

3.5 Summary and Comments

4 Hardware Technology for Neurosimulation

4.1 Custom Design Neurocomputers

4.1.1 Silicon Technology

Digital technology

Analogue technology

Hybrid technology

4.1.2 Optical Technology

4.1.3 Molecular Technology

4.2 Programmable Neurocomputers

SIMD processor array

SPMD processor array

Systolic arrays

Neuroaccelerators

4.3 Summary and Comments

5 Simulation Tools

Menu based/graphic oriented systems

Module libraries

Speci c programming languages

System handling

Flexibility

System integration

Pragmatics

5.1 Menu Based and Graphic Oriented Systems

System handling

Flexibility

System integration

Pragmatics

5.2 Systems Based on Module Libraries

Simulation of Articial Neural Networks

2 Articial Neural Networks

Specic programming languages