Hybrid Algorithm for Intelligent Swarm Optimization in Non-deterministic Environment

(1)

Hybrid Algorithm for Intelligent Swarm Optimization in Non-deterministic Environment

Assel Akzhalova¹, Dmitry Mukharsky², Beisembetov Iskander³, Yekaterina Polichshuk⁴

1Kazakh -British Technical University, Kazakhstan

2 Kazakh National University named after al-Farabi, Kazakhstan

1assel.akzhalova@g mail.co m

2amiddd@ramb ler.ru (Tel: +7-7272-72-1502)

Abstract A group of interacting agents are able to solve complex tasks in a dynamic, continuous, stochastic environment. In formation and intellectual load on the individual un its of the group substantially less than for an autonomous agent. Every agent in the group has the ability to accumu late experience fro m environ ment and shares them with other team members. The behavior of the group interacting and even hinder each other a gents leads to unexpected (emergent) properties [24]. System behavior is not deduced from the properties o f its parts. This work offers an approach to modeling the behavior of a group of mobile agents employed a common purpose. We have tested proposed hybrid algorith m to find rad iation sources and bypass obstacles on mock-up simulator. The tests show that the method enables optimize the target functions of agents with minimu m time .

Key words: intelligent agents, reinforcement learn ing, genetic algorithms, neural network, optimization.

1 Introduction

Modern science pays great attention to the development of artificial intelligence systems. The issue has important practical significance. Autonomous agents controlled by artificial intelligence are able to rep lace man in dangerous enterprises, high-risk conditions for life and rescue operations [23].

There are many theoretical provisions under study of multi-agent systems. They are positioned as the theory of coordinated work done by agents. Here we can dis tinguish the theory of common intentions [11]. A similar concept is called theory of general plans [16]. These theories are based on the architecture of BDI (Belief Desire Intention) [25].

Nowadays, the BDI arch itecture is considered to be the logical basis of intelligent agents.

The aim of developing the intellectual swarm is to build models for comb ining beliefs, desires, intentions of agents and providing them with teamwork. In such models assumes the existence of numerous communications between homogeneous little intellectual agents.

For many models it is enough to use the simplest reactive agents. Their architecture is subordinate the causal logic of state-action. These agents form simp le swarm intelligence. Ho wever, even such teams are capable of quite complex collective action.

Co mplicating the interaction of reactive agents with environment and entering the target function can significantly complicate the behavior. Agents seek to maximize the objective function and act more intelligently. Intelligent agents are forced to operate in the conditions of uncertainty and fuzzy information that was received from sensors. Trajectory planning must be ensured taking into account the internal database and the flow of changing in formation fro m the sensors. A greater number of sensors provide better perception of the environment. The optimal target function provides optimal motion. Generally speaking, building the target function for non -determin istic stochastic environments is difficult and not comp letely solved problem.

Artificial neural networks are often applied as control systems. Firstly, the principle of their operation is based on the effective bio logical neural networks. Secondly, there is rich experience of p ractical application of neural networks in the various fields of automation [17].

The choice of topology and setting the weights of the artificial neural networks are one of the most important steps when using neural network technology to solve practical problems [17]. The paper [17] is considered one of the possible approaches to training the neural network, neuro-evolution approach. The method is based on the idea of a genetic algorith m, first proposed in [18].

This work presents architecture of the neural network (NN) that controls agents with a large number of sensors in nondeterministic, continuous environment. Architecture NN allows performing the task (to find the sources of radiation and do not come across obstacles) without prior train ing. Immed iately after p lacing agent int o environment, it begins performing tasks only due to the network architecture. Agent receives information about the environment through a system of sensors. The agent receives a positive reward for correctly actions and improves its rank in this way. The rank of the agent decreases due to error actions. Training of network is done by correction weights based on networks of agents with high ranks. It allows the agent to run the task more efficiently. If the rank o f the agent becomes less than zero, its neural network undergoes a transformation. The proposed method has been tested and compared with similar approaches.

The article is organized as follows. Second section formu lates physical and mathematical p roblem statement and considers base agent architecture. Third section discusses the existing approaches to the task. Fourth section describes

(2)

construction of interaction architecture of the agents swarm with environment and proposes management architecture of the neural network. Fifth section proposes a hybrid approach to increasing intellectuality agents based on genetic algorith ms and reinforcement learn ing. Sixth section presents the results of numerical experiments on simu lator.

Seventh section compares the results of numerical experiments with similar ap proaches.

2 Problem statement

Swarm intelligence is a term first introduced by Gerardo Beni and Jing Wang in the study of cellu lar automata [ 6]. A set of homogenous agents actively communicates with the environment. Environ ment is non -determin istic and continuous.

Interaction between agents made the environment stochastic and dynamic. Each agent in the swarm is capable of only the simp lest actions. The agents do not have leaders and subordinates among swarm. Exchange of information between agents is primitive in nature and does not contain complex statements of behavior. However, the entire swarm is able to complex purposeful behavior. Such system can be called self-organizing [8, 9].

Analyze mathematically the exact behavior of the swarm is not possible due to the complexity of the mathematical model. The exact solution is possible only for a simplified model. In [10] proposed the simplest exact solution of a one- dimensional task. However, as the authors note, even this simplified solution has an equivale nt in nature. Due to the complexity of the exact solution important role in research on swarm intelligence plays a simu lation.

Behavior of the intelligent swarm of the agents in real continuous dynamic environment can be simu lated in computer experiments. In the model easy to consider the impact of obstacles and mistakes of perception of the environment by the individual agents. In the model easy to change settings and features for agents, the number of sensors for more information about environment. At the same time the model gives a clear and accurate representation of the behavior of the simulated swarm.

This paper discusses a model that simulates the behavior of swarm agents in two -dimensional space. The agents perceive the environment and other agents only by using sensors. Other ways to interact between agents no. Number of sensors can vary widely.

The environment is partially observed. In arbitrary locations of space can be unknown in advance insurmountable obstacles. The agent is able to detect the obstacle by using distance sensors. Sensors are not able to distinguish obstacles fro m other agents. Thus for each agent the environment is partially observed nondetermin istic and dynamic. This model is dictated by the real purpose of the search robots radiation sources in places inadmissible for working people [21].

There is a distorted surface. On the surface may be barriers that impede free movement on it. In arbitrary locations on the ground are sources of radiation or radioactive contamination. Area is dangerous for a long staying in it. There is a need to eliminate sources of pollution as soon as possible. In such circumstances the search can be done only remote - controlled or autonomous robots. The search is carried out by robots that communicate using radio waves or are deprived of opportunities for long-distance communication.

Using autonomous or remotely controlled robots can be difficu lt due to their risk of failure and lo w speed research. A swarm of autonomous robots with collective intelligence is fault tolerant on account of centralized management. The failure of indiv idual robots remains unnoticed on account of their independence from each other. Explores the large area in less time on account of the large number of robots [12, 13].

The problem statement is as follows. There is a two-dimensional space with obstacles and numerous sources of physical emission , for examp le radiation, is quantity of sources. The intensity of the emission sources decreases with increasing distance fro m them as the inverse square of the distance. The intensity of the emission in a point is defined as the superposition of all sources.

Arbitrary region of space occupied with the obstacles. Obstacles are absolutely imperv ious to agents. These areas are inaccessible to traffic of the agents. But in these areas can be sources. In the described model assumes that barriers do not in any way affect the distribution of intensity.

Set of the agents are equipped with sensors for orientation in space is quantity of the agents. The sensors location is shown in Figure 1.

Figure 1. Location of the sensors on the body of the agent. The follow ing symbols are used: D is sensor distance to obstacles. f, r, b, l are intensity sensors.

(3)

The sensors estimated distance from the agent to the obstacle. When there are not any obstacles is in a field of view of the sensor then a corresponding sensor generates 0. The distance sensors have limited range of coverage. At long distances the display resolution of the sensors is reduced. At the limit of range of coverage the sensor may give false or mispresent information. Closely spaced obstacles require immed iate action, so the information fro m the sensors is processed. Closely spaced obstacles give big signal on sensors. Nu mber of sensors can be arbitrarily extended. The minimu m nu mber of sensors range-1 sensor.

The distance sensors can react not only on stationary obstacles but also to other agents. To distinguish static obstacles fro m moving agents a separate sensor cannot. If the number of sensors is greater than five then the distance to different parts of the agent will vary. The comb ined sensor readings would yield a stable picture allowing distinguishing agents against a background of stationary obstacles .

Sensing elements are intended for measuring the intensity of the field ahead, right, behind and to the left fro m the agent. They measure the intensity directly at the point of its location. Therefore the maximu m distance between them may not be greater than the diameter of the agent. A set of sensing elements intensity can also be expanded. The basic sensing elements are . Nu mber of the sensing elements on the sides of the agent can be arbitrary.

During the movement the agents collide with each other and with obstacles. The clash between agents is modeled absolutely elastic impact. After the collision, their speeds are changed to the opposite. Co llision with obstacles varies on opposite values both components of the velocity vector. Interaction with the edge of the search space is modeled elastic reflection.

The agent can be in one of the many possible states , is quantity of possible states. For nondeterministic continuous environment the number of possible states can be very large even for limited space. But it is fin ite.

The agent operates in a discrete time . At the certain time the agent is located in one of the possible states , which are co mposed of the states of the sensors and the status of the agent . The state of agent includes the coordinate and the speed .

3 Approach

Swarm intelligence constructed using the described agents can be modeled in continuous partially observable dynamic environment. In the model you must consider the possible errors of perception by sensors of the environment. For each agent must provide the freedom to choose from a larger set of actions. Interaction the agent with environment is passive.

Interaction of agents with one another is only possible within a field of view the distance sensors and is active. A set of sensors you can extend to provide a detailed p icture of the agent environment. An expanded set of sensors allows the agent more intelligently interact with environment and perform tasks with greater efficiency.

Real environ ment is non-deterministic. Agent with rigid program of action will not be able to operate effectively in such environment. Agents must be able to accumulate a limited amount of knowledge about environment and on interaction with other agents swarm. Thus, a swarm of agents should be learning.

Teaching a group of interacting agents is complex and unresolved challenge. This issue is being actively explo res [24, 31].

Lately for learning agents in harsh environments actively apply learning method with reinforcements for the first time proposed in [27]. For theory reinforcement learning there are effective learning algorith ms [3]. They formalized for Markov processes decision making [4]. A mong the works of recent years on training with reinforcement you can specify [26, 20].

Most fully embodies the reinforcement learning ideas in Q-learn ing algorithm first proposed by Watkins in his thesis of the year 1992 [29]. The method developed and formalized in [28]. Q-learning algorithm allows you to get good results for discrete nondeterministic environ ments with little well distinguishable states. Increasing the size of the environment or move to continuous environment leads to exponential growth in the nu mber of states. The task turns out to be NP- hard. In the literature the problem is called the curse of dimensionality and orig inates in the work of the Bellman [5].

Using the target function approximation of NN is way to reduce the influence of the curse of dimensionality. For the first time this technique has been applied in [22]. In our work we have abandoned the direct dependence of weights NN fro m corroborating signals from environment and returned to the ideas of classical learning with reinforcements. NN architecture that managements of the agents is defined a priori. NN weights approximate the target function of the agent. Because different agents have various NN weights so the target function for differe nt agents will be various.

Under the same conditions different agents will act differently.

To optimize the target function used genetic strategy. Rewards fro m environ ment are signals that reporting network on the success or not of training success. Application of the ideas of evolutionary simulation (simulated evolution) to optimize especially justified in situations where not seen direct ways for solving the problem. In such cases it is an evolutionary simulation can achieve a result. The disadvantage of this approach is to the relative slowness of the process of convergence of the algorithm in comparison with direct methods. Evolutionary simulation does not guarantee results in a reasonable amount of time.

The next section constructs architecture of intelligent interaction swarm with environ ment and the repository of neural networks.

4 The Collective Intelligent Management Architecture

(4)

Each agent in the intelligent swarm has its own NN control. Information about all NN in the swarm is stored in the repository. Swarm of the agents actively interacts with the environment. In response to the actions of the agent environment charges each agent reward. Management periodically requests information about each agent reward and selects the best and worst of NN. Measure the success of the network is the reward. The task of the same daemon includes culling bad networks and replacing their derived fro m top networks. Each agent or gets a new network or continues to be managed by the old network for next stage. Schematically architecture interaction agents swarm with environment and with the store is presented in Figure 2.

Figure 2. The general architecture of interaction the swarm w ith environment and with depositary of the NN. Black arrows marked the transfer of data. Grey arrows marked influence or control.

In Figure 3 shows the general architecture of the swarm of the agents. Each agent receives information fro m sensors fro m the environment. Information is processed within a NN. At the entrance of the NN is given one of possible action.

Information co mes in block effectors. The agent performs the appropriate maneuver. It is produces effects on the environment. Status of the agent changes and the whole loop repeats itself.

Figure 3. The intelligent swarm architecture.

The environment passed information about the award that is assigned in the previous step together with information about the current state of the agent. The environment passes this same in formation to management p rogram.

Interaction between agents is possible only in a field of view of the sensor. The agent that is outside the field of view of the sensor of other agents must operate autonomously. But his experience of interaction with the environment entered into the repository and can be further used by other agents. Thus there are still between agents and indirect interactions.

The following section considers the decentralized management of intellectual swarm considering fault tolerance.

4.1 The Decentralized Management of Intellectual Swarm Considering Fault Tolerance

The agent is controlled by the NN. The input to the network is a vector of sensor activity . In the network there is a h idden layer that handles signals fro m sensors and sends signals to the output layer. The output layer of the NN is directly related to effectors. Effectors allow the agent to move at different angles to the current direction. In the model imp lementation set actions includes turns at different angles to the left, straight and turns at different angles to the right. Base set allo ws you to perform only three rotations. Agent rotation combined with a step movement.

The architecture of the NN has two kinds of connections between neurons. The first is excitatory connection. The weights of excitatory connection are taken with the plus sign and amplify a signal passing through them. The second type of relations is inhibitory connection. Inhibitory connections are taken with a minus sign and suppress a signal

(5)

passed through. The connection type of NN is set initially and cannot be changed in the network operations. The weight of the connections between neurons can become equal 0. In this case, the connection between neurons turns out to be broken. Initially all weights of the connections between neurons are either random numbers or fixed values. The idea of the described architecture is inspired by the textbook work [19].

In Figure 4 shows a schematic d iagram of the base NN fo r maneuver to toward the emitter.

Figure 4. The diagram of NN. Arrows are excitatory connections. Circles are inhibitory connections. Solid lines shows the relationships between the various layers. Dotted lines shows the relationships that connect neurons w ithin a single layer.

The architecture of the network is designed to route to the emitter. In the first step of modeling the weights of all connections are set randomly but with appropriate sign. During the simulation the type of connection (excitation or inhibition) does not change. The only change is an absolute value within the interval . If the weight of the connection during process becomes zero then the synapse that emulates the relationship will d ie.

The proposed architecture has no a recovery mode for broken connections. Initially the network is fully connected and the recovery of synapse is superfluous. During simulation if the algorith m modification imp lies reducing to zero then it is necessary to improve the operation.

There are internal connections between neurons in the hidden layer. Neuron with maximal activity at the current step inhibits all neighboring neurons. Thus the competitive learning model is implemented. The goal is to adjust the weight so that for any input vector the only one motor neuron is excited corresponding to the maximu m of the target function.

In the diagram in Figure 4 on hidden neuron ends more inhibiting than excitation connections. The sum of the weights of excitatory connections should be, on average, greater than the sum of the weights of the inhibitory connections. In this case the hidden neurons will have a non-zero activity. It is another possible case of nonzero activity hidden neurons.

Activity of excitatory receptors must be greater activity of inhibiting receptors. This situation can occur in the vicinity of the source. When deleting from source network performance will fall. At greater distances network may cease to perform their functions. The difference between the weights of the excitation and inhibitory connections can serve as a measure of the sensitivity of the network.

Further the index are neurons fro m the input layer. They may be set to the values from the set or values . The index denotes neurons from the hidden layer and take the values . The neurons from the output layer are denoted by index . The index takes values from the set or values .

The matrix of weights for connections between neurons are denoted by the symbol , where the index belongs to the neuron where connection begins and the index is to the neuron where connection ends.

The activity of the neurons for appropriate layers will be denoted as where is taken fro m the set . For the diagram of the neural network shown in Figure 4, the connection matrix fro m the input layer to the hidden layer will be:

,

(1)

(6)

where are input neurons located on front, right, rear and left sides of the agent, are hidden neurons which correspond to the left turn, the straight movement, and the right turn, +r, -r are random numbers in the range [0,1] with the appropriate sign.

Neurons have excitation connections with hidden neurons that relate with motor neurons responsible for the left turn, the straight movement, and the right turn. But these neurons have inhibitory connections with other neurons in the hidden layer. The neuron that lifts the value of the intensity behind of the agent has the excitatory connections to neurons responsible for turns but inhibits straight movement of neuron.

The matrix for connections from the hidden layer to the output layer is:

,

(2)

where are the output neurons related to the right turn, the straight movement, and the left turn. The connections between hidden neurons and motor neurons are direct. The excitation fro m the hidden neuron is immed iately passed to motor neurons.

,

.

(3)

Each neuron of the hidden layer inhibits the neighboring neurons and thereby increases the chance to win the competition. The self-excitation of the neuron is not considered. All d iagonal elements of the matrix (3) have zero values.

A neural network with the task of finding an emitter is an appro ximation of the target (objective) function:

(4)

where is a turn left at the angle , is a turn right at the angle , is straight and is an input vector of the neural network.

The target function (4) does not depend on the internal state of the agent but it depends on the input vector of the NN . In modeling the target function is served not only for agent orientation in space but also for the comp ensation of stochastic changes in the speed during collisions.

The agent with the optimal target function tends to move in a direction of gradient intensity of the field. The decline of tension in the distance between the sensors should not exceed the limit of sensitivity. If the intensity on the sensor is higher than that on the sensor then you must change the direction of the opposite. Depending on the internal state the network can order a turn to the right or to the left. The agent will need to ma ke a few twists for the full turn and move in an arc. At different stages of the spread will work different parts of function (4). Depending on the stage of the reversal of the curvature of the arc can vary.

The network architecture allows agents to find the right direction to the source and confidently move in that direction if the matrix was initiated correctly. When the network with random weights is initializing not all agents will respond correctly to the data from their sensors. Only 10% of all agents are able to adequately respond to the escape intensity.

An asymmetry in the initialization of weights makes a certain contribution to the balance. The connection of sensors and are assumed to be symmetrical. The target function (4) is also symmetrical with respect to the right and left turns.

Generally speaking the statement about the target function is not true when the weights are initialized with random numbers. Therefore in the process of optimization of the target function it must be considered a balanced view of the agent with respect to rotations.

The second subsystem of the NN is used to prevent collisions agent with obstacles. There are sensors distances to obstacles for the early detection of obstacles. The angle between outermost distance sensors is and can be changed.

The angle between neighboring sensors depends on the number o f sensors. Maximu m range of detection of obstacles

(7)

can be modified. The agent's ability to control the side s pace is limited. Agents might collisions with sharp angles during turns. These clashes just make a stochastic component.

Connecting diagram distance sensors with neurons in the hidden layer is presented in Figure 5. For ease in Figure 5 are not marked connections between neurons in the hidden layer. Hidden neurons in Figure 5 the same as in Figure 4.

Figure 5. The diagram of the NN responsible for obstacle avoidance.

Subnet architecture crawling obstacles inverse the subnet architecture orientations on sources. Obstacle detecting sensor with one party agent activates neurons from the opposite side and at the same time inhibits neurons its side. It provides care agent from the collision. Detection of obstacles during the traffic agent causes him to change the direction of either the right or

The entire network operation algorithm is described in the fo llo wing steps.

Step 1: Extract the data from the sensors and form the input vector for NN . The agent is in the active state .

Step 2: Redistribute the input vector components between hidden neurons. Calculate an activity of neurons from the hidden layer:

where, is an activity of hidden neurons, is an activity of input neurons, is a matrix of weights input to hide. The result allows to form the vector of activity for h idden neurons: , H is a number of the hidden neurons. If the activity of the neuron is less than zero the neuron is not activated and the corresponding component of the vector is set to zero.

Step 3: Calcu late the activity of neurons for the output layer:

(6) where is the activity of output neurons, is a matrix o f weights fro m hidden to output layers. Next, it is calculated a vector of activity for output neurons: , N is a nu mber of output neurons. If the output is negative, the corresponding component of the vector is set to zero.

Step 4: Find a nu mber corresponding to the maximu m value in the vector :

(7)

Next the agent is translated in the space by using a calculated above . The agent moves into the new state . After that the procedure returns to step 1.

Architecture which is given a priori allows about 10% of the agents immed iately perform the task without the stage of learning. But to optimize weights requires additional information that informs the network about the correctness of the

(5)

(8)

actions that were performed. This information is a reward which is assigned to an agent after each iteration. Reward ing algorith m corresponds to the algorithm reinforcement learning. If the previous step was successful then the reward will be positive. In case of wrong or ineffective action is assigned a negative reward. Rewards of the agent summed and form its rank.

The biological neural networks are built and operate on a similar princip le. At birth the organism is inherited fro m parents a scheme of connections between neurons and a type of relationships. But how to correctly apply the architecture the organism learns during the interaction with the environment through positive and negative experience.

The process is often used to strengthen relations gradual oppression and even the subsequent death of seldom-used connections. Throughout the architecture a priority pathway of signal transmission is selected. The continued operation of the biological neural network occurs along the adjusted path.

And only in case of sudden changes of external conditions the network starts the search for new alternative pathways and the system adapts to new changed conditions. This is one of the constituent parts of the learning process.

The following section discusses the hybrid approach to increasing intellectuality agents based on genetic algorith ms and reinforcement learn ing. It is objective approach to imp rove the target function interaction the agents with environ ment.

5 The Hybrid Approach to Increasing Intellectuality Agents Ba sed on Genetic Algorithms and Reinforcement Learning

Collective intelligence tasks in nondeterministic, continuous, dynamic environ ment you can simulate a swarm interacting with environ ment and with each other. Each agent in the swarm has the freedom of choice of actions in response to the state environment. Each agent has at its disposal a system of targeting and positioning. To imp rove intelligence and perform tasks more productive agent management system required to adapt to changes dynamic nondeterministic environ ment. Adaptation helps each agent individually and around swarm generally a cts more purposefully in the future. Neural network orientation mechanism you want to augment the algorithm for the selection of the fittest agents. Hybrid algorith m is based on the genetic algorith m and the rein forcement of learning.

In the proposed model we optimized only weights of a neural network. A priori specified arch itecture remains unchanged. Part weights while mutations can accept null values. Such changes will remain if they have a positive effect on the dynamics of the agent. The NN of closest to optimized may have fewer active links than the original NN.

The weights of the NN that initiated by random numbers are not optimal. The experiments show that in 90% of cases the agent behaves in a completely unexpected way. Its trajectory is chaotic and has not intersection with the optimal trajectory. As often the agent acquires an oscillating movement in nature. The path is a circle or more co mplex but regularly repeating trajectory. In 10% of cases the combination of weights allows the agent to move in the right direction.

For selection and optimization of the network weights we used a selection method of the NN with optimal parameters and on their basis the formation of new NN. The agent uses the available network and tries to reach the goal over a ro w of iterations. He gets the total reward during this time. On the final award made conclusions about the effectiveness of the network agent. The amount of rewards of all agents has measure effectiveness of the swarm mining in general.

The algorith m for generating new networks must meet several conditions. First, the algorithm must not allow an unlimited growth of weight. The range must be within the interval [0, 1] for arbitrary number of cycles. The second condition is the preservation of the connection type. The algorithm does not have to change architecture of the NN. The excitatory synapse will remain constant throughout the entire process of modeling. The same ru le holds true for inhibitory synapse. The weights of the resulting NN would be closer to th e original with the great rank. This is the third rule.

The genetic algorithm is divided into three parts. The first part of the algorithm generates a new weight on the basis of two random NN with a positive rank. Modification of exposed every weight netwo rk. New weight is expressed through parental weight according to the formula (8). 10% agents undergo transformation in accordance with the first part of the genetic algorith m in each mutation.

(8)

where is the resulting weight for neural connection fro m neuron to neuron , are patterns weights of neural connections from neuron to neuron , are ratings of agents for the generation of new neural network, is a small random correction.

The formu la (8) easily generalizes to an arb itrary nu mber of samples. Random variable is used to prevent premature convergence process. Her adding pulls out the network from balance and allows not lingering in local lows of the target function.

The second part of the genetic algorith m is crossover. Two parental genes are separated at an arbitrary point. Fro m the first parent takes the first part of the gene. From the other parent takes the second part of the gene. The resulting gene is given to the offspring. The second part of the first parental gene and the first part of the gene of the other parent are not used. Select the first and second parents accidental and therefore loss of information fro m the unused genes are minimal. Th is mutation is applied to the remain ing 90% of agents. Thus, all NN are subjected exposed or crossing genes or entire network modifications during each mutation.

(9)

The third part of the genetic algorithm is random point mutation for each agent that has a negative rank. Numb er of point mutations in one gene increases if there are many agents with low ran ks. A large number of agents with low ratings mean that the previous part of the genetic algorithm is not able to provide the necessary for the survival of a variety of genes. The increase in the number of point mutations increases the diversity of genes. Many of them will be useless and even dangerous for the agent. But this increases the probability of a gene that is capable to deduce the population from the adverse conditions.

Similar way to accelerate mutations observed in bacteria. When the bacteria are in unfavorable conditions she dramatically increases the formu lation of defective proteins. Most of them will be useless for the bacteria but some may help her to survive and evolve. Hypothetically stress for higher organisms just increases the number of point mutations and contributes to the rapid adaptation of organism.

Genetic algorith m dynamically adapts to changing conditions. The number of mutations associated with a n umber of agents with low ranks inversely proportional dependence. This mechanism is intended to increase the speciation and speed up evolution.

Agent rank is an indicator of the success of the neural network. Calcu late correct rating is challenging and dep ends on the search space. Rating is a measure of the success of the agent when the task runs.

In non-determin istic and continuous environment the number of possible states agent are indefinitely. But many states may be close to one another in a metric space. These can be grouped into clusters. The network must react at them in a similar manner. For examp le in space without obstacles and with one source we can distinguish four cluster depending on the angle at which the agent is deployed to the source.

Generally a number of clusters may be too high but it is much s maller than a nu mber of states. At the end of iteration the agent moves to the new state . It is located either in the same cluster as in the state or the other. There is a transfer function between states. The function returns the value of the rewards for the transition. The practical implementation is more convenient to assign the reward for transition between sufficiently large clusters than for transitions between individual states.

In the paper five transfer functions are used in all nu merical experiments.

1. if the intensity of the emitter on the current step increased compared to the previous step;

2. if the intensity on the current step is less compared to the previous step;

3. if there is a collision of the agent with obstacle or with another agent;

4.

when the agent see any obstacles in the sector of visib ility of the distance sensors;

5. if the agent finds a source of the radiation.

The reward for the movement in the opposite direction from the source is greater than the absolute reward for movement in the right direction. This choice eliminates the looped traffic agent along a circle or along a more co mplex shape. While driving along circles the agent will move a half of the time in the right direction and a half of the time in the wrong direction. The resulting rank will be lowered. The network with low rank will be modified and the agent will break off fro m the circular motion.

The collision is a hard error for the agent and the penalties will be large in absolute value for negative reward. The agents involved in the collision have far fro m optimal target function and should be subject to the correction. When in arc of visibility of the agent appears obstacle its reward is reduced. The closer to the agent hazard, the lower the reward he will receive. You want to train in advance agent to take steps to prevent a recurrence of fighting.

Full reward for modeling all the time is calcu lated by adding up the rewards at each step. The final rank of the agent was formed according to the formu la:

(9)

where is total reward for all time.

The patterns network uses the only agents that have a positive rank. Next, all agents are sorted by descending order of ranking. The candidates for updating network are selected from a tail of the list. Candidates for the patterns are selected fro m a head of the list.

The agents updates a connection with the small ran k either at the end of each step of the iteration or with the periodic interval. The first method allows searching quickly for the new network configuration. The second method is preferred for environments with many obstacles. For that case the agent may lose temporarily the rank when it is trying to get away fro m the obstacles having a good configuration of the NN. The second method gives a time to recover the ran k after bypass of the obstacles.

To accelerate the learning symmetry of connection matrix can be used. The same technique can be applied to the learning of the subnet to bypass the obstacles.

The next section, the proposed approach is being tested in several experiments. The purpose of the experiments is revealing the dependence of the speed of convergence of the algorithm on the number of agents in the swarm.

6 Experiments

(10)

The test of the above method was conducted in specially designed simulator software. The initial weights of all matrices have been initiated by random nu mbers based on a sign of the connection.

The agents are circles with a fixed radius. The software takes into account the possible collisions. The source could not be immed iately reached by three agents. The agent that approaches to center of the source on the distance equal to or less than the radius of the agent are finding source. To accelerate the learning and to prevent the accumulation of individual rewards the following method is used. The agent, that found the source, receives the reward and its location in space is changed at random. The agent keeps the accumulated reward and the matrix of connections. Place near the emitter of constantly is released.

The proposed technique provides to the other agents to get closer to the source. In addition, the agent, wh ich already received the maximu m reward has an opportunity to try out a new mission from another location and therefore to boost its ranking.

The numerical experiments show that even the method is not be able to fully eliminate the “effect of the crowd”. In the latter stages of training for large nu mber the agents come to the source much faster. The experiment with different numbers of agents in the space has shown that the most effective learning occurs when a number of agents are close to 30.

The paper presents the results for 30 agents and for 60 agents. Despite the success of the training in both cases the final check on new space shows a difference in the networks performances . The NN compares with PSO algorithm [7]. The PSO guaranteed approximation to the source but only in a space with no obstacles.

Table I lists the parameters in the numerical experiments for space in the presence of obstacles.

Table I. Parameters for numerical experiments.

Environment and agents properties Value

Size of the space 1200×669 pixels

Quantity sources One source

Maximal intensity of the source 10000

Quantity of the agents 30 agents, 60 agents

Full velocity of the agent 3 pixel/iteration

Minimal velocity of the agent 1 pixel/iteration

Agent radius 14 pixels

Quantity of the distance sensors 5

The angle of the sector review 100⁰

Range limit distance sensors 150 p ixels

Quantity of the intensity sensors 4

Angle turn of the agent 15⁰

6.1 Stabilization of the System

Figure 6 shows a chart of the weights matrix depending on the number of iterations. Chart shows the average weight for each connection averaged for 60 agents and for 30. In the chart one curve combines connections from one input neuron and different hidden neurons.

Figure 6. The change of average values of weights matrix over 10000 iterations for 60 agents (on the left) and for 30 agents (on the right).

A strong fluctuation of weights is detected for 30 agents. The plots are stabilized and there are no changes after 4000 iteration. At this point the matrix approaches to one of many of optimal conditions.

The initial installation period for 60 agents runs more s moothly. This can be exp lained by a large number of orig inal patterns to build new networks.

In Figure 7 shows a plot of average weights matrix related for a number o f iterations for 60 agents and for 30 agents.

(11)

Figure 7. The change of average values of the weights matrix over 10000 iterations for 60 agents (on the left) and for 30 agents (on the right).

The plots demonstrate a period of uncertainty for 30 agents and more smooth behavior for 60 agents. In addition, both figures show a trend of diagonal positive weights drawn to closely values. Off diagonal elements of the matrix have a very wide range of values for both 60 and 30 agents.

In Figure 8 the weight matrix is depicted for 60 and 30 agents.

Figure 8. The change of average values of the weights matrix over 10000 iterations for 60 agents (on the left) and for 30 agents (on the right).

Co mmon features of behavior identified in the prev ious figures are saved in the internal connections of the hidden layer.

But for those ties vastly less stable behavior. The connections between neurons in the hidden layer are complementary.

They are designed to be more of a landslide victory one of the neurons on the other. They strongly depend on the values of the weights between input and hidden layers but not depend on rewards received by the agent. It can be assumed that in order to stabilize the matrix it requires more iterations than the relationships between the layers. Only after complete stabilization of weights between layers the connections between hidden layers will start to stabilize.

It is conditions of the stabilization of the system on the basis of the conducted experiments. Matrix assumes the primary burden on the management agent. Weights of the matrix quickly stabilized. After a large nu mber of iterations at many scales you may notice only slight fluctuations due to point mutations and small additive in the formula (8). But there are some weights that even after 10000 iterations are susceptible to strong fluctuations.

Matrix is designed to provide competitive learning. Weights of the matrix over 10000 iterations are susceptible to strong fluctuations. In general, the matrix is mo re unstable than the matrix . It swings the weights have a greater amplitude. It said about a large range of values fro m d ifferent NN. Matrix less impact for reward. Agents with very different matrices of the hidden layer are less susceptible to extinction during evolution.

Matrix also provides competition between output neurons. It is the continuation of competitive learn ing that began in the hidden layer.

When there is very large nu mber o f agents in a confined space, the ability of the system to learning reduced. Th is is due to two reasons. Firstly, environment is more populated and thus more random and dynamic. Adjust to this environment is more co mplicated. Secondly, the size of the neural network does not allow taking into account all the different possible states.

The next section considers a continuation of the experiments in this section. Consider the results of the analysis of the learning capability of the system.

6.2 Productivity of the System

(12)

To evaluate the progress achieved in the learning the agents use a reinforcement learning algorithm based on average rewards received by agents during all iterations. The increasing rewards tell that the agents accumulate an experience and improve its behavior over time.

Figure 9 shows the cumulative average of 60 and 30 agents reward.

Figure 9. The average reward of the 60 and 30 agents for 10000 iterations.

Behavior graphic charts show rapid progress in learning agents. Not trained agents with connectivity matrixes that initiated by random nu mbers on average receive negative awards. Among the large number of random matrices are matrices whose target functions are closer to optimal. Agents with these matrices are getting positive rewards and become donors for building hybrid networks. Matrix with bad target functions quickly becomes extinct. Agents earn positive ranks. Graphs of the accumulation of rewards come at a crossroads and the recession gives way to lift.

For both charts the tipping point comes almost simultaneously. Part connectivity matrices with good target function for tests with 60 agents more than for tests with 30 agents. A diagram is set above. In the first iteration agents -averaged reward is declining. A s mall part of agents with good target function distribute their matrix to agents with low ranks.

Random matrices with very bad target function quickly disappear. In later iterations the agents meet with non determinacy environment. Beg ins a process of slow selection matrixes with g ood target functions.

The number of states with obstacles far more states in free space. The genetic algorithm wo rks slower at later stages.

Agents needed more time to try out the different states with obstacles.

Was took place an experiment with network where the sign of the weight was not fixed advance. The network was simp le fully connected three-layer perceptron. There was no other advanced information about solving task. In other criterions the experiment was alike the last one. To the entrance of the network was presented the vector and the network calcu lated an exit. The agent moved by according to the exit. For his actions right or incorrect the agent got a reward. The agents what had get the negative rewards modify their networks. The results of the experiments present in Figure 10.

Figure 10. The average reward of the 30 and of the 60 agents for 10000 iterations for event absence of a priori information about signs of the weights.

The charts show that total average by all agents reward is monotonic fall. There is not any difference between the chart for 30 agents and the chart for 60 agents. Graphic chart for 30 agents is higher than the graphic chart for 60 agents.

There is no satisfactory explanation.

Shown behavior can be attributed to the dramatic increase in the volume o f phase space weights for absences a priori information about the sign of weight. This can be demonstrated for phase space of three weights . In Figure 11 shows an examp le of such a phase space.

(13)

Figure 11. Phase space for the three weights. Digit one labeled unit hypercube for full prior information about the sign of each weight.

Digit two labeled expansion unit hypercube in case an uncertainty sign of the weight . Digit three marked part of the phase space in case of uncertainty sign of the weight .

If all three signs of the weights initially specified in phase space is allocated a unit hypercube that labeled digit 1. In Figure 11 sign all three signs of the weights specified positive. Each neural network will be submitted to the point of lying, somewhere within the hypercube. Search min imu m will be in the same range. Due t o the fact that the hypercube is a convex shape formu la (8) will always give new networks, belonging to the hypercube. But if we assume that the information on the sign of the weight is unknown, there is a need to extend the hypercube for weight to negative values. The resulting figure is marked in Figure 11 dig it 2. Its volume is twice the volume of the original figure. At the same time increase the search space. Hence, in order to maintain the same pace of evolution, will require appro ximately double the initial population.

Suppose uncertainty sign of the weight . For this purpose it is necessary to extend the previous figure to negative weight . The resulting volume figures labeled in Figure 11 d igit 3, increase 4 times compared to the original figure under digit 1. At the same time the need to increase the starting population to maintain the retrieval rate.

In the case of our network we have 9 input neurons plus one threshold neuron. Total we have 10 ne urons in the first layer. The second and third layers have for three workers and one threshold neuron. Total number of connects between an input layer and a hidden layer will equal 10 × 3 = 30. Here considered that connects between the threshold neurons are not installed. When you consider that the threshold neuron with all the other always has negative sign, we obtain the total number of weights 27. A similar calcu lation for connects between the hidden layer and an output layer gives you addition 9 connects. Between neurons in the hidden layer assumes that internal connections. This adds another 9 connects. Total we have 45 free weights, a sign which can be uncertain. Total the neural network appears as point in 54 - dimensional phase space.

If you do not pin the hard sign of 45 free weights to obtain acceptable timing evolution of neural network will require approximately times to increase the size of the population. Thus, making reasonable assumptions about the sign, you can significantly reduce the size of the initial population. Co mpare Figure 9 with Figure 10 illustrates the justice above reasoning.

6.3 Convergence of the System

The connectivity matrix of the agent with the highest ranking saved after graduation learn ing. These matrices have been involved in the following experiment on convergence system. The resulting values of the weights are collected in Table II.

Table II. The best w eights.

Weight Values for 60 agents Values for 30 agents Input -> Hide

-0,609 -0,561

-0,597 -0,277

0,515 0,580

0,521 0,192

-0,587 -0,631

(14)

-0,507 -0,615

0,531 0,489

0,570 0,503

0,373 0,541

-0,641 -0,477

0,582 0,771

-0,635 -0,334

-0,565 -0,825

0,260 0,065

-0,426 -0,306

-0,481 -0,451

-0,553 -0,507

-0,102 -0,00017

0,287 0,494

-0,460 -0,683

-0,514 -0,220

0,595 0,401

0,581 0,665

0,543 0,399

0,404 0,335

-0,481 -0,304

-0,606 -0,803

Hide -> Output

0,724 0,479

-0,292 -0,201

-0,655 -0,404

-0,414 -0,908

0,633 0,499

-0,652 -0,493

-0,471 -0,376

-0,635 -0,571

0,602 0,468

Hide -> Hide

0 0

-0,558 -0,493

-0,471 -0,627

-0,489 -0,693

0 0

-0,476 -0,489

-0,411 -0,570

-0,396 -0,552

0 0

The matrix is used for experiments with PSO and NN algorithms. Next the training was carried out. All agents receive the matrix of weights from Table II. The weights have not changed during simulation. The train ing is separated from the exploitation.

In each experiment 30 agents is participated. The first series of experiments carried out with NN wh ich got random values of the weights in space which included obstacles. The result of the experiment is presented in Figure 12.

(15)

Figure 12. Compare the speed of the PSO algorithm and neural netw ork in the presence of obstacles in the space. The test was held for 30 agents and one source.

The graph y-axis is the total distance from all agents to source. Shorter distance indicates the approach to source.

Oscillation distances indicate a bad choice of trajectory or avoiding obstacles. So me of the agents are able to perform the task at the expense of the architecture. But this is true for a small percentage of agents only with weights which were close enough to optimu m values. A larger percentage of the NN outputs are undefined action signals.

The convergence of PSO algorithm stops after 600 iterations. All agents find the source. The agents behind obstacles have not been able to reach the source. At this point the algorithm stops a convergence. The algorithm demonstrates a convergence for a good distribution of the weights. The other portion of the agents demonstrates a circular motion. Near 600-th iteration there is a small lift of the reward line. At this point the portion of the agents was detained by obstacles and has not been able to overcome them.

A second series of experiments were carried out with 30 agents. The NN agents are initiated b y values from Table II. In Figure13 there is a comparison between PSO and NN algorith ms initiated by the weights from Tab le II. Taken the matrix weights trained for 60 agents.

Figure 13. Compare the speed of the PSO algorithm and neural netw ork in the presence of obstacles in the space. The test was held for 30 agents and one source. Weights of the neural network of the agents initiated the second column values from Table II.

The third series of experiments were carried out with 30 agents. Figure 14 shows a comparison of PSO and neural network algorith ms. Taken the matrix weights trained for 30 agents.

(16)

Figure 14. Compare the speed of the PSO algorithm and neural netw ork in the presence of obstacles in the space. The test was held for 30 agents and one source. Weights of the neural network of the agents initiated the third column values from Table II.

There is an analytical line highest possible speed approaching the agents to the source in the chart [1, 2].

The matrix wh ich is obtained when training 60 agents was less suited to work in a new environment. Environ ment with a large number of agents is very stochastic and dynamic. Agents often meet each other. Opportunities for maneuvers agents are minimal. Rank of the agent may fall due to the failed location. The system rewards too simp lified to be able to take into account the failed location of the agents. The NN resources just are not enough for the intelligent response to rapidly changing environment. This problem will call "effect crowd". This slows down the learning process. The solution to the problem o f the «crowd effect» is the increase in training time and increases the resources of the NN.

At the initial stage the PSO algorithm demonstrates a higher speed than the NN. But later the agents meet obstacles and the speed drops. The curve of the PSO algorith m is asymptotically parallel to the horizontal axis. Agents that are managed by the NN well obviate obstacles. At the final stages of simulation the NN s uperiority is obvious. The agents converge to the lowest possible distance fro m the source and remain in this position until the end of the simulation.

The experiments show that the method of genetic selection enables in the acceptable time to optimize the target functions of agents. The proposed algorithm allows generating new networks based on the rewards.

During testing one another's agents detection disabled. Agent sensors perceive only static obstacles. This is a forced measure. A large concentration of agents around the source of the perceived new agents as an obstacle. They activated function crawl impediments. This affects the convergence of the algorith m.

The result of the test of convergence depends greatly on the location and shape of the obstacles. Learning agents conducted in space c located in random places round obstacles. Trained agents are easy to bypass obstacles. In tests on convergence lengthy obstacles were used. This form of obstacles delaying the agents working on PSO algorith m. But trained agents have shown the ability to move along the longest obstacles and find the exit.

On the test results is strongly influenced by the angle sector controlled by distance sensors. If the angle is small, sharp corners when crawling obstacles can occur collisions. Agents with a small angle sector review cannot move without clashes along the longest obstacle. The large viewing angle and a small number of sensors leave large blind spots. This is especially true for small obstacles. The increase in the number of sensors entails an increase in the size of the networks and slowing learn ing.

The following section explores the difficulties that have arisen during the tests and examines possible approaches to addressing them.

7 Discussion

This model uses the neural network forward propagation (feed-forward). Such networks may not provide long-term planning for action. Testing has shown that agents are not able to remember and use the best routes found. The agent directed for instant profit not for the long term. The solution to the problem can become a recurrent network. Recurrent networks have the connection from the output layer to the inner layers or to the input layer. Information on the recurrence relations served with delay. Consequently, the network may h ave internal States, and memo ry. Using recurrent networks in the context of rein forcement learn ing will keep a history of the states and make pred ictions [30, 15]. As emphasized in the listed sources, predicting the action agent helps cope with partial observability of the environment.

Another constraint to increasing intellectuality of the agents is fixed neural network architecture. Specified by apriority the architecture uses in the model is not the best. The proposed topology can be changed only in t he direction of rupture ties. Neural networks with a large number of hidden neurons adapted more quickly. Evolutionary search in a topologies space would optimize not only network weights gain but also the architecture of the network. The common name is used in the literature fo r methods of evolution of architecture topology -and weight-evolving artiﬁcial neural networks (TWEA NNs) [30, 14].

Discussed in the article hybrid approach to teaching swarm weakly interacting agents in continuous dynamic partially observed environment demonstrates a quick convergence. Neural network allows the agents to store and summarize the experience of previous interactions with the environment. Genetic algorithms are able to find the target control functions at low initial information. You only need to provide a value for the action of the agent. Such a function in real-time allows you to build a learning algorith m with reinforcements.

Considered a hybrid algorith m can be improved by expanding neural network architecture. Introdu ction to neural network recurring relationships with delay will consider when choosing action immediately p receding the current status.

Genetic algorith m may be supplemented by a system of crossing genes and point mutations. Point mutations allow diversifying population. This will help faster target function of local min ima.

Hybrid algorith m is very sensitive to the choice of the system of rewards. In experiments the system rewards very emp irical way. Analytical method for determining the system of rewards for a particular task does not exist. Required to strike a balance between negative and positive rewards. A particu larly acute problem arises at the beginning of the training. The space is filled with obstacles provokes frequent clashes at the beginning of training. Only agents that located at the beginning of learning close to sources can get a positive reward. Active research environment distant fro m sources leads to the accumulation of negative rewards by agents. If the number of agents with positive reward is small

(17)

the process of formation of new networks is slowing. On the other hand high positive rewards promote survival inefficient neural networks. Output in future works can be a dynamic system of rewards.

9 Conclusion

Swarm intelligence is used in many areas of industry and science. Use large groups of simple agents justified in situations involving risk to life and health. Examp les of such situations are hotbeds of radioactive and chemical contamination, high temperature zones and large pressure. Intelligent agents are actively being studied since the middle of last century. Active study of swarm intelligence started relatively recently. This is due to the large computational difficulties in examining collective intelligence. The purpose of this pap er to offer a hybrid approach to the problem of learning intelligent swarm agents in continuous, non-deterministic, partially observed environment.

Physical problem is formu lated at the beginning of the article and built a mathematical model of the search environment. Intelligent Agent architecture is considered in detail. Existing approaches to the solution of the problem in the next section discusses. Weaknesses of the swarm intelligence algorith ms, neural networks and genetic algorithms are analyzed in detail. The need for a hybrid algorith m based on the considered algorithms is justified.

Decentralized control architecture for swarm intelligence continued environment with taking into account the fault tolerance of the system was built. We have introduced both an architecture of interactions between agents swarm with environment and the architecture of interacting between each other. The approach bases on the neural network and fault tolerance that was provided by genetic algorith ms and reinforcement learn ing. Rewards system to encourage agents was conducted in accordance with the reinfo rcement learning algorith m. Issues of neural network performance and information exchange between layers in the neural network were exp lored. We have taken into account heuristic assumptions about possible ways to speed up the algorithm. We have tested proposed hybrid algorithm to find radiation sources and bypass obstacles on mock-up simulator. The result of the tests shows convergence that directly depends on the location and shape of the obstacles. The experiments show that the method of genetic selection enables in the acceptable time to optimize the target functions of agents. The proposed algorithm allo ws generating new networks based on the rewards that affect positively to learn ing behavior of agent.

In future work it needs to find a more precise formu la for generating neural networks. The proposed formula (8) does not take into account the history of changes of weights and the symmetry of the network configuration with respect to the right and left turns. By taking into account above factors it can be assumed rapid learning and, consequently, the possibility of increasing the size of the network without compro mising performance.

References

1. Akzhalova, A., Inoue, A., & Mukharsky, D. (2014). Intelligent Mobile Agents for Disaster Response: survivor search and simple co mmunication support. AROB 2014 International Symposium on Artificial Life and Robotics. Beppu, Japan.

2. Akzhalova, A., Kornev, A., & Mukharsky, D. (2014). Intelligent control technique for autonomous collective robotics systems. Computational Intelligence and Computing Research (ICCIC), 2014 IEEE International Conference on (pp. 1-8). IEEE.

3. Baxter, J., Weaver, L., & Bartlett, P. (1999). Direct gradient-based reinforcement learning: II. Gradient ascent algorith ms and experiments. National University.

4. Bellman, R. (1957). A Markovian decision process (No. P-1066). RAND CORP SANTA MONICA CA.

5. Bellman, R. E. (2015). Adaptive control processes: a guided tour. Princeton university press.

6. Beni, G., & Wang, J. (1993). Swarm intelligence in cellular robotic systems. In Robots and Biological Systems:

Towards a New Bionics? (pp. 703-712). Springer Berlin Heidelberg.

7. Beville, J. I. (2006). Particle Swarm Optimization. Technical report, Miki Lab, Doshisha University Japan.

8. Bonabeau, E., Dorigo, M., & Theraulaz, G. (1999). Swarm Intelligence: Fro m Natural to Artificial Systems.

Oxford University Press.

9. Bonabeau, E. & Theraulaz, G. (2000). Swarm s marts (behavior of social insects as model for co mplex systems).

Scientific American, 282(3), 72–79.

10. Chen, J., Zhang, W., & Lim, C. C. (2013). Self-Organization In 1-d Swarm Dynamics. arXiv preprint arXiv:1309.2959.

11. Cohen, P. R., & Levesque, H. J. (1991). Teamwork. Nous, 25(4), 487-512.

12. Couceiro, M. S., Rocha, R. P., & Ferreira, N. M. F. (2013). Fault-tolerance assessment of a darwin ian swarm exploration algorithm under co mmunication constraints. Robotics and Automation (ICRA), 2013 IEEE International Conference on, 2008-2013.

13. Couceiro, M. S., Figueiredo, C. M., Rocha, R. P., & Ferreira, N. M. (2014). Darwin ian swarm exp loration under communication constraints: Initial deploy ment and fault-tolerance assessment. Robotics and Autonomous Systems, 62(4), 528-544.

14. Dasgupta, D., & McGregor, D. R. (1992). Designing application-specific neural networks using the structured genetic algorithm. Combinations of Genetic Algorithms and Neural Networks, 1992., COGANN -92. International Workshop on, 87-96.

15. Go mez, F., & Sch midhuber, J. (2005). Evolv ing modular fast-weight networks for control. Artificial Neural Networks: Formal Models and Their Applications–ICANN 2005, 383-389. Springer Berlin Heidelberg.