Immuno-Inspired Embodied Lifelong Learning in Robots

The results of the experiments indicate the acceleration in learning and the effectiveness of the neuronal transmission of controllers. 75 3.6 Variation in the Champion fitness of the RMW-M version - Phototaxis 75 3.7 Variation in the Champion fitness of the ψ-M version - Obstacle.

Inspirations from Nature

Inspiration from the biological immune system (BIS) has resulted in a new paradigm known as the Artificial Immune System (AIS) which uses theoretical immunology to solve many problems in the computational world. One of the main application areas of bio-inspired computing is in the field of robotics.

Learning in Robots

Algorithms inspired by biological evolution have opened up a wealth of solutions in the field of machine learning [5]. The enormous diversity exhibited by processes in the biological world provides classic inspirations for emulating them in robotics.

Research Challenges and Objectives

Such a bias can hinder the learning of the robot controllers, or the learning can be interrupted altogether, which is a challenge. The selected controller must take into account the current state or state of the robot.

Background

Evolutionary Algorithms

Evolutionary robotics uses evaluation, selection, and variation for the robots' evolving controllers based on their fitness values. Based on the observed behavior of the robot and the desired goals, a predefined fitness function is used to evaluate the controllers.

Artificial Immune Systems

The primary purpose of the immune system is to identify the self (host cells) and non-self cells (antigens) in the body and eliminate the latter. This increase in the population of antibodies better suited to the situation helps to quickly contain the growth of the antigens.

Contributions of the Thesis

Embodied Lifelong learning

However, there is no real method to determine the optimal value of the number of controllers to be cached, especially when the learning is performed in an online and on-board manner. This dynamic regulation of the number of controllers residing in a HoF calls for a proper mechanism for evicting the non-performers based on the current state of the system.

Enchancing Neuroevolution

Neuronal Transfer Learning

Integrating Lifelong learning and Transfer Learning

Outline of the Thesis

The concept of the Concentration for the reselection of the controllers from the HoF is also introduced. Recognition is based on the binding of the antibody (Ab) to the immune cell with the antigen (Ag).

Artificial Immune System

Related Work

AIS in Robotics

The misaligned robot detects the uncertain conditions and learns to detect and adapt to such conditions in the future. 94] have extended the Idiotypic Network theory [62] with information sharing to make multiple robots learn and adapt their behavior to changes in the environment.

Hall-of-Fame Approach

This essentially means that the maximum number of individuals that can reside in the HoF must be determined somehow, either empirically or otherwise, and then set as a hyperparameter. The exposure, deletion rates, and diversity threshold must also be set as hyperparameters.

Methodology

Structure of Antigen and Antibody
The Immune Network
Hall-of-Fame of Antibodies
Eviction of Ex-Champs from HoF
iAES-HoF Algorithm

The concentration of Champ and all Ex-Champs in the associated HoF varies due to stimulations and suppressions as determined by Eq. Champs and all Ex-Champs are suppressed by the Champ related HoF. RM ax is the maximum resource value initially assigned to the antibody upon entry into the HoF.

Experimental setup

Simulation Setup

Three out of the eight IR sensors on board the e-puck are used to measure the distance of the robot from an obstacle. A simulated ceiling light suspended in the center of the arena within Webots facilitates as the light source to be used for Phototaxis. When there is no obstacle in the path of the robot, the value of d is high.

Real Robot Setup

As with the simulated setup, the experiments with the real robot aim for a high fitness value. Similar to simulation, here too the real robot placed in the arena samples the environment through its sensors. All related parameters in the real robot experimental setup were set to those stated in the simulation setup.

Results and Discussions

Effect of Caching Champs

The graph in Figure 2.7 represents the change in Champs fitness values over generations for (1+1) Online EA. It can be seen from the graph that there are drastic rises and falls in the value of physical fitness. Developed antibodies in iAES and iAES-HoF are indicated by English letters.

Effect of ϵ

The iAES-HoF algorithm may also suffer from similar issues due to antigen switching and diversity as in the iAES version. In the simulated robot experiments, for ϵ equal to 0.3, 0.45 and 0.6, the number of active regions generated turned out to be: (a) For the task of OA - 7, 4 and 2 for the respective ϵ- values (in both iAES and iAES-HoF). The higher values of CAF, especially when ϵ was set to 0.45, indicate this improvement in learning.

Significance of Resource

The concentration increases to CM1(g) due to the stimulation of the antigen and all current Ex-Champs in the associated HoF. If M1 fails to maintain its position as Champ nag+δ, it would become an Ex-Champ and reside in the associated HoF. Figure 2.19a shows the sharp increase in the concentration of a Champis as a result of the stimulations it received from both the antigen and the Ex-Champs in the associated HoF.

Summary of the Chapter

ANN performance is measured based on a task-dependent Objective or Fitness function, which helps to calculate Fitness values [132]. The powers associated with the weights are consumed and replenished based on the efficiency of the evolving ANN. If the performance of the ANN improves as a result of the mutation, then the powers of the corresponding weights are accumulated.

Methodology

Concept of Mutational Puissance

If the mutations result in an ANN showing better performance, the puissances of the weights that have been mutated are augmented; otherwise they are reduced. Consumption is proportional to the current value of theψ and the difference in fitness values of the parent and the evolving child ANN. The complement of ψ is proportional to the moving average of the differences in the fitness values of the parent and the child ANN over a window of generations.

Proposed Mutational Puissance assisted Neuroevolutional Al-

The puissances associated with the weights in the weight matrices are first initialized to the maximum value of the ψ, viz. In this case, based on a random probability Prdm mutate, the challenger is developed by either mutating a random number of weights of the champion (Mute Random Weights function) or by mutating the champion's weights according to the ψ values associated with the weights ( Mutate Puissance W ights function). A minimum of 25% of the total number of weights is empirically set to mutate in a weight matrix.

Experimental Setup

It can be noted that the task of avoiding obstacles is less complicated than phototaxis, as the latter inherently also includes the task of avoiding obstacles during its movement towards the light source. The hidden layer consists of 13 nodes, while the output layer has two nodes, corresponding to the speeds of the left and right motors, respectively. The evaluation time in the run eval function was set to thirteen seconds, i.e. the respective ANN controller was evaluated after it had run for thirteen seconds.

Results and Discussions

Therefore, there is almost no improvement in the performance of the ANN controller over the generations. Figures 3.7 and 3.8 show the fitness values of evolving champions across generations for the respective obstacle avoidance and phototaxis tasks using the ψ-M strategy. The decay mechanism proposed in the algorithm avoids the accumulation of mutational forces associated with ANN weights.

Summary of the Chapter

Transfer Learning

Lange et al. [80] present a method where only the last few layers of a deep network are refined. In addition to fine-tuning, it is also essential to identify the specific layers of an ANN that need to be transferred. Based on a policy network, they decide whether to pass the input through a series of fine-tuned or pre-trained layers.

The Idiotypic Network

Methodology

Immuno-inspired Idiotypic network based Transfer

This work proposes an idiotypic network (IN) inspired transfer of neurons from AN NS to AN NT, where only some of the neurons in the layers are transferred. The neurons in each layer form a population and form an idiotypic network that is local to that layer. After transferring and freezing the weight of the hot neurons in AN NT, it is trained using the target dataset, DT.

Experimental Setup

Experiment #1:Transfer of Hot neurons from a XOR AN N S

The targets, AN NT1 and AN NT2 used to learn AND and OR logic, respectively, had the same neural architecture as AN NS. All the weights associated with the non-transmitted neurons of AN NT1 and AN NT2 were initialized randomly, in both these cases, viz. The weights associated with the transferred neurons in both these cases were frozen during the training phases of AN NT1 and AN NT2.

All hyperparameters were set to the same values as those of the CN NS used to learn the Devanagari script. For the Idiotypic Network-Based Transfer, ▶2, ▶3, and ▶4 are followed, respectively transferring the top 2, 3, and 4 hot neurons of each of the population from the first two convolutional layers of CN NS to the first two convolutional layers of CN NT1 and CN NT2. For the full layer-wise transfer▶λ, all neurons of the first two convolutional layers were transferred.

Experiment #3:Transfer of Hot neurons from CN N S trained

Results and Discussions

Results from Experiment#1: XOR to AND and OR logic
Transferring Hot neurons from Devanagari Character dataset
Transferring Hot neurons from MNIST dataset to USPS dataset 96
Evolving the repertoire of Robot Controllers - iAES-HoF Al-

Hot neurons were identified in each population of neurons in the first and second convolutional layers. While the native repertoire evolves, the temperatures of the neurons in the evolving controllers change in proportion to the gradients of their respective skill values and weights. TL is mostly successful in cases where the source and target domains are similar [99].

Methodology

NeEvoT Algorithm

The temperatures of all neurons in all controllers are calculated for each generation. Higher values of the same indicate that the associated neuron has positively influenced learning, making it a possible candidate for a transfer to ET. In the get best controller() function, the best controller from the corresponding HoF at that point in time is retrieved from the ReS and mutated, after which the suitability of the mutated controller is determined.

Learning in the Target Environment

As can be seen from the above equations, under such conditions the temperature increase can be insignificant or negative. Thus, over generations, the temperature of a neuron will show its influence on the learning exercise. The hottest n neurons within the transferable layers of each controller are marked and the complete repertoire of the source ReS is transferred to the target robot where it serves as its external repertoire (ReEx).

Experimental Setup

Results and Discussions

Evolving Similar Tasks T =

The graphs in Figures 5.3–5.7 show the intergenerational differences in fitness and cumulative average fitness (CAF) values for repertoire transfer from ES to ET, where both environments develop the same obstacle avoidance task (T=). In the case of T={0} (Figure 5.3), where there is no repertoire transfer (ie, the controller evolves from scratch), it can be seen that there are many rises and falls in fitness values over generations and the learning curve is not smooth. In the T={7} and T={λ} cases (Figures 5.6 and 5.7), the curves seem to fluctuate less and end up with higher CAF values than the previous ones.

Evolving Dissimilar Tasks T ̸=

In the graph for T̸={0} shown in Figure 5-8, since the repertoire is not transferred to the target, the controllers must be developed from scratch. It can be noted that the fitness values in the case of T= and T̸= are different as the corresponding equations applicable to them are different. In contrast to the observations in the DLS values for T= (Figure 5.13), those in T̸= achieve higher DLS values for lower orders compared to the higher orders (including full transfers).

Summary of the Chapter

The Concentration and Resource parameters controlled the re-selection and eviction of the controllers of a HoF. The concentration and resource values of the controllers were tuned dynamically based on the performance of the respective controllers and were not set a priori. The experimental results indicate improved learning in the target robot controllers after handover is affected, even when the source and target environments of the respective robots are different and when they have to learn new tasks.

Future Research Directions and Applications

Proceedings of the third international conference on Simulation of adaptive behavior: From animals to animals 3, pages. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. Mutational puissance assisted neuroevolution. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, pp.

A Shape Space depicting the Active Regions within

Idiotypic Network formed by antibodies within an HoF

Arena in the Webots simulator

An Artificial Neural Network as a Robot Controller

The Arena and the Firebird Robot used for experiments

Evolution of Champs in (1+1) Online EA for the OA task in the

Evolution of Champs in an Active Region with ϵ = 0.3 for the OA

Evolution of Champs in (1+1) Online EA for the PT-OA task in the

Evolution of Champs in an Active Region with ϵ = 0.3 for the PT-OA

Evolution of Champs in an Active Region with ϵ = 0.45 for the PT-