A Computational Model of the Hybrid Bio Machine MPMS for Ratbots Navigation

(1)

A Computational

Model of the Hybrid

Bio-Machine MPMS

for Ratbots Navigation

Lijuan Su, Nenggan Zheng, Min Yao, and Zhaohui Wu, Zhejiang University

As a typical cyborg

intelligent system,

ratbots possess

not only their own

biological brain

but machine visual

sensation, memory,

and computation.

They could help

us understand

the memory and

learning mechanisms

of cyborg intelligent

systems.

devices (artificial intelligence) and biologi-cal brains (biologibiologi-cal intelligence). Research-ers have proposed a concept and architecture for cyborg intelligence that integrates biolog-ical intelligence with artificial intelligence.1

Through complementary integration, the cyborg intelligence can solve complicated problems in general environments, which neither biological nor artificial intelligence system can tackle alone.

A typical cyborg intelligence system is the ratbot, which is a rat with electrodes im-planted in the medial forebrain bundle of its brain.2,3_{The electrodes are connected to an}

embedded backpack fixed on the rat to deliver the stimulation pulses that give a reward ac-cording to rat behavior. The real-time reward related to the rat’s transient action can affect the animal’s learning and memory processes. In a goal-oriented task, for example, ratbots can reach the goal faster and learn the optimal path in fewer trials.

The fantastic performance exhibited by ratbots in maze-learning tasks comes from

their novel memory and learning system, which is built up by introducing a real-time

medial forebrain bundle (MFB) reward into the existing multiple parallel memory sys-tem in their biological brains. However, there’s a research gap between the compu-tational model and experimental results. We aim to create the computational process for explaining the mechanisms underlying a ratbot’s superior learning performance and figuring out how the input information is processed to generate behavior.

Multiple Parallel Memory Systems in Ratbot Navigation Neurobiological results suggest that the memory system in rat brains is composed of several distinct anatomically and func-tionally dissociable subsystems.4_The

cur-rent classical version of the multiple parallel memory systems (MPMSs) theory hypothe-sizes three central structures: the hippocam-pus, the dorsal striatum, and the amygdala.5

The respective neural circuits of these three

T

he biological brain is the most sophisticated, efficient, parallel, and

low-energy consumption system to exhibit advanced cognitive functions.

As the technology behind brain-computer interfaces becomes more and more

(2)

C Y B O R G I N T E L L I G E N C E

structures encode and process special-ized memory information and gener-ate rat behavior to the outside world.

The hippocampus subsystem is as-sumed to represent the relationship between stimuli and events from physical environments. For example, a particular location can be stored by place cells; when the rat passes through this place, the corresponding place cells fire significant bursts of ac-tion potentials.6

The second subsystem, with the dor-sal striatum as its central structure, represents stimulus-response (S-R) re-lationships.5_{When first faced with an}

environmental stimulus, animals might respond accidentally or instinctually. But if they encounter a reinforcer, the association between the stimulus and accidental response will be strength-ened. The next time the animal is in the presence of the same stimulus, it will be more likely to exhibit the same response.

The relationship between neutral stimuli and reinforcers is represented in the third subsystem of the amyg-dala’s central structure. Associated with the reinforcer, a neutral stimulus can also evoke conditioned responses similar to those initially elicited by the reinforcer.

The hybrid bio-machine MPMS in ratbots is constructed from the

combination of the aforementioned neural circuits and the computer-de-livered MFB reward loop. As Figure 1 shows, the physical relationships among environmental cues (barriers and walls in the maze) in the hippo-campus subsystem form spatial maps that direct ratbots’ ongoing navi-gation behavior. In behavioral ex-periments, the reward association between a landmark and the right choice in the maze is processed by the striatum subsystem to make the ratbot select the correct route choice more likely as the training time increases. By introducing computer-controlled MFB rewards into ratbots, the rein-forcer distribution in the maze learned by the computer guides the rat to for search higher reinforcers with stron-ger MFB stimuli, represented by the amygdala subsystem in the rat’s bio-logical brain and the MFB reinforcer distribution map in the computer.

Computational Model of the Hybrid Bio-Machine MPMS in Ratbots

Figure 2 depicts the computational model of a ratbot’s MPMS. By intro-ducing real-time MFB stimulation into the rat brain, the new hybrid bio-machine MPMS encodes environ-ment sensation inputs as neural repre-sentation in the rat’s brain or a map

in the computer, integrates the repre-sentation into various associations, and then generates the motor selec-tion. Three subsystems complete the environmental information encoding, memory association forming, and mo-tor output.

The hippocampus subsystem is re-sponsible for processing allothetic and idiothetic information.7_Allothetic

place cells (APCs) encode distances to walls and barriers, while idiot-hetic place cells (IPCs) represent the current position on the inputs of the rat’s movement speed and direction. Hippocampus place cells (HPCs) in-tegrate these two pathways into spa-tial memory by associating related representations.

The dorsal striatum subsystem en-codes the relationships between land-marks and actions. The position of landmarks in the environment is rep-resented by landmark cells (LCs).8

Given a stimuli of a landmark in a maze, the rat is rewarded for its re-sponse of a specific movement ac-tion. The strengthened associations between the landmark and the cor-responding action are represented by larger interconnection weights be-tween dorsal striatum cells (DSCs) and action cells (ACs) in the compu-tational model.

In the third subsystem, the computer (that is, the backpack) senses environ-ment cues in the physical world to rec-ognize the rat’s current position and movement actions. The computer also learns a reward map (position s, re-ward r) for the experiment scenario using a Q learning algorithm itera-tively along with the rat’s navigation behavior in the maze. For each rat state at position s, the computer de-livers the real-time electric stimulus r to the MFB of the rat’s brain accord-ing to the reward map. The real-time virtual MFB reward affects the amyg-dala to update the connection strength Computer Hippocampus

Amygdala Striatum

Distance to walls

Landmark Motor

Environment State and reward

(3)

the actions (AC in Figure 2), the rat will prefer the movement action with maxi-mum real-time reward prediction and run in the corresponding direction. The computer will recognize this movement action as input for the successive itera-tions of real-time MFB rewards.

Computational Model for the Hippocampus Subsystem

The hippocampus subsystem is con-structed as a connectionist model (see Figure 3),4_{with different}

in-puts of space information encoded. In the allothetic pathway, distances to the maze walls are represented by APCs, whereas the proprioceptive inputs (rat’s own speed and head direction) are processed by IPCs.8

Both APCs and IPCs are projected to HPCs for integrating their repre-sentations into a spatial memory of the current position. HPC activity is linked to the input of ACs to gener-ate motor output.

Allothetic information about envi-ronment geometric properties plays an important role in hippocampal spatial representation. The environ-ment’s geometric information is en-coded as the distances from the rat’s current position to the surrounding maze walls(d1, d2,…, dNA), in NA

di-rections angles (j1, j2,…, jNA). The

firing rate of the APC j is computed as

a exp N

In the idiothetic pathway, suppose the agent (that is, the rat or ratbot) is moving at speed v(t) in the direction angle b(t), and its current position

pt(xt, yt) is computed by the previous

The activity of IPC i is computed as follows:

where p(t) is the agent’s current posi-tion, pi is the center field of IPC i, and sIPC is the width of the IPCs.

With inputs from two different rep-resentations of spatial information in

APC and IPC populations, the HPC populations represent the current lo-cation. For an HPC cell indexed m in the population, we calculate

a w a Hebbian learning algorithm,9

∆w_mi =µ*a_mHPC*

(

a_iAPC−∆w_mi

)

(6) ∆wmj=µ*amHPC*

(

aIPCj −∆wmj

)

, (7)

where m is the learning rate, and a_iAPC,

a_jIPC, and a_mHPC are the activities of

Figure 2. Computational model of the hybrid bio-machine MPMS. AC = action cells;

AMC = amygdala cells; APC = allothetic place cells; DSC = dorsal striatum cells;

HPC = hippocampus place cells; IPC = idiothetic place cells; LC = landmark cells;

MFB = medial forebrain bundle; and QL=Q learning algorithm.

(4)

C Y B O R G I N T E L L I G E N C E

APC cell i, IPC cell j, and HPC cell m, respectively.

Computational Model for the Striatum Subsystem

The striatum subsystem receives the landmark information and associates it with a specific action when the land-mark and the reward appear in pairs, as Figure 4 shows. The landmark in-formation is processed by a population of LCs that correspond to directions in

(

ϕ₁, ϕ₂,…,ϕ_N

)

, ϕ_i=2* * /π _LC

LC i N .

If a landmark is detected in direction

ji, the activity of LC j in that direction

is 1, that is, I_ij= 1. The firing rate of the DSC populations is computed as

a exp N

resentation of landmark in NLC

di-rections for the DSCs, and Ik is the

current view in DSCs.

Computational Model for the Amygdala Subsystem

With inputs of rat action a and cur-rent state s captured by the comput-er’s camera at time t, the environment Q value Q(St, at) is calculated by a Q

learning algorithm in Figure 5 run-ning on the machine as follows:

Q(st, at) = Q(st, at) + a(rt+b *

max(Q(st+ 1, at+ 1)) - Q(st, at,)), (9)

where rt is the reward, a is the

learn-ing rate, and b is the discount factor that determines the current value for future rewards. For state st, the

com-puter calculates the Q(st, at) and

de-livers the MFB stimulus intensity SI mapping from the Q(st, at).

In the behavioral experiments, to ensure that the ratbots can distinguish different intensities of MFB stimula-tion, the number of the MFB stimu-lation was set to 7 according to our preliminary experiments. Similar to the behavioral experiments, the com-putational model mapped the Q(s, a) to seven groups of SI intensities. The

smallest Q(s, a) was mapped to the SIlow, whose value was 10; the

larg-est Q(s, a) was mapped to the SIhigh;

whose value was 70; and the other Q(s, a) was computed using the a lin-ear interpolation.10

Action Selection for Behavior Outputs by TD Learning

A population of ACs sums up the projections of APCs and DSCs to de-termine the agent’s movement direc-tion angle f: which can be calculated in Equations 5 and 8. The weights wli and wlj between

AC l and HPC i, DSC j are updated by the temporal difference (TD) learning algorithm,11_{depicted in Figure 6.}

The state-action value Q(s, f) is defined as state s encoded by the ac-tivity of HPCs, DSCs, and the action determined by the movement angle

f. The activity of AC l represents the movement activity in direction jl =

2p/NAC * l. We get the movement

direction f of the behavior output en-coded by the AC populations from the following equation:

∅ = =

( )

Evaluation of the Bio-Machine MPMS Computational Model

The task in the biological experi-ments is to reach the goal location in the maze shown in Figure 7, where S is the start location, G is the goal lo-cation, the yellow circle is the deci-sion point, and the purple star is the

(d_1,...,dn₎ ₍v_{, theta)}

Allothetic pathway Idiothetic pathway

…

… …

APC

HPC

IPC

(5)

size of which is 150 cm × 150 cm × 15 cm. For each trial, the agent be-gins at the start point and finishes at the goal location. To reach the goal location, the agent must make six de-cisions from two choices each. One choice leads to the target location and the other leads to a dead end where the agent can only go back to the previous location. The correct directions in dif-ferent mazes can be changed to con-duct different experiments. Figure 7 is maze A, whose correct directions are RLLRLR (R means right, L means left); correct directions for maze B are LR-RLRL and for maze C LLR-RLRLR. The biological experiments are performed in three groups:

• The control group has no landmark at the decision-making point. The rat learns the optimal path to the goal only through the hippocam-pus memory subsystem.

• The landmark group has a land-mark at each decision-making point to indicate the right direction. The rat learns the path to the goal loca-tion from the hippocampus subsys-tem and striatum subsyssubsys-tem.

• The MFB group has the ratbot learn the goal location through the computer-delivered virtual reward based on amygdala subsystem and hippocampus subsystem.

In the control group, only the hippo-campus subsystem forms the relation-ships between environment info mation. With inputs of distances to walls in the allothetic pathway and the rat’s own speed and direction in idiothetic pathway, the hippocampus subsystem builds the spatial relationships. As Fig-ure 8 shows, the control group need six

trials to make the percentage of right choices higher than 83.3 percent (de-fined as the high level, which means at the six decision-making points, the agent makes correct choices at least five times). The trial number for the con-trol group to reach the high level in the computational model and biological

experiment is the same: six trials in three mazes.

In Figures 8b, 8d, and 8f, the blue line represents the simulated learn-ing process for control rats. In this group, we assume that only the hip-pocampus subsystem is involved. In the maze-learning task, two kinds of

LC

...

DSC

Figure 4. Computational model for the dorsal striatum subsystem in the MPMS.

State detection Action detection

Q-value table

Update Q (St_,at₎

Reward level

SI Q (St_,at₎

...

... ... at

St

(6)

C Y B O R G I N T E L L I G E N C E

representations of spatial information are encoded by APC and IPC popu-lations. The environmental geomet-ric properties information is passed to the APCs, whose firing rates are computed by using Equation 1. In our simulation, N APCs represent the rat’s current position. The other kind

of idiothetic information is used to compute the activity of IPCs by us-ing Equation 4. With these inputs, the HPC’s activity is calculated by Equa-tion 5 to represent current environ-mental information. The random initialization of weights between HPCs and APCs/IPCs makes the percentage of right choices in the first trial start randomly, as shown in Figure 8b. Af-ter several trials, the weights between HPCs and APCs/IPCs are updated to choose the direction that leads to the goal position using Equations 6 and 7. As Figure 8b shows, from the sixth trial, the percentage of right choices is greater than 86.85 percent.

There are hippocampus and striatum subsystems involved in the landmark group. As shown in the control group, the hippocampus subsystem will build spatial relationships between walls in the maze. In the striatum subsystem, the landmark at the decision-making point is associated with the correct movement direction. The activity of the AC encoding this movement direc-tion is increased to the sum of these two subsystems. In this computational

model, the trial number to reach the high level is the same with the biologi-cal experiment in all three mazes.

In Figures 8b, 8d, and 8f, the green line represents the simulated learn-ing process for rats in maze with land-marks. In this group, the hippocampus and striatum subsystems are assumed to be involved in our proposed MPMS model. In the maze-learning task, the two subsystems encode the spatial and landmark information. The spa-tial properties information is processed by the hippocampus subsystem, and the landmark information is passed to the striatum subsystem. With this in-formation, the firing rates of HPCs and DSCs are computed using Equa-tions 5 and 8. In the first trial (Figure 8b), the percentage of right choices at the six decision-making points is 52.15 percent. After several tri-als, the weights between ACs and HPCs/DSCs are updated using Heb-bian learning. As shown in Figure 8b, from the fifth trial, the percentage of correct choices is greater than 87.23 percent. With the involvement of land-mark information, the computational

Figure 6. The temporal distance (TD) algorithm learning to update the weights.

Initialize weights randomly in the open interval (0, 1)

for Trial = 1: _Ntrialdo

while_{current state is not the goal}do

Choose the action a defined by equation 11 with probability ξ, or a random

action a in the probability of 1 − ξ

Action a is executed, the time step is updated t = t + 1.

Calculate the MFB stimulus intensity SI in amygdala subsystem.

Calculate the reward prediction error δ(t).

δ(t) = R(t) + γQ(s_t, a_t) +SI − Q(s_t_{− 1}, a_t_{− 1})

where the R(t) is the reward received from state s (t − 1) to s_t at time t

and γ is the discount factor.

Calculate the eligibility trace.

e_ji (t) = λe_ji(t – 1) + a_jAC * a_i

where the γ is the decay rate of the eligibility trace.

Update the Connection weights w_ji between HPC/DSC i and AC j populations,

w_ji = w_ji+ µ ∗ δ(t) ∗ e_ji(t) where µ being the learning rate.

end while end for

G S

(7)

T1 T2 T3 T4 T5 T6 T7 T8 0.4

0.5 0.6 0.7 0.8

Percentage of right choices

T1 T2 T3 T4 T5 T6 T7 T8

0.4 0.5 0.6 0.7 0.8

T1 T2 T3 T4 T5 T6 T7 T8

0.4 0.5 0.6 0.7 0.8 0.9 1.0

T1 T2 T3 T4 T5 T6 T7 T8

0.4 0.5 0.6 0.7 0.8 0.9 1.0

T1 T2 T3 T4 T5 T6 T7 T8

0.4 0.5 0.6 0.7 0.8 0.9 1.0

T1 T2 T3 T4 T5 T6 T7 T8

0.4 0.5 0.6 0.7 0.8 0.9 1.0

(a) (b)

(c) (d)

(e) (f)

(8)

C Y B O R G I N T E L L I G E N C E

model can process the information ef-fectively and predict the number of times the rat correctly reaches the same high performance in the land-mark group (fi ve trials) and the control group (six trials) as the biological re-sults, respectively.

In the MFB group, the amygdala subsystem receives the computer-deliv-ered MFB stimulus in the rat brain. The MFB stimulus intensity is calculated by the Q learning algorithm as the closer to the goal, the higher the stimulus in-tensity. Based on the amygdala sub-system in the MFB group, the rat will build the association between spatial representation (encoded in the hippo-campus subsystem) and reward pre-diction (computed by the computer). To choose the movement with higher reward prediction, the trials needed to learn that the optimal path to the goal is shorter than the route taken by the control group, as shown in Figures 8b, 8d, and 8f. The trial number of ratbots in the MFB group to reach the same high level in the computational model and biological experiment was the same in all three trials.

In Figures 8b, 8d, and 8f, the ma-genta line represents the simulated learning process for rats in the MFB simulation. In this group, the hip-pocampus, striatum, and amygdala

subsystems are assumed to be in-volved. Spatial properties informa-tion is processed by the hippocampus

subsystem, and landmark informa-tion is processed by the striatum subsystem. With these two kinds of information, the fi ring rates of HPCs and DSCs are computed by using Equations 5 and 8. But in the MFB group, after the rats choose an ac-tion via Equaac-tion 11, the amygdala subsystem calculates MFB stimulus intensity and updated the weights between HPCs/DSCs and ACs us-ing Equation 10 and Figure 6. In the fi rst trial, as Figure 8b shows, the percentage of right choices is 69.21 percent, which is higher than that in

the other two groups because of the involvement of the MFB subsystem. After several trials, the weights be-tween ACs and HPCs/DSCs are up-dated to choose the direction that leads to the goal position using Figure 6. As Figure 8b shows, the time it takes to reach the high per-formance in the MFB group (three trials) is faster than the other two groups (fi ve and six trials), which means that with the involvement of the MFB stimulation, the ratbots can update the weights to the goal position much more quickly.

T

hese results are promising. As a next step in our work, the MPMS model will be implemented in a robot to conduct similar behavior experiments and further validate our proposed computational processes. Comparative research will help us dis-cover critical structure and weights in a hybrid bio-machine memory system, which can enhance the learning and memory functionalities of cyborg in-telligent systems, such as the ratbots described in this article.

Acknowledgments

This work was supported by National Key Basic Research Program of China (973 pro-gram 2013CB329504) and partially sup-ported by Zhejiang Provincial Natural Sci-ence Foundation of China (LZ14F020002). Correspondence and questions should be ad-dressed to Nenggan Zheng ([email protected]).

References

1. Z. Wu, “The Convergence of Machine and Biological Intelligence,” IEEE Intelligent Systems, vol. 28, no. 5, 2013, pp. 28–43. 2. C. Sun et al., “Automatic Navigation for

Rat-Robots with Modeling of the Human Guidance,” J. Bionic Eng., vol. 10, no. 1, 2013, pp. 46–56.

3. S.K. Talwar et al., “Behavioural Neuroscience: Rat Navigation Guided by Remote Control,” Nature, vol. 417, 2002, pp. 37–38.

T H E A U T H O R S

Lijuan Su is a PhD student in the Department of Computer Science at Zhejiang Univer-sity. Her research interests include computational intelligence, neural computation, and artifi cial intelligence. Contact her at [email protected].

Nenggan Zheng is an associate professor in the Qiushi Academy for Advanced Studies at Zhejiang University. His research interests include neural computation and real-time systems. Zheng has a PhD in computer science from Zhejiang University. He’s the corre-sponding author for this work. Contact him at [email protected].

Min Yao is a professor in the Department of Computer Science at Zhejiang University. His research interests include computational intelligence, fuzzy system, data mining, and service computing. Yao has a PhD in biomedical engineering from Zhejiang University. Contact him at [email protected].

Zhaohui Wu is a professor in the Department of Computer Science at Zhejiang University. His research interests include pervasive computing, distributed computing, and computa-tional intelligence. Wu has a PhD in computer science from Zhejiang University. Contact him at [email protected].

(9)

5. N.M. White and R.J. McDonald, “Multiple Parallel Memory Systems in the Brain of the Rat,” Neurobiology of Learning and Memory, vol. 77, no. 2, 2002, pp. 125–184.

6. J. O’Keefe, “A Review of the Hip-pocampal Place Cells,” Progress in Neurobiology, vol. 13, no. 4, 1979, pp. 419–439.

al Model of Parallel Navigation Systems in Rodents,” Neuroinformatics, vol. 3, no. 3, 2005, pp. 223–241.

9. R. Kempter, W. Gerstner, and J.L. Van Hemmen, “Hebbian Learning and Spik-ing Neurons,” Physical Rev. E, vol. 59, no. 4, 1999, p. 4498–4514.

10. C. Zhang et al., “Bio-Robots Automatic Navigation with Graded Electric Reward

ment Learning with Replacing Eligibil-ity Traces,” Machine Learning, vol. 22, nos. 1–3, 1996, pp. 123–158.

Selected CS articles and columns are also available for free at http://ComputingNow.computer.org.

A Computational Model of the Hybrid Bio Machine MPMS for Ratbots Navigation

A Computational

Model of the Hybrid

Bio-Machine MPMS

for Ratbots Navigation

As a typical cyborg

intelligent system,

ratbots possess

not only their own

biological brain

but machine visual

sensation, memory,

and computation.

They could help

us understand

the memory and

learning mechanisms

of cyborg intelligent

systems.

T

he biological brain is the most sophisticated, efficient, parallel, and

low-energy consumption system to exhibit advanced cognitive functions.

As the technology behind brain-computer interfaces becomes more and more

C Y B O R G I N T E L L I G E N C E

(

)

(

)

C Y B O R G I N T E L L I G E N C E

(

)

( )

...

C Y B O R G I N T E L L I G E N C E

C Y B O R G I N T E L L I G E N C E

T

T H E A U T H O R S

The #1 AI Magazine

www.computer.org/intelligent

Cutting Edge

stay

on

the

IEEE Intelligent Systems

provides

peer-reviewed, cutting-edge

articles on the theory and

appli-cations of systems that perceive,

reason, learn, and act intelligently.