A convolutional neural network based policy inspired by the cerebellum

However, these methods do not take much into consideration for a good neural network structure to process the robot configuration input. In this paper, we present a novel neural network inspired by the human cerebellum that contains 1-dimensional convolution layers. It is experimentally demonstrated that our new neural network has better expressiveness for shared information than previous models.

The implementation of such a capability for robots is one of the important problems and many researches have been actively carried out to solve it [3, 4, 6]. It consists of three convolutional layers and a new spatial softmax layer to extract image features, and three fully connected layers to generate motor commands from combined image features and robot configuration. However, their work does not take much into account the correct network structure for processing the robot configuration input.

In addition, the plasticity of synapses between parallel fiber and Purkinje cell of the cerebellum provides an ability to learn new behavior according to changes in environments [14, 2]. Based on the functional characteristics of the cerebellum, we hypothesized that a neural network layer that mimics the structure of the neural circuit of the cerebellum can outperform fully connected layers to extract features from robot configuration data. By analyzing the cerebellum, we implemented a new neural network that contains 1-dimensional convolutional layers instead of fully connected layers.

Experimental results show that your new neural network converges faster than a fully connected network in most cases.

Figure 1: Structure of a novel CNN network used to represent visuomotor policy

Policy training based on Guided Policy Search (GPS)

Optimization of local controller under unknown dynamics

For convenience, the further explanation of the method assumes that p(τ) is a single Gaussian distribution, since in real implementation the method can be applied in parallel to any mixing element. One way to fit this time-varying linear Gaussian is to use linear regression to determine the coefficients that make up its mean and covariance. However, a simple linear regression requires a large number of samples for a high-dimensional system such as a robot.

Because the dynamics of adjacent time steps are highly correlated with each other, using data from different time steps and the previous iteration can greatly reduce the complexity of the sample. To use data from other time steps and iterations, a global model is introduced as prior to the dynamics and fitted to state transitions {xt,ut,xt+1} of all time steps and three previous iterations. A Gaussian Mixture Model (GMM) is used to represent this global model because it is good at approximating a piecewise linear system [7], which is a good choice for a dynamics of a robotic system that frequently experiences contact with environments.

Fitting a Gaussian model N(µ,Σ) to the {xt,ut,xt+1} data at each time step and conditioning it on (xt;ut) also contributes to reducing the complexity of the dynamic fitting pattern. The normal inverse Wishart prior to the Gaussian model, which is defined by the primary parameters Φ, µ0, mand n0, allows the use of prior data. If ¯µ and ¯Σ are the mean and covariance of the global model for the dynamics, the preference will be Φ =n0Σ,¯ µ0= ¯µ.

The dynamics sp(xt+1|xt, ut) required to optimize the trajectory may be unknown in general. However, the dynamics, represented as a linear Gaussian, can be easily fitted from samples generated from the trajectory distribution of the previous iteration, denoted by ˆp(τ). To prevent this, the update of the trajectory distribution is bounded by the KL-divergence between ˆp(τ) and p(τ).

Supervised learning of global policy

Cerebellar neural circuit

Function of the cerebellum

Anatomy and somatotopy of the cerebellum

Structure of the cerebellar neural circuit

Parallel fibers are located parallel to each other in the cerebellum and form synapses with Purkinje cells, which have a flat dendritic shape lying in a plane perpendicular to the direction of the parallel fiber. This structural characteristic maximizes the number of synapses between the parallel fiber and the Purkinje cell in unit space. An afferent fiber is another major source of input to the cerebellar neural circuit, which carries error feedback signals.

The Purkinje cell processes signals from parallel fibers and climbing fibers together and sends them to deep cerebellar nuclei. One is a pathway consisting of mossy fibers - granule cell - parallel fiber - Purkinje cell - deep cerebellar nuclei and another consisting of climbing fibers - Purkinje cell - deep cerebellar nuclei. In our neural network we do not take into account the second path that comes from climbing fibers.

Instead, we hypothesize that the update of the neural network weights can replace the role of the pathway that receives the error feedback signal as input.

Approaches to mimic the cerebellar neural circuit

Instead, we hypothesize that the update of the neural network weights can replace the role of the pathway that receives the error feedback signal as input. a) a new neural network based on the human cerebellum. We hypothesized that a neural network that mimics the structure of a cerebellar neural circuit can show better performance in joint information processing than a conventional fully connected network. Because, firstly, the Purkinje cell produces the sole output of the cerebellar neural circuit from the signals transmitted through the parallel fiber, and secondly, the Purkinje cell and the parallel fiber are unique features that can be found within the cerebellar neural circuit.

The axon of this cell extends outside the cell body and bifurcates many times to form a flat dendritic shape. It is very similar to the receptive field CNN in that it uses adjacent pixels from the input to compute the output. Finally, we implemented the synapse between the Purkinje cell and the parallel fiber with a 1-dimensional convolution layer.

The structure of the new neural network we propose (conv1d-fc-net) is shown in Figure 4a. As described in Section 4.4, two fully connected layers are attached at the end of 1-dim convolutional layers because a neural network without fully connected layers cannot be successfully trained. The ideal input format for conv1d-fc-net is a 1-dimensional vector in which information from a particular joint is next to each other and followed by information from the next joint, as shown in Figure 5.

It is designed that a receptive field should not only process partial information about a particular term by setting window and step size to multiples of an observation dimension corresponding to a term. For example, we used a window of size 4 or 6 and steps of size 2, since most of the experiments in this paper used joint position and velocity as an observation. From the analysis of the cerebellar neural circuit, we decided to mimic its structure with a 1-dim convolution layer.

The purpose of this experiment is to test whether the 1-dim conv layer is exclusively sufficient to represent a policy or not. To test it on the ability to process a joint information, we trained this network with GPS algorithm on Baxter reaching task and Mujoco 2D reaching and pin insertion task.

Figure 4: Two neural networks used in experiments. (a) A novel neural network inspired from human cerebellum and (b) a fully connected network

Experiment 2 : Training a neural network composed of 1-dim conv layers and fc

Experimental settings

Mujoco 2D reacher task

A 2D reacher from the Mujoco simulator has a 2-DoF arm with the first joint attached to the origin. The goal of this task is to bring the end effector as close as possible to the target shown as a white box. The angle and velocity of each joint and the position and velocity of the end effector are included in the status, while the observation includes only the angles and velocity of each joint.

Mujoco peg insertion task

Baxter reaching task

Experiment 2 : Training a neural network composed of 1-dim conv layers and fc

Our new network showed a better convergence rate than the fully connected network regardless of the number of parameters in the model. In order to identify the factors that determine the successful training of the neural network, we analyzed the visualized weight matrices of the fully connected layers in conv1d-fc-net, as shown in Figure 10. The subscript pixel at the input contains information about the joint closer to the body.

For example, information from the shoulder joint is placed at the front of the input and information from the end effector joint is at the back of the input. The same configuration is applied to the output of the neural network, so pixels at the front contain a motor command for the shoulder joint and pixels at the back correspond to the end effector joint. Most of the other features in the feature map contribute to each joint output and it is difficult to observe a fixed pattern in the weight matrix.

We suspect that the reason for a uniform pattern of weights across features is the simplicity of the Baxter reaching task compared to the Mujoco simulator tasks. We conclude that the observed results suggest a necessity of the global model in robot control. Except for a few features that have zero weight for all node production, most hidden features contribute to the control of nodes with non-zero weights.

Another notable result is that conv1d-fc-net showed better convergence speed than a typical fully connected network. Based on this feature of a convolutional layer, we suspect that there are some kind of spatial relationships between adjacent pixels of the robot configuration input we used. We hypothesize that a neural network that mimics the structure of the human cerebellar neural circuit can process joint information well, because the human cerebellum plays an important role in movement control.

A neural network consisting only of the 1-dim convolutional layers could not be trained successfully. This problem could be solved by introducing fully connected layers at the end of 1-dim convolution layers. For both environments, conv1d-fc-net converged faster than a fully connected network in most cases.

In: Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA. The role of the cerebellum in movements: control of timing or movement transitions?" In: Experimental brain research p.