4.5 Experimental evaluation
4.5.4 Comparison to baseline methods
In approaching this problem, we have focused on dimensionality reduction to ob- tain a terrain classification solution as a subproblem. The rationale is that learning to recognize the terrain types for autonomous robots is essentially the fundamental problem that needs to be solved. After this problem has been solved, various types of mechanical behavior of the robot—e.g., slippage, traversability, etc.—can be learned and predicted.
However, focusing on only slip learning and prediction without obtaining a terrain classifier as a subproduct, the problem in our setup (Section 4.4.1) can be solved by a number of well-established baseline regression techniques. In this section we compare the proposed method to two baseline nonlinear regression methods: k-Nearest Neigh- bor (kNN) and the Locally Weighted Projection Regression (LWPR) algorithm [124]
(described in Section 2.9.1), which learn directly the mapping from the inputs (visual features x and slope y) to the output (slip z) without explicitly applying dimen- sionality reduction as an intermediate step. For both algorithms we explored their performance for a set of parameters (Figures 4.7 and 4.8) and then compared the ones that perform best to the proposed nonlinear dimensionality reduction from automatic supervision (Figure 4.9).
The k-Nearest Neighbor algorithm is a common baseline algorithm which, al- though simple, has been shown to be successful for a variety of applications. It learns
1 2 3 7 10 20 40 8
10 12 14 16
Number of neighbors (k)
Error (%)
Slip prediction error. kNN algorithm
k=1k=2 k=3k=7 k=10k=20 k=40
Figure 4.7: Slip prediction with the k-Nearest Neighbor algorithm. The best perfor- mance here is for k = 3, which is worse than the proposed dimensionality reduction from automatic supervision.
a direct mapping from the input space to the given output by averaging the response of the outputs of a predefined number of nearest neighbors. For the k-Nearest Neigh- bor algorithm we varied the number of neighbors. Experimental results are seen in Figure 4.7. Naturally, the algorithm performs best for a number of nearest neigh- bors which is neither large nor small. It does not perform as well as the proposed supervised nonlinear dimensionality reduction method.
The LWPR algorithm [124] is particularly suited for working with high-dimensional data. It also learns the direct mapping from the given inputs to the outputs. Unlike the k-Nearest Neighbor, it does perform an inherent dimensionality reduction while learning the desired mapping. It has been shown to be very successful for complex high-dimensional data [124]. For the LWPR algorithm we varied a parameter which controls the receptive field size.2 Figure 4.8 shows the average slip prediction error for different receptive field sizes (λ = 1,5,9,15,20,30,100). The average number
2For this comparison, we used the code of Vijayakumar and Schaal [124] which automatically selects the most appropriate number of lower dimensions needed. Some other parameters, e.g., learning rates, did not affect the final performance.
1 7 9 15 20 30 100 8
10 12 14 16
Receptive field size
Error (%)
Slip prediction error. LWPR algorithm
λ=1λ=7 λ=9λ=15 λ=20 λ=30λ=100
Figure 4.8: Slip prediction with the LWPR algorithm. The best performance here is for λ = 15, which is also outperformed by the proposed dimensionality reduction from automatic supervision.
of receptive fields corresponding to these parameters, across the runs are as follows:
(644,274,168,50,27,12,1). As seen, the LWPR outperforms the k-Nearest Neighbor algorithm but does not outperform the proposed nonlinear dimensionality reduction algorithm.
We additionally tested the instances of both algorithms which perform best (k = 3 for k-Nearest Neighbor and λ = 15 for LWPR) and the proposed algorithm on the exact same training and testing random subsets of the data, similar to the experi- mental setup in Section 4.5.3. As seen in Figure 4.9, both the k-Nearest Neighbor and the LWPR are outperformed by the methods based on dimensionality reduction.
Note that the unsupervised MFA algorithm also utilizes both the visual data and the (slip and slope) supervision information; its mechanism is different from the pro- posed supervised nonlinear dimensionality reduction because it uses these two set of information independently (in particular, once it has learned the lower-dimensionality representation, it uses the slip and slope information). In contrast, the proposed al- gorithm uses both sets of information together and allows for them to interact. Note
1 2 3 4 5 5
10 15 20
Learning scenario
Error (%)
Slip prediction error
1. Unsupervised
2. Automatic supervision 3. Human supervision 4. LWPR
5. Nearest Neighbor
Figure 4.9: Direct comparison of the proposed algorithm to k-Nearest Neighbor and LWPR for the same random splits of the data (10 runs).
that in directly learning the desired outputs, as is done with both the k-Nearest Neighbor and the LWPR, important information about the structure of the problem is ignored, namely that there are several underlying terrain types on which potentially different slip behaviors occur. This might explain the reason why both methods are outperformed by the dimensionality reduction ones which exploits structure knowl- edge about the data.
In summary, we have compared to a set of alternative algorithms which, although not able to provide terrain classification results, are suitable for the final slip learning and prediction task. We have seen that they are outperformed by the proposed non- linear dimensionality reduction algorithm because it exploits the inherent structure of the problem, namely that a potential ‘switching’ behavior between slippage on dif- ferent terrains is possible. Other learning algorithms—e.g., Neural Networks—were not successful with this data, because they cannot tackle high dimensions or learn the
‘switching’ behavior with limited training data and without easily getting stuck in local minima. Note also that algorithms which rely on clustering of all the available data (e.g., the supervised EM algorithm of Ghahramani [44] and some others applied
in robotics [80]) are not applicable because the slippage signal cannot be clustered.