• Tidak ada hasil yang ditemukan

We consider the problem of predicting the slip Z of the robot in each map cell on the forthcoming terrain using the visual information x ∈ Ω of the cell and some information about the terrain geometry y∈ Φ (e.g., local terrain slope) as input (Ω is the vision space, Φ is the space of terrain slopes). Let us denote the function that needs to be evaluated as Z =F(x,y). The main goal of this chapter is to learn the slip prediction function Z using the slip measurements as the only supervision to the system.

Because of the physical nature of the problem, we can assume that there are a limited number (K) of terrain types that can be encountered, and that on each terrain the robot experiences different slip:

F(x,y) =

f1(y), if x∈Ω1

... ...

fK(y), if x∈ΩK

(3.1)

where x ∈ Ω, y ∈ Φ, Ω∩Φ = ∅, Ωi ∈ Ω are different subsets in the vision space, Ωi∩Ωj =∅, i6=j,fk(y) are nonlinear functions which work in the domain Φ and which change their slip behavior dependent on terrain, and K is the number of terrains. In other words, different slip behaviors occur on different terrain types as determined by appearance. While we focus on learning and prediction of rover slip, the problem can apply to various types of robot mechanical behavior. For example, f can represent terrain traversability, ground surface compressibility [82, 87], or load-bearing surface height [128] for each terrain type. In general,f can be a function of some additional sensor-based input signal y, e.g., here slip is a function of terrain slope angles.

Our goal is to learn the mappingZ =F(x,y) from training dataD ={(xi,yi), zi}Ni=1,

Figure 3.2: A schematic of the main learning setup using automatic supervision:

several unknown nonlinear models describe the mechanical behavior corresponding to different terrain types; each training example consists of a vision part (e.g., an image patch of this terrain) and one single point on the curve (marked with a diamond) describing the mechanical behavior.

wherexiare the visual representations of patches from the observed terrain;yi are the terrain slopes, and zi are the particular slip measurements when the robot traverses that terrain. Similarly to Chapter 2, the input variablesxi andyi are measured from stereo data and, for example, the IMU, and the corresponding output, i.e., rover slip z, is measured by VO [88].

Figure 3.2 visualizes the problem when measurements of slip as a function of terrain slope are used as supervision. Each terrain measurement is composed of an appearance patch, a terrain slope estimate and a measurement of the amount of slip occurring at the location with this particular appearance and slope (note that one training example is a single point on the nonlinear curve of slip behavior). The system works without human supervision and does not use ground truth. Instead, it relies on the goodness-of-fit of the measured slip data to potential slip models to learn both the terrain classification and the nonlinear slip behaviors.

−10 −5 0 5 10 15 20 0

20 40 60 80

100

Slip space data

Slope (deg)

Slip (%)

SoilGravel Asphalt

Figure 3.3: Slip measurements plotted as a function of the estimated slope angles retrieved from actual rover traversals. The ground truth terrain types in this figure are provided by human labeling for visualization purposes. The proposed algorithm does not use the ground truth.

Figure 3.3 visualizes the problem when actual measurements of slip as a function of terrain slope are used as supervision. The slip and slope measurements in Figure 3.3 are obtained completely automatically, as only the vehicle’s sensors are needed to compute them. The data is very challenging: the slip measurements to be used as supervision are very noisy and can overlap in parts of the domain. A nonlinear model can presumably approximate the slip behavior as a function of the slope for each terrain type. These models will essentially act as supervision, but they are unknown and have to be learned from the data.

We consider only the slip in the forward motion direction as dependent on the lon- gitudinal slope, similar to slip measurements done for the Mars Exploration Rover [81].

Although rover slip depends also on the lateral slope, as seen in Chapter 2, here, for simplicity, we exploit the most significant slip signal available for the purposes of providing supervision. After the robot has learned how to visually discriminate the terrains with this form of supervision, it is conceivable that it could learn more com-

plex slip models using additional input variables (e.g., both longitudinal and lateral slopes, roughness, etc.), as in Chapter 2.

The main problem in our formulation is that the slip signal to be used as super- vision can be of very weak form. In particular, because of the nonlinearity of the slip models fi(y), it is possible that some of the models overlap in parts of their domain.

(i.e., for some i, j, i 6=j, fi(y)≡ fj(y), for y ∈Φ0, for some Φ0 ⊆ Φ). For example, several terrains might exhibit the same slip for ∼0 slope, as seen on Figure 3.3, or simply two visually different terrain types might have the same slip behavior. We call this supervision ambiguous. Note that, since the slip measurements come from some unknown nonlinear functions (Figure 3.3), they cannot be simply clustered into clearly discriminable classes, as was previously done for characterizing terrains from mechanical vibration signatures [28, 37], or for learning terrain traversability in self- supervised learning [30, 50, 68]. That is, using these slip measurements as supervision is not a trivial extension of supervised learning. However, this form of supervision can still provide useful information for discrimination. The intuition is that, although the supervision might not always be useful, some of the examples which provide useful supervision information can propagate this information to other examples through their visual similarity. In this way, two visually similar terrains, which might not be normally discriminated in the vision space, will be discriminated after introducing the supervision. Conversely, if two terrains exhibit different slip behavior, the super- vision should be forcing a better discrimination in the visual space. Additionally, as the supervision is collected automatically by the robot’s mechanical sensors, it will be noisy (e.g. including occasional outliers due to unmodeled events or ground truth measurement errors). To cope with ambiguous and noisy supervision signals necessitates a framework which allows reasoning under uncertainty.

To summarize, our goal is to learn the function Z = F(x,y) from the available training data D={xi,yi, zi}Ni=1, where xi, yi are the visual and mechanical domain inputs and zi are the mechanical measurements collected by the vehicle. Thus, after the learning has completed, the mechanical behaviorz for some query input example (xq,yq) will be predicted as z = F(xq,yq). We do not want to use manual labeling

of the terrain types during training, so the mechanical measurements zi, which are assumed to have come from one of the unknown nonlinear models, will act as the only supervision to the whole system. The main problem is that using the mechanical measurements as the only ground truth, or supervision, we have to learn both the terrain classification and the unknown nonlinear functions for each terrain (note that the models for the particular mechanical behavior might not be known beforehand, as is the case with slip). The difficulty of the formulated problem lies in the fact that a combinatorial enumeration problem needs to be solved as a subproblem, which is known to be computationally intractable [66].