Deployment of CoSpar in Human Subject Exoskeleton Experiments . 117

Chapter V: Mixed-Initiative Learning for Exoskeleton Gait Optimization

5.5 Deployment of CoSpar in Human Subject Exoskeleton Experiments . 117

Figure 5.5 shows the simulation results. In each case, the mixed-initiative simulations involving coactive feedback improve upon those with only preferences. Learning is slowest for𝑛 =2, 𝑏 =0 (Figure 5.5), since that case elicits the fewest preferences.

Figure 5.4b depicts the utility model’s posterior mean for the objective function in Figure 5.4a, learned in the simulation with 𝑛 = 1, 𝑏 = 1, and mixed-initiative feedback. In comparing Figure 5.4b to Figure 5.4a, we see that CoSpar learns a sharp peak around the optimum, as it is designed to converge to sampling preferred regions, rather than giving the user undesirable options by exploring elsewhere.

Figure 5.5: CoSpar simulation results on 100 2D synthetic objective functions, comparing CoSpar with and without coactive feedback for three settings of the pa- rameters𝑛and𝑏(see Algorithm 13). Mean +/- standard error of the objective values achieved over the 100 repetitions. The maximal and minimal objective function values are normalized to 0 and 1. We see that coactive feedback always helps, and that 𝑛=2,𝑏 =0—which receives the fewest preferences—performs worst.

5.5 Deployment of CoSpar in Human Subject Exoskeleton Experiments

lengths, i.e., optimizing over a one-dimensional feature space. The second experi- ment5demonstrates CoSpar’s effectiveness in two-dimensional feature spaces, and optimizes simultaneously over two different gait feature pairs. Importantly, CoSpar operates independently of the choice of gait features. The subjects’ metabolic ex- penditure was also recorded via direct calorimetry as shown in Figure 5.1, but this data was uninformative of user preferences, as users are not required to expend effort toward walking.

Learning Preferences between Step Lengths

In the first experiment, all three subjects walked inside the Atalante exoskeleton, with CoSpar selecting the gaits. We considered 15 equally-spaced step lengths between 0.08 and 0.18 meters, each with a precomputed gait from the gait library. Feature discretization was based on users’ ability to distinguish nearby values. The users decided when to end each trial, so as to be comfortable providing feedback. Since users have difficulty remembering more than two trials at once, we used CoSpar with𝑛=1 and𝑏=1, which corresponds to asking the user to compare each current trial with the preceding one. Additionally, we queried the user for coactive feedback:

after each trial, the user could suggest a longer or shorter step length (±20% of the range), a slightly longer or shorter step length (±10%), or no feedback. Coactive feedback was added to the dataset and treated as additional preference feedback.

Each participant completed 20 gait trials, providing preference and coactive feedback after each trial. Figure 5.6 illustrates the posterior’s evolution over the experiment.

After only five exoskeleton trials, CoSpar was already able to identify a relatively- compact preferred step length subregion. After the 20 trials, three points along the utility model’s posterior mean were selected: the maximum, mean, and minimum.

The user walked in the exoskeleton with each of these step lengths in a randomized ordering, and gave a blind ranking of the three, as shown in Figure 5.6. For each subject, the blind ranking matches the preference posterior obtained by CoSpar, indicating effective learning of individual user preferences.

Learning Preferences over Multiple Features

We further demonstrate CoSpar’s practicality to personalize over multiple features, by optimizing over two different feature pairs: 1) step length and step duration and 2) step length and step width. The protocol of the one-dimensional experiment

5Gaussian process kernel: same parameters as in⁴ except for step duration lengthscale = 0.08 and step width lengthscale = 0.03.

Figure 5.6: Experimental results for optimizing step length with three subjects (one row per subject). Columns 1-4 illustrate the evolution of the preference model posterior (mean +/- standard deviation), shown at various trials. CoSpar converges to similar but distinct optimal gaits for different subjects. Column 5 depicts the subjects’ blind ranking of the three gaits executed after 20 trials. The rightmost column displays the experimental trials in chronological order, with the background depicting the posterior preference mean at each step length. CoSpar draws more samples in the region of higher posterior preference.

was repeated for Subject 1, with step lengths discretized as before, step duration discretized into 10 equally-spaced values between 0.85 and 1.15 seconds (with 10% and 20% modifications under coactive feedback), and step width into 6 values between 0.25 and 0.30 meters (20% and 40% modifications). After each trial, the user was queried for both a pairwise preference and coactive feedback. Figure 5.7 shows the results for both feature spaces. The estimated preference values were consistent with a three-sample blind ranking evaluation, suggesting that CoSpar successfully identified user-preferred parameters. Figure 5.8 displays phase diagrams of the gaits with minimum, mean, and maximum posterior utility values to illustrate the difference between preferred and non-preferred gaits.

5.6 The LineCoSpar Algorithm for High-Dimensional Preference-Based Learn- ing

While the CoSpar algorithm reliably identifies user-preferred gaits in one and two- dimensional action spaces, the preference-based gait optimization problem can be- come intractable in larger action spaces. CoSpar must jointly maintain and sample from a posterior over every action, resulting in a computational complexity that increases exponentially in the action space dimension𝑑. Specifically, CoSpar opti-

Figure 5.7: Experimental results from two-dimensional feature spaces (top row: step length and duration; bottom row: step length and width). Columns 1-4 illustrate the evolution of the preference model’s posterior mean. Column 4 also shows the subject’s blind rankings of the three gaits executed after 20 trials. Column 5 depicts the experimental trials in chronological order, with the background as in Figure 5.6.

CoSpar draws more samples in the region of higher posterior preference.

Figure 5.8: Experimental phase diagrams of the left leg joints over 10 seconds of walking. The gaits shown correspond to the maximum, mean, and minimum preference posterior values for both of subject 1’s 2D experiments. For instance, Subject 1 preferred gaits with longer step lengths, as shown by the larger range in sagittal hip angles in the phase diagram.

mizes over the𝑑-dimensional action spaceAby discretizing the entire space before beginning the learning process. With𝑚uniformly-spaced points in each dimension of A, this discretization results in an action space of cardinality 𝐴 = |A | = 𝑚^𝑑, where larger 𝑚 enables finer-grained search at a higher computational cost. The Bayesian preference model is updated over all 𝐴points during each iteration. This update is intractable for higher values of𝑑, since computing the posterior over all𝐴 points involves expensive matrix operations, such as invertingΣ^pr,Σ𝑖 ∈R^𝐴^×^𝐴. The LineCoSpar algorithm (Alg. 14) integrates the CoSpar framework with techniques from high-dimensional Gaussian process learning to model users’ preferences

in high-dimensional action spaces. Drawing inspiration from the LineBO algorithm in Kirschner, Mutny, et al. (2019), LineCoSpar exploits low-dimensional structure in the search space by sequentially considering one-dimensional subspaces from which to sample actions. This allows the algorithm to maintain its Bayesian preference relation function over a subset of the action space in each iteration. Compared to CoSpar, LineCoSpar learns the model posterior much more efficiently and can be scaled to higher dimensions. Figure 5.9 compares computation times for CoSpar and LineCoSpar.

Algorithm 14LineCoSpar

1: Input: A = action set; utility prior parameters (𝑐 and kernel hyperparameters); 𝑚 = granularity of discretization

2: D =∅,W=∅ ⊲D: preference data,W: actions inD

3: Set 𝒑1to a uniformly-random action inA 4: for𝑖=1,2, . . . , 𝑁do

5: L𝑖 =random line through 𝒑𝑖, discretized via𝑚

6: V𝑖 =L𝑖∪ W ⊲Points over which to update posterior

7: (𝝁𝑖,Σ𝑖) =posterior over points inV𝑖, givenD 8: Sample utility function 𝑓_𝑖 ∼ N (𝝁𝑖,Σ𝑖)

9: Execute action𝒙^𝑖=argmax_𝒙_∈V_𝑖 𝑓_𝑖(𝒙)

10: Add pairwise preference between𝒙𝑖and𝒙𝑖−1toD 11: Add coactive feedback𝒙_𝑖⁰ 𝒙𝑖toD

12: SetW =W ∪ {𝒙^𝑖} ∪ {𝒙𝑖⁰} ⊲Update set of actions inD 13: Set 𝒑𝑖+1=argmax_𝒙_∈V_𝑖 𝜇_𝑖(𝒙)

14: end for

2 3 4 5 6

Dimensionality (^d)

10⁰ 10²

Timeperiteration(s) Comparison of time per iteration

LineCoSpar CoSpar

Figure 5.9: Curse of dimensionality for CoSpar. Average time per iteration of CoSpar versus LineCoSpar. The y-axis is on a logarithmic scale. For LineCoSpar, the time is roughly constant in the number of dimensions 𝑑, while the runtime of CoSpar increases exponentially. For𝑑 =4, the duration of a CoSpar iteration is inconvenient in the human-in-the-loop learning setting, and for𝑑 ≥ 5, it is intractable.

This section provides background on existing approaches for high-dimensional Gaussian process learning, and then describes the LineCoSpar algorithm, includ-

ing 1) defining the posterior updating procedure, 2) achieving high-dimensional learning, and 3) incorporating posterior sampling and coactive feedback.

High-Dimensional Bayesian Optimization

Bayesian optimization is a powerful approach for optimizing expensive-to-evaluate black-box functions. It maintains a model posterior over the unknown function, and cycles through a) using the posterior to acquire actions at which to query the function, b) querying the function, and c) updating the posterior using the obtained data. This procedure is challenging in high-dimensional search spaces due to the computational cost of the acquisition step (a), which often requires solving a non-convex optimization problem over the search space, and maintaining the posterior in the update step (c), which can require manipulating matrices that grow exponentially with the action space’s dimension. Dimensionality reduction techniques are therefore an area of active interest. Solutions vary from optimizing variable subsets (DropoutBO) (Li, Gupta, et al., 2017) to projecting into lower-dimensional spaces (REMBO) (Wang et al., 2016) to sequentially optimizing over one-dimensional subspaces (LineBO) (Kirschner, Mutny, et al., 2019). We draw upon the approach of LineBO because of its state-of-the-art performance in high-dimensional spaces. Furthermore, it is especially sample-efficient in spaces with underlying low-dimensional structure. In the case of exoskeleton walking, low-dimensional structure could appear as linear relationships between two gait parameters in the user’s utility function, for instance, users who prefer short step lengths also prefer short step durations.

The LineCoSpar Algorithm

Modeling Utilities Using Pairwise Preference Data. Similarly to CoSpar, LineCoSpar uses pairwise comparisons to learn a Bayesian model posterior over the relative utilities of actions (i.e., gait parameter combinations) to the user based upon the Gaussian process preference model in Chu and Ghahramani (2005b). We focus on Gaussian process methods because they model smooth, non-parametric utility functions.

As previously, A ⊂ R^𝑑 represents the set of possible actions. In iteration𝑖 of the algorithm, we consider a subset of the actionsV𝑖 ⊂ A, with cardinality𝑉_𝑖 = |V𝑖|. Though we will defineV𝑖 later, we note that it includes all points in the datasetD; the posterior is specifically modeled over points inV𝑖. As in the CoSpar framework, we assume that each action 𝒙 ∈ A has a latent utility to the user, denoted as 𝑓(𝒙). Throughout the learning process, LineCoSpar stores a dataset of all user feedback, D = {𝒙𝑘1 𝒙𝑘2|𝑘 = 1, . . . , 𝑁}, consisting of 𝑁 preferences, where

𝒙𝑘1 𝒙𝑘2 indicates that the user prefers action 𝒙𝑘1 to action 𝒙𝑘2. The preference data D is used to update the posterior utilities of the actions in V𝑖. Defining 𝒇 = [𝑓(𝒙𝑖₁), 𝑓(𝒙𝑖₂), . . . , 𝑓(𝒙𝑖𝑉

𝑖

)]^𝑇 ∈R^𝑉^𝑖, where𝒙𝑖𝑗 is the 𝑗^thaction inV𝑖, the utilities 𝒇 have posterior:

𝑃(𝒇 | D) ∝𝑃(D | 𝒇)𝑃(𝒇). (5.9) In each iteration𝑖, we define a Gaussian process prior over the utilities 𝒇 of actions inV𝑖:

𝑃(𝒇) = 1

(2𝜋)^𝑉^𝑖^/2|Σ^pr

𝑖 |^1/2exp

−1

2𝒇^𝑇[Σ^pr_𝑖 ]⁻¹𝒇

, (5.10)

whereΣ^pr_𝑖 ∈ R^𝑉^𝑖^×𝑉^𝑖 is the prior covariance matrix, which must now be recalculated in each iteration𝑖:[Σ^pr

𝑖 ]𝑗 𝑘 =K (𝒂𝑖𝑗,𝒂𝑖𝑘)for an appropriate kernel functionK. Our experiments use the squared exponential kernel.

The likelihood 𝑃(D | 𝒇) is calculated identically to the likelihood in CoSpar.

Importantly,V𝑖 contains all points in the dataset D, and therefore the likelihood is well-defined:

𝑃(𝒙𝑘1 𝒙𝑘2 | 𝒇) =𝑔

𝑓(𝒙𝑘1) − 𝑓(𝒙𝑘2) 𝑐

where 𝑔(·) ∈ [0,1] is a monotonically-increasing link function, and 𝑐 > 0 is a hyperparameter indicating the magnitude of the preference noise.

While the previous CoSpar results utilize the Gaussian cumulative distribution function for 𝑔, we empirically found that using the heavier-tailed sigmoid distribution, 𝑔_log(𝑥) := ₁_+𝑒¹−𝑥, as the link function improves performance. The sigmoid link function 𝑔_log(𝑥) satisfies the convexity conditions for the Laplace approximation described in Section 5.3 and has been used to model preferences in other contexts (Wirth, Akrour, et al., 2017). The full likelihood expression becomes:

𝑃(D | 𝒇) =

𝑁

𝑘=1

𝑔_log

𝑓(𝒙𝑘1) − 𝑓(𝒙𝑘2) 𝑐

As with CoSpar, the posterior in Eq. (5.9) is estimated via the Laplace approximation to yield a multivariate Gaussian distribution,N (𝝁𝑖,Σ𝑖).

Sampling Approach for Higher Dimensions. Inspired by Kirschner, Mutny, et al.

(2019), LineCoSpar overcomes CoSpar’s computational intractability by iteratively modeling the posterior over one-dimensional subspaces (lines), rather than considering the full action space A at once. In each iteration 𝑖, LineCoSpar selects

uniformly-spaced points along a new random lineL𝑖within the action space, which lies along a uniformly-random direction and intersects the action 𝒑𝑖that maximizes the posterior mean. Including 𝒑𝑖 in the subspace L𝑖 encourages exploration of higher-utility areas. The posterior 𝑃(D | 𝒇) is calculated over V𝑖 := L𝑖 ∪ W, whereWis the set of actions that appear in the preference feedback datasetD. Critically, this approach reduces the model’s covariance matricesΣ^pr_𝑖 ,Σ𝑖 from size 𝐴× 𝐴to𝑉_𝑖×𝑉_𝑖. Rather than growing exponentially in 𝑑, which is impractical for online learning, LineCoSpar’s complexity is constant in the dimension𝑑and linear in the number of iterations 𝑁. Since queries are expensive in many human-in-the- loop robotics settings,𝑁 is typically low.

Posterior Sampling Framework. Utilities are learned using the SelfSparring (Sui, Zhuang, et al., 2017) approach to posterior sampling. Specifically, in each iteration, we calculate the posterior of the utilities 𝒇 over the points in V𝑖 = L𝑖 ∪ W, obtaining the posterior N (𝝁𝑖,Σ𝑖) over V𝑖. The algorithm then samples a utility function 𝑓_𝑖 from the posterior, which assigns a utility to each action inV𝑖. Next, LineCoSpar executes the action 𝒙𝑖 that maximizes 𝑓_𝑖: 𝒙𝑖 = argmax_𝒙∈V_𝑖 𝑓_𝑖(𝒙). The user provides a preference (or indicates indifference, i.e. “no preference”) between 𝒙𝑖and the preceding action𝒙_𝑖−1.

In addition, for each executed action 𝒙𝑖, the user can provide coactive feedback, specifying the dimension, direction (higher or lower), and degree in which to change 𝒙𝑖. The user’s suggested action𝒙⁰_𝑖 is added toW, and the feedback is added to D as 𝒙⁰_𝑖 𝒙𝑖. In each iteration, preference and coactive feedback each add at most one action toW. Thus, in iteration𝑖, V𝑖 contains at most𝑚+2(𝑖−1) actions, so its size is independent of the dimensionality 𝑑. In the subsequent analysis, 𝒙_max is defined as the action maximizing the final posterior mean after 𝑁 iterations, i.e., 𝒙_max:=argmax_𝒙∈V

𝑖

𝜇_𝑁+1(𝒙).

Note that LineCoSpar can be generalized to include the𝑛and𝑏hyperparameters in CoSpar, which respectively allow the algorithm to sample multiple actions per learning iteration and to query the user for preferences between trials in non-consecutive iterations. The LineCoSpar description in Algorithm 14 sets 𝑛 = 𝑏 = 1, since it is hard for exoskeleton users to remember more than the current and previous gait trial at any given time.

Dalam dokumen Applications to Exoskeleton Gait Optimization (Halaman 134-142)