Chapter V: Mixed-Initiative Learning for Exoskeleton Gait Optimization
5.5 Deployment of CoSpar in Human Subject Exoskeleton Experiments . 117
Figure 5.5 shows the simulation results. In each case, the mixed-initiative simulations involving coactive feedback improve upon those with only preferences. Learning is slowest forπ =2, π =0 (Figure 5.5), since that case elicits the fewest preferences.
Figure 5.4b depicts the utility modelβs posterior mean for the objective function in Figure 5.4a, learned in the simulation with π = 1, π = 1, and mixed-initiative feedback. In comparing Figure 5.4b to Figure 5.4a, we see that CoSpar learns a sharp peak around the optimum, as it is designed to converge to sampling preferred regions, rather than giving the user undesirable options by exploring elsewhere.
Figure 5.5: CoSpar simulation results on 100 2D synthetic objective functions, comparing CoSpar with and without coactive feedback for three settings of the pa- rametersπandπ(see Algorithm 13). Mean +/- standard error of the objective values achieved over the 100 repetitions. The maximal and minimal objective function val- ues are normalized to 0 and 1. We see that coactive feedback always helps, and that π=2,π =0βwhich receives the fewest preferencesβperforms worst.
5.5 Deployment of CoSpar in Human Subject Exoskeleton Experiments
lengths, i.e., optimizing over a one-dimensional feature space. The second experi- ment5demonstrates CoSparβs effectiveness in two-dimensional feature spaces, and optimizes simultaneously over two different gait feature pairs. Importantly, CoSpar operates independently of the choice of gait features. The subjectsβ metabolic ex- penditure was also recorded via direct calorimetry as shown in Figure 5.1, but this data was uninformative of user preferences, as users are not required to expend effort toward walking.
Learning Preferences between Step Lengths
In the first experiment, all three subjects walked inside the Atalante exoskeleton, with CoSpar selecting the gaits. We considered 15 equally-spaced step lengths between 0.08 and 0.18 meters, each with a precomputed gait from the gait library. Feature discretization was based on usersβ ability to distinguish nearby values. The users decided when to end each trial, so as to be comfortable providing feedback. Since users have difficulty remembering more than two trials at once, we used CoSpar withπ=1 andπ=1, which corresponds to asking the user to compare each current trial with the preceding one. Additionally, we queried the user for coactive feedback:
after each trial, the user could suggest a longer or shorter step length (Β±20% of the range), a slightly longer or shorter step length (Β±10%), or no feedback. Coactive feedback was added to the dataset and treated as additional preference feedback.
Each participant completed 20 gait trials, providing preference and coactive feedback after each trial. Figure 5.6 illustrates the posteriorβs evolution over the experiment.
After only five exoskeleton trials, CoSpar was already able to identify a relatively- compact preferred step length subregion. After the 20 trials, three points along the utility modelβs posterior mean were selected: the maximum, mean, and minimum.
The user walked in the exoskeleton with each of these step lengths in a randomized ordering, and gave a blind ranking of the three, as shown in Figure 5.6. For each subject, the blind ranking matches the preference posterior obtained by CoSpar, indicating effective learning of individual user preferences.
Learning Preferences over Multiple Features
We further demonstrate CoSparβs practicality to personalize over multiple features, by optimizing over two different feature pairs: 1) step length and step duration and 2) step length and step width. The protocol of the one-dimensional experiment
5Gaussian process kernel: same parameters as in4 except for step duration lengthscale = 0.08 and step width lengthscale = 0.03.
Figure 5.6: Experimental results for optimizing step length with three subjects (one row per subject). Columns 1-4 illustrate the evolution of the preference model posterior (mean +/- standard deviation), shown at various trials. CoSpar converges to similar but distinct optimal gaits for different subjects. Column 5 depicts the subjectsβ blind ranking of the three gaits executed after 20 trials. The rightmost column displays the experimental trials in chronological order, with the background depicting the posterior preference mean at each step length. CoSpar draws more samples in the region of higher posterior preference.
was repeated for Subject 1, with step lengths discretized as before, step duration discretized into 10 equally-spaced values between 0.85 and 1.15 seconds (with 10% and 20% modifications under coactive feedback), and step width into 6 values between 0.25 and 0.30 meters (20% and 40% modifications). After each trial, the user was queried for both a pairwise preference and coactive feedback. Figure 5.7 shows the results for both feature spaces. The estimated preference values were consistent with a three-sample blind ranking evaluation, suggesting that CoSpar successfully identified user-preferred parameters. Figure 5.8 displays phase diagrams of the gaits with minimum, mean, and maximum posterior utility values to illustrate the difference between preferred and non-preferred gaits.
5.6 The LineCoSpar Algorithm for High-Dimensional Preference-Based Learn- ing
While the CoSpar algorithm reliably identifies user-preferred gaits in one and two- dimensional action spaces, the preference-based gait optimization problem can be- come intractable in larger action spaces. CoSpar must jointly maintain and sample from a posterior over every action, resulting in a computational complexity that increases exponentially in the action space dimensionπ. Specifically, CoSpar opti-
Figure 5.7: Experimental results from two-dimensional feature spaces (top row: step length and duration; bottom row: step length and width). Columns 1-4 illustrate the evolution of the preference modelβs posterior mean. Column 4 also shows the subjectβs blind rankings of the three gaits executed after 20 trials. Column 5 depicts the experimental trials in chronological order, with the background as in Figure 5.6.
CoSpar draws more samples in the region of higher posterior preference.
Figure 5.8: Experimental phase diagrams of the left leg joints over 10 seconds of walking. The gaits shown correspond to the maximum, mean, and minimum preference posterior values for both of subject 1βs 2D experiments. For instance, Subject 1 preferred gaits with longer step lengths, as shown by the larger range in sagittal hip angles in the phase diagram.
mizes over theπ-dimensional action spaceAby discretizing the entire space before beginning the learning process. Withπuniformly-spaced points in each dimension of A, this discretization results in an action space of cardinality π΄ = |A | = ππ, where larger π enables finer-grained search at a higher computational cost. The Bayesian preference model is updated over all π΄points during each iteration. This update is intractable for higher values ofπ, since computing the posterior over allπ΄ points involves expensive matrix operations, such as invertingΞ£pr,Ξ£π βRπ΄Γπ΄. The LineCoSpar algorithm (Alg. 14) integrates the CoSpar framework with tech- niques from high-dimensional Gaussian process learning to model usersβ preferences
in high-dimensional action spaces. Drawing inspiration from the LineBO algorithm in Kirschner, Mutny, et al. (2019), LineCoSpar exploits low-dimensional structure in the search space by sequentially considering one-dimensional subspaces from which to sample actions. This allows the algorithm to maintain its Bayesian prefer- ence relation function over a subset of the action space in each iteration. Compared to CoSpar, LineCoSpar learns the model posterior much more efficiently and can be scaled to higher dimensions. Figure 5.9 compares computation times for CoSpar and LineCoSpar.
Algorithm 14LineCoSpar
1: Input: A = action set; utility prior parameters (π and kernel hyperparameters); π = granularity of discretization
2: D =β ,W=β β²D: preference data,W: actions inD
3: Set π1to a uniformly-random action inA 4: forπ=1,2, . . . , πdo
5: Lπ =random line through ππ, discretized viaπ
6: Vπ =Lπβͺ W β²Points over which to update posterior
7: (ππ,Ξ£π) =posterior over points inVπ, givenD 8: Sample utility function ππ βΌ N (ππ,Ξ£π)
9: Execute actionππ=argmaxπβVπ ππ(π)
10: Add pairwise preference betweenππandππβ1toD 11: Add coactive feedbackππ0 ππtoD
12: SetW =W βͺ {ππ} βͺ {ππ0} β²Update set of actions inD 13: Set ππ+1=argmaxπβVπ ππ(π)
14: end for
2 3 4 5 6
Dimensionality (d)
100 102
Timeperiteration(s) Comparison of time per iteration
LineCoSpar CoSpar
Figure 5.9: Curse of dimensionality for CoSpar. Average time per iteration of CoSpar versus LineCoSpar. The y-axis is on a logarithmic scale. For LineCoSpar, the time is roughly constant in the number of dimensions π, while the runtime of CoSpar increases exponentially. Forπ =4, the duration of a CoSpar iteration is inconvenient in the human-in-the-loop learning setting, and forπ β₯ 5, it is intractable.
This section provides background on existing approaches for high-dimensional Gaussian process learning, and then describes the LineCoSpar algorithm, includ-
ing 1) defining the posterior updating procedure, 2) achieving high-dimensional learning, and 3) incorporating posterior sampling and coactive feedback.
High-Dimensional Bayesian Optimization
Bayesian optimization is a powerful approach for optimizing expensive-to-evaluate black-box functions. It maintains a model posterior over the unknown function, and cycles through a) using the posterior to acquire actions at which to query the function, b) querying the function, and c) updating the posterior using the obtained data. This procedure is challenging in high-dimensional search spaces due to the computational cost of the acquisition step (a), which often requires solving a non-convex optimiza- tion problem over the search space, and maintaining the posterior in the update step (c), which can require manipulating matrices that grow exponentially with the action spaceβs dimension. Dimensionality reduction techniques are therefore an area of active interest. Solutions vary from optimizing variable subsets (DropoutBO) (Li, Gupta, et al., 2017) to projecting into lower-dimensional spaces (REMBO) (Wang et al., 2016) to sequentially optimizing over one-dimensional subspaces (LineBO) (Kirschner, Mutny, et al., 2019). We draw upon the approach of LineBO because of its state-of-the-art performance in high-dimensional spaces. Furthermore, it is especially sample-efficient in spaces with underlying low-dimensional structure. In the case of exoskeleton walking, low-dimensional structure could appear as linear relationships between two gait parameters in the userβs utility function, for instance, users who prefer short step lengths also prefer short step durations.
The LineCoSpar Algorithm
Modeling Utilities Using Pairwise Preference Data. Similarly to CoSpar, LineCoSpar uses pairwise comparisons to learn a Bayesian model posterior over the relative utili- ties of actions (i.e., gait parameter combinations) to the user based upon the Gaussian process preference model in Chu and Ghahramani (2005b). We focus on Gaussian process methods because they model smooth, non-parametric utility functions.
As previously, A β Rπ represents the set of possible actions. In iterationπ of the algorithm, we consider a subset of the actionsVπ β A, with cardinalityππ = |Vπ|. Though we will defineVπ later, we note that it includes all points in the datasetD; the posterior is specifically modeled over points inVπ. As in the CoSpar framework, we assume that each action π β A has a latent utility to the user, denoted as π(π). Throughout the learning process, LineCoSpar stores a dataset of all user feedback, D = {ππ1 ππ2|π = 1, . . . , π}, consisting of π preferences, where
ππ1 ππ2 indicates that the user prefers action ππ1 to action ππ2. The preference data D is used to update the posterior utilities of the actions in Vπ. Defining π = [π(ππ1), π(ππ2), . . . , π(πππ
π
)]π βRππ, whereπππ is the πthaction inVπ, the utilities π have posterior:
π(π | D) βπ(D | π)π(π). (5.9) In each iterationπ, we define a Gaussian process prior over the utilities π of actions inVπ:
π(π) = 1
(2π)ππ/2|Ξ£pr
π |1/2exp
β1
2ππ[Ξ£prπ ]β1π
, (5.10)
whereΞ£prπ β RππΓππ is the prior covariance matrix, which must now be recalculated in each iterationπ:[Ξ£pr
π ]π π =K (πππ,πππ)for an appropriate kernel functionK. Our experiments use the squared exponential kernel.
The likelihood π(D | π) is calculated identically to the likelihood in CoSpar.
Importantly,Vπ contains all points in the dataset D, and therefore the likelihood is well-defined:
π(ππ1 ππ2 | π) =π
π(ππ1) β π(ππ2) π
,
where π(Β·) β [0,1] is a monotonically-increasing link function, and π > 0 is a hyperparameter indicating the magnitude of the preference noise.
While the previous CoSpar results utilize the Gaussian cumulative distribution func- tion for π, we empirically found that using the heavier-tailed sigmoid distribution, πlog(π₯) := 1+π1βπ₯, as the link function improves performance. The sigmoid link function πlog(π₯) satisfies the convexity conditions for the Laplace approximation described in Section 5.3 and has been used to model preferences in other contexts (Wirth, Akrour, et al., 2017). The full likelihood expression becomes:
π(D | π) =
π
Γ
π=1
πlog
π(ππ1) β π(ππ2) π
.
As with CoSpar, the posterior in Eq. (5.9) is estimated via the Laplace approximation to yield a multivariate Gaussian distribution,N (ππ,Ξ£π).
Sampling Approach for Higher Dimensions. Inspired by Kirschner, Mutny, et al.
(2019), LineCoSpar overcomes CoSparβs computational intractability by iteratively modeling the posterior over one-dimensional subspaces (lines), rather than con- sidering the full action space A at once. In each iteration π, LineCoSpar selects
uniformly-spaced points along a new random lineLπwithin the action space, which lies along a uniformly-random direction and intersects the action ππthat maximizes the posterior mean. Including ππ in the subspace Lπ encourages exploration of higher-utility areas. The posterior π(D | π) is calculated over Vπ := Lπ βͺ W, whereWis the set of actions that appear in the preference feedback datasetD. Critically, this approach reduces the modelβs covariance matricesΞ£prπ ,Ξ£π from size π΄Γ π΄toππΓππ. Rather than growing exponentially in π, which is impractical for online learning, LineCoSparβs complexity is constant in the dimensionπand linear in the number of iterations π. Since queries are expensive in many human-in-the- loop robotics settings,π is typically low.
Posterior Sampling Framework. Utilities are learned using the SelfSparring (Sui, Zhuang, et al., 2017) approach to posterior sampling. Specifically, in each iteration, we calculate the posterior of the utilities π over the points in Vπ = Lπ βͺ W, obtaining the posterior N (ππ,Ξ£π) over Vπ. The algorithm then samples a utility function ππ from the posterior, which assigns a utility to each action inVπ. Next, LineCoSpar executes the action ππ that maximizes ππ: ππ = argmaxπβVπ ππ(π). The user provides a preference (or indicates indifference, i.e. βno preferenceβ) between ππand the preceding actionππβ1.
In addition, for each executed action ππ, the user can provide coactive feedback, specifying the dimension, direction (higher or lower), and degree in which to change ππ. The userβs suggested actionπ0π is added toW, and the feedback is added to D as π0π ππ. In each iteration, preference and coactive feedback each add at most one action toW. Thus, in iterationπ, Vπ contains at mostπ+2(πβ1) actions, so its size is independent of the dimensionality π. In the subsequent analysis, πmax is defined as the action maximizing the final posterior mean after π iterations, i.e., πmax:=argmaxπβV
π
ππ+1(π).
Note that LineCoSpar can be generalized to include theπandπhyperparameters in CoSpar, which respectively allow the algorithm to sample multiple actions per learn- ing iteration and to query the user for preferences between trials in non-consecutive iterations. The LineCoSpar description in Algorithm 14 sets π = π = 1, since it is hard for exoskeleton users to remember more than the current and previous gait trial at any given time.