Chapter V: Mixed-Initiative Learning for Exoskeleton Gait Optimization
5.1 Introduction
The field of human-robot interaction is receiving increasing attention in many appli- cation domains, from mobility assistance to autonomous driving, and from education to dialog systems. In many such domains, for a robotic system to interact optimally with a human user, it must adapt to user feedback. In particular, learning from user feedback could help to improve robotic assistive devices.
This work focuses on optimizing walking gaits for a lower-body exoskeleton, Ata- lante (pictured in Figure 5.1), in order to maximize user comfort. Atalante, developed by Wandercraft (Wandercraft, n.d.), uses 12 actuated joints to restore mobility to individuals with lower-limb mobility impairments. Previous work with Atalante demonstrated dynamically-stable walking using the method of partial hybrid zero dynamics (PHZD), originally designed for bipedal robots (Harib et al., 2018; Gur- riet, Finet, et al., 2018; Agrawal, Harib, et al., 2017). While this method generates stable bipedal locomotion, it lacks the ability to optimize for the user’s comfort;
yet, user comfort should be a critical objective of gait optimization for exoskeleton walking. While existing methods (Ames, 2014) can generate human-like walking gaits for bipedal robots, it is unlikely that these methods fulfill the preferences of individuals using robotic assistance.
The exoskeleton gait optimization problem is challenging, as it involves search-
Figure 5.1: Atalante Exoskeleton with and without a user. The user is wearing a mask to measure metabolic expenditure.
ing over the vast space of all possible walking trajectories, accounting for user feedback reliability, and learning from limited data obtained from time-intensive human trials. Several existing approaches for customizing walking with various ex- oskeletons optimize quantitative metrics, including body parameters and targeted walking speeds (Wu, Liu, et al., 2018; Ren et al., 2019) and metabolic expenditure (Kim et al., 2017; Zhang et al., 2017). However, since the goal of this work is to optimize for user comfort, our learning approach instead queries the user for pref- erences between sequential gait trials. Directly incorporating personalized feedback avoids making overly-strong assumptions about gait preference, or optimizing for a numerical quantity not aligned to personalized comfort.
In many real-world settings that involve learning from human feedback, it is chal- lenging or impossible for people to reliably specify numerical scores or to provide demonstrations (Amodei et al., 2016; Argall et al., 2009; Basu, Yang, et al., 2017;
Joachims et al., 2005). In particular, this is true in the exoskeleton application, as it is difficult for users to remember many gaits at once. In contrast, the users’
relative preferences can measure their comfort more accurately. Indeed, previous studies have found preferences to be more reliable than numerical scores in a range of domains, including information retrieval (Chapelle, Joachims, et al., 2012) and autonomous driving (Basu, Yang, et al., 2017). In the exoskeleton domain, query-
ing users for pairwise preferences only requires them to remember the current and previous gait trials. Conversely, prompting users for numerical scores requires them to remember all gait trials to ensure that the scoring is consistent over time.
While interactive preference learning has previously been used to tune parameters for an ankle exoskeleton in Thatte, Duan, and Geyer (2018), the presented approach utilizes domain knowledge to narrow the search space before performing online learning. To generate preference queries, Thatte et al. employ Double Thompson Sampling (Wu and Liu, 2016). This algorithm operates in the K-armed dueling bandit setting, in which outcomes corresponding to different actions are assumed to be independent. Yet, pairwise preferences provide a sparse feedback signal, as the algorithm only receives one bit of information per preference query. In contrast to Thatte, Duan, and Geyer (2018), we aim to optimize gaits over large gait parameter spaces without prior assumptions on the users’ preferences.
Building upon techniques from dueling bandits (Sui, Zhuang, et al., 2017; Sui, Zoghi, et al., 2018; Yue, Broder, et al., 2012) and coactive learning (Shivaswamy and Joachims, 2012; Shivaswamy and Joachims, 2015), this work proposes the CoSpar algorithm to learn user-preferred exoskeleton gaits. CoSpar is a mixed- initiative approach, which both queries the user for preferences and allows the user to suggest improvements via coactive feedback. By combining multiple types of user feedback within a Gaussian process-based learning framework, CoSpar is able to identify well-performing gaits within relatively few trials. CoSpar is validated in both simulation and in human subject experiments with the Atalante exoskeleton, in which CoSpar finds user-preferred gaits within a gait library.
This work also presents the LineCoSpar algorithm, which integrates CoSpar with techniques from high-dimensional Bayesian optimization (Kirschner, Mutny, et al., 2019) to create a unified framework for performing high-dimensional preference- based learning. In simulation, LineCoSpar exhibits sample-efficient convergence to user-preferred actions in high-dimensional spaces. The algorithm is then deployed experimentally to optimize exoskeleton walking over six gait parameters (shown in Figure 5.2) for six able-bodied subjects.
In summary, the CoSpar and LineCoSpar algorithms perform sample-efficient, mixed-initiative human-in-the-loop learning to identify preferred actions in pos- sibly high-dimensional spaces. This work not only identifies exoskeleton users’
preferred walking trajectories, but can also provide insights into their preferences for certain gaits. Such knowledge could potentially help to design more comfortable
Figure 5.2: Human subject experiments with the LineCoSpar algorithm exploring six exoskeleton gait parameters: step length, step duration, step width, maximum step height, pelvis roll, and pelvis pitch.
exoskeleton gaits in the future.
5.2 Background on the Atalante Exoskeleton and Gait Generation for Bipedal