This dissertation develops new theoretical bounds on the performance of these and similar algorithms, empirically evaluates these algorithms against several competitors in simulation, and uses a closed-loop version of the GP-BUCB algorithm to control SCI therapy via epidural electrostimulation in four living rats. The dura is the outermost of the three meninges; the middle is arachnoid and the innermost pia.
Spinal Cord Injury
The spinal cord is composed of the gray matter and white matter regions in the middle of the figure. Sufficiently severe damage to the spinal cord can lead to the loss of voluntary control (often accompanied by loss of sensation) of the legs (paraplegia) or the legs and arms (quadraplegia).
Epidural Electrostimulation
Another thorough discussion of spinal cord organization, function, and dysfunction can be found in the text and atlas of Watson et al. Patients with SCI have different clinical symptoms depending on the location of the injury in the spinal cord, including different syndromes that are symptomatic of injuries to different structures in the spinal cord.
Active Learning
If these competing imperatives are properly balanced, it can sometimes be demonstrated that the algorithm will converge to the optimal action (i.e., the rate of sub-optimal actions approaches zero) with high probability in the limit of infinite time. Recent work in this area has brought together bandits and Bayesian optimization, deriving algorithms that seek to explore and exploit over very large decision sets, using response function models (eg, the GP-UCB algorithm of Srinivas et al. ., 2010, which uses Gaussian processes to model the reward function).
Objective Statement: Major Problem
If the algorithm operates continuously, it must consider the therapeutic efficacy of the mediated stimuli as an essential component of its decision-making if an effective treatment is to be used. Therefore, poor stimulus choices destroy much of the utility of experimental or therapeutic training.
Contributions
Performance guarantees are an important requirement, as the practical performance of the algorithm may be easier to understand in light of these guarantees. Modularity is highly desirable because different components can be interchanged to suit the problem at hand.
Organization
Compared to the existing GP-UCB algorithm (Srinivas et al., 2010), GP-BUCBandGP-AUCB can select sets of experiments or use the knowledge of pending experimental observations to help select future experiments. An area of considerable interest is the acute mitigation of this secondary damage (Zhang et al., 2013).
Existing Therapeutic Approaches
- Functional Electrical Stimulation
- Regenerative Therapies
- Cord-Rehabilitative Approaches
- Epidural Electrostimulation
- Combined Approaches
The results of Cai et al. 2006) may provide evidence that variability in the training paradigm is important to avoid this outcome. This type of stimulation is believed to activate afferent fibers as they enter the spinal cord through dorsal nerve roots (Minassian et al., 2007).
Active Learning and Bandits
- Bandit Algorithms
- Classical Setting
- Making Large Problems Tractable: Structural Assumptions
- Bayesian Optimization
- Parallel Selection
- Active Learning in the Face of Time Variation
- Learning Systems and Control Algorithms in Biological Contexts
A review of the literature on control algorithms and learning agents in biological applications follows in Section 2.3.5. The exploration-exploitation combination has also been studied in global Bayesian optimization and response surface modeling, where Gaussian process models are often used due to their flexibility in incorporating prior assumptions about the structure of the profit function (Brochu et al., 2009). ). .
Gaussian Processes
Regression Using Gaussian Processes
Of great interest in this dissertation is the use of the GP model to predict f(x∗), the value of a function extracted from the GP at the test point x∗, given some finite set of observations y corresponding to the set X These forms represent the uncertainty about which function from Gaussian processes explain the observations, and capture the marginalization over all functions that could be drawn from GP.
Covariance Functions
- Reproducing Kernel Hilbert Spaces
- Stationary Covariance Functions on R d
- Non-stationary Covariance Functions on R d
- Constructing Covariance Functions
Thus, the 1st-order process AR(p) corresponds to the special case of the exponential covariance function (ν= 1/2), and the 2nd-order process corresponds to the special case of the Mat´ern covariance with ν = 3/2. In the rest of the chapter, we begin with a formal description of the problem setting (Section 3.2).
Problem Setting and Background
- The Problem: Parallel or Delayed Selection
- Modeling f via Gaussian Processes (GPs)
- Conditional Mutual Information
- The GP-UCB approach
It is important to note that σ2t0−1(xt0) is independent of the values of the observations. Implicit in the definition of the GP-UCB decision rule, equation (3.5), is the corresponding confidence interval.
GP-BUCB Algorithm and Regret Bounds
- GP-BUCB: An Overview
- General Regret Bound
- Suitable Choices for C
- Corollary Regret Bound: GP-BUCB
- Better Bounds Through Initialization
This ratio quantifies the degree of “overconfidence” with respect to the rear from the start of the batch. Let RT be sorry for the two-stage initialized GP-BUCB algorithm, which ignores feedback for the first Tinit rounds.
Adaptive Parallelism: GP-AUCB
GP-AUCB Algorithm
To make sure this happens, C can be set so that C > γ(Bmin−1), that is, no set of queries with a size smaller than Bmin could potentially get enough information to terminate the batch. If the algorithm converges to the optimal subset X∗ ⊆ D, as the regret bound suggests, and X∗ has a finite size, then the variances of the selected actions (and thus the denominator in the above expression) can be expected to be over the generally very small and produce very long length batches, even for very small ones.
Local Stopping Conditions Versus Global Stopping Conditions
Lazy Variance Calculations
In implementing GP-AUCB Local, we can run what is effectively lazy GP-AUCB until the global stop condition is met, at which time we switch to GP-AUCB Local.
Computational Experiments
- Experimental Comparisons
- Data Sets
- Synthetic Benchmark Problems
- Automated Vaccine Design
- Spinal Cord Injury (SCI) Therapy
- Computational Performance
- Parallelism: Costs and Tradeoffs
For each of the experimental data sets used in this chapter, the kernel functions and experimental constants are listed in Table 3.2. Since most algorithms under conditions of the same batch length perform very comparably in terms of both average and minimum regret, the most interesting results are i.
Conclusions
The aim of this experiment was to increase the amplitude of the resulting evoked potential. If this is not the case and the algorithm converges to the maximizer of the reward function, the true therapeutic utility may not be maximized.
Experimental Methods
- Injury, Implantation, and Animal Care
- Parylene Arrays
- Wire-based Spinal Stimulating Arrays
- Animal Testing Procedures
The smaller red text overlaid on the array indicates the segmental level of the spine that lies below that portion of the array. Due to the design of the implanted circuit board, 36 of these pairs cannot be stimulated, but the remaining 666 can.
Objective Function
In each of the first three runs, the experimenter selected the first three actions based on knowledge of the anatomical location of the electrodes relative to the distal motor pools of the legs, combined with observations from the previous runs of the day. In the fourth and fifth series, the experimenter selected measures to explore parts of the string and string that were judged to be less likely to elicit strong responses.
Modifications to the GP-BUCB algorithm
Time Variation of the Reward Function
This is in stark contrast to the nominal setting for the UCB family of algorithms, in which the observations of D get progressively denser as the algorithm runs, so that the problem is more of an interpolation issue and the posterior uncertainty does not increase. Because the posterior uncertainty cannot decrease beyond a finite level, depending on the rate at which the response function changes with respect to the arrival of observations, the algorithm will never really "converge".
Redundancy Control and Repeated Observations
Kernel and Mean Functions
Unfortunately, due to the strong smoothness assumptions implicit in this kernel (the squared exponential kernel implies that the Gaussian process is infinitely mean-square differentiable; see Rasmussen and Williams, 2006, section 4.1.1), problems have resulted from intra -day variations in the responses of individual configurations, as well as of long gaps in testing, e.g. weekends. Functionally, this means that the uncertainty about f(x) never becomes less than a small, positive value, which is useful because it means that short-time variations in the function are subsumed into this noise term, capturing the overall shape. by the.
Results
Wire-Based Array Animals: Results
Both of these animals also showed an increase in response level over the course of the experiments. The qualitative form of this change in responsiveness may be a function of recovery from the injury.
Parylene Microarray Animals: Results
Computational Performance
This can potentially be reduced by a factor of 10 or 20, depending on the number of pulses typically corresponding to each action, potentially reducing the computational time by a factor of m3, since the computation of the Cholesky decay rates as O( n3).
Discussion
- Wire-based Array Animals
- Cross-animal Comparisons
- Parylene Array Animals
- Therapeutic Relevance
- Kernels and Hyperparameters
Stronger support for the search effectiveness of the algorithm can be derived from the fact that the algorithm avoided visiting many of the configurations in the stimulus space. However, the extent to which the algorithm was able to effectively distribute its actions despite the limitations can be examined.
Conclusions
The algorithm's average reward is again typically superior to that of the human experimenter, while maintaining competitive maximum reward, as of similar time or action index. The algorithm initiated 25 actions (17 unique pairs of electrodes) and the human initiated 23 actions (all unique).
Prior Human Experiments
It can be very difficult to extract this kind of information from EMG in an automated manner. As with standing, it may be that what is truly desirable in pedaling is not simply the basic pattern of motor activity, but the ability to respond to perturbation while pedaling.
Pilot Applications of GP-BUCB to Human SCI Therapy: Introduction
Since the effect of the stimulus change is almost immediate, observations of the patient's responses can potentially provide feedback on the resulting performance. A major obstacle during this optimization process, due in part to the limitations of the redesigned Medtronic hardware, is that stimulation must be temporarily stopped when the active stimulation electrode pattern is changed.
Mathematical Methods
Performance Measures
- Subjective Ratings
- Grading Vector-valued EMG
Some discussion of the difficulties inherent in this non-equivalence of rating and utility is discussed in Section 5.4.2.1. Performance must necessarily be quantified as a scalar for performance optimization to be a meaningful concept.
Algorithmic Extensions
- Divorcing Reward from the Function Regressed Upon
- Making Decisions Using Vector-Valued Functions
- Choosing Paths
Finally, the scalar case of this decision rule reduces to the GP-BUCB decision rule, Equation 3.7, fort→. A version of this decision rule was implemented in the code package prepared for the pilot experiments.
Novel Covariance Functions
These localizations can also be verified with respect to the array using the data from the supine stimulation experiments described in Section 5.2. One way to do this simply is to choose the weights as the height values of a Gaussian bump centered on the location of the motor pool associated with muscles.
Preliminary Results and Discussion
After the completion of the first path, the stimulus was set at 4.4 V and 30Hz, which was used as the starting condition for the remaining three paths of the first session. Within this second, EMG-based session, the algorithm selected two paths, the first starting from the best stimuli found during the fourth path of the previous session.
Extensions
- Time Series Information and Coordination of Muscles
- Dynamical Systems Approaches: Cost Functions and LQG
- Alternative Covariance Functions
- Expansions of the Decision Set
In this case, a physical human model can be used to infer a simple parameterization of the composite controller (spinal cord under the influence of the EES system); this becomes a system identification problem, classically treated in the control and dynamical systems literature (for an introduction to system identification, see the text by Ljung, 1999). As the experiments progress, the algorithm gradually builds a set of human-interpretable predictions about the response of the spinal cord.
Future Work
Chondroitinase and growth factors enhance oligodendrocyte activation and differentiation of endogenous neural progenitor cells after spinal cord injury. Clinical observation of fetal olfactory ensheathing glial transplantation (OEGT) in patients with chronic complete spinal cord injury.
Theorem 4: Initialization Set Size Bounds
Initialization Set Size: Linear Kernel
We try to initialize GP-BUCB with a set Dinit of size Tinit, assuming, motivated by this boundary and the form of Inequality (3.16), that Tinit has the form. So for a linear kernel and such k, an initialization set Dinit of size Tinit, where Tinit ≥ kηd(B−1) log(B), ensures that the hallucinated conditional information in any future batch of size B ≤ 2e.
Initialization Set Size: Mat´ ern Kernel
Initialization Set Size: Squared Exponential (RBF) Kernel
GP-AUCB: Finite Batch Size
During two of the animal experiment runs (animal 5, run 1 and animal 7), a significant number of actions performed by the human experimenter were missed or abandoned. Without inserted passes, the same action indices for the human and algorithm do not correspond to the same time, and visual interpretation of the regret plots is difficult; after inserted passes, this synchrony is restored.
Proof Multi-Muscle Uncertainty Term is Non-Increasing
Path-Based Decision Rules
- The Vertebral Column and Spinal Cord
- The Spinal Cord in Cross-section
- EES-Based SCI Therapy: Schematic
- GP-BUCB: Confidence Interval Containment
- Regret: Batch Size = 5
- Regret: Delay = 5
- Regret: Non-adaptive Algorithms, Batch Sizes = 5, 10, & 20
- Regret: Adaptive Algorithms, Batch Sizes = 5, 10, & 20
- Regret: Delays = 5, 10, & 20
- Elapsed Computational Time By Algorithm
- Cost Parameterization: Algorithmic Tradeoffs
- Parylene Array Device
- Placement of the Array Device Relative to the Spinal Cord
- Experimental System Diagram
- Animal Experiment Overview: All Observed Evoked Potentials
- Reward: Animal 2, Run 1
- Reward: Animal 2, Run 2
- Reward: Animal 2, Run 3
- Reward: Animal 5, Run 1
- Reward: Animal 5, Run 2
- Reward: Animal 3
- Reward: Animal 7
- One Testing Day: Animal 5, Run 2, P35 Responses
- Animal 5: Retrospective
- Animal 7: Final Day Rewards
- Cross-animal Comparisons
- The Consequences of Kernel Mis-specification
- Initialization Set Sizes for Theorem 4
- Kernel Functions and Parameters: Computational Experiments
- Stimulus Latency Windows
- Kernel and Hyperparameter Choices by Experimental Run
- Actions Taken Per Run
- Repeatability Between Individual Animals
- Proportion of Stimuli Yielding Satisfactory Responses
This decision rule was implemented for a version of the human experimental code that is designed to search through the space of voltage and frequency parameters corresponding to a fixed array of active electrodes.
Regret: Batch Size = 5
Regret: Delay = 5
Regret: Non-adaptive Algorithms, Batch Sizes = 5, 10, & 20
Regret: Adaptive Algorithms, Batch Sizes = 5, 10, & 20
Regret: Delays = 5, 10, & 20
Elapsed Computational Time By Algorithm