Spinal Cord Injury Therapy through Active Learning

This dissertation develops new theoretical bounds on the performance of these and similar algorithms, empirically evaluates these algorithms against several competitors in simulation, and uses a closed-loop version of the GP-BUCB algorithm to control SCI therapy via epidural electrostimulation in four living rats. The dura is the outermost of the three meninges; the middle is arachnoid and the innermost pia.

Spinal Cord Injury

The spinal cord is composed of the gray matter and white matter regions in the middle of the figure. Sufficiently severe damage to the spinal cord can lead to the loss of voluntary control (often accompanied by loss of sensation) of the legs (paraplegia) or the legs and arms (quadraplegia).

Figure 1.2: Schematic diagram of the the spinal cord in cross-section. The dorsal surface of the body is upward in this figure

Epidural Electrostimulation

Another thorough discussion of spinal cord organization, function, and dysfunction can be found in the text and atlas of Watson et al. Patients with SCI have different clinical symptoms depending on the location of the injury in the spinal cord, including different syndromes that are symptomatic of injuries to different structures in the spinal cord.

Figure 1.3: A schematic view of SCI therapy through EES. Arrows show the flow of information or action through the composite system

Active Learning

If these competing imperatives are properly balanced, it can sometimes be demonstrated that the algorithm will converge to the optimal action (i.e., the rate of sub-optimal actions approaches zero) with high probability in the limit of infinite time. Recent work in this area has brought together bandits and Bayesian optimization, deriving algorithms that seek to explore and exploit over very large decision sets, using response function models (eg, the GP-UCB algorithm of Srinivas et al. ., 2010, which uses Gaussian processes to model the reward function).

Objective Statement: Major Problem

If the algorithm operates continuously, it must consider the therapeutic efficacy of the mediated stimuli as an essential component of its decision-making if an effective treatment is to be used. Therefore, poor stimulus choices destroy much of the utility of experimental or therapeutic training.

Contributions

Performance guarantees are an important requirement, as the practical performance of the algorithm may be easier to understand in light of these guarantees. Modularity is highly desirable because different components can be interchanged to suit the problem at hand.

Organization

Compared to the existing GP-UCB algorithm (Srinivas et al., 2010), GP-BUCBandGP-AUCB can select sets of experiments or use the knowledge of pending experimental observations to help select future experiments. An area of considerable interest is the acute mitigation of this secondary damage (Zhang et al., 2013).

Existing Therapeutic Approaches

Functional Electrical Stimulation
Regenerative Therapies
Cord-Rehabilitative Approaches

Epidural Electrostimulation

Combined Approaches

The results of Cai et al. 2006) may provide evidence that variability in the training paradigm is important to avoid this outcome. This type of stimulation is believed to activate afferent fibers as they enter the spinal cord through dorsal nerve roots (Minassian et al., 2007).

Active Learning and Bandits

Bandit Algorithms

Classical Setting
Making Large Problems Tractable: Structural Assumptions

Bayesian Optimization
Parallel Selection
Active Learning in the Face of Time Variation
Learning Systems and Control Algorithms in Biological Contexts

A review of the literature on control algorithms and learning agents in biological applications follows in Section 2.3.5. The exploration-exploitation combination has also been studied in global Bayesian optimization and response surface modeling, where Gaussian process models are often used due to their flexibility in incorporating prior assumptions about the structure of the profit function (Brochu et al., 2009). ). .

Gaussian Processes

Regression Using Gaussian Processes

Of great interest in this dissertation is the use of the GP model to predict f(x∗), the value of a function extracted from the GP at the test point x∗, given some finite set of observations y corresponding to the set X These forms represent the uncertainty about which function from Gaussian processes explain the observations, and capture the marginalization over all functions that could be drawn from GP.

Covariance Functions

Reproducing Kernel Hilbert Spaces
Stationary Covariance Functions on R d
Non-stationary Covariance Functions on R d
Constructing Covariance Functions

Thus, the 1st-order process AR(p) corresponds to the special case of the exponential covariance function (ν= 1/2), and the 2nd-order process corresponds to the special case of the Mat´ern covariance with ν = 3/2. In the rest of the chapter, we begin with a formal description of the problem setting (Section 3.2).

Problem Setting and Background

The Problem: Parallel or Delayed Selection
Modeling f via Gaussian Processes (GPs)
Conditional Mutual Information
The GP-UCB approach

It is important to note that σ2t0−1(xt0) is independent of the values of the observations. Implicit in the definition of the GP-UCB decision rule, equation (3.5), is the corresponding confidence interval.

GP-BUCB Algorithm and Regret Bounds

GP-BUCB: An Overview
General Regret Bound
Suitable Choices for C
Corollary Regret Bound: GP-BUCB
Better Bounds Through Initialization

This ratio quantifies the degree of “overconfidence” with respect to the rear from the start of the batch. Let RT be sorry for the two-stage initialized GP-BUCB algorithm, which ignores feedback for the first Tinit rounds.

Figure 3.1: (a): The confidence intervals C fb[t]+1 seq (x) (dark), computed from previous noisy observa- observa-tions y 1:fb[t] (crosses), are centered around the posterior mean (solid black) and contain f (x) (white dashed) w.h.p

Adaptive Parallelism: GP-AUCB

GP-AUCB Algorithm

To make sure this happens, C can be set so that C > γ(Bmin−1), that is, no set of queries with a size smaller than Bmin could potentially get enough information to terminate the batch. If the algorithm converges to the optimal subset X∗ ⊆ D, as the regret bound suggests, and X∗ has a finite size, then the variances of the selected actions (and thus the denominator in the above expression) can be expected to be over the generally very small and produce very long length batches, even for very small ones.

Local Stopping Conditions Versus Global Stopping Conditions

Lazy Variance Calculations

In implementing GP-AUCB Local, we can run what is effectively lazy GP-AUCB until the global stop condition is met, at which time we switch to GP-AUCB Local.

Computational Experiments

Experimental Comparisons
Data Sets

Synthetic Benchmark Problems
Automated Vaccine Design
Spinal Cord Injury (SCI) Therapy

Computational Performance
Parallelism: Costs and Tradeoffs

For each of the experimental data sets used in this chapter, the kernel functions and experimental constants are listed in Table 3.2. Since most algorithms under conditions of the same batch length perform very comparably in terms of both average and minimum regret, the most interesting results are i.

Table 3.2: Experimental kernel functions and parameters.

Conclusions

The aim of this experiment was to increase the amplitude of the resulting evoked potential. If this is not the case and the algorithm converges to the maximizer of the reward function, the true therapeutic utility may not be maximized.

Figure 3.2: Time-average (AR) and minimum (MR) regret plots, batch setting, for a batch size of 5.

Experimental Methods

Injury, Implantation, and Animal Care
Parylene Arrays
Wire-based Spinal Stimulating Arrays
Animal Testing Procedures

The smaller red text overlaid on the array indicates the segmental level of the spine that lies below that portion of the array. Due to the design of the implanted circuit board, 36 of these pairs cannot be stimulated, but the remaining 666 can.

Figure 4.1: Parylene Array Device. (A) and (B): The complete implant, including the main circuit board, parylene electrode array, head plug, EMG wires, and ground wires

Objective Function

In each of the first three runs, the experimenter selected the first three actions based on knowledge of the anatomical location of the electrodes relative to the distal motor pools of the legs, combined with observations from the previous runs of the day. In the fourth and fifth series, the experimenter selected measures to explore parts of the string and string that were judged to be less likely to elicit strong responses.

Modifications to the GP-BUCB algorithm

Time Variation of the Reward Function

This is in stark contrast to the nominal setting for the UCB family of algorithms, in which the observations of D get progressively denser as the algorithm runs, so that the problem is more of an interpolation issue and the posterior uncertainty does not increase. Because the posterior uncertainty cannot decrease beyond a finite level, depending on the rate at which the response function changes with respect to the arrival of observations, the algorithm will never really "converge".

Redundancy Control and Repeated Observations

Kernel and Mean Functions

Unfortunately, due to the strong smoothness assumptions implicit in this kernel (the squared exponential kernel implies that the Gaussian process is infinitely mean-square differentiable; see Rasmussen and Williams, 2006, section 4.1.1), problems have resulted from intra -day variations in the responses of individual configurations, as well as of long gaps in testing, e.g. weekends. Functionally, this means that the uncertainty about f(x) never becomes less than a small, positive value, which is useful because it means that short-time variations in the function are subsumed into this noise term, capturing the overall shape. by the.

Figure 4.3: Simplified system diagram for the GP-BUCB algorithm interacting with the spinal cord.

Results

Wire-Based Array Animals: Results

Both of these animals also showed an increase in response level over the course of the experiments. The qualitative form of this change in responsiveness may be a function of recovery from the injury.

Parylene Microarray Animals: Results

Computational Performance

This can potentially be reduced by a factor of 10 or 20, depending on the number of pulses typically corresponding to each action, potentially reducing the computational time by a factor of m3, since the computation of the Cholesky decay rates as O( n3).

Discussion

Wire-based Array Animals
Cross-animal Comparisons
Parylene Array Animals
Therapeutic Relevance
Kernels and Hyperparameters

Stronger support for the search effectiveness of the algorithm can be derived from the fact that the algorithm avoided visiting many of the configurations in the stimulus space. However, the extent to which the algorithm was able to effectively distribute its actions despite the limitations can be examined.

Table 4.4: Days in both animals 2 and 5 in which some configurations were tested in common, combining human- and algorithm-commanded experiments for each animal

Conclusions

The algorithm's average reward is again typically superior to that of the human experimenter, while maintaining competitive maximum reward, as of similar time or action index. The algorithm initiated 25 actions (17 unique pairs of electrodes) and the human initiated 23 actions (all unique).

Figure 4.4: Peak-to-peak amplitude (mV, reward) of all individual stimulus pulses for each animal, combined across runs

Prior Human Experiments

It can be very difficult to extract this kind of information from EMG in an automated manner. As with standing, it may be that what is truly desirable in pedaling is not simply the basic pattern of motor activity, but the ability to respond to perturbation while pedaling.

Pilot Applications of GP-BUCB to Human SCI Therapy: Introduction

Since the effect of the stimulus change is almost immediate, observations of the patient's responses can potentially provide feedback on the resulting performance. A major obstacle during this optimization process, due in part to the limitations of the redesigned Medtronic hardware, is that stimulation must be temporarily stopped when the active stimulation electrode pattern is changed.

Mathematical Methods

Performance Measures

Subjective Ratings
Grading Vector-valued EMG

Some discussion of the difficulties inherent in this non-equivalence of rating and utility is discussed in Section 5.4.2.1. Performance must necessarily be quantified as a scalar for performance optimization to be a meaningful concept.

Algorithmic Extensions

Divorcing Reward from the Function Regressed Upon
Making Decisions Using Vector-Valued Functions
Choosing Paths

Finally, the scalar case of this decision rule reduces to the GP-BUCB decision rule, Equation 3.7, fort→. A version of this decision rule was implemented in the code package prepared for the pilot experiments.

Novel Covariance Functions

These localizations can also be verified with respect to the array using the data from the supine stimulation experiments described in Section 5.2. One way to do this simply is to choose the weights as the height values of a Gaussian bump centered on the location of the motor pool associated with muscles.

Preliminary Results and Discussion

After the completion of the first path, the stimulus was set at 4.4 V and 30Hz, which was used as the starting condition for the remaining three paths of the first session. Within this second, EMG-based session, the algorithm selected two paths, the first starting from the best stimuli found during the fourth path of the previous session.

Figure 5.1: Search paths, ratings, and EMG features from experiments with participants ARI and BQB

Extensions

Time Series Information and Coordination of Muscles
Dynamical Systems Approaches: Cost Functions and LQG
Alternative Covariance Functions
Expansions of the Decision Set

In this case, a physical human model can be used to infer a simple parameterization of the composite controller (spinal cord under the influence of the EES system); this becomes a system identification problem, classically treated in the control and dynamical systems literature (for an introduction to system identification, see the text by Ljung, 1999). As the experiments progress, the algorithm gradually builds a set of human-interpretable predictions about the response of the spinal cord.

Future Work

Chondroitinase and growth factors enhance oligodendrocyte activation and differentiation of endogenous neural progenitor cells after spinal cord injury. Clinical observation of fetal olfactory ensheathing glial transplantation (OEGT) in patients with chronic complete spinal cord injury.

Theorem 4: Initialization Set Size Bounds

Initialization Set Size: Linear Kernel

We try to initialize GP-BUCB with a set Dinit of size Tinit, assuming, motivated by this boundary and the form of Inequality (3.16), that Tinit has the form. So for a linear kernel and such k, an initialization set Dinit of size Tinit, where Tinit ≥ kηd(B−1) log(B), ensures that the hallucinated conditional information in any future batch of size B ≤ 2e.

Initialization Set Size: Mat´ ern Kernel

Initialization Set Size: Squared Exponential (RBF) Kernel

GP-AUCB: Finite Batch Size

During two of the animal experiment runs (animal 5, run 1 and animal 7), a significant number of actions performed by the human experimenter were missed or abandoned. Without inserted passes, the same action indices for the human and algorithm do not correspond to the same time, and visual interpretation of the regret plots is difficult; after inserted passes, this synchrony is restored.

Table B.1: Average (AR) and Minimum regret (MR) for fixed batch size B = 5.

Proof Multi-Muscle Uncertainty Term is Non-Increasing

Path-Based Decision Rules

The Vertebral Column and Spinal Cord
The Spinal Cord in Cross-section
EES-Based SCI Therapy: Schematic
GP-BUCB: Confidence Interval Containment
Regret: Batch Size = 5
Regret: Delay = 5
Regret: Non-adaptive Algorithms, Batch Sizes = 5, 10, & 20
Regret: Adaptive Algorithms, Batch Sizes = 5, 10, & 20
Regret: Delays = 5, 10, & 20
Elapsed Computational Time By Algorithm
Cost Parameterization: Algorithmic Tradeoffs
Parylene Array Device
Placement of the Array Device Relative to the Spinal Cord
Experimental System Diagram
Animal Experiment Overview: All Observed Evoked Potentials
Reward: Animal 2, Run 1
Reward: Animal 3
Reward: Animal 7
One Testing Day: Animal 5, Run 2, P35 Responses
Animal 5: Retrospective
Animal 7: Final Day Rewards
Cross-animal Comparisons
The Consequences of Kernel Mis-specification
Initialization Set Sizes for Theorem 4
Kernel Functions and Parameters: Computational Experiments
Stimulus Latency Windows
Kernel and Hyperparameter Choices by Experimental Run
Actions Taken Per Run
Repeatability Between Individual Animals
Proportion of Stimuli Yielding Satisfactory Responses

This decision rule was implemented for a version of the human experimental code that is designed to search through the space of voltage and frequency parameters corresponding to a fixed array of active electrodes.

Regret: Batch Size = 5

Regret: Delay = 5

Regret: Non-adaptive Algorithms, Batch Sizes = 5, 10, & 20

Regret: Adaptive Algorithms, Batch Sizes = 5, 10, & 20

Regret: Delays = 5, 10, & 20

Elapsed Computational Time By Algorithm