• Tidak ada hasil yang ditemukan

We develop theGP-BUCBandGP-AUCBalgorithms for parallelizing exploration-exploitation trade- offs in Gaussian process bandit optimization. The analytical framework used to bound the regret of GP-BUCBandGP-AUCBis generalized to include all GP-UCB-type algorithms. We prove Theorem 1, which provides high-probability bounds on the cumulative regret of algorithms in this class, which hold for both the batch and delay setting. These bounds consequently provide guarantees on the convergence of such algorithms. Further, we prove Theorem 4, which establishes a high-probability regret bound for initialized GP-BUCB. This bound scales independently of the batch size or delay lengthB, ifB is constant or polylogarithmic in T. Finally, we introduce lazy variance calculations, which dramatically accelerate computation of GP-based active learning decision rules.

Across the experimental settings examined, GP-BUCB and GP-AUCB performed comparably with state of the art parallel and adaptive parallel Bayesian optimization algorithms, which are not equipped with theoretical bounds on regret. GP-BUCB and GP-AUCBalso perform comparably to the sequential GP-UCB algorithm, indicating that GP-BUCB and GP-AUCB successfully overcome the disadvantages of only receiving delayed or batched feedback. With respect to cost to achieve a given level of regret, GP-AUCBappears to offer substantial advantages over the fully parallel or fully sequential approaches. We believe that our results provide an important step towards solving complex, large-scale exploration-exploitation tradeoffs.

0 50 100 150 200 250 300 0

0.2 0.4 0.6 0.8 1 1.2 1.4

Time (Queries)

Time−average Regret

GP−UCB

GP−BUCB SM−UCB SM−MEI

(a)Mat´ern: AR

0 50 100 150 200 250 300

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Time (Queries)

Time−average Regret

GP−UCB

GP−BUCB SM−UCB SM−MEI

(b)SE: AR

0 50 100 150 200 250 300

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

Time (Queries)

Time−average Regret

GP−BUCB SM−UCB SM−MEI GP−UCB

GP−UCB

(c)Rosenbrock: AR

0 20 40 60 80 100

0 0.2 0.4 0.6 0.8 1 1.2

Time (Queries)

Minimum Regret

GP−UCB GP−BUCB SM−UCB SM−MEI

(d)Mat´ern: MR

0 20 40 60 80 100

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Time (Queries)

Minimum Regret GP−UCB

GP−BUCB SM−UCB SM−MEI

(e)SE: MR

0 5 10 15 20 25

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

Time (Queries)

Minimum Regret

GP−UCB GP−BUCB SM−UCB SM−MEI

(f)Rosenbrock: MR

0 50 100 150 200 250 300

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Time (Queries)

Time−average Regret GP−UCB

GP−BUCB SM−UCB SM−MEI GP−UCB

(g)Cosines: AR

0 50 100 150 200 250 300

0 0.5 1 1.5 2 2.5 3 3.5 4

Time (Queries)

Time−average Regret

GP−UCB GP−BUCB SM−UCB SM−MEI

(h)Vaccine: AR

0 50 100 150 200 250 300

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Time (Queries)

Time−average Regret

GP−UCB GP−BUCB

SM−UCB SM−MEI

(i)SCI: AR

0 20 40 60 80 100

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Time (Queries)

Minimum Regret

GP−UCB GP−BUCB SM−UCB SM−MEI

(j)Cosines: MR

0 20 40 60 80 100

0 0.5 1 1.5 2 2.5 3 3.5

Time (Queries)

Minimum Regret

GP−UCB GP−BUCB

SM−UCB SM−MEI

(k)Vaccine: MR

0 20 40 60 80 100

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Time (Queries)

Minimum Regret

GP−UCB GP−BUCB SM−UCB SM−MEI

(l)SCI: MR

Figure 3.2: Time-average (AR) and minimum (MR) regret plots, batch setting, for a batch size of 5.

0 50 100 150 200 250 300 0

0.2 0.4 0.6 0.8 1 1.2 1.4

Time (Rounds)

Time−average Regret

GP−UCB

GP−BUCB GP−AUCB

(a)Mat´ern: AR

0 50 100 150 200 250 300

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Time (Rounds)

Time−average Regret GP−UCB

GP−BUCB GP−AUCB

(b)SE: AR

0 50 100 150 200 250 300

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

Time (Rounds)

Time−average Regret

GP−UCB GP−BUCB

GP−AUCB

(c)Rosenbrock: AR

0 20 40 60 80 100

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Time (Rounds)

Minimum Regret

GP−UCB GP−BUCB GP−AUCB

(d)Mat´ern: MR

0 20 40 60 80 100

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Time (Rounds)

Minimum Regret

GP−UCB GP−BUCB GP−AUCB

(e)SE: MR

0 20 40 60 80 100

0 0.05 0.1 0.15 0.2 0.25

Time (Rounds)

Minimum Regret GP−UCB

GP−BUCB GP−AUCB

(f)Rosenbrock: MR

0 50 100 150 200 250 300

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Time (Rounds)

Time−average Regret GP−UCB

GP−BUCB GP−AUCB

(g)Cosines: AR

0 50 100 150 200 250 300

0 0.5 1 1.5 2 2.5 3 3.5 4

Time (Rounds)

Time−average Regret

GP−UCB GP−BUCBGP−AUCB

(h)Vaccine: AR

0 50 100 150 200 250 300

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Time (Rounds)

Time−average Regret

GP−UCB

GP−BUCB GP−AUCB

(i)SCI: AR

0 20 40 60 80 100

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

Time (Rounds)

Minimum Regret

GP−UCB GP−BUCB

GP−AUCB

(j)Cosines: MR

0 20 40 60 80 100

0 0.5 1 1.5 2 2.5 3

Time (Rounds)

Minimum Regret GP−UCB

GP−BUCB GP−AUCB

(k)Vaccine: MR

0 20 40 60 80 100

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Time (Rounds)

Minimum Regret

GP−UCB

GP−BUCB GP−AUCB

(l)SCI: MR

Figure 3.3: Time-average (AR) and minimum (MR) regret plots, delay setting, with a delay length of 5 rounds between action and observation.

0 50 100 150 200 0

0.2 0.4 0.6 0.8 1 1.2 1.4

Time (Queries)

Time−average Regret GP−BUCB, B =5

GP−BUCB, B =10 GP−BUCB, B =20

SM−UCB, B =5 SM−UCB, B =10 SM−UCB, B =20

SM−MEI, B =5 SM−MEI, B =10 SM−MEI, B =20

(a)Mat´ern: AR

0 50 100 150 200

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Time (Queries)

Time−average Regret GP−BUCB, B =5

GP−BUCB, B =10 GP−BUCB, B =20

SM−UCB, B =5 SM−UCB, B =10 SM−UCB, B =20

SM−MEI, B =5 SM−MEI, B =10 SM−MEI, B =20

(b)SE: AR

0 50 100 150 200

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

Time (Queries)

Time−average Regret GP−BUCB, B =5GP−BUCB, B =10

GP−BUCB, B =20

SM−UCB, B =5 SM−UCB, B =10 SM−UCB, B =20

SM−MEI, B =5 SM−MEI, B =10 SM−MEI, B =20

(c)Rosenbrock: AR

0 10 20 30 40 50

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Time (Queries)

Minimum Regret

GP−BUCB, B =5 GP−BUCB, B =10

GP−BUCB, B =20 SM−UCB, B =5 SM−UCB, B =10 SM−UCB, B =20

SM−MEI, B =5 SM−MEI, B =10 SM−MEI, B =20

(d)Mat´ern: MR

0 10 20 30 40 50

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Time (Queries)

Minimum Regret

GP−BUCB, B =5 GP−BUCB, B =10 GP−BUCB, B =20

SM−UCB, B =5 SM−UCB, B =10 SM−UCB, B =20

SM−MEI, B =5 SM−MEI, B =10 SM−MEI, B =20

(e)SE: MR

0 5 10 15 20 25 30

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

Time (Queries)

Minimum Regret

GP−BUCB, B =5 GP−BUCB, B =10 GP−BUCB, B =20 SM−UCB, B =5 SM−UCB, B =10 SM−UCB, B =20 SM−MEI, B =5 SM−MEI, B =10 SM−MEI, B =20

(f)Rosenbrock: MR

0 50 100 150 200

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6

Time (Queries)

Time−average Regret

GP−BUCB, B =5 GP−BUCB, B =10 GP−BUCB, B =20 SM−UCB, B =5 SM−UCB, B =10 SM−UCB, B =20

SM−MEI, B =5 SM−MEI, B =10 SM−MEI, B =20

(g)Cosines: AR

0 50 100 150 200

0.5 1 1.5 2 2.5 3 3.5 4

Time (Queries)

Time−average Regret

GP−BUCB, B =5 GP−BUCB, B =10

GP−BUCB, B =20 SM−UCB, B =5 SM−UCB, B =10

SM−UCB, B =20 SM−MEI, B =5 SM−MEI, B =10

SM−MEI, B =20

(h)Vaccine: AR

0 50 100 150 200

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Time (Queries)

Time−average Regret

GP−BUCB, B =5

GP−BUCB, B =10 GP−BUCB, B =20

SM−UCB, B =5

SM−UCB, B =10 SM−UCB, B =20

SM−MEI, B =5

SM−MEI, B =10 SM−MEI, B =20

(i)SCI: AR

0 20 40 60 80 100

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Time (Queries)

Minimum Regret

GP−BUCB, B =5 GP−BUCB, B =10 GP−BUCB, B =20 SM−UCB, B =5 SM−UCB, B =10 SM−UCB, B =20

SM−MEI, B =5 SM−MEI, B =10 SM−MEI, B =20

(j)Cosines: MR

0 50 100 150 200

0 0.5 1 1.5 2 2.5 3 3.5 4

Time (Queries)

Minimum Regret

GP−BUCB, B =5 GP−BUCB, B =10 GP−BUCB, B =20 SM−UCB, B =5

SM−UCB, B =10 SM−UCB, B =20 SM−MEI, B =5 SM−MEI, B =10

SM−MEI, B =20

(k)Vaccine: MR

0 20 40 60 80 100

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Time (Queries)

Minimum Regret

GP−BUCB, B =5 GP−BUCB, B =10 GP−BUCB, B =20

SM−UCB, B =5

SM−UCB, B =10 SM−UCB, B =20 SM−MEI, B =5 SM−MEI, B =10 SM−MEI, B =20

(l)SCI: MR

Figure 3.4: Time-average (AR) and minimum (MR) regret plots, non-adaptive batch algorithms, batch sizes 5, 10, and 20.

0 50 100 150 200 0

0.2 0.4 0.6 0.8 1 1.2 1.4

Time (Queries)

Time−average Regret GP−AUCB, Bmax =5

GP−AUCB, Bmax =10 GP−AUCB, Bmax =20 GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20

HBBO UCB, Bmax =5 HBBO UCB, Bmax =10 HBBO UCB, Bmax =20 HBBO MEI, Bmax =5 HBBO MEI, Bmax =10 HBBO MEI, Bmax =20

(a)Mat´ern: AR

0 50 100 150 200

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Time (Queries)

Time−average Regret

GP−AUCB, Bmax =5 GP−AUCB, Bmax =10 GP−AUCB, Bmax =20

GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10 GP−AUCB Local, B

max =20

HBBO UCB, Bmax =5 HBBO UCB, Bmax =10 HBBO UCB, Bmax =20

HBBO MEI, Bmax =5 HBBO MEI, Bmax =10 HBBO MEI, Bmax =20

(b)SE: AR

0 50 100 150 200

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

Time (Queries)

Time−average Regret

GP−AUCB, Bmax =5 GP−AUCB, Bmax =10

GP−AUCB, Bmax =20 GP−AUCB Local, Bmax =5

GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20 HBBO UCB, Bmax =5

HBBO UCB, Bmax =10

HBBO UCB, Bmax =20 HBBO MEI, Bmax =5 HBBO MEI, Bmax =10 HBBO MEI, B

max =20

(c)Rosenbrock: AR

0 10 20 30 40 50

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Time (Queries)

Minimum Regret

GP−AUCB, B max =5

GP−AUCB, Bmax =10 GP−AUCB, Bmax =20 GP−AUCB Local, B

max =5 GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20

HBBO UCB, Bmax =5 HBBO UCB, B

max =10 HBBO UCB, Bmax =20 HBBO MEI, Bmax =5 HBBO MEI, Bmax =10 HBBO MEI, B

max =20

(d)Mat´ern: MR

0 5 10 15 20 25 30

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Time (Queries)

Minimum Regret

GP−AUCB, Bmax =5 GP−AUCB, Bmax =10 GP−AUCB, Bmax =20

GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20

HBBO UCB, Bmax =5 HBBO UCB, Bmax =10 HBBO UCB, Bmax =20 HBBO MEI, Bmax =5 HBBO MEI, Bmax =10 HBBO MEI, Bmax =20

(e)SE: MR

0 5 10 15 20 25 30

0 0.05 0.1 0.15 0.2 0.25

Time (Queries)

Minimum Regret

GP−AUCB, Bmax =5 GP−AUCB, Bmax =10 GP−AUCB, Bmax =20

GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20

HBBO UCB, Bmax =5 HBBO UCB, Bmax =10 HBBO UCB, Bmax =20 HBBO MEI, Bmax =5 HBBO MEI, Bmax =10 HBBO MEI, Bmax =20

(f)Rosenbrock: MR

0 50 100 150 200

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6

Time (Queries)

Time−average Regret

GP−AUCB, Bmax =5 GP−AUCB, Bmax =10 GP−AUCB, Bmax =20 GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20 HBBO UCB, Bmax =5

HBBO UCB, Bmax =10

HBBO UCB, Bmax =20

HBBO MEI, B max =5 HBBO MEI, Bmax =10 HBBO MEI, B

max =20

(g)Cosines: AR

0 50 100 150 200

0.5 1 1.5 2 2.5 3 3.5

Time (Queries)

Time−average Regret

GP−AUCB, Bmax =5

GP−AUCB, Bmax =10 GP−AUCB, B

max =20 GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20

HBBO UCB, Bmax =5 HBBO UCB, Bmax =10 HBBO UCB, Bmax =20

HBBO MEI, Bmax =5 HBBO MEI, B

max =10 HBBO MEI, Bmax =20

(h)Vaccine: AR

0 50 100 150 200

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Time (Queries)

Time−average Regret GP−AUCB, Bmax =5

GP−AUCB, Bmax =10 GP−AUCB, Bmax =20 GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20

HBBO UCB, Bmax =5 HBBO UCB, Bmax =10 HBBO UCB, Bmax =20

HBBO MEI, Bmax =5 HBBO MEI, Bmax =10 HBBO MEI, Bmax =20

(i)SCI: AR

0 20 40 60 80 100

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Time (Queries)

Minimum Regret

GP−AUCB, Bmax =5 GP−AUCB, Bmax =10 GP−AUCB, Bmax =20

GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20

HBBO UCB, Bmax =5 HBBO UCB, Bmax =10 HBBO UCB, Bmax =20

HBBO MEI, Bmax =5 HBBO MEI, Bmax =10 HBBO MEI, Bmax =20

(j)Cosines: MR

0 50 100 150 200

0 0.5 1 1.5 2 2.5 3 3.5

Time (Queries)

Minimum Regret

GP−AUCB, Bmax =5 GP−AUCB, B

max =10 GP−AUCB, Bmax =20 GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10 GP−AUCB Local, B

max =20 HBBO UCB, Bmax =5 HBBO UCB, Bmax =10

HBBO UCB, Bmax =20 HBBO MEI, Bmax =5

HBBO MEI, B max =10 HBBO MEI, Bmax =20

(k)Vaccine: MR

0 20 40 60 80 100

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Time (Queries)

Minimum Regret

GP−AUCB, Bmax =5 GP−AUCB, Bmax =10 GP−AUCB, Bmax =20 GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20

HBBO UCB, B max =5 HBBO UCB, Bmax =10 HBBO UCB, Bmax =20

HBBO MEI, Bmax =5 HBBO MEI, Bmax =10 HBBO MEI, Bmax =20

(l)SCI: MR

Figure 3.5: Time-average (AR) and minimum (MR) regret plots, adaptive batch algorithms, maxi- mum batch sizes 5, 10, and 20. For the adaptive algorithms, minimum batch sizeBmin was set to 1, as inHBBO. The algorithms tended to run fully sequentially at the beginning, but quite rapidly switched to maximal parallelism.

0 50 100 150 200 0

0.2 0.4 0.6 0.8 1 1.2 1.4

Time (Rounds)

Time−average Regret

GP−BUCB, B =5 GP−BUCB, B =10

GP−BUCB, B =20

GP−AUCB, Bmax =5 GP−AUCB, Bmax =10

GP−AUCB, Bmax =20

GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10

GP−AUCB Local, Bmax =20

(a)Mat´ern: AR

0 50 100 150 200

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Time (Rounds)

Time−average Regret GP−BUCB, B =5

GP−BUCB, B =10 GP−BUCB, B =20

GP−AUCB, Bmax =5 GP−AUCB, Bmax =10 GP−AUCB, Bmax =20

GP−AUCB Local, Bmax =5 GP−AUCB Local, B

max =10 GP−AUCB Local, Bmax =20

(b)SE: AR

0 50 100 150 200

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

Time (Rounds)

Time−average Regret

GP−BUCB, B =5 GP−BUCB, B =10 GP−BUCB, B =20

GP−AUCB, Bmax =5 GP−AUCB, Bmax =10

GP−AUCB, Bmax =20

GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20

(c)Rosenbrock: AR

0 20 40 60 80 100

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Time (Rounds)

Minimum Regret

GP−BUCB, B =5 GP−BUCB, B =10 GP−BUCB, B =20 GP−AUCB, Bmax =5 GP−AUCB, Bmax =10 GP−AUCB, Bmax =20

GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20

(d)Mat´ern: MR

0 20 40 60 80 100

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Time (Rounds)

Minimum Regret

GP−BUCB, B =5 GP−BUCB, B =10 GP−BUCB, B =20 GP−AUCB, Bmax =5 GP−AUCB, Bmax =10 GP−AUCB, Bmax =20

GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20

(e)SE: MR

0 20 40 60 80 100

0 0.05 0.1 0.15 0.2 0.25

Time (Rounds)

Minimum Regret GP−BUCB, B =5

GP−BUCB, B =10 GP−BUCB, B =20 GP−AUCB, B

max =5 GP−AUCB, Bmax =10 GP−AUCB, Bmax =20

GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10

GP−AUCB Local, Bmax =20

(f)Rosenbrock: MR

0 50 100 150 200

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6

Time (Rounds)

Time−average Regret

GP−BUCB, B =5 GP−BUCB, B =10

GP−BUCB, B =20 GP−AUCB, Bmax =5 GP−AUCB, Bmax =10 GP−AUCB, Bmax =20

GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10

GP−AUCB Local, Bmax =20

(g)Cosines: AR

0 50 100 150 200

0.5 1 1.5 2 2.5 3 3.5 4

Time (Rounds)

Time−average Regret

GP−BUCB, B =5

GP−BUCB, B =10 GP−BUCB, B =20

GP−AUCB, Bmax =5

GP−AUCB, Bmax =10 GP−AUCB, Bmax =20

GP−AUCB Local, Bmax =5

GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20

(h)Vaccine: AR

0 50 100 150 200

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Time (Rounds)

Time−average Regret

GP−BUCB, B =5 GP−BUCB, B =10 GP−BUCB, B =20

GP−AUCB, Bmax =5 GP−AUCB, Bmax =10 GP−AUCB, Bmax =20

GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20

(i)SCI: AR

0 20 40 60 80 100

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

Time (Rounds)

Minimum Regret

GP−BUCB, B =5 GP−BUCB, B =10 GP−BUCB, B =20

GP−AUCB, Bmax =5 GP−AUCB, Bmax =10 GP−AUCB, Bmax =20

GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20

(j)Cosines: MR

0 20 40 60 80 100

0 0.5 1 1.5 2 2.5 3

Time (Rounds)

Minimum Regret

GP−BUCB, B =5 GP−BUCB, B =10 GP−BUCB, B =20 GP−AUCB, Bmax =5 GP−AUCB, Bmax =10 GP−AUCB, Bmax =20

GP−AUCB Local, Bmax =5

GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20

(k)Vaccine: MR

0 20 40 60 80 100

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Time (Rounds)

Minimum Regret

GP−BUCB, B =5 GP−BUCB, B =10 GP−BUCB, B =20 GP−AUCB, Bmax =5 GP−AUCB, B

max =10 GP−AUCB, Bmax =20

GP−AUCB Local, Bmax =5 GP−AUCB Local, Bmax =10 GP−AUCB Local, Bmax =20

(l)SCI: MR

Figure 3.6: Time-average (AR) and minimum (MR) regret plots, delay setting, with delay lengths of 5, 10, and 20 rounds between action and observation. Note that the adaptive algorithms, GP- AUCB and GP-AUCB Local, may balk at some rounds. The time-average regret is calculated over the number of actions actually executed as of that round; this means that the number of queries submitted as of any particular round is hidden with respect to the plots shown, and may vary across runs of the same algorithm.

0 50 100 150 200 10−4

10−3 10−2 10−1 100 101 102

Time (Queries)

Total wall−clock time elapsed (seconds)

GP−UCB

GP−UCB Lazy

GP−BUCB

GP−BUCB Lazy SM−UCB SM−UCB Lazy

SM−MEI SM−MEI Lazy

HBBO UCB HBBO MEI GP−AUCB

GP−AUCB Lazy

GP−AUCB Local GP−AUCB Lazy Local

(a)Mat´ern

0 50 100 150 200

10−4 10−3 10−2 10−1 100 101 102

Time (Queries)

Total wall−clock time elapsed (seconds)

GP−UCB

GP−UCB Lazy

GP−BUCB

GP−BUCB Lazy SM−UCB SM−UCB Lazy

SM−MEI SM−MEI Lazy

HBBO UCB HBBO MEI GP−AUCB

GP−AUCB Lazy GP−AUCB Local GP−AUCB Lazy Local

(b)SE

0 50 100 150 200

10−4 10−3 10−2 10−1 100 101 102

Time (Queries)

Total wall−clock time elapsed (seconds)

GP−UCB

GP−UCB Lazy

GP−BUCB GP−BUCB Lazy

SM−UCB SM−UCB Lazy

SM−MEI SM−MEI Lazy

HBBO UCB HBBO MEI GP−AUCB

GP−AUCB Lazy GP−AUCB Local GP−AUCB Lazy Local

(c)Rosenbrock

0 50 100 150 200

10−4 10−3 10−2 10−1 100 101 102

Time (Queries)

Total wall−clock time elapsed (seconds)

GP−UCB

GP−UCB Lazy

GP−BUCB

GP−BUCB Lazy SM−UCB SM−UCB Lazy

SM−MEI SM−MEI Lazy

HBBO UCB HBBO MEI GP−AUCB GP−AUCB Lazy

GP−AUCB Local GP−AUCB Lazy Local

(d)Cosines

0 50 100 150 200

10−4 10−2 100 102 104

Time (Queries)

Total wall−clock time elapsed (seconds)

GP−UCB

GP−UCB Lazy

GP−BUCB GP−BUCB Lazy

SM−UCB SM−UCB Lazy

SM−MEI SM−MEI Lazy

HBBO UCB HBBO MEI GP−AUCB GP−AUCB Lazy

GP−AUCB Local GP−AUCB Lazy Local

(e)Vaccine

0 50 100 150 200

10−5 10−4 10−3 10−2 10−1 100 101

Time (Queries)

Total wall−clock time elapsed (seconds)

GP−UCB

GP−UCB Lazy

GP−BUCB

GP−BUCB Lazy SM−UCB SM−UCB Lazy

SM−MEI SM−MEI Lazy

HBBO UCB HBBO MEI

GP−AUCB GP−AUCB Lazy

GP−AUCB Local GP−AUCB Lazy Local

(f)SCI

Figure 3.7: Elapsed computational time in batch experiments, B = 5. Note the logarithmic vertical scaling in all plots. Note also the substantial separation between the three groups of algorithms, discussed in Section 3.6.3.

0 0.5 1 0.6

0.8 1 1.2 1.4 1.6

Lowest Cost Algorithm Average of 200 Runs

Average Regret Attained

Cost Parameter w : Cost = (1−w) * rounds + w * actions

GP−UCB Balking GP−BUCB GP−AUCB

(a)Lowest-Cost Algorithms

0 50 100 150 200 250 300

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Cost of execution (tradeoff parameter w = 0)

Average Regret

GP−UCB Balking GP−BUCB GP−AUCB

(b)w = 0

0 50 100 150 200 250 300

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Cost of execution (tradeoff parameter w = 0.5)

Average Regret

GP−UCB Balking GP−BUCB GP−AUCB

(c)w = 1/2

0 50 100 150 200 250 300

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Cost of execution (tradeoff parameter w = 1)

Average Regret

GP−UCB Balking GP−BUCB GP−AUCB

(d)w = 1

Figure 3.8: Parameterized cost comparison on the SCI data set, simple delay case,B= 5. The same experiment, with a different set of algorithms shown, is presented in Figure 3.3(i). Figure 3.8(a):

the space of cost tradeoff parameterwand attained average regrets ¯ris colored according to which algorithm has the lowest mean cost at the round in which the mean, time-average regret is first≤¯r.

The algorithm denotedGP-UCB Balkingrefuses to submit another query while one is pending, i.e., it is theGP-UCBalgorithm obeying the delay constraint of the problem setting. Figures 3.8(b), 3.8(c), and 3.8(d) show ¯r as a function ofC and correspond to vertical slices through Figure 3.8(a) at the left, center, and right. SinceGP-AUCBandGP-UCB Balkingpass on some rounds, the terminal cost of GP-AUCBandGP-UCB Balkingis possibly <300.

Chapter 4

Animal Studies

4.1 Introduction

A rigorous and automated method is required to select the stimuli applied by the arrays reviewed in Chapter 1 for SCI therapy. Such a method would allow application of these techniques by non- experts, or even autonomously, and additionally would allow for customization to individual patients and their unique, time-varying responses to the stimuli. This dissertation suggests that a variant of GP-BUCBor a similar GP-based active learning algorithm is suitable for this task.

To show that it would be feasible to use GP-BUCB as a learning system for SCI therapy, a closed-loop implementation of the algorithm in real animals was developed. These experiments represent a first step toward a more complex closed-loop implementation in human patients. A somewhat simplified problem was chosen for demonstrating feasibility. A variant of GP-BUCBwas used to control an experiment in which a rat was stimulated using an epidural electrode array and the evoked potential in a muscle was measured via EMG. The goal of this experiment was to maximize the amplitude of the resulting evoked potential. While the evoked potential is not a complex motor behavior, this experiment does have many of the important characteristics of the full SCI therapy problem, in particular that the evoked potential varies with the pattern of active electrodes on the array and over the course of the animal’s experimental lifetime, and that evoked potentials are critically dependent on the spinal interneuronal circuitry. Showing that the algorithm can successfully control this activity and additionally learn something about the structure of the spinal cord’s responses (considered as a function over the space of active electrode configurations) demonstrates a major step toward a therapeutic implementation.

The simplifications inherent in using evoked potentials present a number of substantial advantages for demonstrating feasibility. First, because the evoked potentials represent a relatively low-level function of the spinal interneuron networks, it is reasonable to suggest that they may be less sensitive to the parameter choices than higher-level motor functions, making the search over stimuli inherently easier, appropriate for a feasibility experiment. Demonstrating that the regression models can indeed

capture the important features of an individual muscle response function while using relatively little data means that the responses of single muscles can be modeled effectively using Gaussian processes, plausibly leading to effective models of the high-level behavior based upon combining the predicted responses of multiple, individual muscles. Certainly, it is plausible to create a model which focuses on only high-level phenomena, e.g., user-reported quality of stimuli, but, particularly if physiological monitoring data will form an important component of the response monitoring (desirable in a fully-implanted system), this sort of prediction of high-level quality based upon low-level data is highly desirable. Conversely, if the activity of an individual muscle cannot be effectively captured by a GP, this argues that GPs are inappropriate for modeling the responses of individual muscles and that the system is likely too sensitive to model without exhaustive testing of all potential stimuli (an exponentially large set), suggesting that the full problem is nearly infeasible.

The ability to successfully manage a problem like evoked potential optimization using aGP-BUCB- like algorithm is thus a necessary condition for success on the full clinical problem, making this an appropriate first step to applying active learning algorithms to SCI therapy. Second, the EMG recordings arising from a train of stimulus pulses can be temporally separated into the individual responses to each stimulus pulse, meaning that each separate pulse and response may be taken as an individual, independent observation; standing or stepping are much less easily separable into individual “observations.” Further, in stepping or standing, it should be expected that successive observations (i.e., blocks of time within a bout, such as strides) would not be independent samples from the same distribution, but instead are highly dependent on their predecessors (e.g., a stumble on striden−1 could reasonably be expected to affect striden). Third, evoked potentials are naturally expressed as a scalar function of time for each muscle, and easily repeatable scalar measurements (i.e., peak-to-peak amplitude) have already been established for them. In contrast, stepping and standing are complex, high-level behaviors of many muscles, for which no single easily and automatically computed, ordinal measurement has yet been canonically established. There are, however, a variety of measurements which aim to quantify standing performance (Prieto et al., 1996; Santos et al., 2008) or to quantify stepping performance. The latter is generally by either human observation (Basso et al., 1995; Antri et al., 2002) or by automated post-hoc analysis (Fong et al., 2005; Cai et al., 2006).

Particularly for measures of locomotor performance based on human-graded observations, it is not clear that the grading scale is ordinal, i.e., while the numerical grade may nominally correspond to quality, these might be more properly thought of as loosely ordered class labels. These label-based grading schemes are often designed to be easy for humans to implement, e.g., using visual features of the stride cycle which are easy to describe semantically, but difficult to describe mathematically, consequently making them very difficult to automate. Further, it is not clear what are “optimal”

values of any of these measurements for SCI animals or humans (as opposed to normals), nor is it clear that attaining the nominally optimal value of an individual metric is therapeutically desirable