Algorithms for Large-Scale Adversarial Decision Problems

26 4.1 Comparison of exact and approximate MDP prohibition with respect to running time. left) and a utility for the attacker (right; lower is better for the defender). The Y-axis shows the score/DDG difference between the transformed antibody and the wild type.

Motivation

The main function of an antibody is to protect against external infections and pathogens, eg, viruses. This gives rise to the idea of broad-affinity antibodies that bind to many different known strains of the same virus, for example, to many different strains of influenza or HIV found "in nature" [17].

Research Challenges and Contributions

MDP Interdiction

The defender's optimization forms the outer layer, and the attacker's MDP solution is the inner layer of optimization. We propose a new interdiction model in which the defender modifies the initial state of the attacker.

Applications in Antibody Design

While this makes finding virus escape practical, the two-tiered nature of the problem means that designing antibodies is still quite time-consuming. To address these challenges, we formulate a two-level optimization problem (corresponding to the antibody design game) in terms of combined binding and.

Organization of the Dissertation

We present a compact single-level mixed-integer program formulation by relaxing the integrality constraints and then obtaining the dual linear program. In Chapter 9, we present algorithms that exploit the structure of antibody-virus binding interactions to formulate the antibody design game as a mixed-integer program.

Computational Game Theory

Stackelberg Games

Two players viz. the leader and the follower in the Stackelberg game do not necessarily represent individuals, but can also be groups working together to implement a common strategy, such as the police or a terrorist organization. The follower first observes the leader's strategy and then takes action to optimize its own payoffs.

Stackelberg Security Games

Planning in Artificial Intelligence

Classical Planning

Planning with Uncertainty: Markov Decision Processes (MDPs)

Factored Representation
Linear Programming Methods for Solving MDPs
Factored MDPs and Approximate Linear Programming
Representing and Computing the Optimal Policy

Γa(Ck):∀a},Γa(C) =∪Xi∈CPARENTSa(Xi) =Scope[g]is the set of parent state variables of variables inC(Scope[h]) in the DBN for the action. In the approximate solution with a subset of basis functions, all variables may not be represented by the set of µvariances.

Reinforcement Learning and Value Function Approximation

Temporal Difference Learning

Learning from temporal differences is a combination of Monte Carlo ideas and dynamic programming ideas, of which SARSA and Q-learning are the most prominent [51, 40].

Deep Reinforcement Learning

Plan Interdiction

The Plan Interdiction Problem

The goal in plan banning is to calculate the defender's optimal strategy to thwart the attacker's plan. Consequently, if M is a subset of the mitigations used by the defender, the attacker's planning problem P becomes P(M).

Interdiction of Deterministic Plans

Formally, the plan prohibition problem is defined by {M,Cm,RD,P}, where M is the set of defender's mitigations, Cm is the cost of a mitigation M∈M, RD(x) =∑jRj(xj) is the defenders' reward function, additive over state variables, andP the attacker's planning problem. A mitigationM can have the following two impacts: a) it can modify the current (initial) state x0, and b) it can remove a subset of the attacker's actions. In the plan prohibition problem, the defender's goal is to choose the optimal set of mitigations, taking into account the defender utility (total discounted reward) and the cost of mitigations.

Computational Protein Design and Drug Design

Antibody Design

Multi-Specificity Design

Game-Theoretic Models of Vaccination Decisions
Combinatorial Drug Design
Learning Protein-Protein Interactions from Data

This comes with the ultimate goal of the rational design of antigens to be used in vaccination that can elicit the antibodies. 109], computational models have been used to develop an approach to translate available viral sequence data into quantitative landscapes of viral fitness as a function of the amino acid sequences of its constituent proteins.

The MDP Interdiction Problem

Problem Definition

In this chapter, we formally introduce the MDP prohibition problem, where the attacker solves an MDP with reduced reward. In the MDPI Stackelberg game, the defender first chooses M⊆M and the attacker subsequently chooses a policyπ in the resulting bounded MDPτ(M).

General Approach

Since the attacker effectively faces a decision problem, it will be sufficient to restrict attention to optimal attacker policies that are deterministic and stationary. Define VA(x,π) as the attacker's value function for a policy π starting at state x in MDP τ(M), and let VD(x,π) be the defender's value function (i.e., use the defender's reward function RD ).

Research Objectives

Contributions

Factored MDP Interdiction

A Mixed-Integer Linear Programming Formulation for Factored MDP

If H, the set of basis functions under consideration, is general enough to encompass the full value function space, the solution to this MILP yields the optimal interdiction decision for the defender. Finally, the set of constraints captures all possible attack policies, making the MILP too large to be manageable even with a compact set of bases.

Constraint Generation for Factored MDP Interdiction

Constraint Generation with Basis Function Selection

Reducing the Number of Iterations of Constraint Generation 45

However, this results in a large number of iterations of the constraint generation procedure that construct sufficient attack policies to prevent trivial mitigating solutions that do not mitigate any action, or a single action sufficient to make them all impossible policies. While warm initialization significantly reduces the number of iterations of the constraint generation procedure, each iteration still involves a costly set of computational operations even to evaluate whether new policies should be added.

Basis Generation

Fourier Basis Functions on Boolean Feature Space

Iterative Basis Function Selection

If Xj appears in any set C∪Γa(C), (C=Scope[h]) and/or Wa (the scope of any local reward function), these sets of state variables are "matched" while eliminating Xj. In Algorithm 3, ATTACKERPOLICY(A,H) solves LP (2.3) and H is a set of variables superposition of parity basis functions.

Greedy Interdiction

Experiments

Problem Domains
Comparison with Exact MDP Interdiction
Scalability
Effectiveness of Greedy Interdiction

On the other hand, the use of fast constraint generation (algorithm 2, marked as "fast bilevel") significantly improves scalability (Figure 5.1 left). As shown in Figure 5.2, we observe a similar trend as before: the fast constraint generation approach significantly outperforms the baseline without compromising the solution quality.

Figure 4.1: Comparison of exact and approximate MDP interdiction in terms of runtime (left) and attacker utility (right; lower is better for the defender).

Conclusions

Finally, we proposed a greedy MDP ban algorithm and showed that it can further improve scalability.

Contributions

MDP State Interdiction

Problem Definition

We propose reasonable approximations for the attacker's value function and scalable algorithms to compute it using a) factorized MDP solution approaches and b) reinforcement learning. The key observation is that the optimal policy is independent of the prohibition decision, since the solution to an MOP is a function of an arbitrary state; to put it another way, the attacker's optimal policy is already function of the defender's choice of the initial state, on the basis that it solves an MDP.

Integer Linear Program for Approximately Optimal Interdiction

In the prohibition problem, the weights of the value function are then fixed, and the goal is to optimize the initial statex0 and consequently the associated basis function valuesφj(x0). The main bottleneck in this approach is step 1, where the difficulty in solving the factorized MDP increases exponentially in the number of interdependencies between state variables.

Interdiction Using RL with Linear Action-Value Functions

Basis Generation

During the learning iterations, we add a new basis function to the set B if it significantly reduces the squared error measure over the samples s= (x,a,r,x0)∈Dˆ, (Q0(x,a;w)− Q(x,a; w))2, compared to the current basis set, where Q0(x,a;w) = r+ γmaxaQ(x0,a;w). The following MILP computes the Boolean basis vector b corresponding to the new basis function with the largest marginal impact|I|on.

Integer Linear Program for Interdiction

To prevent the basis set from becoming too large, we periodically monitor the weights and remove all basis functions with normalized weights below a predefined threshold. While this approach allows for direct model-free learning and a significantly more scalable ban (see Experiments section), the performance still depends on the subset of basis functions chosen for approximation.

Interdiction with Non-Linear Function Approximation

Interdiction Using Greedy Local Search

Interdiction Using Local Linear Approximation

The output layer has linear activation (since it predicts an action-value function that is in principle unbounded). The first-order Taylor approximation of a multivariable scalar function is given by f(x+δx) = f(x) +∑i∂f(x).

Stabilizing the Q-Network

Bayesian Interdiction Problem

Experiments

MDP State Interdiction

The utility differs significantly in the case of NI and RI, as they do not perform any optimization for banning and only serve as a baseline. In this case, we reduce the original rewards by a factor of 100 to ensure better convergence of the learning algorithms.

Figure 5.1: Comparison between the proposed interdiction approaches on the sysadmin domain in terms of utility (from two different starting states, left and center) and runtime (right).

Bayesian Interdiction

Conclusions

In this chapter, we move into the field of immunology and summarize the vaccine design problem we presented in the introduction to the dissertation. As antibodies develop in response to a vaccine against a particular pathogen, they remain in the individual's bloodstream and rapidly neutralize and clear the pathogen if the individual is ever infected, thereby preventing disease.

Antibody Design as a Plan Interdiction Problem

However, binding to a single solid antigen (part of the pathogen that typically interacts with the antibody) is often insufficient: viruses such as HIV and influenza, for example, have many strains, and an antibody that neutralizes one will often fail to to neutralize another. . Nevertheless, as a pathogen evolves, it may still escape neutralization; for example, HIV has an extremely high mutation rate [18].

Research Objectives

Second, it must be robust to prevent virus mutations, ie. the engineered antibody should continue to bind as these sequences of the virus cause mutations to escape. Finally, it must be in accordance with aspects of energy stability, i.e. the designed antibody must be stable in complex (minimum energy configuration) with the virus.

Contributions

Experimental Workflow

Sequence-based Linear Classification and Regression Models to predict

The binding classifier is based on the assumption that the amino acids at the binding positions of the antibody interact with those at the binding positions of the virus. The binding site comprises the FR2, CDR2, FR3 and CDR3 regions of the heavy chain of the antibody.

Figure 7.1: Experimental workflow of the BROAD design method. The method uses ROSETTA structural modeling to generate a large set of mutated antibodies, support vec-tor machines (SVM) to predict ROSETTA energy from amino acid sequence, and integer linear

Algorithm

To validate these optimized antibody candidates, we predicted binding and stability scores using a model trained on all data. Generate data: Rosetta (virus panel, antibody variants) Learn models: binding Φ and stability Ψ on all data Select 50 random subsamples of 100 viruses.

Results

Redesign of VRC23 Improves Predicted Breadth

We found that the BROAD method resulted in a significant increase in predicted width over the RECON multistate design method (Figure 7.6 A). Notably, both methods were able to increase the predicted width from the starting value of 53.3% for wild-type VRC23.

Designed Residues Recapitulate Known Binding Motifs

Finally, the D102E mutant on CDRH3 places a carboxylic acid group in the same position as a glutamic acid on NIH45-46, improving electrostatic interactions with the antigen (Figure 7.8, bottom right). We see that BROAD samples sequences at several positions that are present in the VRC01 line but absent in MSD-sampled sequences (Figure 7.9, blue boxes).

Figure 7.7: Score comparison of redesigned antibodies. The ROSETTA score (A) and binding energy (DDG) (B) are shown for ten redesigned antibodies made either by BROAD or multistate design, paired with 180 viruses

Discussions

Summary of Results

Percent similarity to the VRC01 lineage was calculated for the BROAD and MSD sequences (similarity is shown in parentheses). Future directions in this work include optimization of gp120 homology modeling protocols to reduce this discrepancy and allow experimental validation.

Figure 7.9: Sequences from BROAD design recapitulate sequences observed in the lin- lin-eage of broadly neutralizing antibody VRC01

Backbone Optimization in Protein Design

Application to HIV Immunology

This technology can be used in the future as part of the antibody discovery and characterization process by rapidly searching sequence space for variants for greater breadth. We can also foresee the application of the BROAD method to this problem by optimizing immunogens for the recognition of germline precursors to known broadly neutralizing antibodies.

Materials and Methods

Structural Modeling
Training Set
Linear Classification and Regression
Breadth Maximization Integer Program
RECON Multistate Design
Sequence Validation
Comparison to VRC01 Lineage Sequences

Let ANa} be a set of discrete random variables representing the amino acids at the binding positions of the antibody. A linear regression model of Ψ(a) predicts stability scores as a function of antibody sequence characteristics.

Contributions

Antibody Design as Stackelberg Game

However, first-order effect on its antibody binding properties is determined by the sequence that is part of the native virus binding site. Note that the antibody-virus interaction in our model is a Stackelberg game in which the designer (antibody) is the leader, and the virus is the follower, choosing an alternative virus sequence in response to the antibody provided by the designer has been selected.

Rosetta Protocol

To obtain 3D structures corresponding to single point mutations, we make an appropriate amino acid change in the virus/antibody part of the sequence. This is followed by 1 repacking and 1 energy minimization step (as opposed to many cycles of these two steps until a certain limit is reached by rapid relaxation), for faster results.

Computing Minimal Virus Escape

Greedy Local Search

The fast relaxation procedure3 is performed on this complex, which works by iteratively performing side chain repacking and energy minimization steps. The Ddg of this selected relaxed complex is the resulting binding between native virus and native antibody.

Speeding Up Search through Learning

Antibody Design

Stochastic Local Search for Antibody Design

BiasedRandom: Our simplest algorithm is a random search that is biased towards the sequence of the native antibody a0 (and is limited only to changes in its binding site, like all other methods), by taking advantage of the structure in the native antibody a0 . In addition, it exploits the binding-predictive classifier described above to check whether the antibody generated in a given step binds to the source virusv0, discarding any instances that do not.

Speeding Up Antibody Search through Learning

Evaluation

Computing Virus Escape

The average accuracy of the classifierΩ predicting which neighbors will cause a significant change in the baseline score is 90.3% when 75% of the data is used for training and 90.7%. The results of the comparison between baseline and classification-based greedy approaches for calculating virus release are shown in Figure 8.3.

Antibody Design

It's clear from the numbers that the classifier-based approach is often even better, partly due to the randomness that the classifier's inaccuracy introduces into the process (as a result, it's no longer strictly mountain climbing). The results, shown in Figure 8.5, show that the order predicted by the Poisson regression is consistent with the evaluation result: random is again significantly better than simulated annealing (p-value<0.001).

The Best Antibody

Finally, we report the result: the actual set of antibodies we generated as part of our search process, ranked in terms of evaluated escape costs (Figure 8.6). It is worth noting that we found many antibodies that are much more robust to escape than the natives when θ =0.

Discussions

Furthermore, we showed an antibody that is much more robust in viral escape than the native antibody (i.e. the antibody found in nature to bind to the corresponding virus epitope). The second problem is an issue for all research into antibody design and characterization, and is not limited to our method in particular [17].

Contributions

A Game Theoretic Model of Antibody Design

Also, lower (more negative) scores indicate stronger binding and stability of the antibody-virus complex. The virus sequence tries to escape binding to the antibody by making a series of mutations.

Solution Approach

A Bi-Linear Representation of Energy Scores

Our bilinear model thus has four sets of parameters: xi, yj, and Qi j for all pairs of antibody-virus positions, i and j, respectively, and an intercept I. We learn these parameters by generating a data set of ROSETTA energy function values for many pairs of antibody-virus sequences (as detailed in the experiments).

Integer Linear Program for Virus Escape

Hence, given and v, the energy score varies as the sum of individual amino acid and pairwise interaction effects. The constraint 9.4c encodes the constraint that we only allow mutations at positions to amino acids that have been observed at a frequency pi j ≥θ as a linear constraint; here, List a large number.

Mixed Integer Linear Program for Antibody Design

The corresponding constraint matrix has at most two non-zero elements in a given column corresponding to the variables vti j. The first non-zero element +1 from the relevant constraint 9.4a and the second non-zero element -1 from 9.4b.

Experiments

Bi-linear Z-score Model

An important observation we can make is that although we originally had bi-linear terms that included antibody and virus decision variables, these were decoupled after taking the binomial, resulting in only linear terms. validation experiment with 80% of the data for training and 20% for testing. We denote our proposed antibody design approach as STRONG: STackelberg game-theoretic model for robust antibody design and compare with the two previous approaches, a) BREED [166] and b) the game-theoretic approach proposed in [171] (henceforth referred to as AAMAS2015).

Comparison against BROAD

Finally, we evaluate in terms of the breadth of binding (fraction of viruses in the evaluation panel to which the designed antibody binds) generated using ROSETTA structure modeling. We perform ROSETTA structural modeling on these antibody candidates (one BROAD and one STRONG candidate) and the escape set of 30 virus sequences.

Figure 9.1: Comparison between STRONG and BROAD in terms of the z-score objective (lower is better): on the full 180 virus panel (left) and the 180 escaping virus set (right).

Comparison against AAMAS2015

Conclusions

Our experiments show that our approach significantly outperforms both the previous game-theoretic alternative, and a state-of-the-art broad-binding antibody design algorithm.

Summary of Contributions

In the first part, we present game theory models for stopping MDP and develop efficient algorithms for computing optimal decisions for the defender. Before considering a game theory model, we focus on the general problem of broad-binding antibody design, i.e., we try to optimize it.

MDP Interdiction

Factored Representation and Scalable Bi-Level Optimization
MDP Initial State Interdiction: Single-Level Optimization
Improving Scalability with Reinforcement Learning
Bayesian Interdiction

We develop several algorithmic approaches to solve difficult game-matching bilevel optimization problems. We model this uncertainty about the attacker (eg, its capabilities and actions) in a Bayesian game framework to include multiple possible types of attackers (in terms of the initial state of the attack, so the defender does not have access to the full initial state).

Robust Antibody Design as an Interdiction Game

Broadly Binding Antibody: Single-Level Optimization

Extensive experiments show a large advantage to the defender of considering Bayesian interdiction compared to the baseline interdiction of a worst-case attack. Although our modeled antibodies have not been tested in vitro, we predict that these variants would have significantly greater breadth compared to the wild-type antibody.

Game Theoretic Robust Antibody Optimization

We predict that if we test these optimal antibodies against the HIV panel, they will have greater neutralization breadth compared to existing antibodies. Finally, we display an antibody that is much more robust compared to the native antibody.

Global Solution to the Bi-Level Optimization

Specifically, we report an optimized antibody that requires a minimum of 7 mutations for the virus to escape binding to it. The native antibody, on the other hand, fails to bind the virus after a single strategic escape mutation.

Future Work

Randomized Strategy Commitment

Partial Observability

Multiple Defenders

Repeated Games

Challenges in the Antibody Design Application

In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems: Industrial Track, pages 125–132. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2, pages 895–902.