26 4.1 Comparison of exact and approximate MDP prohibition with respect to running time. left) and a utility for the attacker (right; lower is better for the defender). The Y-axis shows the score/DDG difference between the transformed antibody and the wild type.
Motivation
The main function of an antibody is to protect against external infections and pathogens, eg, viruses. This gives rise to the idea of broad-affinity antibodies that bind to many different known strains of the same virus, for example, to many different strains of influenza or HIV found "in nature" [17].
Research Challenges and Contributions
MDP Interdiction
The defender's optimization forms the outer layer, and the attacker's MDP solution is the inner layer of optimization. We propose a new interdiction model in which the defender modifies the initial state of the attacker.
Applications in Antibody Design
While this makes finding virus escape practical, the two-tiered nature of the problem means that designing antibodies is still quite time-consuming. To address these challenges, we formulate a two-level optimization problem (corresponding to the antibody design game) in terms of combined binding and.
Organization of the Dissertation
We present a compact single-level mixed-integer program formulation by relaxing the integrality constraints and then obtaining the dual linear program. In Chapter 9, we present algorithms that exploit the structure of antibody-virus binding interactions to formulate the antibody design game as a mixed-integer program.
Computational Game Theory
Stackelberg Games
Two players viz. the leader and the follower in the Stackelberg game do not necessarily represent individuals, but can also be groups working together to implement a common strategy, such as the police or a terrorist organization. The follower first observes the leader's strategy and then takes action to optimize its own payoffs.
Stackelberg Security Games
Planning in Artificial Intelligence
Classical Planning
Planning with Uncertainty: Markov Decision Processes (MDPs)
- Factored Representation
- Linear Programming Methods for Solving MDPs
- Factored MDPs and Approximate Linear Programming
- Representing and Computing the Optimal Policy
Γa(Ck):∀a},Γa(C) =∪Xi∈CPARENTSa(Xi) =Scope[g]is the set of parent state variables of variables inC(Scope[h]) in the DBN for the action. In the approximate solution with a subset of basis functions, all variables may not be represented by the set of µvariances.
Reinforcement Learning and Value Function Approximation
Temporal Difference Learning
Learning from temporal differences is a combination of Monte Carlo ideas and dynamic programming ideas, of which SARSA and Q-learning are the most prominent [51, 40].
Deep Reinforcement Learning
Plan Interdiction
The Plan Interdiction Problem
The goal in plan banning is to calculate the defender's optimal strategy to thwart the attacker's plan. Consequently, if M is a subset of the mitigations used by the defender, the attacker's planning problem P becomes P(M).
Interdiction of Deterministic Plans
Formally, the plan prohibition problem is defined by {M,Cm,RD,P}, where M is the set of defender's mitigations, Cm is the cost of a mitigation M∈M, RD(x) =∑jRj(xj) is the defenders' reward function, additive over state variables, andP the attacker's planning problem. A mitigationM can have the following two impacts: a) it can modify the current (initial) state x0, and b) it can remove a subset of the attacker's actions. In the plan prohibition problem, the defender's goal is to choose the optimal set of mitigations, taking into account the defender utility (total discounted reward) and the cost of mitigations.
Computational Protein Design and Drug Design
- Antibody Design
- Multi-Specificity Design
- Game-Theoretic Models of Vaccination Decisions
- Combinatorial Drug Design
- Learning Protein-Protein Interactions from Data
This comes with the ultimate goal of the rational design of antigens to be used in vaccination that can elicit the antibodies. 109], computational models have been used to develop an approach to translate available viral sequence data into quantitative landscapes of viral fitness as a function of the amino acid sequences of its constituent proteins.
The MDP Interdiction Problem
Problem Definition
In this chapter, we formally introduce the MDP prohibition problem, where the attacker solves an MDP with reduced reward. In the MDPI Stackelberg game, the defender first chooses M⊆M and the attacker subsequently chooses a policyπ in the resulting bounded MDPτ(M).
General Approach
Since the attacker effectively faces a decision problem, it will be sufficient to restrict attention to optimal attacker policies that are deterministic and stationary. Define VA(x,π) as the attacker's value function for a policy π starting at state x in MDP τ(M), and let VD(x,π) be the defender's value function (i.e., use the defender's reward function RD ).
Research Objectives
Contributions
Factored MDP Interdiction
A Mixed-Integer Linear Programming Formulation for Factored MDP
If H, the set of basis functions under consideration, is general enough to encompass the full value function space, the solution to this MILP yields the optimal interdiction decision for the defender. Finally, the set of constraints captures all possible attack policies, making the MILP too large to be manageable even with a compact set of bases.
Constraint Generation for Factored MDP Interdiction
Constraint Generation with Basis Function Selection
- Reducing the Number of Iterations of Constraint Generation 45
However, this results in a large number of iterations of the constraint generation procedure that construct sufficient attack policies to prevent trivial mitigating solutions that do not mitigate any action, or a single action sufficient to make them all impossible policies. While warm initialization significantly reduces the number of iterations of the constraint generation procedure, each iteration still involves a costly set of computational operations even to evaluate whether new policies should be added.
Basis Generation
Fourier Basis Functions on Boolean Feature Space
Iterative Basis Function Selection
If Xj appears in any set C∪Γa(C), (C=Scope[h]) and/or Wa (the scope of any local reward function), these sets of state variables are "matched" while eliminating Xj. In Algorithm 3, ATTACKERPOLICY(A,H) solves LP (2.3) and H is a set of variables superposition of parity basis functions.
Greedy Interdiction
Experiments
- Problem Domains
- Comparison with Exact MDP Interdiction
- Scalability
- Effectiveness of Greedy Interdiction
On the other hand, the use of fast constraint generation (algorithm 2, marked as "fast bilevel") significantly improves scalability (Figure 5.1 left). As shown in Figure 5.2, we observe a similar trend as before: the fast constraint generation approach significantly outperforms the baseline without compromising the solution quality.
Conclusions
Finally, we proposed a greedy MDP ban algorithm and showed that it can further improve scalability.
Contributions
MDP State Interdiction
Problem Definition
We propose reasonable approximations for the attacker's value function and scalable algorithms to compute it using a) factorized MDP solution approaches and b) reinforcement learning. The key observation is that the optimal policy is independent of the prohibition decision, since the solution to an MOP is a function of an arbitrary state; to put it another way, the attacker's optimal policy is already function of the defender's choice of the initial state, on the basis that it solves an MDP.
Integer Linear Program for Approximately Optimal Interdiction
In the prohibition problem, the weights of the value function are then fixed, and the goal is to optimize the initial statex0 and consequently the associated basis function valuesφj(x0). The main bottleneck in this approach is step 1, where the difficulty in solving the factorized MDP increases exponentially in the number of interdependencies between state variables.
Interdiction Using RL with Linear Action-Value Functions
Basis Generation
During the learning iterations, we add a new basis function to the set B if it significantly reduces the squared error measure over the samples s= (x,a,r,x0)∈Dˆ, (Q0(x,a;w)− Q(x,a; w))2, compared to the current basis set, where Q0(x,a;w) = r+ γmaxaQ(x0,a;w). The following MILP computes the Boolean basis vector b corresponding to the new basis function with the largest marginal impact|I|on.
Integer Linear Program for Interdiction
To prevent the basis set from becoming too large, we periodically monitor the weights and remove all basis functions with normalized weights below a predefined threshold. While this approach allows for direct model-free learning and a significantly more scalable ban (see Experiments section), the performance still depends on the subset of basis functions chosen for approximation.
Interdiction with Non-Linear Function Approximation
Interdiction Using Greedy Local Search
Interdiction Using Local Linear Approximation
The output layer has linear activation (since it predicts an action-value function that is in principle unbounded). The first-order Taylor approximation of a multivariable scalar function is given by f(x+δx) = f(x) +∑i∂f(x).
Stabilizing the Q-Network
Bayesian Interdiction Problem
Experiments
MDP State Interdiction
The utility differs significantly in the case of NI and RI, as they do not perform any optimization for banning and only serve as a baseline. In this case, we reduce the original rewards by a factor of 100 to ensure better convergence of the learning algorithms.
Bayesian Interdiction
Conclusions
In this chapter, we move into the field of immunology and summarize the vaccine design problem we presented in the introduction to the dissertation. As antibodies develop in response to a vaccine against a particular pathogen, they remain in the individual's bloodstream and rapidly neutralize and clear the pathogen if the individual is ever infected, thereby preventing disease.
Antibody Design as a Plan Interdiction Problem
However, binding to a single solid antigen (part of the pathogen that typically interacts with the antibody) is often insufficient: viruses such as HIV and influenza, for example, have many strains, and an antibody that neutralizes one will often fail to to neutralize another. . Nevertheless, as a pathogen evolves, it may still escape neutralization; for example, HIV has an extremely high mutation rate [18].
Research Objectives
Second, it must be robust to prevent virus mutations, ie. the engineered antibody should continue to bind as these sequences of the virus cause mutations to escape. Finally, it must be in accordance with aspects of energy stability, i.e. the designed antibody must be stable in complex (minimum energy configuration) with the virus.
Contributions
Experimental Workflow
Sequence-based Linear Classification and Regression Models to predict
The binding classifier is based on the assumption that the amino acids at the binding positions of the antibody interact with those at the binding positions of the virus. The binding site comprises the FR2, CDR2, FR3 and CDR3 regions of the heavy chain of the antibody.
Algorithm
To validate these optimized antibody candidates, we predicted binding and stability scores using a model trained on all data. Generate data: Rosetta (virus panel, antibody variants) Learn models: binding Φ and stability Ψ on all data Select 50 random subsamples of 100 viruses.
Results
Redesign of VRC23 Improves Predicted Breadth
We found that the BROAD method resulted in a significant increase in predicted width over the RECON multistate design method (Figure 7.6 A). Notably, both methods were able to increase the predicted width from the starting value of 53.3% for wild-type VRC23.
Designed Residues Recapitulate Known Binding Motifs
Finally, the D102E mutant on CDRH3 places a carboxylic acid group in the same position as a glutamic acid on NIH45-46, improving electrostatic interactions with the antigen (Figure 7.8, bottom right). We see that BROAD samples sequences at several positions that are present in the VRC01 line but absent in MSD-sampled sequences (Figure 7.9, blue boxes).
Discussions
Summary of Results
Percent similarity to the VRC01 lineage was calculated for the BROAD and MSD sequences (similarity is shown in parentheses). Future directions in this work include optimization of gp120 homology modeling protocols to reduce this discrepancy and allow experimental validation.
Backbone Optimization in Protein Design
Application to HIV Immunology
This technology can be used in the future as part of the antibody discovery and characterization process by rapidly searching sequence space for variants for greater breadth. We can also foresee the application of the BROAD method to this problem by optimizing immunogens for the recognition of germline precursors to known broadly neutralizing antibodies.
Materials and Methods
- Structural Modeling
- Training Set
- Linear Classification and Regression
- Breadth Maximization Integer Program
- RECON Multistate Design
- Sequence Validation
- Comparison to VRC01 Lineage Sequences
Let ANa} be a set of discrete random variables representing the amino acids at the binding positions of the antibody. A linear regression model of Ψ(a) predicts stability scores as a function of antibody sequence characteristics.
Contributions
Antibody Design as Stackelberg Game
However, first-order effect on its antibody binding properties is determined by the sequence that is part of the native virus binding site. Note that the antibody-virus interaction in our model is a Stackelberg game in which the designer (antibody) is the leader, and the virus is the follower, choosing an alternative virus sequence in response to the antibody provided by the designer has been selected.
Rosetta Protocol
To obtain 3D structures corresponding to single point mutations, we make an appropriate amino acid change in the virus/antibody part of the sequence. This is followed by 1 repacking and 1 energy minimization step (as opposed to many cycles of these two steps until a certain limit is reached by rapid relaxation), for faster results.
Computing Minimal Virus Escape
Greedy Local Search
The fast relaxation procedure3 is performed on this complex, which works by iteratively performing side chain repacking and energy minimization steps. The Ddg of this selected relaxed complex is the resulting binding between native virus and native antibody.
Speeding Up Search through Learning
Antibody Design
Stochastic Local Search for Antibody Design
BiasedRandom: Our simplest algorithm is a random search that is biased towards the sequence of the native antibody a0 (and is limited only to changes in its binding site, like all other methods), by taking advantage of the structure in the native antibody a0 . In addition, it exploits the binding-predictive classifier described above to check whether the antibody generated in a given step binds to the source virusv0, discarding any instances that do not.
Speeding Up Antibody Search through Learning
Evaluation
Computing Virus Escape
The average accuracy of the classifierΩ predicting which neighbors will cause a significant change in the baseline score is 90.3% when 75% of the data is used for training and 90.7%. The results of the comparison between baseline and classification-based greedy approaches for calculating virus release are shown in Figure 8.3.
Antibody Design
It's clear from the numbers that the classifier-based approach is often even better, partly due to the randomness that the classifier's inaccuracy introduces into the process (as a result, it's no longer strictly mountain climbing). The results, shown in Figure 8.5, show that the order predicted by the Poisson regression is consistent with the evaluation result: random is again significantly better than simulated annealing (p-value<0.001).
The Best Antibody
Finally, we report the result: the actual set of antibodies we generated as part of our search process, ranked in terms of evaluated escape costs (Figure 8.6). It is worth noting that we found many antibodies that are much more robust to escape than the natives when θ =0.
Discussions
Furthermore, we showed an antibody that is much more robust in viral escape than the native antibody (i.e. the antibody found in nature to bind to the corresponding virus epitope). The second problem is an issue for all research into antibody design and characterization, and is not limited to our method in particular [17].
Contributions
A Game Theoretic Model of Antibody Design
Also, lower (more negative) scores indicate stronger binding and stability of the antibody-virus complex. The virus sequence tries to escape binding to the antibody by making a series of mutations.
Solution Approach
A Bi-Linear Representation of Energy Scores
Our bilinear model thus has four sets of parameters: xi, yj, and Qi j for all pairs of antibody-virus positions, i and j, respectively, and an intercept I. We learn these parameters by generating a data set of ROSETTA energy function values for many pairs of antibody-virus sequences (as detailed in the experiments).
Integer Linear Program for Virus Escape
Hence, given and v, the energy score varies as the sum of individual amino acid and pairwise interaction effects. The constraint 9.4c encodes the constraint that we only allow mutations at positions to amino acids that have been observed at a frequency pi j ≥θ as a linear constraint; here, List a large number.
Mixed Integer Linear Program for Antibody Design
The corresponding constraint matrix has at most two non-zero elements in a given column corresponding to the variables vti j. The first non-zero element +1 from the relevant constraint 9.4a and the second non-zero element -1 from 9.4b.
Experiments
Bi-linear Z-score Model
An important observation we can make is that although we originally had bi-linear terms that included antibody and virus decision variables, these were decoupled after taking the binomial, resulting in only linear terms. validation experiment with 80% of the data for training and 20% for testing. We denote our proposed antibody design approach as STRONG: STackelberg game-theoretic model for robust antibody design and compare with the two previous approaches, a) BREED [166] and b) the game-theoretic approach proposed in [171] (henceforth referred to as AAMAS2015).
Comparison against BROAD
Finally, we evaluate in terms of the breadth of binding (fraction of viruses in the evaluation panel to which the designed antibody binds) generated using ROSETTA structure modeling. We perform ROSETTA structural modeling on these antibody candidates (one BROAD and one STRONG candidate) and the escape set of 30 virus sequences.
Comparison against AAMAS2015
Conclusions
Our experiments show that our approach significantly outperforms both the previous game-theoretic alternative, and a state-of-the-art broad-binding antibody design algorithm.
Summary of Contributions
In the first part, we present game theory models for stopping MDP and develop efficient algorithms for computing optimal decisions for the defender. Before considering a game theory model, we focus on the general problem of broad-binding antibody design, i.e., we try to optimize it.
MDP Interdiction
- Factored Representation and Scalable Bi-Level Optimization
- MDP Initial State Interdiction: Single-Level Optimization
- Improving Scalability with Reinforcement Learning
- Bayesian Interdiction
We develop several algorithmic approaches to solve difficult game-matching bilevel optimization problems. We model this uncertainty about the attacker (eg, its capabilities and actions) in a Bayesian game framework to include multiple possible types of attackers (in terms of the initial state of the attack, so the defender does not have access to the full initial state).
Robust Antibody Design as an Interdiction Game
Broadly Binding Antibody: Single-Level Optimization
Extensive experiments show a large advantage to the defender of considering Bayesian interdiction compared to the baseline interdiction of a worst-case attack. Although our modeled antibodies have not been tested in vitro, we predict that these variants would have significantly greater breadth compared to the wild-type antibody.
Game Theoretic Robust Antibody Optimization
We predict that if we test these optimal antibodies against the HIV panel, they will have greater neutralization breadth compared to existing antibodies. Finally, we display an antibody that is much more robust compared to the native antibody.
Global Solution to the Bi-Level Optimization
Specifically, we report an optimized antibody that requires a minimum of 7 mutations for the virus to escape binding to it. The native antibody, on the other hand, fails to bind the virus after a single strategic escape mutation.
Future Work
Randomized Strategy Commitment
Partial Observability
Multiple Defenders
Repeated Games
Challenges in the Antibody Design Application
In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems: Industrial Track, pages 125–132. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2, pages 895–902.