2.5 Computational Protein Design and Drug Design
2.5.1 Antibody Design
An antibody is a large, Y-shaped protein that is used by the immune system to identify and neutralize pathogens such as bacteria and viruses. The antibody recognizes a unique molecule of the harmful agent, called an antigen, via its variable region. Each tip of the ”Y”
of an antibody contains a paratope (analogous to a lock) that is specific for one particular epitope (similarly analogous to a key) on an antigen, allowing these two structures to bind together with precision. Using this binding mechanism, an antibody can tag a microbe or an infected cell for attack by other parts of the immune system, or can neutralize its target directly (for example, by blocking a part of a microbe that is essential for its invasion and survival).
It is not surprising that over the years, there have been extensive efforts geared toward designing antibodies and libraries thereof. A number of experimental techniques have been developed and successfully applied to design antibodies that bind to desired antigens or to improve the binding characteristics of an existing antibody. Computational design has been used successfully by protein engineers for many years to alter the physicochemical properties of proteins [80, 81]. In the simplest case, protein design involves optimizing the amino acid sequence of a protein to accommodate a desired 3-D conformation. This approach has been extended to related tasks such as protein-protein interface design, de novo design of protein binding molecules, design of self-assembling protein nano-cages, etc. [82, 83, 84, 85]. Our growing understanding of sequence and structure relationships in
antibodies, and advances in computational protein modeling has enabled progress towards computational methods that can assist in re-designing antibodies for higher binding affin- ity or other desired modifications [86, 87, 88]. Further, efficient experimental platforms exist to test predicted antibody structure or designed antibody function, thereby leading to an iterative feedback loop between computation and experiment. Recently, an important proof-of-principle experiment for computer-aided epitope-focused vaccine design was re- ported [89]. In Sevy et al. [90], the authors discuss computational prediction of antibody structure and design of function. Although much focus has been directed at engineering an- tibodies with desired properties, recent work has targeted the opposite side of the problem:
engineering an antigen that can elicit a desired antibody in an effective and reproducible manner. This comes with the ultimate goal of the rational design of antigens to be used in vaccination that can elicit the antibodies.
We note that substantial work in antibody design is towards predicting structure. An- tibodies are basically protein sequences. Protein sequences have four different levels of structure. The primary structure is a sequence of a chain of amino acids. In the secondary structure, the amino acids are linked by hydrogen bonds. In the tertiary strcture, attrac- tions are present between alpha helices and pleated sheets. Quarternary structure consists of more than one amino acid chain. We focus on primary sequence representation in our work. However, determining the amino acids in the primary sequence is only the first step of antibody design. Comparative modeling of protein, refers to constructing an atomic- resolution model of the target protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein (thetemplate). It is also worth- while to briefly mention dead-end elimination algorithms [91] that have been developed and applied mainly to the problems of predicting and designing the structures of proteins.
In general, the dead-end elimination algorithm minimizes a function over a discrete set of independent variables. It identifies ‘dead ends’, i.e., combinations of variables that are not necessary to define a global minimum, such that the combination can be replaced by
a better or equally good combination. These criteria are applied repeatedly until conver- gence such that no more eliminations can be performed. This algorithm is guaranteed to find an optimal (global) solution and significantly outperforms several alternatives derived from genetic algorithms and Monte Carlo methods, although it can be relatively slow to converge.
2.5.1.1 Multi-Specificity Design
Advancements in computational protein design have enabled the design of better an- tibodies. The above protein design approaches involve the straightforward application of design methodologies to a single, static protein conformation. However, there is a need to extend protein design to apply to several conformations simultaneously. These approaches, referred to as multistate design (MSD), can be used to modulate protein speci- ficity, model protein flexibility, and engineer proteins to undergo conformational changes [92, 93, 94, 95, 96, 97, 98]. Multistate design [99] considers the impact that a sequence has on multiple structures (states) simultaneously to rule one sequence more favorable (fit) for a particular purpose than another sequence. This can be explained as follows. Some pro- tein design tasks cannot be modeled by the traditional single state design strategy of finding a sequence that is optimal for a single fixed backbone. Such cases require multistate de- sign, where a single sequence is threaded onto multiple backbones (states) and evaluated for its strengths and weaknesses on each backbone. For example, to design a protein that can switch between two specific conformations, it is necessary to find a sequence that is compatible with both backbone conformations. Multistate design is the most appropriate approach in such cases.
While existing experimental and computational antibody design methods have made key contributions, there is still a need for a general computational method that can rapidly design libraries of antibodies to bind to rapidly mutating virus sequences. Several methods have been developed to enable computationally expensive multistate design [21, 99]. How-
ever, these methods all suffer from large energetic barriers that limit sampling in sequence space, resulting in sub-optimal designs [21]. Recently, algorithms have been proposed for multi-specificity design, which extend general protein design by creating a sequence that has low energy with multiple binding partners [21]. Our research aims to address these significant limitations in sampling by developing global solution approaches for broadly binding antibody (and equivalently, protein sequence) design.