• Tidak ada hasil yang ditemukan

Non-contiguous Protein Recombination

N/A
N/A
Protected

Academic year: 2023

Membagikan "Non-contiguous Protein Recombination"

Copied!
173
0
0

Teks penuh

The past four and a half years have been filled with the desire to learn and experience new things. I have been fortunate to work with many exceptional people in the Arnold lab and I appreciate all the help I have received over the past four and a half years.

Introduction

Swapping sequence elements while preserving protein functionfunction

Swapping sequence elements and recovering protein func- tiontion

Interestingly, the five most beneficial mutations that Rosetta selected were all located at the interface between the parent fragments (Figure 1.2). Given the importance of residues at the interface between swapped sequence elements, it may be useful to design recombination experiments to allow for amino acid variation in these regions.

Swapping sequence elements to transfer protein function

Additionally, testing individual mutations within a stable block of sequence can reveal highly stabilizing amino acid substitutions [ 29 ]. Sequence regions were selected for their known functional importance in substrate binding, metal ion binding, and catalysis, and the chimeric proteases exhibited novel specificities for a variety of peptide substrates.

Swapping sequence elements can probe sequence-function relationshipsrelationships

Pushing the limits of protein recombination

Summary

This work

In chapters 4, 5 and 6 I present a new computational tool for identifying structural blocks that can be swapped between homologous proteins with minimal disruption and I present several examples of enzyme engineering using this approach.

Figures

The chimera is made of fragments of a response regulator CheY (green) and an imidazole glycerol phosphate synthase HisF (blue). Five stabilizing mutations predicted by Rosetta are highlighted in red and a model of the structure is in gray.

Figure 1.2: Interfacial mutations stabilize a chimera made from fragments of different folds.
Figure 1.2: Interfacial mutations stabilize a chimera made from fragments of different folds.

2011) Generating families of construct variants using golden gate shuffling. 2009) User-friendly DNA recombination (USERec): a simple and flexible nearly homology-independent method for gene library construction. 2002) Structure-based combinatorial protein engineering (SCOPE).J. 2012). Eisenbeis S., Proffitt W., Coles M., Truffault V., Shanmugaratnam S., Meiler J. and Hocker B. 2012) The potential of fragment recombination for rational protein design.

Summary

Introduction

By analyzing a subset of the possible chimera sequences, we can build predictive models and identify the chimeras with useful changes in these properties [8]. We have previously designed a very similar library [6] and analysis of a subset of chimeras led us to identify chimeric CBH1s that are more stable than either of the 5 parents.

Materials

However, new combinations of amino acids in other parts of the protein can lead to significant changes in key properties such as stability [4, 5], expression level [6] or substrate specificity [7]. Sequence alignment of one of the parent sequences with the sequence from the PDB structure file (see Note 7).

Methods

We used ClustalW2 [9] to align the parental sequences and we named our alignment file 'CBH1-msa.txt'. We used ClustalW2 to align the parental sequences and we named our alignment file 'Temer-1Q9H.txt'.

Notes

The parent sequence must have the same identifier in both alignment files ('Temer') and the PDB sequence identifier must be the name of the PDB structure ('1Q9H'). Crossover points are given by the first residue of each new fragment (except the first fragment, which is always 1) based on the alignment numbering of the parent sequence.

Figures

A graph of the possible libraries plotting the average SCHEMA energy (< E >) of each library against the average number of mutations (< m >). The multiple sequence alignment of the parent CBH1s with each of the 8 blocks highlighted in a different color.

Figure 2.2: Libraries returned by RASPP. (a) The contents of ‘opt.txt’, which lists the crossover locations of candidate libraries identified by RASPP
Figure 2.2: Libraries returned by RASPP. (a) The contents of ‘opt.txt’, which lists the crossover locations of candidate libraries identified by RASPP

Në: Keasling, A (ed) Methods in Enzymology: Methods in Protein Design, Elsevier Ltd, Oxford, U.K. 2007) Comparative Protein Structure Modeling Using MODELLER.Curr.

Abstract

Introduction

The modularity of cellulosomes has spurred interest in 'designer cellulosomes' [4,6], where different cellulases are synthetically combined for a specific application. As one of the most important families of bacterial cellulases [ 7 , 8 ], they are usually a major component of bacterial cellulosomes [ 9 , 10 ].

Results

For all cellulases that have a dockerin, activity was much higher in the presence of miniscaffoldin than without it (Figure 3.2A-C). We did not observe a correlation between Topt and specific activity at that temperature for all chimeras sampled (Figure 3.5B).

Discussion

The thermostability model highlights stabilizing amino acid blocks, whether they occur in the most stable proteins or not. These blocks are located in the C-terminus of the catalytic domain, close to where the dockerin attaches, indicating an important stabilizing interaction between these blocks and the C. In other words, the ability to resist temperature-induced denaturation at increasingly higher temperatures , leads to increases in the optimal temperature for activity.

Materials and methods

Chimeric genes were assembled from 24 gene fragments, representing 8 blocks from each of 3 parents, using the Sequence-Independent Site-Directed Chimeragenesis (SISDC) method [23]. The following consensus sites were used for junction sites: 1) CCG, 2) GCC, 3) GAC, 4) CAT, 5) GGT, 6) AAC, 7) TTA (Supplementary Table 3.6). From each well, 50 µL of the supernatant was transferred to a 96-well PCR plate and analyzed using either the Park-Johnson assay or the enzymatic glucose assay. Samples were spun for 1 min at 200 x g and 50 μL of a 1:10 dilution of the supernatant was analyzed using the Park-Johnson assay.

Figures

Arel is the cellulase-specific activity at its respective optimal temperature measured in a 1-hour assay with 0.2µM enzyme and 0.2µM miniscaffoldin in 10 g/L Avicel. The activities are measured in a 1-hour test, with 0.2 µM enzyme and 0.2 µM miniscaffoldin in 10 g/L Avicel at the respective optimal catalytic temperature. The maximal activities of the three parent constructs and two of the most stable, most active chimeras.

Figure 3.2: Activities of purified family 48 cellulases as a function of temperature, in the presence and absence of equimolar amounts of miniscaffoldin
Figure 3.2: Activities of purified family 48 cellulases as a function of temperature, in the presence and absence of equimolar amounts of miniscaffoldin

Processive endocellulase CelF, a major component of the Clostridum cellulolyticum cellulosome: purification and characterization of the recombinant form. 2009) The Carbohydrate-Active EnZymes (CAZy) database: an expert resource for glycogenomics. 2002). A diverse family of thermostable cytochrome P450 generated by recombination of stabilizing fragments. 1997) Structure of the Clostridium stercorarium celY gene encoding Avicelase II exo-1,4-β-glucanase. 1993) Cloning and DNA sequence of the gene encoding Clostridium thermocellum cellulase Ss (CelS), a major component of the cellulosome. 1990) Cloning and expression of Clostridium thermocellum genes encoding thermostable exoglucanases (cellobiohydrolases) in Escherichia coli cells.

Supplementary information

Supporting information

Abstract

Formulating recombination as a graph partitioning problem allows us to identify non-contiguous segments of sequence that should be inherited together in the progeny proteins. We demonstrate this non-contiguous recombination approach by constructing a chimera of β-glucosidases from two different kingdoms of life. Although the alpha-beta barrel fold of the proteins has no obvious subdomains for recombination, non-contiguous SCHEMA recombination generated a functional chimera that takes approximately half of its structure from each parent.

Introduction

To design libraries of chimeric proteins, we used structural information to select crossover sites that minimize the average number of non-native residue-residue contacts in the resulting chimeras [6]. However, sequence blocks that are contiguous in the primary structure are not necessarily optimal elements for recombination [11]. Because elements that are distant in the primary structure are often brought together in the folded protein, structural blocks may not be contiguous in the polypeptide chain.

Results

Chimera NcrBgl has the TIM barrel fold and catalytic residues E170 and E374 (numbering based on the alignment shown in Figure 4.2a) of parent enzymes. Thus, for NcrBgl, we combined the structures of the TrBgl2 block and the TmBglA block to predict the structure of NcrBgl. Our ability to predict finer structural features is limited by the current low resolution of the chimera structure.

Discussion

We tested whether we could model the structure of the chimera by combining the parent structures of the chimera blocks, using an alignment of the parent structures to position each block. This approach does not rely on detailed atomistic models of the parent and progeny proteins. Alternatively, structures of the chimeric proteins can provide detailed and accurate information about the structures of the parent proteins.

Materials and methods

The hmetis graph partition suite [14, 15] was used to find 2-way partitions of the SCHEMA contact map - these partitions gave designs for 2-block chimera genesis of TmBglA and TrBgl2. Fractions containing the NcrBgl protein were buffer exchanged to 20 mM Tris, pH 7.4 and loaded onto a 5 mL HiTrap Q HP column (GE healthcare). A homology model of the NcrBgl was constructed in MODELLER [27] using 2WBG.pdb, chain A and 3AHY.pdb, chain C.

Figures

Most of the SCHEMA contacts are within both blocks and are therefore not disturbed upon recombination. For greater clarity, conserved residues have been assigned to one of the two blocks based on structural proximity.). Structural alignment of TmBglA 2WBG.pdb and TrBgl2 3AHY.pdb (RMSD = 3.34 ˚A) shows significant variation between these two homologues.

Figure 4.2: β-glucosidase non-contiguous chimera design chosen for construction. A) Num- Num-bered sequence alignment of the eukaryotic (top) and prokaryotic (bottom) β-glucosidases.
Figure 4.2: β-glucosidase non-contiguous chimera design chosen for construction. A) Num- Num-bered sequence alignment of the eukaryotic (top) and prokaryotic (bottom) β-glucosidases.

2003) Iminosugar glycosidase inhibitors: structural and thermodynamic dissection of isofagomine and 1-deoxynojirimycin binding to β-glucosidases. Molecular cloning and expression of novel fungal β-glucosidase genes from Humicola grisea and Trichoderma reesei. 2011) Structural and functional analysis of three β-glucosidases from the bacterium Clostridium cellulovorans, the fungus Trichoderma reesei and the termite Neotermes koshunensis. 1997) MOLREP: an automated molecular replacement program. 1997) Refinement of macromolecular structures by the method of maximum likelihood.

Supplementary information

Summary

Introduction

We previously designed a library very similar to this one (Smith et al. in preparation) and identified several stabilizing sequence elements. NCR-designed libraries can exhibit significantly less disruption than RASPP (contiguous) designs from the same parent sequences. We recommend analysis of NCR-designed libraries by creating an informative sample set of genes and using them to build predictive models, as we have done for RASPP-designed libraries [4].

Materials

This alignment must be in FASTA format (see note 6) and the file must be named 'alignment.fasta'. The available crystal structures are for the catalytic domain, therefore we considered this domain for recombination only (see Note 7). If no structure is specified, the NCR tools can also search for suitable structures in the PDB database (see Note 10).

Methods

This generates a list of all chimeras in the selected library along with their SCHEMA energies, number of mutations, and sequences (see Note 18). This list is saved as the text file 'chimeras.output' in the directory 'ncr/picked libraries/library12 2'. Before expressing CBH1 chimeras, we add a binding and cellulose binding domain to the recombinant catalytic domains.

Notes

However, in larger libraries, desirable chimeras are more difficult to find, and increasing the number of blocks increases the < E > of the library. The Python script 'ncr.py' generates one or more parent contact maps, computes SCHEMA contacts, and searches for low< E >libraries. In the terminal window, NCR lists < E > and < m > for each library and the distribution of mutations among the 12 blocks.

Figures

The alignment of multiple sequences of the parent CBH1s, with each of the 12 blocks highlighted in a different color.

Figure 5.2: Visualizing the chosen NCR design. A) The multiple sequence alignment of the parent CBH1s with each of the 12 blocks highlighted in a different color
Figure 5.2: Visualizing the chosen NCR design. A) The multiple sequence alignment of the parent CBH1s with each of the 12 blocks highlighted in a different color
  • Abstract
  • Introduction
  • Results
  • Discussion
  • Materials and methods
  • Figures
  • References
  • Supplementary information

While almost all of the blocks are continuous pieces of structure (Figure 6.1B), they each consist of many fragments of the polypeptide chain (Figure 6.1C). We constructed predictive models of TR50 and TA50 based on the sequences of the 32 functional chimeras and three parent cellulases (see Materials and Methods). As shown in Figure 6.3A, the TR50 model accurately predicts the stability of the library sample (r2 = 0.81).

A graph of the elevated temperature at which an enzyme loses half its activity (TA50) versus the incubation temperature at which an enzyme loses half its (unincubated) activity (TR50). A linear thermostability model trained on the TR50s of the chimeras accurately predicts the measured values ​​(r2 = 0.81).

Figure 6.1: Non-contiguous recombination library design. A) A graph view of the blue block and neighboring residues
Figure 6.1: Non-contiguous recombination library design. A) A graph view of the blue block and neighboring residues

Gambar

Figure 1.1: Site-directed homologous recombination. Two or more proteins are fragmented into well-defined pieces
Figure 1.2: Interfacial mutations stabilize a chimera made from fragments of different folds.
Figure 1.3: Different protein fragments can be responsible for different protein functions.
Figure 1.4: A 1-dimensional example of a protein fitness landscape, predicted from exper- exper-imental data using Gaussian processes
+7

Referensi

Dokumen terkait

In the light of the rising incidence of benign solid pancreatic tumors and the substantial proportion of patients in whom surgery is not feasible, new and effective forms of minimally