Bioinformatics methods - Materials and methods

CHAPTER 2 Materials and methods

2.3 Bioinformatics methods

This section describes the bioinformatics tools used in this study.

2.3.1 Sequence identification and analysis Copper transport protein

To establish a parasite requirement for copper, the P. falciparum proteome (www.plasmodb.org) was BLASTp searched for the presence of copper-requiring protein orthologues (Table 3.1).

The identification of a number of copper-requiring enzymes within the P. falciparum proteome suggested a need for copper for parasite metabolic function. Sequences for a putative Plasmodium copper transport protein were similarly identified by a Basic Logical Alignment Search Tool search for proteins (BLASTp) of PlasmoDB, using the Theileria parva (Muguga stock) polymorphic immunodominant molecule (PIM) (GenBank: AAA99499). Putative copper transport protein sequences were identified for eight species of the Plasmodium parasite. Each sequence was retrieved and saved as a text file in FASTA format. In an effort to support sequence identity, each retrieved Plasmodium spp. sequence was aligned with sequences of

characterised copper transport proteins using the ClustalW™ server (Thompson et al., 1994).

The characterised sequences included were for the Homo sapiens (Accession No.

NP_001850), Arabidopsis thaliana (GenBank: BAE98928) and Saccharomyces cerevisiae (GenBank: AAB68064) copper transport proteins. To identify the presence of three characteristic transmembrane domains in the retrieved Plasmodium spp. sequences, each was submitted to both the HMMTOP (Tusnády and Simon, 2001) and TMHMM 2.0 (Krogh et al., 2001) topology prediction servers. The presence of potential signal sequences in each protein was established using the TMHMM server as well as the SignalP 3.0 program (Center for Biological Sequence Analysis, Lyngby, Denmark). The presence of parasite specific targeting signals was determined using the PlasmoAP and PlasmoMit servers found at PlasmoDB as well as the PSEApred2 server for PEXEL motif prediction. The genome organisation of the two P. falciparum copper transport protein coding domains were constructed from information retrieved from PlasmoDB and the nucleotide database of the National Center for Biotechnology Information (NCBI). P. falciparum expression data for lactate dehydrogenase, S- adenosylhomocysteine hydrolase, Cox11 and cytochrome-c oxidase subunit I were retrieved from PlasmoDB.

Cox17

The sequence of a putative Plasmodium spp. Cox17 copper chaperone was identified from the BLASTp search of PlasmoDB for copper-requiring protein orthologues. The specific sequence used to search PlasmoDB was the H. sapiens Cox17 sequence (GenBank: AAA98114). As with the identification of a putative Plasmodium spp. copper transport protein sequence, a putative Cox17 sequence was identified in the same eight species of the parasite. To identify characteristic Cox17 features in each retrieved sequence, they were aligned with characterised Cox17 sequences using the ClustalW™ server (Thompson et al., 1994). The characterised sequences included were the H. sapiens (GenBank: AAA98114), A. thaliana (Accession No.

NP_566508) and S. cerevisiae (GenBank: CAA97453) Cox17 proteins. This alignment revealed the conservation of six cysteine residues known to be essential for Cox17 function. As with the copper transport proteins, the presence of potential signal sequences in each protein was established using the TMHMM server as well as the SignalP 3.0 server (Center for Biological Sequence Analysis, Lyngby, Denmark). The presence of parasite specific targeting signals was determined using the PlasmoAP and PlasmoMit servers found at PlasmoDB as well as the PSEApred2 server for PEXEL motif prediction. The genome organisation of the P. falciparum Cox17 protein coding domain was constructed from information retrieved from PlasmoDB and the nucleotide database of the National Center for Biotechnology Information (NCBI).

2.3.2 Homology modelling

A homology model of the putative P. falciparum Cox17 metallochaperone (PF10_0252) was constructed using the NMR-solved structures of Homo sapiens Cox17, with and without copper bound [Protein Data Bank (PDB) ID's: 2RN8 (plus Cu⁺) and2RN9 (minus Cu⁺)] (Banci et al., 2008b). Each of these structures were used individually to produce a PfCox17 model using the Swiss-pdb DeepView program (Guex and Peitsch, 1997). Once the desired HsCox17 structure was loaded into the DeepView program, the PfCox17 sequence was loaded for subsequent mapping. This was achieved by aligning the PfCox17 sequence to the HsCox17 sequence (AAA98114) with the assistance of a ClustalW™ alignment (Thompson et al., 1994). Sequence insertions in the alignment resulted in unsolved gaps in the hypothetical PfCox17 structure.

These were accounted for by inserting a fold that placed the least structural strain on the protein model. The predicted folds are derived from solved protein structures having similar amino acid sequence arrangements. The final hypothetical protein model was checked for amino acid side chain or backbone clashes and corrected as necessary.

2.3.3 Phylogenetic tree construction

For phylogenetic analysis of the retrieved sequences, the same approach was adopted for both the putative Plasmodium spp. copper transport protein and Cox17 sequences. The eight Plasmodium sequences, for each respective protein, were aligned with the relevant sequences from the organisms listed in Table 2.1. Sequences were aligned using the ClustalW™ server (Thompson et al., 1994) and a phylogenetic tree constructed from this alignment. The resulting tree diagram was copied as a JPEG image and edited, if necessary, with the GIMP™ image editor software.

Table 2.1 Accession numbers of sequences used for copper transport protein and Cox17 phylogenetic tree construction

Organism Accession number

Copper transport protein Cox17

Homo sapiens NP_001850 AAA98114

Mus musculus NP_780299 BAB32486

Rattus norvegicus NP_598284 NP_445992

Danio rerio NP_991280 NP_001004652

Arabidopsis thaliana BAE98928 NP_566508

Saccharomyces cerevisaie AAB68064 CAA97453

Theileria parva AAA99499 EAN30678

2.3.4 Predict7™ antigenic peptide prediction

The production of synthetic peptide antigens for antibody production relies on the precise prediction of antigenic sites on a protein molecule (Saravanan and Kumar, 2009). Immunogenic regions on the proteins encoded by PY00413, PY03823, PF10_0252, PF14_0211 and PF14_0369 were identified using the Predict7™ programme (Cármenes et al., 1989). The programme calculates various features of an amino acid sequence, namely antigenicity (Welling et al., 1985), hydrophilicity (Hopp and Woods, 1983), flexibility (Karplus and Schulz, 1985), surface probability (Emini et al., 1985), hydropathy, secondary structure and N- glycosylation sites. For this particular study, the respective protein sequences (saved as text files) were uploaded into Predict7™ and regions satisfying the more important criteria of hydrophilicity, surface probability and side chain flexibility were selected (Saravanan and Kumar, 2009). Furthermore, when selecting a peptide it was important that the selected sequence did not contain more than two lysine residues since multiple lysines can interfere with peptide specific antibody production due to their long side chains. For the copper transport proteins, peptide sequences were specifically chosen from the amino-terminal domain due to its extracellular location, presumably making it more accessible to antibodies. The prediction file for each protein was saved as a .xls file and the graphics of the three desirable properties plotted using Microsoft Excel™ graphing software. Protein-protein BLAST searches were carried out on the selected peptides to ensure the sequence was absent in all G. gallus (antibody production host), H. sapiens (P. falciparum host) and M. musculus (P. yoelii host) native proteins. If necessary a cysteine residue was added to the amino-terminus of those peptide sequences lacking a native cysteine at this position. The cysteine residue enables conjugation to a carrier molecule and Sulfolink^® affinity resin. The specific peptides selected for antibody production are shown in Table 2.2.

Table 2.2 Peptide sequences selected for anti-peptide antibody production in chickens

Protein name PlasmoDB identifier Peptide sequence^a

P. yoelii copper transport protein PY00413 CSDKQSGDDECKPILD

P. yoelii Cox17 PY03823 CPLNTTEESKTα-Bu(C)A^b

P. falciparum copper transport protein I PF14_0211 CHSKNDDGVMLPMY P. falciparum copper transport protein II PF14_0369 CNLQKEEDTVVQLQD

P. falciparum Cox17 PF14_0252 CPINNTNEANKGE

a Cysteines represented by a bold 'C' are indicative of residues added for coupling, whilst underlined three letter codes indicate the name by which the peptide is referred to

b α-Bu is the abbreviation for α-butyric acid, which is used to substitute any internal cysteine residues

Dalam dokumen Recombinant expression and initial characterisation of two Plasmodium copper binding proteins. (Halaman 42-46)