• Tidak ada hasil yang ditemukan

Bioinformatic methods

CHAPTER 3 Bioinformatic studies

3.2 Bioinformatic methods

CHAPTER 3

The amino acid sequence of Pca 96 was used as a query sequence in a BLASTP (protein to protein database) and TBLASTN (protein to translated nucleic acid database) search at PlasmoDB (http://www.plasmodb.org/).

The nucleotide sequences from the TBLASTN search that did not have protein sequences available, were downloaded from PlasmoDB (http://www.plasmodb.org/) and translated into amino acid sequences at Expasy (http://www.expasy.org/tools/dna.html). The amino acid sequence of Pca 96 was aligned with the amino acid sequences and translated amino acid sequences producing matches with the BLAST searches. The sequence alignment was performed using the program CLUSTALW (Chenna et al., 2003), at the European Bioinformatics Institute (EBI) website (http://www.ebi.ac.uk/clustalw/).

3.2.2 Expression profiles

High-density oligonucleotide arrays were used to create mRNA expression profiles of genes for all of the stages of the P. falciparum life cycle (Le Roch et al, 2003). In the present study, the data from Le Roch et al. (2003) was used to construct graphs of the mRNA expression levels for the gene encoding the P. falciparum protein PFC0760c. This was done to determine at what stage of the parasite life cycle the expression increased, as this could possibly provide insight into the role PFC0760c has within the parasite life cycle, as it is assumed that the mRNA expression levels could possibly reflect the protein expression level, although it is understood that this may not necessarily always be the case. P. falciparum lactate dehydrogenase (PfLDH) (PF13_0141) was used as a comparison for levels of mRNA expression.

3.2.3 Plasmodium export element (PEXEL) motif searches and signal sequence predictions

Secreted Plasmodium protein families such as P. falciparum erythrocyte membrane protein (PfEMP1), repetitive interspersed family (RIFIN) and subtelomeric variable open reading frame (STEVOR) are exported out of the parasitophorous vacuole and into the host erythrocyte, where they alter the host red blood cell to maximize the parasites survival.

Exported proteins generally have a sequence located at the N-terminus that directs the protein to the endoplasmic reticulum (Wickham et al., 2001; Haldar et al., 2002). The proteins are

then transported through the parasite membrane via the golgi apparatus, to the parasitophorous vacuole. In order for the proteins to be transported from the parasitophorous vacuole into the host red blood cell, it appears necessary for the proteins to have both a signal sequence and a motif that targets the protein to the host red blood cell. The motif, the Plasmodium export element (PEXEL) (Marti et al., 2004) or the vacuolar transport signal (VTS) (Hiller et al., 2004) as it is also known, consists of a short sequence of alternating charged, hydrophobic amino acids separated by non-charged amino acid residues (Marti et al., 2005). The sequence R/KxLxQ/E is typically located 15-20 amino acids downstream of the signal sequence (Marti et al., 2005). The presence of a PEXEL motif in a protein sequence can aid with predictions of where a protein is located within a cell and can essentially aid with predictions of the proteins function. All of the different combinations of the motif R/KxLxQ/E were searched for in the amino acid sequences of PFC0760c and the P. yoelii yoelii sequence PY05757 which was found to have high sequence similarity to PFC0760c and Pca 96 (Section 3.2.1).

The programs SIG-Pred (http://bmbpcu36.leeds.ac.uk/prot_analysis/Signal.html), and Sigfind 2.1 (http://www.stepc.gr/~synaptic/sigfind.html) were used to determine if the sequences possessed hydrophobic N-terminals signal sequences.

3.2.4 Transmembrane domain prediction

Transmembrane proteins are proteins having one or more sections of their structure crossing a biological membrane. These sections have amino acids that are predominantly hydrophobic in nature, suiting the hydrophobic environment of a lipid membrane bi-layer. These sections are composed of twenty or more amino acid residues (Westhead et al., 2002). The sections of the protein that are on either side of the membrane tend to be hydrophilic due to the aqueous environment. The membrane spanning regions of proteins often adopt the structure of an alpha helix.

One method of predicting membrane-spanning regions within proteins is with the program TMHMM. It is a program based on a hidden Markov model and is able to predict the presence of a transmembrane domain and the orientation, with respect to which sections of the protein are inside the membrane, and which are outside the membrane.

The amino acid sequences of Pca 96, PFC0760c and PY05757 were assessed for membrane spanning regions using TMHMM (http://www.cbs.dtu.dk/Services/TMHMM/).

3.2.5 Subcellular localization predictions

Prediction of a proteins sub cellular localisation can give insight to the proteins role in the cell. Due to so little being known about Pca 96, PFC0760c and PY05757, it was decided attempt to predict their subcellular location, using the web-based programs SubLoc and pTARGET.

3.2.5.1 pTARGET subcellular localization prediction

pTARGET (http://bioapps.rit.albany.edu/pTARGET/) has the ability to predict the subcellualr location of eukaryotic proteins. The program predicts proteins targeted to the cytoplasm, endoplasmic reticulum, golgi, lysozyme, mitochondria, nucleus, plasma membrane, peroxisome and whether they are secreted extracellularly (Guda, 2006). The prediction algorithm calculates two scores which are then summed to give the final prediction of the proteins subcellular localization. One score is based on the relative amino acid weights which are calculated from the amino acid composition of the protein sequence. The other score is based on the presence or absence of Pfam domains that are in a specific location within the protein sequence (Guda and Subramaniam, 2005). pTARGET has been shown to accurately predict protein subcellular localization to 96-99% (Guda and Subramaniam, 2006).

A confidence level is given with each of the predictions which indicate the trustworthiness of that prediction. The confidence value is expressed as a percentage of the ratio of the calculated score to the total score required for making a true prediction.

3.2.5.2 Prediction of Subcellular Localization by SubLoc

SubLocv1.0 (http://www.bioinfo.tsinghua.edu.cn/SubLoc/) has the ability to predict the subcellular localization of prokaryotic and eukaryotic proteins, based on their amino acid composition (Hua and Sun, 2001). For eukaryotes, SubLoc can predict proteins targeted to the nucleus, cytoplasm, mitochondria and extra cellular matrix. SubLoc has a prediction accuracy of 79.4% for eukaryotic proteins, and 91.4% for prokaryotic proteins (Hua and Sun, 2001).

3.2.6 Conserved domains

Conserved domains are compact structural or functional parts of a protein that have been used and are still being used throughout the process of evolution (Wheeler et al., 2001).

Searching for conserved domains in a protein of unknown structure and function could give insight into possible functions or structures of the protein (Wheeler et al., 2001). The conserved domain search (CD-Search) at the National Centre for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi), allows conserved domains to be searched for on a protein query. CD-Search uses the Conserved Domain Database (CDD).

The CDD contains data from Simple Modular Architecture Research Tool (SMART), Protein families (Pfam) and Clusters of Orthologous Groups of proteins (COGs) databases.

3.2.7 Helical wheel plot

A helical plot shows an end-on view of an alpha helix. The angle of rotation between adjacent amino acid residues is 100˚. In the arrangement of alpha helices, the hydrophobic amino acid residues are typically located on one side of the alpha helix (Mount, 2001). The conserved peptide sequence was used to plot a helical wheel plot using the programs Helixator (http://www.tcdb.org/progs/helical_wheel.php) and EMBOSS Pepwheel (http://www.tcdb.org/progs/pepwheel.php). The plot was performed to deduce if the sequence could possibly be an alpha helix. The programs utilised use different algorithms and thus will provide greater confidence in the results if both algorithms provide a similar result.

3.2.8 Structure prediction

A protein with a known 3 dimensional structure, having significant sequence similarity to Pca 96, PFC0760c or PY05757 was searched for using a sequence search at the Protein Data Bank (PDB) (http://pdb.org/pdb/search/advSearch.do?st=SequenceQuery). A significant sequence similarity of 40-50% would allow the amino acid of the query sequence to be superimposed onto the PDB entry, resulting in a sufficiently accurate model of the proteins structure (Mount, 2001; Lesk, 2002). In addition, the sequences of the proteins were used as query sequences with the program Phyre (www.sbg.bio.ic.ac.uk/phyre/) (Kelley and Sternberg, 2009), to determine which protein folds may be present in the proteins.

3.2.9 T-cell epitope prediction

SYFPEITHI (Rammensee et al., 1999) is a database allowing for the prediction of T- cell epitopes. The database consists of peptide sequences known to bind MHC class I and II molecules. The SYFPEITHI program (http://www.syfpeithi.de/) was used to determine if T- cell epitopes were present in the Pca 96 protein sequence. The epitopes predicted for Pca 96 were searched for in the amino acid sequence of PFC0760c.

3.2.10 Protein interaction

A version of the yeast-two hybrid assay has been used to determine P falciparum protein-protein interactions (LaCount et al., 2005). The plasmoMAP project at PlasmoDB uses the query gene entered to determine functional interactions with other P. falciparum genes.

Each gene determined to have a functional interaction with the query gene is given a likelihood score. The higher the likelihood score the greater the confidence of the functional interaction. A likelihood score of 10 or above is a fairly good indication of the functional interaction being true. A second program called PlasmoPredict was used for predicting protein

interactions for PFC0760c

(www.bioinformatics.leeds.ac.uk/~bio5pmrt/PlasmoPredict/PlasmoPredict.html).