• Tidak ada hasil yang ditemukan

PDF University of the Philippines Manila

N/A
N/A
Protected

Academic year: 2023

Membagikan "PDF University of the Philippines Manila"

Copied!
128
0
0

Teks penuh

La-anan in partial fulfillment of the requirements for the bachelor's degree in computer science has been examined and recommended for admission. Accepted and approved as partial fulfillment of the requirements for the bachelor's degree in computer science. STRIDE Protein Topology Cartoon Generator and Database (SPTCGaD) is a system which converts 3D representation of protein domains to its 2D form based on the STRuctural knowledge algorithm.

INTRODUCTION

  • Background of the Study
  • Statement of the Problem
  • Objectives of the Study
  • Significance of the Study
  • Scope and Limitations

In the existing system, also from SPTCGaD, some functionalities of the three types of users are not complete. The main purpose of the Stride Protein Topology Cartoon Generator and Database (SPTCGaD) system is to provide protein topology information by parsing uploaded PDB files using STRIDE's knowledge-based algorithm (STRuctural IDentification) and using that to 2D representations of protein fold-based over CATH alignments. 11] Also, PDB-ID will be used to identify a protein fold rather than the entire CATH code in the system.

Figure 2. Conversion of 3D to 2D representation of a protein domain
Figure 2. Conversion of 3D to 2D representation of a protein domain

REVIEW OF RELATED LITERATURE

2] This is a more accurate algorithm in describing the secondary structure of proteins compared to others because other algorithms such as Define Secondary Structure of Proteins (DSSP), which uses single hydrogen bond energy expressions to assign eight states of secondary structure elements (SSEs) to three . - dimensional coordinates. The diagrams include a sequence of secondary structure elements (SSEs) and two sets of relationships defined between pairs of SSEs, which are hydrogen bonds (bonds) and chirality. The purpose of the cartoons is to facilitate the interpretation of protein structure by showing the secondary and super-secondary structure in two dimensions in such a way that the structure is clear.

Figure 6. Topological diagram (left) and a cartoon diagram (right)
Figure 6. Topological diagram (left) and a cartoon diagram (right)

THEORETICAL FRAMEWORK

Protein Topology

Class, Architecture, Topology and Homologous Structure

CATH generally provides information based on a particular structure classified in the database about the structure and known functions of that protein. Evolutionary relationships involving the structure of interest and other proteins in the database can also be determined. There has been a change in the domain identifications in the latest version of CATH due to the emergence of protein structures with more than nine domains.

Table 2. Summary of the number of clusters within each of the four classes in CATH  Each protein structure is decomposed into one or more chains which in turn are split  into one or more domains before being classified into homologous superfamilies accordi
Table 2. Summary of the number of clusters within each of the four classes in CATH Each protein structure is decomposed into one or more chains which in turn are split into one or more domains before being classified into homologous superfamilies accordi

Structural Protein Motifs

3D representation of 1ipo classified as mainly beta and belongs to trefoil architecture based on CATH. 3D representation of 1h12 classified as mainly alpha and belongs to the barrel architecture based on CATH. 3D representation of 3fxy classified as mainly beta and belongs to the roll architecture based on CATH.

Figure 9. 3D representation of 1ipo classified as mainly beta and belongs to trefoil  architecture based on CATH
Figure 9. 3D representation of 1ipo classified as mainly beta and belongs to trefoil architecture based on CATH

Protein Databank File

Sandwiches are protein motifs that can consist of strands, Greek keys, rolls, leaves and folds. Structures can be alpha, beta or mixed and can be parallel or antiparallel. The list of ATOM records in each polymer chain must be terminated with a TER record.

Protein Cartoon Generator

This must be an exact match to one of the declared record names detailed below. It is a rod-like structure stabilized by hydrogen bonds between the carboxyl group of each amino acid residue in the chain and the amino group of the amino acid located four residues ahead in the linear sequence. The n-terminal refers to the beginning of the finished protein from a free amine group (-NH2).

STRIDE Algorithm

The main cartoon generator refers to the STR record type, which contains secondary structure summary information. List of secondary structure element codes used in the STRIDE output file that may be present in the protein domain. The ASG record type contains detailed secondary structure information (see Figure 13) that will be used to create a 2D representation of the protein domain.

Table 7. Detailed format of STRIDE output file
Table 7. Detailed format of STRIDE output file

Existing System of STRIDE Protein Topology Cartoon Generator and Database

1D graphical representation of a STRIDE output file on the website http://webclu.bio.wzw.tum.de/stride/. It also has external programs such as the STRIDE output parser which accepts PDB files as input. Other external software algorithms used for comparing protein domains are Needleman-Wunsch, Smith-Waterman, CLUSTALW, FSA, POA-MSA and FASTA are also implemented on the site.

The user can upload a file or select an already uploaded file for the protein domain to be compared. The user will then select the algorithm to use to compare the protein domains and then click the Start Comparison button to begin comparing. Example protein comparison result of 1YBZ and 1EHS (both upside-down architecture) using the Needleman-Wunsch algorithm.

It is the page where the protein domain code is entered to see its 2D topology cartoon representation. It displays the basic information (chain, domain, architecture and sequence) of the protein domains stored in the database. It is the page where a registered user can upload a PDB file to generate the STRIDE output file and topology cartoon if the architecture is applied, or upload a STRIDE output file to generate the topology cartoon if the architecture is applied.

Figure 16. Login Page in the existing system of SPTCGaD
Figure 16. Login Page in the existing system of SPTCGaD

Database Management System

  • Information System

Definition of Terms

Anti-parallel – this describes the direction of two adjacent protein motifs to be opposite each other, one up and one down. Domain – is a part of protein sequence and structure that can develop, function and exist independently of the rest of the protein chain. Architecture – refers to the arrangement and orientation of secondary structural elements, but not their connectivity.

Figure 27. Greek key protein topology diagram
Figure 27. Greek key protein topology diagram

DESIGN AND IMPLEMENTATION

Context Diagram

Use-Case Diagram

Activity Diagrams

Activity diagram of uploading STRIDE output file to parse functionality for registered users in SPTCGaD.

Figure 33. Create an account functionality for registered users in SPTCGaD
Figure 33. Create an account functionality for registered users in SPTCGaD

Flowcharts

The STRIDE algorithm implemented to parse PDB files is an external application written in C. The program can be downloaded at http://webclu.bio.wzw.tum.de/stride/. Upon input of a PDB file with 3D coordinates of a protein domain, the external application STRIDE generates exported 2D secondary structure assignments in a text file in the specified format.

Figure 44. Flowchart of Generate the Cartoon file and save it in the database process SPTCGaD
Figure 44. Flowchart of Generate the Cartoon file and save it in the database process SPTCGaD

Process Explosion

The input files would be the STRIDE output files to be parsed; CATH Domain List text file containing a CATH-based list of protein domains and their corresponding classification (class and architecture numbers in the case of a system); and a CATH Domall text file containing the domain boundaries for each PDB chain needed to identify the architecture of each part of the protein domain. The protein domain name is retrieved from the HDR line of the STRIDE output file. It searches the file by the class and architecture number of the protein domain it belongs to, which will then be the result.

The first column is the protein domain name; the second is the class of the protein domain; and the third is the architecture number of Since a protein can have many domains with different architectures, the CATH Domall text file is used to identify how many architecture chains exist in a protein. Only column 1 (chain name) and 2 (number of domains, in D%02d format) will be used in the process.

The result for this process will be protein chains, with their domains and their respective architecture where they belong. The input to this process is the STRIDE output file to be uploaded by the user.

Figure 47 shows the ASG information of 1RG8 from its STRIDE Output file
Figure 47 shows the ASG information of 1RG8 from its STRIDE Output file

Entity-Relationship Diagram

Data Dictionary

Technical Architecture

RESULTS

Search protein domain page (see Figure 63) is the page where the user can search available and unavailable (as long as the STRIDE output file is in the database) architectures on the website. The Registered User Home Page (see Figure 73) displays the tools that can be used in the website. The menu in the upper left corner of the websites logs the user off the website.

If a registered user wants to analyze a PDB file to obtain his/her STRIDE output, he/she must click the Analyze PDB File link on the MySTRIDE home page or in the tools menu. The user will enter the PDB ID of the protein he/she wants to analyze in the search bar (see Figure 78) of the RCSB website. Click the drop-down box on the upper right side of the page (see Figure 79) and then click the link to download the PDB File of the protein the user wants to analyze in the SPTCGaD system.

If a registered user wants to parse the STRIDE output file and view its 2D cartoon equivalent, they must click the Generate Cartoon link on the MySTRIDE home page or in the Tools menu (see Figure 78). If the system administrator wants to approve or reject accounts, they can click the Approve Accounts link on the MySTRIDE home page or in the Tools menu. If the system administrator wants to view the messages that have been sent to the system, they can click the View Messages link on the MySTRIDE home page or in the Tools menu.

If the system administrator wants to reply to the message sent by the site users, he/she must fill out the message form under the message content of the message page (see Figure 85).

Figure 54. Website FAQ Page
Figure 54. Website FAQ Page

DISCUSSION

Group the betas in half and then place them in the second and fourth columns of the main array, getting the maximum length of the beta strands in each column. If the beta strand to be placed is less than the maximum beta strand length in that column, add coils. Note the direction in which the spirals are placed and also the maximum length of the spirals in the column or row.

The connectors are coils and will be placed on the first and last column of the main box. An example of a generated 2D cartoon of a 3-layer (aba) sandwich is shown below (see Figure 97). The value of the orientation variable changes when it gets the secondary structures from the array.

An example of a generated 2D up-down beam cartoon is shown below (see Figure 101). Note the length for all groups of alpha helices, as this will be the maximum length of the (vertical) string and connecting coils. Combining the alpha 1ppr solenoid based on the algorithm that exists in the system. An example of a generated 2D drawing of an alpha solenoid is shown below (see Figure 104).

An example of a generated 2D cartoon of an alpha-beta barrel is shown below (see Figure 106). An example of a generated 2D cartoon of an alpha-beta barrel is shown below (see Figure 108). The beta strands will start printing based on the y-coordinate of the last structure.

Figure 86. Topology cartoon of a Beta Hairpin  Algorithm:
Figure 86. Topology cartoon of a Beta Hairpin Algorithm:

CONCLUSION

RECOMMENDATION

Use of different secondary structure prediction algorithms such as DSSP for comparison of the 2D cartoons.

Table 11 shows the updated list of representative protein domain algorithms based on CATH  that are now existing in SPTCGaD
Table 11 shows the updated list of representative protein domain algorithms based on CATH that are now existing in SPTCGaD

BIBLIOGRAPHY

APPENDIX ................................................................. Error! Bookmark not defined

Gambar

Figure 9. 3D representation of 1ipo classified as mainly beta and belongs to trefoil  architecture based on CATH
Figure 11. 3D representation of 3fxyclassified as mainly beta and belong to the roll  architecture based on CATH
Figure 12. 3D representation of 1kvi classified as mixed alpha and beta, and belong to the  sandwich architecture based on CATH
Figure 18. Compare Similarity Page of the existing system of SPTCGaD
+7

Referensi

Dokumen terkait

Cognitive distortions do not just apply to problem and compulsive gamblers, they apply to most people in some form or another.. Our own brains are particularly good at misleading