Representation of Molecular Structures - AN INTRODUCTION TO COMPUTATIONAL BIOCHEMISTRY

Figure 4.2. GenBank format for nucleotide sequence of chicken egg-white lysozyme.

correspond to different hues,but different spectra can give the same perceived hue. A ﬂat spectrum appears achromatic: white,gray,or black.

2. Tone (value): Roughly speaking,the total amount of light per unit area (i.e., multiplying a spectrum by a constant) changes the tone. Members of the series white; gray ; black differ in tone. Empirical scales of tone are not linear in integrated intensity; moreover,changes in tone can alter perceived hue.

3. Saturation (intensity or chroma): The difference between a color and the gray with the same tone exempliﬁes saturation. A pure or saturated color can be diminished in saturation by adding white light and normalizing the result to the same perceptual tone.

As a result,colors may be thought of as point in a three-dimensional space,the axes of which might be the primary colors (primaries): red,green,and blue (Figure 4.1). Each color is a vector,the components of which are the intensities of the primaries required to match it. For displays generated by three-gun(red,green,and blue) monitors,these are the numbers speciﬁed. The literature of computer graphics reveals considerable efforts to achieve truly convincing representations of real objects (Roger,1985; Wyszecki and Stiles,1982).

Figure 4.3. EMBL format for nucleotide sequence of chicken egg-white lysozyme.

Figure 4.4. Fasta format for nucleotide sequence of chicken egg-white lysozyme.

Figure 4.5. PIR format for amino acid sequence of chicken egg-white lysozyme.

Figure 4.6. Swiss-Prot format for amino acid sequence of chicken egg-white lysozyme.

REPRESENTATION OF MOLECULAR STRUCTURES 57

Figure 4.7. GenPept format for amino acid sequence of chicken egg-white lysozyme.

Figure 4.8. Fasta format for amino acid sequence of chicken egg-white lysozyme.

Figure 4.9. TOPS diagrams for trypsin domains.

Protein topology cartoons (TOPS) are two-dimensional schematic representa-tions of protein structures as a sequence of secondary structure elements in space and direction (Flores et al.,1994; Sternberg and Thornton,1977). The TOPS of trypsin domains as exempliﬁed in Figure 4.9 have the following symbolisms:

1. Circular symbols represent helices( and 3).

2. Triangular symbols represent strands.

3. The peptide chain is divided into a number of fragments,and each fragment lies in only one domain.

4. Each fragment is labeled with an integer (i),beginning at NG and ending at CG> with the ﬁrst fragment being N ;C.

5. If the chain crosses between domains,it leaves the ﬁrst at CG>tojointhenextNG>.

6. Each secondary structure element has a direction(N to C) that is either up (out of the plane of the diagram) or down (into the plane of the diagram).

7. The direction is up if the N-terminal connection is drawn to the edge of the symbol and the C-terminal connection is drawn to the center of the symbol.

Otherwise,the direction is down if the N-terminal connection is drawn to the center of the symbol and the C-terminal connection is drawn to the edge.

58 MOLECULAR GRAPHICS: VISUALIZATION OF BIOMOLECULES

8. For strands,up strands are indicated by upward-pointing triangles whereas down strands are indicated by downward-pointing triangles.

The topology cartoons can be browsed and searched at TOPS server (http://

tops.ebi.ac.uk/tops/).

The most obvious data in a typical 3D structure record are the atomic coordinate data,the locations in space of the atoms of a molecule represented by (x, y, z) triples,and distances along each axis to some arbitrary origin in space. The coordinate data for each atom are attached to a list of labeling information in the structure record such as that derived from the protein or nucleic acid sequence.

Three-dimensional molecular structure database records employ two different

‘‘minimalist’’ approaches regarding the storage of bond data. The chemistry rule applies observable physical principles of chemistry to record molecular structures without bond information. There is no residue dictionary required to interpret data encoded by this approach,just a table of bond lengths and bond types for every conceivable pair of bonded atoms. This approach is the basis for 3D biomolecular structure ﬁle format of Protine Data Bank (Bernstein et al.,1997). The other approach is used in the database records of the Molecular Modeling Database (MMDB) at NCBI,which uses a standard residue dictionary of all atoms and bonds in the biomacromolecules of amino acid and nucleotide residues plus end-terminal variants (Hogue et al.,1996). The software that reads in MMDB data which are derived from the data of PDB can use the bonding information supplied in the dictionary to connect atoms together,without trying to enforce the rules of chemistry.

Almost all known protein structures have been determined by X-ray crystallog-raphy. A few contain details derived from neutron diffraction,and a few have been determined from nuclear magnetic resonance (NMR). Recently,the theoretical models derived from molecular modeling are added. The resolution of a structure is a measure of how much data were collected. The more data collected,the more detailed the features in the electron density map to be ﬁtted,and,of course,the greater the ratio of observations to parameters to be determined (i.e.,the atomic coordinates and R-factors). Resolution is expressed in angstroms (Å),which is a measure of distance. The lower the number,the higher the resolution. Protein structures are generally determined to a resolution between 1.7 and 3.5Å; those determined at 2.0Å or better are considered high-resolution. The R-factor of a structure determination is a measure of how well the model reproduces the experimental intensity data. Other things being equal,the lower the R-factor,the better the structure. The R-factor is a fraction expressed as a percentage; R: 0%

would be an impossible ideal case (no disorder,no experimental error),and R: 58% for a collection of atoms placed randomly in unit cell of the crystal.

The Protein Data Bank(PDB) is the collection of publicly available structures of proteins,nucleic acids,and other biological macromolecules initiated by Brook-haven National Laboratory and now maintained by the Research Collaboratory for Structural Bioinformatics(RCSB) at http://www.rcsb.org/pdb/ (Berman et al.,2000).

The PDB coordinates of biomacromolecules can be classiﬁed into the following:

1. Protein structures determined by X-ray or neutron diffraction or NMR which may include co-factors,substrates,inhibitors,or other ligands

REPRESENTATION OF MOLECULAR STRUCTURES 59

2. Oligonucleotide or nucleic acid structures determined by X-ray crystal-lography

3. Carbohydrate structures determined by X-ray diffraction 4. Hypothetical models of protein structures

5. Bibliographic entries

Each set of coordinates deposited with the PDB becomes a separate entry. Each entry is associated with an accession PDB code with a unique set of four alpha-numeric characters. PDB and its mirror sites offer a text search engine that uses an index of all the textual information in each PDB record(e.g.,PDB ID); an example of such an index is 1LYZ for hen’s egg-white lysozyme. The first character is a version number. An identifier beginning with the number 0 signifies that the entry is purely bibliographic. The pdb file is a text file with an explanatory header followed by a set of atomic coordinates. The atomic coordinates are subjected to a set of standard stereochemical checks and are translated into a standard entry format; for example,Figure 4.10 shows partial coordinate file for 1LYZ.pdb or pdb1LYZ.ent.

The PDB format includes information about the structure determination, bibliographic references describing the structure,types/locations of secondary struc-tures,and the atomic coordinates. Most software programs created for molecular graphics or other computational analysis of protein structures can read ﬁles in PDB format,with ﬁle extension of either .pdb or .ent. The 3D graphical representations of these biomacromolecules can be displayed with RasMol(Sayle and Milner-White, 1995),Cn3D (Wang et al.,2000) or KineMage (Richardson and Richardson,1992;

Richardson and Richardson,1994) as shown in Figure 4.11. For online visualizatiion of 3D structures,the MIME types for PDB format is chemical/x-pdb,which enables the display of 3D structure on the Web with RasMol. Once the atomic coordinates are known,the reader can manipulate the image in the browser to rotate the molecule,view it from a different perspective,or change the manner in which the structure is presented. A comprehensive list of chemical MIME media type(Rzepa et al.,1998) is available from http://www.ch.ic.ac.uk/chemime/.

4.3. DRAWING AND DISPLAY OF MOLECULAR STRUCTURES

Dalam dokumen AN INTRODUCTION TO COMPUTATIONAL BIOCHEMISTRY (Halaman 66-70)