2. Oligonucleotide or nucleic acid structures determined by X-ray crystal-lography
3. Carbohydrate structures determined by X-ray diffraction 4. Hypothetical models of protein structures
5. Bibliographic entries
Each set of coordinates deposited with the PDB becomes a separate entry. Each entry is associated with an accession PDB code with a unique set of four alpha-numeric characters. PDB and its mirror sites offer a text search engine that uses an index of all the textual information in each PDB record(e.g.,PDB ID); an example of such an index is 1LYZ for hen’s egg-white lysozyme. The first character is a version number. An identifier beginning with the number 0 signifies that the entry is purely bibliographic. The pdb file is a text file with an explanatory header followed by a set of atomic coordinates. The atomic coordinates are subjected to a set of standard stereochemical checks and are translated into a standard entry format; for example,Figure 4.10 shows partial coordinate file for 1LYZ.pdb or pdb1LYZ.ent.
The PDB format includes information about the structure determination, bibliographic references describing the structure,types/locations of secondary struc-tures,and the atomic coordinates. Most software programs created for molecular graphics or other computational analysis of protein structures can read files in PDB format,with file extension of either .pdb or .ent. The 3D graphical representations of these biomacromolecules can be displayed with RasMol(Sayle and Milner-White, 1995),Cn3D (Wang et al.,2000) or KineMage (Richardson and Richardson,1992;
Richardson and Richardson,1994) as shown in Figure 4.11. For online visualizatiion of 3D structures,the MIME types for PDB format is chemical/x-pdb,which enables the display of 3D structure on the Web with RasMol. Once the atomic coordinates are known,the reader can manipulate the image in the browser to rotate the molecule,view it from a different perspective,or change the manner in which the structure is presented. A comprehensive list of chemical MIME media type(Rzepa et al.,1998) is available from http://www.ch.ic.ac.uk/chemime/.
4.3. DRAWING AND DISPLAY OF MOLECULAR STRUCTURES
Figure 4.10. PDB file (partial) for 3D structure of hen’s egg-white lysozyme (1LYZ.pdb).
The abbreviated file shows partial atomic coordinates for residues 34—36. Informational lines such as AUTHOR (contributing authors of the 3D structure), REVDAT, JRNL (primary bibliographic citation), REMARK (other references, corrections, refinements, resolution and missing residues in the structure), SEQRES (amino acid sequence), FTNOTE (list of possible hydrogen bonds), HELIX (initial and final residues of-helices), SHEET (initial and final residues of-sheets), TURN (initial and final residues of turns, types of turns), and SSBOND (disulfide linkages) are deleted here for brevity. Atomic coordinates for amino acid residues are listed sequentially on ATOM lines. The following HETATM lines list atomic coordinates of water and/or ligand molecules.
conforms to the lowest normal valence consistent with explicit bonds — that is,C(4), N(3,5), O(2), P(3,5),and S(2,4,6).
3. Bond specification: Single,double,triple,and aromatic bonds are represented by the symbols s, , {,and :,respectively,for example,CCO for acetaldehyde. Generally,single and aromatic bond symbols are omitted.
4. Branch specification: Branches are enclosed in nested or stacked paren-theses — for example,C(C)CC(N)C(O)O for valine.
DRAWING AND DISPLAY OF MOLECULAR STRUCTURES 61
Figure 4.11. Graphic representations of protein 3D structure. Three-dimensional graphics of hen’s egg-white lysozyme as visualized with RasMol (first and second rows, 1LYZ.pdb) and Cn3D (third row, 1LYZ.val) are shown from left to right (color type) in wireframe (atom), spacefill (atom), dots (residue), backbone (residue), ribbons (secondary structure), strands (secondary structure), secondary structure (secondary structure), ball-and-stick (residue), and tubular (domain) representations.
5. Ring specification: Ring closure bonds are specified by appending matching digits to the specifications of the joined atoms,with the bond symbol preceding the digit — for example,N1CCCC1C(O)O for proline.
6. Aromaticity(A ring having sp hybridized carbons with 4N ; 2 p-electrons):
Aromatic atoms are specified with lowercase atomic symbols with appen-ding matching digits following the joined atomic symbols — for example, Oc1ccccc1CC(N)C(O)O or OC1CCCCC1CC(N)C(O)O for tyrosine.
7. Disconnection: The period or dot is used to represent disconnections — for example,CCOP(O)(O)OC(O)[O\].[Na>] for sodium phospho-enolpyruvate.
8. Isotope: Isotopic specification is indicated by prefixing the atomic symbol with a number equal to the integral isotopic mass — for example,[H] for deuterium and [C] for carbon-13.
62 MOLECULAR GRAPHICS: VISUALIZATION OF BIOMOLECULES
9. Isomerism (geometric): Configuration around double bonds is specified by the characters / and indicating relative directionality between the connec-ted(by double bond) atoms — for example,
CCCCCCCC/CCCCCCCCCC(O)O for oleic acid with cis double bond.
10. Isomerism (chiral): The most common type of chirality in biochemistry is tetrahedral. The tetrahedral chiral specification( or ) is written as an atomic property following the atomic symbol of the chiral atom. Looking at the chiral center from the direction of the ‘‘from’’ atom (preceding the chiral atom), (or 1) means ‘‘the other’’ three atoms (following the chiral atom) are listed anti-clockwise; (or 2) means clockwise. If the chiral atom has a nonexplicit hydrogen,it will be listed inside the chiral atom’s brackets as [CH] — for example,
[C2H]O1([C2H](O)[C2H](O)[C2H](O)[C2H]1CO for--glucopyranose that has all hydroxyl groups in equatorial configuration.
The default sequence format (nucleotide/amino acid) retrieved from three integrated database retrieval systems are GenBank/GenPept from Entrez as well as DBGet,and EMBL/Swiss-Prot from EBI. Though different formats can be specified at the time of retrieval,these formats can be interconverted by the use of ReadSeq facility at http://dot.imgen.bcm.tmc.edu:9331/seq-util/Options/readseq.html. The for-mats supported by ReadSeq are IG/Stanford,GenBank/GB,NBRF,EMBL,GCG, DNASrider,Fitch,Pearson.Fasta,Philip3.2,Philip. PIR/CODATA,MSF,ASN.1, and PAUP/NEXUS.
4.3.2. Drawing of Molecular Structures
The 1D nucleotide/amino acid sequences in character format (without index,e.g., fasta format) can be converted into the 2D chemical structures with ISIS Draw, which can be downloaded from MDL Information System at http://www.mdli.com/
download/isisdraw.html for academic use. Install the package by issuing Run command,C:IsisDraw23.exe. Launch IsisDraw to open the Draw window.
Retrieve nucleotide/amino acid sequence file in fasta format (remove the heading,line) or prepare text file of sequence in one-letter characters. Rename the file as seqname.seq. Prepare to import the sequence by checking(') Show sequence bond,Show leaving groups,Amino acid-/DNA-/RNA-1 letter from Sequence op-tions of Chemistry menu. Invoke File;Import;Sequences. Select Amino acid-/
DNA-/RNA-1 letter. The sequence(with bonds and leaving groups attached) should appear within the draw window. Mark the whole sequence with Select All from the Edit menu or by using Lasso tool. From the Chemistry menu,select Residue; Expand,the 1D (text string) sequence is transformed into the 2D molecular structure. Save it as struname.skc(e.g.,heptapeptide,STANLEY as stanley.skc) as shown in Figure 4.12.
DRAWING AND DISPLAY OF MOLECULAR STRUCTURES 63
Figure 4.12. Two-dimensional structure sketch with ISIS Draw. The two-dimensional structure of a heptapeptide, SerThrAlaAsnLeuGluTyr (without hydrogens), is sketched on the ISIS display window after importing the sequence file in text format (STANLEY).
To draw 2D molecular structures,the users should refer to ISIS Draw Quick Start(ISIS Draw help) for operating instructions. Draw the basic framework from template tools (horizontal template-tool icons and template pages from Template menu) and drawing tools (vertical drawing-tool icons). The small triangular sign on the drawing tool icons indicates additional tools available for selection. For example, pressing on the Single bond tool provides selection for drawing a double bond or a triple bond. Verify the chemistry of sketch by clicking run Chemisrtry inspector icon (or select Chemistry inspector from the Chemistry menu). To ensure uniform bond lengths and angles for the sketched molecule,select the molecule and then choose Object;Clean Molecule. Save the sketch as struname.skc.
To place template,click one of the template-tool icons or an atom/bond in the structural fragment/molecule on the template page and place the template anywhere inside the window by clicking an empty area. The template can be fused/attached to an existing bond/atom by simply clicking the bond/atom. To draw bonds,click a bond tool (Single/double/triple bond or Up wedge/down wedge/either/up bond/
down bond),and then click the drawing area or drag the mouse from an existing atom to add a bond. To sprout a bond from an atom,click a bond tool and then click the atom. To draw chain in one direction/ring of specific shape,click chain/multibond tool. To draw atoms,click Atom tool and enter atom symbol or choose one from the drop-down list. Arrow tool provides options for drawing a variety of arrows (e.g.,unidirection,equilibrium,double-head,and electron-shift arrows,etc.,after pressing arrow tool then choosing one of the arrows) for chemical reactions. Use Lasso select tool to select any structure/structural component for
64 MOLECULAR GRAPHICS: VISUALIZATION OF BIOMOLECULES
Figure 4.13. Home page for an access to TOPS cartoons.
editing or relocating. To delete atom/bond/object one at a time,click Eraser tool and then the atom/bond/object. Text tool appends text description to the structures/
reactions.
To search for TOPS cartoons at http://tops.ebi.ac.uk/tops/,select Browse the Atlas of topology cartoons and Browse HTML page version to open the query form (Figure 4.13). Enter PDB ID on the Protein code query box (Chain query box can be left blank). The search may request a choice of the chain (if more than one chains are available) and returns TOPS atlas information listing the protein of your choice and representative protein in atlas. Click to view the TOPS cartoon(s) of the representative protein. Right click on the diagram to save the TOPS cartoon as cartoon.gif.
4.3.3. Display of 3D Structures with Molecular Graphics Programs
For the 3D view of ISIS/Draw,ACD/3D Viewer Add-in can be installed. Retrieve ACD/3D Viewer for ISIS/Draw from http://www.acdlabs.com/downloar/
download.cgi and installed it as an Add-in according to instructions. To view 3D structure which is opened/sketched on the ISIS/Draw window,select ACD/3D Viewer tool from Object menu to open ACD/3D Viewer window and subsequent display of the 3D structure. The 3D structure can be optimized and can be saved only in .s3d format,which is not recognizable by other modeling packages.
The 2D structures of ISIS draw can be transformed into the 3D structures by WebLab Viewer Lite,which can be downloaded free for academic use from Accelrys Inc. at http://www.accelrys.com/viewer/viewlite/index.html. Select Download
View-DRAWING AND DISPLAY OF MOLECULAR STRUCTURES 65
Figure 4.14. Conversion of 2D structure into 3D structure. The 2D structure file from ISIS draw (stanley.skc) is converted into the 3D structure with WebLab Viewer Lite. It should be noted that the atomic coordinate file does not contain ATOM columns with residue ID.
lite to register and download. Open the file by selecting MDL (*.skc); the 2D structure (e.g.,hexapeptide,stanley.skc) is converted into the 3D structure (Figure 4.14) whose coordinate file can be saved as struname.pdb (e.g.,stanley.pdb).
Alternately,the commercial molecular modeling software programs such as ChemOffice(http://www.camsoft.com) can be used. The ISIS draw in sketch format (struname.skc) is first converted to ChemDraw format (struname.cdx),which is then transformed into 3D structure(struname.c3d) with Chem 3D (Chapter 14) and saved as PDB format(struname.pdb).
The common atomic coordinate files for 3D structure in biochemistry is PDB format. The pdb files of polysaccharides,proteins,and nucleic acids can be retrieved from the Protein Data Bank at RCSB(http://www.rcsb.org/pdb/). On the home page (Figure 4.15),enter PDB ID (check the box ‘‘query by PDB id only’’) or keywords (check the box ‘‘match exact word’’) and click Find a structure button. Alternatively, initiate search/retrieval by selecting SearchLite. On the query page,enter the keyword (e.g.,the name of ligand or biomacromolecule) and click Search button.
Select the desired entry from the list of hits to access Summary information of the selected molecule. From the Summary information,select Download/Display file and then PDB Text and PDB noncompression format to retrieve the pdb file. In order to display 3D structure online,choose View structure followed by selecting one of 3D display options. The display can be saved in .jpg or .gif image format.
Most of molecular modeling software programs accept the pdb files (struname.pdb). RasMol,which is one of the most widely used molecular graphics freeware,can be downloaded from http://www.umass.edu/microbio/rasmol/
66 MOLECULAR GRAPHICS: VISUALIZATION OF BIOMOLECULES
Figure 4.15. Home page of PDB at Research Collaboratory for Structural Bioinformatics.
index2.htm. In addition to PDB(struname.pdb or struname.ent) file,RasMol also read Alchemy,Sybtk MOL2,MDL mol,CHARMm,and MOPAC files. Launch RasMol(double click rswin.exe or rw32b2a.exe) to open the display window. Open the pdb file from File menu. The 3D structure can be displayed as wireframe, backbone,sticks,spacefill,ball and stick,ribbons,strands and cartoons(Figure 4.16).
The display can be exported as bmp,gif,epsf,ppm,and rast graphics.
KineMage (kinetic image) is an interactive 3D structure illustration software that can be downloaded from http://orca.st.usm.edu/:rbateman/kinemage/. It is adapted for the structure representation of biological molecules by many biochemi-cal textbooks. The program consists of two components: PREKIN and MAGE. The PREKIN program interprets struname.pdb file to kinemage struname.kin file that is then displayed and manipulated with the MAGE program. To start the PREKIN program,click Proceed to enter an output file name. This opens a dialog box,
‘‘Starting ranges.’’ Accepting the default(Backbone browsing script) saves the script producing C,disulfides for all subunits in the file to struname.kin. To start the MERGE program,click Proceed and select Open new file from File menu to open struname.kin with three windows(caption,display,and text). The 3D structure with connected series of alpha carbons is shown in the display window. To highlight the secondary structures,choose Selection of build-in scripts from the dialog box,
‘‘Starting ranges,’’ to open Build-in scripts box. Select ribbon: HELIX, SHEET from pdb to save as ribbon.kin. The MERGE program opens ribbon.kin as shown in Figure 4.17.
Cn3D is a molecular graphics program that interprets structure files in MMDB (ASN.1) format (struname.val or struname.cgi) of Entrez/MMDB (Wang et al.,
DRAWING AND DISPLAY OF MOLECULAR STRUCTURES 67
Figure 4.16. Graphic display of 3D structure with RasMol. The display shows the 3D structure of liver alcohol dehydrogenase complex (6ADH.pdb) with two subunits and bound NAD>. The protein molecule is visualized with RasMenu.
Figure 4.17. Graphic display of KineMage in ribbon representation. The Cchain of hen’s egg-white lysozyme (1LYZ.kin derived from 1LYZ.pdb) is displayed in ribbons showing secondary structure features.
68 MOLECULAR GRAPHICS: VISUALIZATION OF BIOMOLECULES
Figure 4.18. Graphic display of macromolecular interaction with Cn3D. The display window of Cn3D illustrates the 3D structure of Zn finger peptide fragments (secondary structure features) bound to the duplex oligonucleotides (brown backbone). Zinc atoms are depicted as spheres. The alignment window shows the amino acid sequence depicting the secondary structures (blue helices and arrows for -helical and -strand structures, respectively) and interacting (thin brown arrows) residues. The structure file, 1A1K.val, is derived from 1AAY.pdb.
2002). Cn3D can be accessed online from Entrez at http://www.ncbi.nlm.nih.gov/
Entrez or downloaded from http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.html to be installed and used locally. This program accepts the coordinate file in MMDB ASN.1 format(*.val or *.cgi) but can be saved as PDB format (*.pdb) or KineMage format(*.kin). It is a structure-sequence interactive program. In addition to structure view in the Graphic window,Cn3D provides the sequence view in the Sequence window which can be activated via View;Sequence Window (Figure 4.18). This enables the user to view the structure and sequence interactively. Select the region of protein molecule in the structure view by double clicking of the mouse,and both the region in the Graphic window and the amino acid residue(s) in the Sequence window are highlighted(yellow) and vice versa. Using this tool,it is possible to map the interaction sites between structure and sequence. The view menu of the Graphic window also provides an option for Animation,and the Sequence window offers options for alignment(Align menu). The style menu enables the user to display the structure in secondary structure,wireframe,neighbor,tabular,spacefill or ball-and-stick modes(Figure 4.11).