Piero Procacci1 z Tom A. Darden2, Emanuele Paci3 y Massimo Marchi3
1 Centre Europeen de Calcul Atomique et Moleculaire (CECAM) Ecole Normale Superieure de
Lyon 46 Allee d'Italie, 69364 Lyon, FRANCE
2 National Institute of Environmental Health, Sciences Research Triangle Park, NC 27709 3 Section de Biophysique des Proteines et des Membranes, DBCM, DSV, CEA,
Centre d'Etudes, Saclay, 91191 Gif-sur-Yvette Cedex, FRANCE
zPermanent address: Dipartimento di Chimica, Universita degli Studi di Firenze, 50120 Firenze,
Italy
yPresent address: Laboratoire de Chimie Biophysique, Institut Le Bel, Universite Louis Pasteur,
67000 Strasbourg, France
Author to whom all correspondence should be addressed
As we shall see, many ORAC 's commands allow the opening of external les. No unit
number needs to be provided asORACopens sequentially the required les assigning at each
le a unit number according to their order of occurrence in the input le.
A. The ORAC Input Files: sys.mddata
Since ORAC is designed not as a modeling interface, but as a molecular dynamics
pro-gram which performs computing intensive tasks, it works only in a non{interactive batch mode: The user must provide an input le (hereafter referred to as sys.mddata) containing commands for execution, and a series of auxiliary les the names of which are also provided in sys.mddata. At execution time, the input le is read from standard input in free format. This means that each input line is read as a character string and parsed in the composing substrings, a series of characters separated by blanks or commas. Each substring represents an instruction and is interpreted by specic routines. A line having the \#" character in column 1 is always considered a comment.
ORAC's instruction set has been designed to include three dierent kinds of instructions:
environments, commands and subcommands. A le sys.mddata is made out of a series of environments, the order of which being unimportant, including a series of commandswhich in turn might use a few subcommands. The environment name is a string beginning always with the& character followed by capital letters. Each environment ends with the instruction &END. Environments are reminiscent of the fortran namelist, but have not been programmed
as such and are portable. Command names are characters strings containing only capital letters. Each command reads a variable set of parameters which can be characters and/or numbers (real or integer). Moreover, commands composed of more than one input line (structured commands) also exist. Each structured command ends with the instruction END
and allows a series of subcommands in its inside. Subcommands are always in lower case and can read substrings containing characters and/or real or integer numbers.
In the following paragraphs we will discuss briey the basic structure of the input to
ORAC . This is intended to be a very concise and by no means exhaustive guide toORAC 's
environments directives. More details about the specic environments commands and their syntax, including the syntax of the auxiliary topological and potential les, will be found in the subsequent sections where a practical example will be illustrated. For a completedescrip-tion of all supported environmentsand commands the reader is referred to theORACmanual
[48].
In general, the environments specied in sys.mddata are roughly classied into three categories:
The Description Environments contain commands referring to the structure of the
system and to the interactions potential. These commands may instruct ORAC , for
example, to use a particular potential form, to adopt a simulation box of a given size and shape, to insert solvent molecules, to read the potential and the topology parameters les, to add extra topology etc..
With the Simulation Environments commands, one can choose the kind of simulation
to perform, e.g., the temperature and the pressure of the systems, the integration scheme to be used etc..
The Output Environments commands control the output of the simulation such as
properties calculation, binary or ascii history les, restart les.
1. Description Environments
To this category belong the environments&SETUP, &SOLUTE, &SOLVENT, &PARAMETERS.
In &SETUP the box size is specied by appropriate arguments to the commands CELL and CRYSTAL. The PDB lename containing the solute and/or solvent coordinates is also entered
in &SETUPthrough the commandREAD PDB.
Commands dening certain potential options for the solute molecules can be specied in the &SOLUTE environment. Examples of such commands are STRETCHING which allows
for bond stretching, I-TORSION which denes the functional form of the improper torsion
potential, AUTO DIHEDRALwhich is used when all the possible proper torsions of the solute
molecules are to be included in the potential, etc. The command INSERT is also available
from this environment, and it is used to ll the simulation box with solvent. Obviously, when INSERTis specied, the environment&SOLVENTmust also be present.
All the parameters needed to completely dene the atomic and geometrical structure of the solvent molecule and its LJ and electrostatic interaction potentials can be specied in the environment&SOLVENT. AlthoughORAC can not generate solute molecules coordinates,
simulations of systems containing only solvent molecules can be run by the program with-out the need to read any external coordinate les. To do so, in addition to dening the appropriate commands in the&SOLVENTenvironment, the crystal basic cell and the number
of replicas to be included in the MD simulation must be constructed using the commands
CELL and CRYSTALof the environment&SETUP. Alternatively, the solvent may also be read
from the PDB le specied in &SETUP.
The &PARAMETERSenvironment has been designed to dene a series of operations strictly
connected with the topology of the solute molecules. Thus, &PARAMETERSmust appear in
the input le only if the environment&SOLUTEis also present. In&PARAMETERSthe primary
structure of the solute is specied in the structured commandJOINas a sequence of molecular
units or residues forming the solute molecules, e.g. the amino acid residues. Each unit is labelled by a name to which corresponds topology data (atomic labels, connectivity, etc.) in an ascii topological le, hereafter referred to as eld.tpg. The name of the le eld.tpg is specied in another &PARAMETERcommand READ TPG ASCII. In the le eld.tpg the atomic
charges on each unit of the solute are also dened. The solute potential parameters, which include, stretching, bending, proper torsion, improper torsion and non{bonded parameters, are read in from the other ascii parameters le, hereafter referred to as eld.prm. The structure and format of both theeld.tpg andeld.prmles will be examined later on in this section. The name of the le eld.prm can be specied in the command READ PRM ASCII field.prm. While the intra{residue topology is dened in the eld.tpg le, inter{residue
topology, such as disulphur bonds and the added topology (bending, and torsions involving the two sulphur atoms), can be specied in &PARAMETERS using the structured command ADD TPG.
2. Simulation Environments
To this category belong the environments &SIMULATION, &RUN, &INTEGRATOR, &POTENTIAL. In &SIMULATION, parameters and keywords connected to the type of
simu-lation to be performed must be specied. Temperature and pressure are entered in this environment. A regular MD is performed only if the commandMDSIM is entered. If, along
with MDSIM, other commands such as STRESS and/or CONST TEMP are given, an extended
system simulation in the specied ensemble is performed. Simple minimizations (steepest descent only) are done if the commandMINIMIZEis present.
The &RUNenvironment contains commands specifying actions to be taken during the MD
run. For examples, the length of the rejection phase, when velocities are scaled, is specied byREJECT, the length of the production run byTIME, the printing interval for instantaneous
energies byPRINT, etc. Moreover,&RUNincludes the commandCONTROLwhich species if the
simulation must be run either from new input coordinates or from a restart le containing coordinates and velocities generated from an older simulation.
The integration algorithm to be used during the simulation and the time step size are specied in the &INTEGRATORenvironment. In case the selected integrator was r{RESPA,
the command TIMESTEP reads the largest timestep of the algorithm, namely t
h in Eq.
IV.51. Only two mutually exclusive commands can be provided to select the integrator:
SINGLE STEP, and in this case a conventional single step MD is performed, or the structured
commandMTS RESPAto perform a NVE ensemble simulation with the r{RESPA integration
algorithm. SINGLE STEPmustbe specied if the dynamics is carried out at constant
temper-ature and/or pressure, i.e. if CONST TEMP and/or (ISO)STRESS are given in &SIMULATION.
The structured command MTS RESPA denes the parameters of the r{RESPA integrator, namely the radii and healing lengths of the short, medium and long range shells for the non{bonded interactions, and the time steps dened in Eq. IV.51. MTS RESPAincludes also
a subcommand dening the reference system associated with the calculation of reciprocal space contribution to the SPME or to standard Ewald summation.
The &POTENTIALenvironments is used to dene various parameters related to the non{
bonded potential interaction and aecting both solute and solvent molecules. &POTENTIAL
allows commands to set up cuto schemes (commandsGROUP CUTOFFand EWALD), to change
the direct lattice cut-o (commands CUTOFF, GROUP CUTOFF, to modify the radius of the
Verlet Neighbor list and the frequency of its evaluation (command UPDATE) or the
param-eters of the linked cell neighbor list (command LINKED CELL). Also, the reciprocal space
convergence parameter of the Ewald sums ( in Eq. III.20) must be specied here
(com-mand EWALD), along with the grid constants K 1
;K 2
;K
3 and the order
n of the B-spline
interpolation if SPME is used.
3. Output Environments
To this category of environments belong &INOUT, &PROPERTIES.
The &INOUT environments handles the output operations carried out by ORAC on les
other that the standard output. Commands are provided that instructs the program to save coordinates to a le and how frequently this must happen. The binary trajectory les can be written onto sequential (commandDUMP) or direct access les (commandDUMP RAND). ORACprovide, of course, the possibility of writing down coordinates le in PDB ascii format.
This is accomplished by the command ASCII.
The &PROPERTIESdirective is used to compute statistical properties at the same time as
the simulation is being carried out. ORAC can compute radial distribution functions,
struc-ture factors (GOFR, SOLVENT GOFAR), velocity autocorrelation functions (VACF, MTS VACF),
infrared spectraMTS SPECTRA, root means square deviation from a reference structureX RMS.
The &PROPERTIESenvironment and the corresponding readproperties.ffortran source,
pro-vides a simple framework to the programmer for adding the command for the calculation of a new \property". Interfacing an user developed property computation to ORAC requires,
in principle, a very limited programming eort. This is discussed furtherly in the manual [48].
B. ORAC Auxiliary Files
Compared to molecular liquids, simulating proteins, or any complex biomolecule, poses additional problems due to the molecules' covalent structure, the knowledge of which must preceed any evaluation of the potential energy of the system.
The covalent topology of any complex biomolecule can be computed from the structure of its constituent residues. In ORAC , to curtail the complexity of the input data, only
minimal information on each residue needs to be provided, such as the constituent atoms, the covalent bonds and, in case of polymers or biopolymers, the terminal atoms used to connect the unit to the rest of the chain. In addition, in order to assign the correct potential parameters to the bonds, bendings and torsions of the residue, the type of each atom needs to be specied. Finally, to each atom type must correspond a set of non{bonded parameters. When the bonding topology of the dierent residues contained in the solute molecule(s) is known, these units are linked together according to their occurrence in the sequence. In this fashion the total bonding topology for the molecule is obtained. From this information, all possible bond angles are collected by searching for all possible couples of bonds which share one atom. Similarly, by selecting all couples of bonds linked among each other by a distinct bond, all the torsions can be obtained.
The following sections sketches the format of the topology and force eld parameters les read byORAC(eld.tpgandeld.prm, respectively). The topology and force eld parameters
les are strongly dependent from each other and together fully dene the molecular force eld of the solute molecule(s).
C. ORAC Auxiliary Files: eld.tpg ORAC is instructed to read the topology le by the command READ TPG ASCII eld.tpg
of the&PARAMETERSenvironment. Fileeld.tpgcontains information on the series of residues
needed to dene the topology of the actual solute molecules. This information is provided through a series of free format keywords and their corresponding input data as done in the main input le sys.mddata. In this way, ORAC reads the solute connectivity, the atomic
charges, the atomic labels corresponding to those found in the PDB le, and the atomic types according to the chosen force eld (i.e. AMBER, CHARMM or others). Moreover, the atomic groups and the improper torsions are also dened.
As for sys.mddata, the le eld.tpg is parsed and the composing substrings of each line are interpreted. Comment lines must have the \#" character in column 1. Each residue or unit denition starts with the keyword
RESIDUEresidue name
where residue name is a character label which must match labels found in the command
JOINof the environment&PARAMETERS, and must end with the keywordRESIDUE END. These
residue delimiting keywords are the only one in capital letters in eld.tpg. In Fig. 3 we give two example of \residue" denition, i.e. the alanine N terminus and the molecule of acetone, coded with the strings ala-hand aceto, respectively.
Atom type denitions and charges are read in between the keywords atomand end. For
each atom three strings must be entered: the PDB atom label, the potential type according to the selected force eld as specied ineld.prm (see later on in this section) and the point charge in electron units. Groups are composed of all atoms entered between two successive
groupkeywords. The PDB labels must be all dierent from each others since they are used
to establish the topology and connectivity of the solute.
In the alanine example, the atomic types and the charges are those of the AMBER force eld. For acetone, instead, the atomic types are still those of AMBER, while the charges are obtained from the MOPAC program [49] with the ESP tting procedure [50]. Four groups are dened for alanine and three groups for acetone. When using r{RESPA with SPME the groups should be dened as small as possible (ideally they should be composed of two, three atoms) to enhance the stability of the fast integrator. Dening large groups, on the other hand, allows substantial saving of memory, since it decreases the size of the nested Verlet neighbor lists used by the r{RESPA algorithm. Hence, in selecting the group size for large biological systems a compromise must be made.
The bond connectivity is specied between the keywords bondandendby providing the
series of bonds present in the residue. Each bond is specied by two atom labels correspond-ing to the atoms participatcorrespond-ing to the bond. In the example in Fig. 3, residue ala-h has
eleven bonds while nine bonds are found inaceto.
All possible bendings and proper torsions are computed byORACfrom bond connectivity
are used to impose geometrical constraints to specic quadruplets of atoms in the solute. In modern all{atoms force elds, improper torsions are generally used to ensure the planarity of an sp
2 hybridized atom. The convention in
ORAC to compute the proper or improper
torsion dihedral angle is the following: If r 1 ;r 2 ;r 3 ;r
4 are the position vectors of the four
atoms identifying the torsion, the dihedral angle is dened as
=ar cos " (r 2 ,r 1) (r 3 ,r 2) jr 2 ,r 1 jjr 3 ,r 2 j (r 3 ,r 2) (r 4 ,r 3) jr 3 ,r 2 jjr 4 ,r 3 j # (VI.1) In case of improper torsions involving a terminal atom a particular quadruplet of atoms must be selected. For instance, the alanine N{terminus is connected to the peptide chain from only one end. One improper torsions must then be specied involving the amino nitrogen (n+) of the subsequent residue to ensure planarity of the peptide planes.
There are other, less important, topology directives ineld.tpg which allows, e.g., to omit specic bendings, dene hydrogen bonds, etc. For a complete description we again refer to the manual [48].
D. ORAC Auxiliary Files: eld.prm
While the leeld.tpg provides electrostatic point charges for each atom of the residues,
ORAC reads the potential parameters for the bonded and the Lennard{Jones interactions
from the le eld.prm. The parsing of this additional auxiliary le is carried out in the same way as for eld.tpg and sys.mddata. In general, the parameters for a given interaction are listed between two keywords: A rst keyword identifying the type of interaction (e.g.
BOND, BENDING, etc.) and the keyword END. The order of the type of interactions and their
associated parameters in the auxiliary le is unimportant.
Bond stretchings (interaction keyword BOND) are entered specifying on a line, the two
atoms involved, followed by a numeric string providing, in this order, the force constant K r
and the equilibrium bond distancer
0 (see Eq. II.7). For example: BOND
....
# AMBER carbonyl stretching in Kcal and A
c o 570.00 1.229
.... END
For a bending (interaction keyword BENDING), three atoms must be entered along with
the force constant and the equilibrium bending angle in this order, (K and
0 in Eq. II.8).
In the atom sequence, the vertex atom is given as second while the order of the other two atoms is immaterial:
BENDING ...
# h20 bending in Kcal and rad hw ow hw 100.0 104.52 ...
END
For each proper torsion (interaction keywordTORSION PROPER) four atoms must be
pro-vided, the second and the third atoms being those on the dihedral angle axis. After the atom sequence the barrier height k
, the angle
and the integer n (see Eq. II.13) must
follow. For example:
TORSION PROPER ...
# Kcal/mole Gamma n
x ct ct x 0.1556 0.0 3
... END
The symbolx is the wild card symbol and is representative of any atomic type.
In the improper torsion potential (interaction keyword TORSION IMPROPER) again the
quadruplet of the atoms involved must be specied. The CHARMM harmonic torsional form (C
1 = 1 in Eq. II.13) or the AMBER form ( C
1 = 0 in Eq. II.13) are assumed if two or
three additional numeric characters are provided, respectively. The following is an example of the two possibilities:
TORSION IMPROPER ...
# for this torsion choose AMBER
x x n h 1.00 180.0 2
# for this instead choose CHARMM
cpb cpa nph cpa 20.80 0.0
.... END
Finally, the Lennard{Jones non{bonded atomic parameters are specied by entering 6 char-acters: The rst is the atomic type according to the chosen force eld, the second and
third are the Rmin 1 and
constants, the fourth and fth are the 1-4 interaction Rmin and constants, and the last is the atomic mass. To obtain cross interaction potentials, the
Lennard{Jones parameters are combined according to standard sum rules (see Eq. II.3). For the Lennard{Jones 1-4 non{bonded interactions, the potential function may be multi-plied by a so{called 1-4 factor, usually less or equal to 1. If zeros are entered in the fourth and fth elds, the 1-4 factor is set by the command LJ-FUDGE on environment&SOLUTE.
If some specic Lennard{Jones 1-4 interactions need to be multiplied by some alternative constants the resulting Lennard{Jones constants must be entered in the these elds. In the following, examples of the various alternatives are shown:
NONBONDED MIXRULE ...
# type o has the 1-4 factor provided by LJ-FUDGE in &SOLUTE o 1.661 0.210 0.000 0.000 16.0
# type oa is same as o but has the 1-4 factor equal to 1 oa 1.661 0.210 1.661 0.210 16.0
# type ob is same as o but has a different 1-4 potential ob 1.661 0.210 1.861 0.105 16.0
.... END
We stress that a 1{4 factor might also be specied for the 1{4 electrostatic interaction by means of the commandQQ-FUDGEin the&SOLUTEenvironment.
VI I. A TYPICAL EXAMPLE: BPTI IN WATER SOLUTION
ORAC is a general MD code which can simulate a variety of systems ranging from
sim-ple homogeneous uids and solids to comsim-plex heterogeneous systems. Here, we provide an example run for a solvated biomolecule. This is the type of systems that ORAC has been
designed to simulate and for which the highest performance can be achieved. We chose to simulate the typical guinea pig of proteins simulation, namely, the Bovine Trypsin Pancre-atic Inhibitor (BPTI), in water and at 300 K. We start our simulation from the available experimental X-ray structure of the orthorhombic type I crystal at low temperature [51]. In
1
Rmin corresponds to the minimum of the Lennard{Jones potential and is related to the
pa-rameter by = 2Rmin2 ,1=6
the following sections we go through all the steps that are needed to prepare the system for a typical MD run and to run the simulation itself. In particular, we discuss the following sequential steps:
Step I: Minimization of the protein structurein vacuum using the AMBER force eld
by means of r{RESPA MD at 20 K.
Step II: Solvent (water molecules) are added into the simulation box. The solvent
structure is relaxed at 300 K with a short r{RESPA simulation.
Step III: A few ps of molecular dynamics simulation at constant pressure and at 300
K is performed in order to nd the equilibrium density at P=1 MPa.
Step IV: A simulation of the hydrated BPTI at the equilibrium density at 300 K is
performed using NVE r{RESPA Molecular Dynamics
The discussion that follows is propaedeutic to the program usage.
A. Step I: Starting a Run from the X-ray PDB le
Our example run was started from the X-rays coordinates of the native bovine pancreatic trypsin inhibitor taken from the protein data bank at the Brookhaven national laboratory, le
pdb1bpi.ent. The PDB coordinate le contains 58 residues for a total of 460 non hydrogen protein atoms, a phosphate anion (5 atoms) and 167 water oxygens. Although ORAC is
able to read the PDB le as is, in le pdb1bpi.ent the GLU7 and ARG53 residues, and the phosphate anion are given in two alternative conformations named A and B. Thus, we retained only the \B" conformation and erased the coordinates of the \A" conformation. These changes done, the input le sys.mddatais given in Fig. 4.
1. Description of the Input File
Although, as we saw in the previous section, the order of the environment commands is immaterial, we chose to order them according to the same arbitrary subdivision and order used before. Thus, thedescription environmentare given rst. Since at this stage no solvent is present, only the environments&SETUP, &SOLUTEand &PARAMETERSare specied.
a. &SETUP In &SETUP only two commands are entered: CRYSTAL, where the simulation
cell parameters (a, b, c, , and discussed in Sec. IIB) are provided, and READ PDB bpti xray.ent, wherebpti xray.entis the lename of the initial solute coordinates obtained
from the Brookhaven PDB.
b. &SOLUTE The &SOLUTE environment contains ve commands: i) STRETCHING
pre-vents ORAC from enforcing constraints on bonds which are in conict with the r{RESPA
integrator to be used. ii) Two commands are used to dene the 1{4 multiplicative factors for electrostatic, QQ-FUDGE, and Lennard{Jones, LJ-FUDGE, interactions. These are discussed
in section VIID. iii) RESET CMshifts the origin of the simulation box to the center of mass
of the solute. iv) The command SCALE CHARGESinstructs ORAC to distribute any excess
charge2 over the rst two solute molecules, i.e. the BPTI and the phosphate ion. Since there
is no&SOLVENTdirective, the 167 crystallographic water molecules along with the phosphate
anion are considered as part of the \solute".
c. &PARAMETERS In the &PARAMETERS environment we enter the lenames of the
topology and parameters auxiliary les by using the commands READ ASCII TPG and READ ASCIIPRM, respectively. The two les amber95.tpg and amber95.prm corresponding
to the AMBER force eld are provided with the ORAC distribution les. If no hydrogen
coordinates are provided in the PDB le, as is generally the case, ORAC generates the
hy-drogen atoms according to simple geometric rules. The structured command JOIN is used
to dene the residues sequence given in the PDB le. We notice that n identical and con-secutive \residues" like the water molecules hoh can be specied with the format hohn.
In addition, by entering the subcommand BOND of the structured commandADD TPG, three
extra bonds corresponding to the three disulphur bridges (namely CYS5{CYS55, CYS14{ CYS38, AND CYS30{CYS51) are added. Finally, the binary le bpti amber95 osf.prmtpg
containing the full topology and interaction parameters of the system is written by the commandWRITE PFR BIN. The expensive computations of the topology and parameters le
can be avoided in subsequent runs by reading the le created by WRITE PFR BIN with the
2Using the standard protonation at PH 7 for his,glu,asp,arg,lys and charge -3e for the phosphate
anion, the system has a total charge of +3e.
commandREAD PER BIN.
d. &SIMULATION In the example,&SIMULATIONindicates that a normal MD simulationis
to be run at the temperature of 20 K with an oscillation band width of10K. By specifying
MDSIM in &SIMULATIONand by selecting r{RESPA as the integrator in&INTEGRATOR, the
minimization is run with a r{RESPA NVE MD algorithm rather than using theORAC min-imization algorithm, i.e. the moduledrvmin3.
e. &INTEGRATOR The rst command entered in the &INTEGRATOR environment is TIMESTEP. Since r{RESPA is used as the integrating algorithm, the time step given in input
to the commandTIMESTEPcorrespond to the thtime step in Eq. IV.51. The parameters of
the integration algorithm are given in the structured commandMTS RESPA. In the example,
the r{RESPA multiple time steps scheme includes ve time steps of which two time step involving bonded forces (step intra) and three steps involving non{bonded forces (step nonbond). The rst eld after the subcommands represents the integers, n0;n1;m;l;h
in Eq. IV.51 associated with each time step. Therefore, in the example we have that: th = 16:0=1 fs, tl= 16:0=4 = 4:0 fs, tm = tl=4 = 1:0 fs, n1 = m=2 = 0:5 fs and
n0 = n1=2 = 0:5=2 = 0:25 fs. The long, mediumand short range potentials (Vh;VlandVm
in Eqs. IV.37 through IV.39) are dened sequentially by the commandsstep nonbond. For
each of these one real number, corresponding to the shell radius r, must be entered. For each shell radius, two more optional parameters can be specied, i.e. the corresponding healing length and the neighbor list oset r, the neighbor radius for each shell being dened asrlist=r++r.). The values for the healing lengths and the neighbor list osets
given in this example have been tested for an energy conserving 5 time steps algorithm in solvated protein at 300 K [12]. The nal keyword reciprocalin the second subcommand step nonbondindicates that the reciprocal lattice sum must be computed during the l-th
time step. The option very cold start is used to prevent the simulation from crashing
due to an initial system very far from equilibrium. The argument following the command
very cold startis the maximum allowed increment per step of a Cartesian coordinate in 3To use
drvmin,MINIMIZEshould have been entered in place ofMDSIM along with the choice of a
single time step integrator
unit of A. Since during minimization the total system energy does not need to be conserved, the parameters of the r{RESPA algorithm can be selected with more freedom.
f. &POTENTIAL In the &POTENTIALenvironment the commandEWALDspecies that the
SPME method will be used in the simulation. The value of the convergence parameters
is given in A
,1 and must follow the keyword
pme. The subsequent four integers are
the constants K 1
;K 2
;K
3 (see Eq. III.30) determining the neness of the grid in reciprocal
space, and the order n of the B-spline interpolation. In this example the relative accuracy jE,E
exact j=E
exact of the Coulomb energy is in the order of ' 10
,4. The SPME reciprocal
lattice contribution V
q r is assigned to the
l shell by the structured command MTS RESPA in
&INTEGRATOR. FollowingEWALD, the commandUPDATEindicates that the Verlet neighbor list
is to be recalculated every 40.0 fs with a cuto 1.5 A larger than the potential cuto. In this example, the size of the system is not suciently large to make it convenient to use the linked-cell neighbor lists (accessed with the command LINKED CELL) rather than the more
conventional Verlet lists.
g. &RUN The rst command of the environment &RUN, CONTROL 0, species that the
simulation is not commenced from a restart le and that the velocities must be initialized from scratch. The subsequent REJECT 496.0, indicates that 496 fs of simulation with
ve-locity rescaling will be carried out. Veve-locity rescaling will occur each time that the system temperature goes beyond the oscillation bandwidth of 10 K dened in the environment
&SIMULATION. The commandTIME is used to dene the length of the production run with
no velocity rescaling. Since Step I is a minimization, this length is set to zero. The last commandPRINT 2.0indicates that intermediate results are to be written every 2.0 fs.
h. &INOUT The output generated byORAC is specied in the environment&INOUTand
consists of a binary restart le printed every 248 fs and of an PDB le printed every 496 fs, i.e only at the end of the run. While the restart le is rewound at each print, the PDB le is not and congurations accumulate during the run.
2. Results and Output from the Run
At execution time, if syntax errors or incompatible options are detected in sys.mddata,
ORAC aborts with an error message before attempting any calculation. If no error is found,
the program builds up the molecules of the system using the sequence specied in the structured commandJOINand the topology denition given in the topology leamber95.tpg.
As the next step, ORAC tries to match bonds, bends, proper and improper torsions with the potential parameters specied in amber95.prm. If matching fails ORAC stops with an error message. Finally, before the simulation can begin, the PDB le bpti.pdb is read in. This preliminary phase, which constructs the system topology, the parameters arrays and corresponds to the execution of the modules start, read input, join and bldbox, may take several minutes for large size biomolecules. The successful completion of this phase is signaled by the printing of a synthetic system description and topology information. For our example, the following output is obtained:
***************************************************************
* Solute TOPOLOGY List *
* *
* 1398 Atoms 1244 Bonds 1244 FLexible Bonds * * 0 Rigid Bonds 1799 Angles 2732 P-Torsions * * 199 I-Torsions 2347 1-4 Inter. 524 Atomic Groups *
* *
***************************************************************
Subsequently, ORAC enters the routine mtsmd instructed by the directive MTS RESPA in &INTEGRATORand the simulation starts.
When running with r{RESPA, at the very beginning of the run, ORAC prints out an estimated CPU time for the scheduled run. The cost per force call for each of the potential contributions in Eqs. IV.35 to IV.39 is also printed. This output helps in tuning the eciency of the integration schemes. For the simulation length specied in the input le we obtained the following output on a DEC alpha 3000/800workstation4:
CPUtime for m-contribution: RECP = 0.00 DIR = 0.137 TOT = 0.137
CPUtime for l-contribution: RECP = 0.37 DIR = 0.444 TOT = 0.811
CPUtime for h-contribution: RECP = 0.00 DIR = 0.683 TOT = 0.683
THEORIC SPEED UP FOR NON BONDED PART = 4.27
4The DEC
alpha 3000/800 workstation runs at 30 MF(Megaops) per second for the Linpack
benchmark.
CPUtime for n1-contribution = 0.0654
CPUtime for n0-contribution = 0.0215
OVERALL THEORIC SPEED UP = 11.48
Expected CPU time for the RUN: 0 hours and 4 min
Expected average time per M step: 0.60 sec.
Expected average time per femto : 0.60 sec.
Thus, the run is expected to last for 4 minutes. This estimate is quite accurate. The eective CPU time at the end of the simulation was 306 seconds. While the simulation is running, intermediate results are printed to standard output. These include various energies (in KJ per mole of \solute" thus encompassing all the 1398 atoms in the simulation box)
and temperatures (in K). The following is an example of the output:
Tstep = 494.000 Total = -15684.279 TotPot = -16053.392 Coulom = -17820.143 Recipr = -10938.008 NonBond = -18188.098 Ener14 = 1009.599 Bonded = 2134.706 Stretch = 348.402 Angle = 616.197 I-Tors = 29.845 P-Tors = 1140.262 TotTemp = 21.2 RotTemp = .000E+00 TraTemp = .000E+00
Tstep = 496.000 Total = -15683.211 TotPot = -16126.250 Coulom = -17777.902 Recipr = -10938.008 NonBond = -18161.346 Ener14 = 1019.448 Bonded = 2035.096 Stretch = 265.103 Angle = 601.302 I-Tors = 29.942 P-Tors = 1138.749 TotTemp = 25.4 RotTemp = .000E+00 TraTemp = .000E+00
The meaning of the symbols in the output is self evident: Tstep is the instantaneous
simulation time in fs;Totalis the total energy;TotPotis the total potential energy,Coulom
is the electrostatic energy;Recipris the reciprocal SPME lattice energy;NonBondis the total
electrostatic + Lennard{Jones non{bonded energy. Ener14is the 1-4 non{bonded Lennard{
Jones interaction energy;Bondedis the total energy due to intra{molecular interactions, and Stretch,Angle, I-Tors,P-Tors are the stretching, bending, improper and proper torsion
contributions, respectively.
At the endo of the run, the last conguration is saved to both a binary restart bpti1.rst
le and to an ascii PDB lebpti1.pdb.
The next step consists in hydrating the protein. To do so, the simulation box containing the protein is lled with solvent molecules generated on a regular grid. Only molecules at a sucient distance from any protein atom are included. Subsequently, a short simulation of about 1 ps in the NVE ensemble at about 300 K is carried out in order to randomize the solvent molecules around the protein. To accomplish this task, the le sys.mdata of step I needs to be modied. We show in Fig. 5 the input le for Step II.
1. Changestothe InputFile
a. &SETUP The environment&SETUPis changed to: &SETUP
CRYSTAL 35.0 35.0 35.0 90.0 90.0 90.0 READ_PDB bpti1.pdb
INSERT 0.75
CELL sc 11 11 11 &END
Here, two new commandsINSERTandCELLare used. The real argument toINSERTspecies
the criterion for discarding overlapping molecules. A solvent molecules is discarded if the distance between any atom of the solute and that of any atom of the solvent molecule is
r isjp
<r adius( is +
jp)
; (VII.2)
where r adius is the argument to INSERT, r
isjp is the distance between the
is{th atom of
the solvent molecule and the jp{th atom of the protein, and is
;
jp are the corresponding
Lennard{Jones diameters. Trial and error has shown that a reasonable solvent density can be achieved with values of radiusin the range between 0.6 and 0.8 units. In this example it was set to 0.75.
CELL generates a periodic structure of solvent molecules with a simple cubic (keyword sc) repeating unit. This basic cell is repeated in the three directions 11 times (keyword 11 11 11) as to reproduce, approximately, the water density at 300 K. Body and face
center cubic cell could have also been chosen with keywords bcc and fcc, respectively.
Since the simple cubic lattice has one molecule per repeating unit, 113 = 1331 solvent
molecules are added to the simulation box which already contained 167 crystallization water molecules. The equilibrium density at 300 K will be obtained in step III when running the constant pressure simulation. The initial protein coordinates, read by the command
READ PDB bpti1.pdb, were obtained from the simulation described in Step I.
b. &PARAMETERS Since all topology and force eld information has already been
gener-ated in the previous step, the &PARAMETERSenvironment are changed to: READ_PFR_BIN bpti_amber95.prmtpg
This instructs ORAC to read the the complete topology of the solute from the binary le
bpti amber95 osf.prmtpg. In this fashion, the expensive computations of the protein topology and force eld parameters arrays are skipped.
c. &SOLVENT A new environment appears in the input le to signal that solvent
molecules are present in the system, namely:
&SOLVENT
ATOM o 1 P 16.0 0.0 0.0 0.0
ATOM h 2 P 1.0 0.81650 -0.57735 0.0 ATOM h 2 P 1.0 -0.81650 -0.57735 0.0 INTERACTION 1 3.1656 0.1554 -0.82
INTERACTION 2 1.6 0.0 0.41
STRETCHING 1 2 524.86 1.0
STRETCHING 1 3 524.86 1.0
BENDING 2 1 3 55.00 109.47
&END
The rst three instructions dene the coordinates of the atoms contained in each of the sol-vent molecules. The commandATOMexpects the atom symbol, type, rank and mass followed
by its coordinates. The atom rank informsORACif the site should be considered as primary
or secondary in the calculation of constraints (see Ref. [23]). Acceptable ranks are P or S
for primary and secondary atoms, respectively. The interaction atom type must be dened by INTERACTIONwhich associate Lennard{Jones parameters and charges to atomic types.
In the example parameters for the SPC water model are provided. Moreover, commands
STRETCHINGand BENDINGdene the intra{molecular parameters for the solvent molecule.
The parameters (in KCal/mole and A) are taken from the CHARMM force eld. Finally, we stress that without the environment&SOLVENTthe changes made to&SETUPwill produce
an error condition.
d. Additional Changes In the new simulation step the temperature is modied to 300 K and the rejection phase increased to 992.0 fs. Thus, the argument to the command
TEMPERATUREin &SIMULATIONis replaced with TEMPERATURE 300.0 20.0to raise the
tem-perature to 300 K with a oscillation band of 20. In addition, the commandMTS RESPA in
&INTEGRATORis changed by removing the keywordvery cold start. Finally, the rejection
phase was increased to by modifying the commandREJECT in&RUN to REJECT 992.0.
2. Resultsand Output for Step II
In step II, most of the computational time of the preliminary phase is spent in hydrat-ing the BPTI molecule. This operation involves the time consumhydrat-ing calculation of all the contacts between the protein and the solvent molecules. Compared to Step I, ORAC skips
the computation of the topology and force eld arrays needed for the simulation phase and instead reads the binary lebpti amber95.prmtpg generated in Step I.
In the examplethe protein hydration is completedwith the following message on standard output:
495 molecules over 1331 have been removed
This means that 836 water molecules have been left in the 42875 A3 cubic box. The total
number of atoms of the new system is nowN = 1398 + 8363 = 3;906.
ORACchanges the output of the intermediate results according to the simulated system.
Since in Step II the system consists of \solute" and \solvent" molecules, the printout of the intermediate results will have a dierent format that in Step I, namely
Tstep = 496.000 Total = -31480.790 SlvPot = -34304.498 SlvCoul = -39226.780 SlvRec = -9510.855 SlvReal = -29715.925 SlvInt = 4584.133 SltTot = -8586.383 SltPot = -14068.423 SltCoul = -18441.684 SltL-J = -954.148 SltHyd = .000 SltBond = 5327.410 SltStr = 1741.395 SltBen = 2079.249 SltItor = 131.662 SltPtor = 1375.104 S-SPot = -3097.657 S-SCoul = -4187.726 SltTemp = 314.651 SlvTrTem = 308.998 SlvRoTem = 321.402 TotTemp = 316.331
Here, the prex Slt in the output labels corresponds to energies or temperatures of the
\solute". On the other hand, solvent properties are indicated by the prex Slv. Thus,
SltBen is the bending energy of the solute and SlvRoTem is the rotational temperature of
the solvent.
Since we have used the same r{RESPA algorithm and the same SPME parameters as in Step I, we expect that the CPU time spent to compute energies and forces will scale linearly with N the number of particles. Indeed, although the SPME algorithm scales in general
with NlogN, at small N (N 20000) the algorithm is eectively linear in N.
In reality, although the number of particles increases ' 2.8 times from Step I to II
(3915=1398 '2:8), Step II takes only 2.3 times more cpu time than step I corresponding to
1.4 s per fs on our alpha 3000/800workstation. This smaller than expected increase is due
to dierences in the direct force routines handling solvent and solute5. In Fig. 6 we show the
starting system conguration after solvent is added at timet= 0 and the nal conguration
at t ' 0:5 ps. We see that at the end of Step II the conguration of the solvent molecules
appears to be suciently randomized.
C. Step III: Obtaining the Equilibrium Density at 300 K
In the previous step we have run the hydrated protein at constant volume, guessing the equilibrium density. Here, we perform instead a constant pressure simulation at 300 K and at atmospheric pressure (P = 0.1 MPa) in order to obtain a better estimate of the system volume. As a starting point for the run, we use the solute and solvent coordinates obtained after 992 fs of simulation in Step II and contained in le bpti2.pdb. In Fig. 7 we show the
input le for Step III.
1. Changestothe InputFile
a. &SOLUTE Currently, ORAC can not carry out simulations at constant pressure with
multiple time step r{RESPA algorithms. In order to perform the simulation of Step III at a reasonable speed, constraints need to be used. To do so, the keyword HEAVYis added after
5In the solvent{solvent and solvent{solute routines there is no need to perform an additional loop
over the \masked list" to exclude the bonded contacts between neighboring groups.
the commandSTRETCHINGof the environment&SOLUTEto impose constraints only to bonds
involving hydrogen atoms. All theX,H bonds in the solute and theH,H distance within
each crystallographic water molecule will be constrained. Moreover, the command INSERT
&SOLUTE) is removed as the starting conguartion is already solvated.
b. &SETUP In the environment&SETUPthe PDB coordinate le produced at the end of
Step II is now read through the commandREAD PDB bpti2.pdb, the lebpti2.pdbcontaining
both solvent and solute coordinates.
c. &SOLVENT As for the solute, we need here to removethe internal degrees of freedom of
the solvent molecules and impose constraints. This is performed by replacing the commands
STRETCHINGand BENDINGwith the appropriate constraints. The environment will then be: &SOLVENT
ATOM o 1 P 16.0 0.0 0.0 0.0
ATOM h 2 P 1.0 0.81650 -0.57735 0.0
ATOM h 2 P 1.0 -0.81650 -0.57735 0.0
INTERACTION 1 3.1656 0.1554 -0.82 INTERACTION 2 1.6 0.0 0.41 CONSTRAINT 1 2
CONSTRAINT 1 3 CONSTRAINT 2 3 &END
d. &SIMULATION To carry out the simulation in the NPH ensemble, a new command
should be added to the environment&SIMULATION:
ISOSTRESS PRESS-EXT 0.1 BARO-MASS 40.0 COMPR 5.3D-04
This command instructs ORAC to run a simulation allowing only for isotropic changes of
the simulation box volume. This is done by using the Andersen [17] extended Lagrangian method. While the imposed external pressure (in MPa) is provided after the keyword
PRESS-EXT, ORAC computes the mass of the barostat according to Eq. V.61. Thus, the
command ISOSTRESS reads optionally (as it is done in the example) the frequency of the
barostat (keywordBARO-MASS) and the system compressibility (keywordCOMPR)
correspond-ing to ! Q and
B
,1 in the Eq. V.61, respectively. Finally, the system temperature dened
by the commandTEMPERATUREis left unchanged at the value of 300 K used in Step II. e. &INTEGRATOR, &RUN and POTENTIAL We choose a time step of 1.0 fs by using
the command TIMESTEP 1.0 of environment &INTEGRATOR and replace MTS RESPA with
SINGLE STEP. In the environment &RUN, we change the lengths of the rejection phase to
2000.0 fs and then set to 6000.0 fs the length of the production phase, 6000.0 fs. Finally, in
&POTENTIALwe impose a direct lattice cuto of 10.0 A with the command CUTOFF 10.0.
2. Resultsand Output for Step III
Once the rst 2000.0 fs of the rejection phase are completed,ORAC reports that:
Temperature has been rescaled 4 times
These four velocity rescaling occur near the beginning of the rejection phase, which means that the sample was already somewhat thermalized. At the end of the simulation, after the 6 ps un{scaled run, the average temperature is 315 K. As an example of ORAC 's typical
output during the onstant pressure simulation run, we show the intermediate results at time
t= 5994fs during the production phase:
Tstep = 5994.000 Total = -42608.023 SlvPot = -28504.079 SlvCoul = -33046.281 SlvRec = -8600.088 SlvReal = -24446.193 SlvInt = .000 SltTot = -8167.073 SltPot = -12691.405 SltCoul = -16082.588 SltL-J = -1174.510 SltHyd = .000 SltBond = 4565.693 SltStr = 685.674 SltBen = 2329.648 SltItor = 155.260 SltPtor = 1395.112 S-SPot = -12642.611 S-SCoul = -16720.347 SltTemp = 318.038 SlvTrTem = 322.703 SlvRoTem = 320.220 TotTemp = 320.073
TotPre = 69.33 ConPre = -44.02 KinPre = 113.36 TmpPre = 38.20 Volume = 39431.87 PV = 2.3748 ... cell parameters .... .... stress ... XYZ 34.0371 34.0371 34.0371 .1152E+06 -.7702E+05 .1324E+06 ABC 90.0000 90.0000 90.0000 .6704E+05 -.6433E+05 .6976E+05 ... ... ... .4689E+05 .1876E+06 -.3569E+06
With respect to the NVE runs, information about the instantaneous values of the pres-sure, the cell parameters and the stress tensor are now added to the output. Since the example only allows for isotropic volume changes, the angles do not vary and the edges change only isotropically (for cubic lattice the three cell edges are equal) in the output.
The command PROPERTY in the environment&RUN can be used to compute system
av-erages and test whether the system has reached statistical equilibrium. This command is active only in the production phase after the rejection part of the simulation is over. It in-structs ORAC to print the running averages and their standard deviations at time intervals
dened by its argument. In an equilibrated sample the running averages and their standard deviation must not change with time. The output produced by the commandPROPERTIES
at the end of our 6000 fs production run is shown in Fig. 8
In Fig. 9 we plot the volume as a function of time for the total 8 picoseconds of the run (2 ps of rejection and 6 of production). The average value in the 6 ps production phase (indicated by the straight line) is 39509 A3 to be compared to the starting volume of 42875
A3. Thus, the cell has shrunk 3366 A3 which, assuming a water molecular volume of 30 A,
corresponds to the volume of about 112 water molecules. As in Step I and II, the les produced byORAC at the end of step III are the standard output, and the PDB and restart
les entered as arguments to the commandsASCIIandRESTARTof the environment&INOUT,
respectively.
D. Step IV: Production Run with Multiple Time Steps and SPME
Step IV consists in a production run carried out with a fast and energy conserving r{ RESPA algorithm. During such a run some properties of the system at equilibrium are computed and analyzed. As stated previously,ORACcan compute at run time some general
properties of the system such as root mean square displacements or power spectra of velocity autocorrelation functions. For a more complete analysis the coordinates of all particles in the system can be written to a le in binary or ascii format.
The input le for Step IV is similar to that discussed for Step II and is shown in Fig. 10. With respect to Step II we must change a series of environments. In rst place, we remove the command INSERT of the &SOLUTEenvironment. Then, we replace the cell parameters
in the &SETUP environment with those obtained from Step III, i.e. with a cell edges of
34.0590 A. Again in the same environment, the PDB coordinate le obtained from Step III must be read by READ PDB. In addition, the commands MOLECULES 836 and READ PDB
should be added to the environment &SOLVENT. While the former indicates that there are
836 solvent molecules, the latter states that the coordinates of the solvent must be read from a PDB le.
The r{RESPA integration algorithm needs to be changed in &INTEGRATOR. Indeed, the
level for a production run. Thus, we decrease the th timestep and modify the relative
magnitude of the other time steps:
&INTEGRATOR
TIMESTEP 10.2 MTS_RESPA
step intra 3 step intra 2
step nonbond 2 4.2 0.3 0.4
step nonbond 3 7.4 0.3 0.4 reciprocal step nonbond 1 9.7 0.3 1.5
test-times OPEN bpti.tt END
&END
The parameters of the r{RESPA algorithm have been tested in past studies on solvated C-phycocyanin using both the AMBER and the CHARMM force eld [12]. To the structured commandMTS RESPA, we have also added the subcommandtest-times. This subcommand
instructs ORAC to dump onto the le bpti.tt the values of the total, potential and kinetic
energies at eachfull propagation step, i.e. when the propagator IV.50 has acted completely
on the vector state p;q. Energy is rigorously conserved at the order O(t
3) only at the
end of the full propagation step in Eq. IV.50 [3,4], i.e. each 10.2 fs in the present example. Velocities in the interior of the r{RESPA propagation step are not corrected to orderO(t3)
as they are in a simple single time step velocity Verlet scheme.
We stress here that the total energy printed out in theORACstandard output when using
r{RESPA is computed at the end of each m step and not at the end of the h step. Thus,
ORAC intermediate total energy is not rigorously conserved because the system velocities
are corrected only up to them reference system and do not include corrections for steps of the higher order reference systems.
In Table I, we show the performances in CPU time per picoseconds of simulation) of the above multiple time step algorithm on various machines. A nanosecond simulation, running with Ewald and integratingallthe 33912 degrees of freedom of the system, takes about 311
hours on a medium size workstation such as HP 735. In Fig. 11, the uctuations of the total energyE and the kinetic energy K are compared. 6 This level of conservation, which yields
6The
ORACr{RESPA algorithms can sometimes give for very long run of solvated proteins a small
an energy conservation ratio of E=K ' 0.05, is generally sucient to obtain accurate
structural and dynamical properties, including those properties depending on velocities [6,7]. In Fig. 12 we show the power spectrum of the atomic velocity autocorrelation function of the hydrated protein, while Fig. 13 presents, instead, the mean square deviation of the instantaneous BPTI coordinates from the crystallographic structure. This deviation was averaged over all non hydrogen and the backbone atoms. Fig. 12 and 13 were generated from data computed during the MD run with commands of the environment&PROPERTIES,
namely:
&PROPERTIES
X_RMS 50.0 OPEN bpti.xrms X_RMS CA HEAVY BACKBONE
VACF 4012.0 OPEN bpti4.vacf VACF_PRM 6963.2 3.4
&END &ENDINPUT
To compute the coordinate mean square deviation from the X-ray structure (commands
X RMS 50.0 OPEN bpti.xrmsandX RMS CA HEAVY BACKBONE), a reference PDB le named bpti xray.ent is specied in the environment &SOLUTE through the command TEMPLATE bpti xray.ent. During the run, ORAC computes the mean square displacements of the
instantaneous coordinates from this reference structure.
If, nally, a trajectory le needs to be generated, the commands DUMP or DUMP RAND of
energy drift. This is due to the intrinsically inhomogeneous nature of the system. As discussed in section II (subsection 3) we choose group based cut-os identical for all kind of non bonded interactions. The subdivision of non{bonded interactions with respect to the selected time steps can be \inappropriate" for some LJ pair potential with large sigma, and exceedingly \appropriate" for other LJ pair potential with low. The same argument can be used for direct space electrostatic
interactions. Again the cut-os are selected irrespectively of the intensity of the charges and Lennard{Jones diameters. Large 's LJ potential and/or interactions among large charges can
produce a small energy drift. For example the algorithm used in step 4 yields a drift 0.1 Kj per picoseconds. Such a drift is virtually undetectable for runs of the order the 10-100 picoseconds but on a nanosecond time scale it can produce visible eects, e.g. a raise of temperature of 20 K
the environment&RUN can print the system coordinates at a chosen time interval in an ascii
or binary formats, respectively. In the example we dump the system coordinates every 10.2 fs on le bpti4.dmp with the command:
DUMP_RAND 10.2 OPEN bpti4.dmp
TABLE I. Performance of ORACon a series of desktop workstations. The simulation runs were
carried out on two dierent systems: i) System 1 corresponded to the hydrated BPTI molecules discussed on the example and contained 3,906 atoms. ii) System 2 was a hydrated reaction cen-ter ofRhodobacter Sphaeroides containing 33,495 atoms in an non{orthogonal box of dimensions a=73.14 A,b=82.07 A, c=57.85 A, and =90.0, =88.6, =90.5 degrees. The algorithm used in
both cases was the ve step r{RESPA algorithm discussed in Step IV of the example. In order to obtain the same level of convergence in the electrostatic sum, the same SPME parameters used in the examples were adopted for both systems except that for system 2 the number of grid points was increased to 64, 72 and 48 in the three directions, respectively. This compensated for the larger dimensions of the simulation box. The timings reported in this tables are given in units of CPU seconds needed to run 1 femtosecond of simulation.
HP 735 DEC alpha 3000/800 IBM 580H SGI R10000
System 1 1.10 1.33 1.22 0.41
System 2 9.62 13.61 9.82 4.13
FIG. 3. The content of leeld.tpgfor N{terminus alanine and acetone. See text for discussion.
FIG. 4. Example Step I: ORAC input le to start a minimization run from an X-ray PDB le.
See text for discussion.
FIG. 5. Example Step II: ORAC input le to add solvent molecules to minimized solute
coordi-nates.See text for discussion.
FIG. 6. a) Solvated BPTI, represented by the strands structure, at the beginning of Step II (see text), before equilibration. The disordered water molecules are the those from the BPTI X-ray structure. b) The same system after 1.0 ps of simulation at 300 K with velocity rescaling.
FIG. 7. Example Step III: ORAC input le to obtain the equilibrium density of a hydrated
protein. See text for discussion.
FIG. 8. Example Step III: ORAC computed average quantity from the standard output. See
text for discussion.
FIG. 9. Volume uctuations as a function of time in a constant pressure (isotropic stress) simu-lation of solvate BPTI. The production run (i.e. no velocity rescaling) starts at 2.0 ps. The average value of the volume in the un-scaled part of the simulation is the dashed line.
FIG. 10. Example Step IV:ORACinput le for a production run of 20 ps. See text for discussion.
FIG. 11. Comparison between the uctuations of the Kinetic Energy and of the Total energy for a 5 time steps the r{RESPA integrator used in Step IV (see text section 7.4).
FIG. 12. Power spectra of the velocity autocorrelation function as computed byORACin a 20 ps
run during Step IV (see text, section 7.4). The bottom, medium and top curves are the spectra of all atoms, \solute" (in theORAC sense; see text section 2 and sections 7.1,7.2) atoms and solvent
atoms, respectively.
FIG. 13. Instantaneous mean square deviation of BPTI from its X-ray coordinates. This devia-tion is averaged over all non hydrogen and backbone atoms (solid and dashed line, respectively).
[1] A. Rahman. Phys. Rev.,136, 405, (1964).
[2] L. Verlet. Phys. Rev.,159, 98, (1967).
[3] M. E. Tuckerman, B.J. Berne, and G.J. Martyna. J. Chem. Phys.,97, 1990, (1992).
[4] D. D. Humphreys, R. A. Friesner, and B. J. Berne. J. Phys. Chem.,98, 6885, (1994).
[5] M. Watanabe and M. Karplus. J. Phys. Chem., 99, 5680, (1995).
[6] P. Procacci and B. J. Berne. J. Chem. Phys.,1015, 2421, (1994).
[7] P. Procacci and M. Marchi. J. Chem. Phys., 104, 3003, (1996).
[8] L. Greengard and V. Rokhlin. J. Comput. Phys., 73, 325, (1987).
[9] K. E. Schmidt and M. A. Lee. J. Stat. Phys.,63, 1223, (1991).
[10] T. Darden, D. York, and L. Pedersen. J. Chem. Phys.,98, 10089, (1993).
[11] U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee, and L. G. Pedersen. J. Chem. Phys.,101, 8577, (1995).
[12] P. Procacci, T. Darden, and M. Marchi. J. Phys. Chem,100, 10464, (1996).
[13] H. Berendsen. Molecular Dynamics and Protein Structure. Polycrystal Book Service, Western Spring, Illinois, (1985).
[14] H. Lee, T. A. Darden, and L. G. Pedersen. J. Chem. Phys.,102, 3830, (1995).
[15] M. Saito. J. Chem. Phys.,101, 4055, (1994).
[16] P. Procacci and B. J. Berne. Mol. Phys.,83, 255, (1994).
[17] H. C. Andersen. J. Chem. Phys.,72, 2384, (1980).
[18] M. Parrinello and A. Rahman. Phys. Rev. Letters,45, 1196, (1980).
[19] S. Nose. J. Chem. Phys.,81, 511, (1984).
[20] B. R. Brooks, R. E. Bruccoeri, B. D. Olafson, D.J. States, S. Swaminanthan, and M. Karplus. J. Comput. Chem.,4, 187, (1983).
[21] S. J. Wiener, P. A. Kollmann, D. T. Nguyen, and D. A. Case. J. Comput. Chem., 7, 230,
(1986).
[22] W. D. Cornell, P. Cieplak, C. I. Bavly, I. R. Gould, K. M. Merz Jr., D. M. Ferguson, D. C. Spellmeyer, T. Fox, J. W. Caldwell, and P. Kollmann. J. Am. Chem. Soc., 117, 5179, (1995).
[23] G. Ciccotti, M. Ferrario, and J.-P. Ryckaert. Mol. Phys.,47, 1253, (1982).
[24] W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey, and M. L. Klein. J. Chem. Phys.,79, 926, (1983).
[25] M. P. Allen and D. J. Tildesley. Computer Simulation of Liquids. Oxford University Press, Walton Street, Oxford OX2 6DP, (1989).
[26] P. Ewald. Ann. Phys.,64, 253, (1921).
[27] S.W. deLeeuw, J. W. Perram, and E. R. Smith. Proc. R. Soc. London A,373, 27, (1980).
[28] P. E. Smith and B. M. Pettitt. J. Chem. Phys.,105, 4289, (1996).
[29] R. W. Hockney. Computer Simulation Using Particles. McGraw-Hill, New York, (1989). [30] H.G. Petersen. J. Chem. Phys.,103, 3668, (1995).
[31] M. E. Tuckerman and B. J. Berne. J. Chem. Phys.,95, 8362, (1991).
[32] M. E. Tuckerman, G. J. Martyna, and B. J. Berne. J. Chem. Phys.,94, 6811, (1991).
[33] M. E. Tuckerman, B. J. Berne, and A. Rossi. J. Chem. Phys., 94, 1465, (1990).
[34] Polygen Corp. Parameter and Topology les for CHARMm, Version 22, (Copyright 1986, Release 1992).
[35] M. E. Tuckerman and M. Parrinello. J. Chem. Phys.,101, 1302, (1994).
[36] M. E. Tuckerman and M. Parrinello. J. Chem. Phys.,101, 1316, (1994).
[37] H. de Raedt and B. De Raedt. Phys. Rev. A,28, 3575, (1983).
[38] H. Goldstein. Classical Mechanics. Addison-Wesley, Reading MA, (1980). [39] S. K. Grey. J. Chem. Phys., 101, 4062, (1994).
[40] E. Paci and M. Marchi. J. Phys. Chem.,104, 3003, (1996).
[41] M. Ferrario and J.-P. Ryckaert. Mol. Phys.,78, 7368, (1985).
[42] S. Nose. Prog. Theor. Phys. Supp.,103, 1, (1991).
[43] M. Ferrario. In M.P.Allen and D.J.Tildesley, editors, Computer Simulation in Chemical Physics, page 153. Kluwer Academic Publishers, (1993).
[44] G. Ciccotti and J.-P. Ryckaert. Computer Phys. Rep.,4, 345, (1986).
[45] G.L. Martyna, M.L. Klein, and M. Tuckerman. J. Chem. Phys.,97, 2635, (1992).
[46] J.-P. Ryckaert and G. Ciccotti. J. Chem. Phys.,78, 7368, (1983).
[47] S. Nose and M.L. Klein. Mol. Phys.,50, 1055, (1983).
[48] M. Marchi and P. Procacci. ORAC Manual and Guide. CECAM, Available at ftp.cecam.fr:/pub/orac/doc/manual.ps CECAM-ENS Lyon, (1997).
[49] J.J.P. Stewart. J. Comp. Chem.,10, 221, (1989).
[50] S. J. Weiner, P. A. Kollmann, D. A. Case, U. C. Singh, C. Ghio, G. Alagona, S. Profeta Jr., and P. Weiner. J. Am. Chem. Soc., 106, 765, (1984).
[51] S. Parkin, B. Rupp, and H. Hope. The structure of bovine pancreatic trypsin inhibitor at 125k: Denition of carboxyl-terminal residues glycine-57 and alanine-58. to be published.
[52] G. J. Martyna, M. E. Tuckerman, D. J. Tobias, and M. L. Klein. Mol. Phys.,87, 1117, (1996).
....
RESIDUE ala-h atoms
group
n n3 0.14140 h1 h 0.19970 h2 h 0.19970 h3 h 0.19970 group
ca ct 0.09620 ha h1 0.08890 group
cb ct -0.05970 hb1 hc 0.03000 hb2 hc 0.03000 hb3 hc 0.03000 group
c c 0.61630
o o -0.57220 end
bonds
cb ca n h1 n h2 n h3
n ca o c c ca ca ha
cb hb1 cb hb2 cb hb3 end
imphd
ca +n c o
end
termatom * c backbone n ca c RESIDUE_END ....
RESIDUE aceto ( Total Charge = 0.0 ) atoms
group
c0 c 0.7865
o o -0.5811
group
c1 ct -0.4573 h1 h1 0.1182 h2 h1 0.1182 h3 h1 0.1182 group
c2 ct -0.4573 h4 h1 0.1182 h5 h1 0.1182 h6 h1 0.1182 end
bonds
c0 o c0 c1 c0 c2 c1 h1 c1 h2 c1 h3 c2 h4 c2 h5 c2 h6 end
imphd
c2 o c0 c1 end
#
# Description Commands: #
&SETUP
CRYSTAL 35.0 35.0 35.0 90.0 90.0 90.0 READ_PDB bpti_xray.ent &END &SOLUTE STRETCHING QQ-FUDGE 0.83333 LJ-FUDGE 0.50 RESET_CM
SCALE_CHARGES 2 1 2 &END &PARAMETERS WRITE_PFR_BIN bpti_amber95.prmtpg READ_TPF_ASCII amber95.tpg READ_PRM_ASCII amber95.prm JOIN
arg-h pro asp phe cys leu glu pro pro tyr thr gly pro cys lys ala arg ile ile arg tyr phe tyr asn ala lys ala gly leu cys gln thr phe val tyr gly gly cys arg ala lys arg asn asn phe lys ser ala glu asp cys met arg thr cys gly gly ala-o po4 hoh x 167
END ADD_TPG
bond 1sg 2sg residue 5 55 bond 1sg 2sg residue 14 38 bond 1sg 2sg residue 30 51 END
&END #
# Simulation Commands: #
&SIMULATION MDSIM
TEMPERATURE 20.0 10.0 &END
&INTEGRATOR
TIMESTEP 16.0 MTS_RESPA
very_cold_start 0.1 step intra 2 step intra 2
step nonbond 4 4.2 0.3 0.4
step nonbond 4 7.4 0.3 0.4 reciprocal step nonbond 1 9.7 0.3 1.5
END &END &POTENTIAL
EWALD pme 0.43 32 32 32 4 UPDATE 40.0 1.5
&END &RUN CONTROL 0 PROPERTY 496.0 REJECT 496.0 TIME 0.0 PRINT 2.0 &END #
# Output Commands: #
&INOUT
#
# Description Commands: #
&SETUP
CRYSTAL 35.0 35.0 35.0 90.0 90.0 90.0 READ_PDB bpti1.pdb
INSERT 0.75
CELL sc 11 11 11 &END &SOLUTE STRETCHING QQ-FUDGE 0.83333 LJ-FUDGE 0.50 RESET_CM
SCALE_CHARGES 2 1 2 &END
&PARAMETERS
READ_PFR_BIN bpti_amber95.prmtpg &END
&SOLVENT
ATOM o 1 P 16.0 0.0 0.0 0.0
ATOM h 2 P 1.0 0.81650 -0.57735 0.0 ATOM h 2 P 1.0 -0.81650 -0.57735 0.0 INTERACTION 1 3.1656 0.15540 -0.82
INTERACTION 2 1.6 0.0 0.41 STRETCHING 1 2 524.86 1.0 STRETCHING 1 3 524.86 1.0 BENDING 2 1 3 55.00 109.47 &END
#
# Simulation Commands: #
&SIMULATION MDSIM
TEMPERATURE 300.0 20.0 &END
&INTEGRATOR
TIMESTEP 16.0 MTS_RESPA
step intra 2 step intra 2
step nonbond 4 4.2 0.3 0.4
step nonbond 4 7.4 0.3 0.4 reciprocal step nonbond 1 9.7 0.3 1.5
END &END &POTENTIAL
EWALD pme 0.43 32 32 32 4 UPDATE 40.0 1.5
CUTOFF 10.0 &END &RUN CONTROL 0 PROPERTY 496.0 REJECT 992.0 TIME 0.0 PRINT 2.0 &END #
# Output Commands: #
&INOUT
#
# Description Commands: #
&SETUP
CRYSTAL 35.0 35.0 35.0 90.0 90.0 90.0 READ_PDB bpti2.pdb &END &SOLUTE STRETCHING HEAVY QQ-FUDGE 0.83333 LJ-FUDGE 0.50 &END &PARAMETERS READ_PFR_BIN bpti_amber95.prmtpg &END &SOLVENT MOLECULES 836
ATOM o 1 P 16.0 0.0 0.0 0.0
ATOM h 2 P 1.0 0.81650 -0.57735 0.0 ATOM h 2 P 1.0 -0.81650 -0.57735 0.0 INTERACTION 1 3.1656 0.15540 -0.82
INTERACTION 2 1.6 0.0 0.41 CONSTRAINT 1 2
CONSTRAINT 1 3 CONSTRAINT 2 3 READ_PDB
&END #
# Simulation Commands: #
&SIMULATION MDSIM
TEMPERATURE 300.0 20.0
ISOSTRESS PRESS-EXT 0.1 BARO-MASS 40.0 COMPR 5.3D-04 &END &INTEGRATOR TIMESTEP 1.0 SINGLE_STEP &END &POTENTIAL
EWALD pme 0.43 16 16 16 4 UPDATE 40.0 1.5
CUTOFF 10.0 &END &RUN CONTROL 0 PROPERTY 500.0 REJECT 2000.0 TIME 6000.0 PRINT 2.0 &END #
# Output Commands: #
&INOUT
================================================================================
= =
= Averages over 6000.0 fs of Simulation =
= =
================================================================================
**********************************
* *
* Energies in _Kjoule/mole_ * * Temperatures in _Kelvin_ * * Pressures in MPascal *
* Volumes in A^3 *
* Distances in A *
* Stress Tensor in Joule/A**3 *
* *
**********************************
Total = -42610.755+/- 2.769 SlvPot = -29299.835+/- 541.694 SlvCoul = -33833.541+/- 711.731 SlvRec = -8600.088+/- .005 SlvReal = -25233.453+/- 711.731 SlvInt = .000+/- .000 SltTot = -8728.819+/- 465.326 SltPot = -13181.195+/- 394.840 SltCoul = -16530.580+/- 360.207 SltL-J = -1122.090+/- 62.765 SltHyd = .000+/- .000 SltBond = 4471.475+/- 99.464 SltStr = 747.711+/- 46.135 SltBen = 2208.239+/- 72.024 SltItor = 121.239+/- 13.797 SltPtor = 1394.286+/- 39.698 S-SPot = -11184.717+/- 1024.034 S-SCoul = -14698.337+/- 1354.610 SltTemp = 312.980+/- 7.890 SlvTrTem = 316.996+/- 8.186 SlvRoTem = 315.918+/- 8.556 TotTemp = 315.047+/- 4.918 SlvKin = 6598.854+/- 118.810 SltKin = 4452.376+/- 112.239
-- Fluctuating Box
---TotPre = .25+/- 54.889 ConPre = -111.14+/- 54.614 KinPre = 111.39+/- 3.019 TmpPre = 332.69+/- 463.58 Volume = 39509.82+/- 333.219 PV = 2.38+/- .020
... cell parameters .... ... + / - ... 34.0590 34.0590 34.0590 .0959396 .0959396 .0959396 90.0000 90.0000 90.0000 .0000000 .0000000 .0000000
... stress .... ... + / - ...
-.2559E+06 .2994E+05 .7462E+04 .17039E+06 .14961E+06 .18375E+06 .2877E+05 -.2441E+06 -.2443E+05 .15448E+06 .18867E+06 .17406E+06 .7778E+04 -.2968E+05 -.2741E+06 .12618E+06 .16754E+06 .20909E+06
#
# Description Commands: define the MD box with solute, # define solute topology
# &SETUP
CRYSTAL 34.0590 34.0590 34.0590 90.0 90.0 90.0 READ_PDB bpti3.pdb TEMPLATE bpti_xray.ent &END &SOLUTE STRETCHING QQ-FUDGE 0.83333 LJ-FUDGE 0.50 &END &PARAMETERS READ_PFR_BIN bpti_amber95.prmtpg &END &SOLVENT MOLECULES 836
ATOM o 1 P 16.0 0.0 0.0 0.0
ATOM h 2 P 1.0 0.81650 -0.57735 0.0 ATOM h 2 P 1.0 -0.81650 -0.57735 0.0 INTERACTION 1 3.1656 0.15540 -0.82
INTERACTION 2 1.6 0.001 0.41 STRETCHING 1 2 524.86 1.0 STRETCHING 1 3 524.86 1.0 BENDING 2 1 3 55.00 109.47 READ_PDB
&END #
# Simulation Commands: NVE MD with RESPA with velocity scaling # at 50 K using Ewald - PME
#
&SIMULATION MDSIM
TEMPERATURE 300.0 20.0 &END
&INTEGRATOR
TIMESTEP 10.2 MTS_RESPA
step intra 3 step intra 2
step nonbond 2 4.2 0.3 0.4
step nonbond 3 7.4 0.3 0.4 reciprocal step nonbond 1 9.7 0.3 1.5
END &END &POTENTIAL
EWALD pme 0.43 32 32 32 4 UPDATE 40.0 1.5
&END &RUN CONTROL 0 PROPERTY 500.0 REJECT 2000.0 TIME 20000.0 PRINT 2.0 &END #
# Output Commands: The restart and PDB files are dumped #
&INOUT
RESTART 400.0 OPEN bpti4.rst ASCII 6000.0 OPEN bpti4.pdb DUMP_RAND 10.2 OPEN bpti4.dmp &END
&PROPERTIES
X_RMS 50.0 OPEN bpti4.xrms X_RMS CA HEAVY BACKBONE VACF 4012.0 OPEN bpti4.vacf VACF_PRM 6963.2 3.4