suppmat1848. 280KB Jun 05 2011 09:30:45 PM

(1)

Piero Procacci1 z Tom A. Darden2, Emanuele Paci3 y Massimo Marchi3

1 Centre Europeen de Calcul Atomique et Moleculaire (CECAM) Ecole Normale Superieure de

Lyon 46 Allee d'Italie, 69364 Lyon, FRANCE

2 National Institute of Environmental Health, Sciences Research Triangle Park, NC 27709 3 Section de Biophysique des Proteines et des Membranes, DBCM, DSV, CEA,

Centre d'Etudes, Saclay, 91191 Gif-sur-Yvette Cedex, FRANCE

zPermanent address: Dipartimento di Chimica, Universita degli Studi di Firenze, 50120 Firenze,

Italy

yPresent address: Laboratoire de Chimie Biophysique, Institut Le Bel, Universite Louis Pasteur,

67000 Strasbourg, France

Author to whom all correspondence should be addressed

(2)

As we shall see, many ORAC 's commands allow the opening of external les. No unit

number needs to be provided asORACopens sequentially the required les assigning at each

le a unit number according to their order of occurrence in the input le.

A. The ORAC Input Files: sys.mddata

Since ORAC is designed not as a modeling interface, but as a molecular dynamics

pro-gram which performs computing intensive tasks, it works only in a non{interactive batch mode: The user must provide an input le (hereafter referred to as sys.mddata) containing commands for execution, and a series of auxiliary les the names of which are also provided in sys.mddata. At execution time, the input le is read from standard input in free format. This means that each input line is read as a character string and parsed in the composing substrings, a series of characters separated by blanks or commas. Each substring represents an instruction and is interpreted by specic routines. A line having the \#" character in column 1 is always considered a comment.

ORAC's instruction set has been designed to include three dierent kinds of instructions:

environments, commands and subcommands. A le sys.mddata is made out of a series of environments, the order of which being unimportant, including a series of commandswhich in turn might use a few subcommands. The environment name is a string beginning always with the& character followed by capital letters. Each environment ends with the instruction &END. Environments are reminiscent of the fortran namelist, but have not been programmed

as such and are portable. Command names are characters strings containing only capital letters. Each command reads a variable set of parameters which can be characters and/or numbers (real or integer). Moreover, commands composed of more than one input line (structured commands) also exist. Each structured command ends with the instruction END

and allows a series of subcommands in its inside. Subcommands are always in lower case and can read substrings containing characters and/or real or integer numbers.

In the following paragraphs we will discuss briey the basic structure of the input to

ORAC . This is intended to be a very concise and by no means exhaustive guide toORAC 's

(3)

environments directives. More details about the specic environments commands and their syntax, including the syntax of the auxiliary topological and potential les, will be found in the subsequent sections where a practical example will be illustrated. For a completedescrip-tion of all supported environmentsand commands the reader is referred to theORACmanual

[48].

In general, the environments specied in sys.mddata are roughly classied into three categories:

The Description Environments contain commands referring to the structure of the

system and to the interactions potential. These commands may instruct ORAC , for

example, to use a particular potential form, to adopt a simulation box of a given size and shape, to insert solvent molecules, to read the potential and the topology parameters les, to add extra topology etc..

With the Simulation Environments commands, one can choose the kind of simulation

to perform, e.g., the temperature and the pressure of the systems, the integration scheme to be used etc..

The Output Environments commands control the output of the simulation such as

properties calculation, binary or ascii history les, restart les.

1. Description Environments

To this category belong the environments&SETUP, &SOLUTE, &SOLVENT, &PARAMETERS.

In &SETUP the box size is specied by appropriate arguments to the commands CELL and CRYSTAL. The PDB lename containing the solute and/or solvent coordinates is also entered

in &SETUPthrough the commandREAD PDB.

Commands dening certain potential options for the solute molecules can be specied in the &SOLUTE environment. Examples of such commands are STRETCHING which allows

for bond stretching, I-TORSION which denes the functional form of the improper torsion

potential, AUTO DIHEDRALwhich is used when all the possible proper torsions of the solute

molecules are to be included in the potential, etc. The command INSERT is also available

(4)

from this environment, and it is used to ll the simulation box with solvent. Obviously, when INSERTis specied, the environment&SOLVENTmust also be present.

All the parameters needed to completely dene the atomic and geometrical structure of the solvent molecule and its LJ and electrostatic interaction potentials can be specied in the environment&SOLVENT. AlthoughORAC can not generate solute molecules coordinates,

simulations of systems containing only solvent molecules can be run by the program with-out the need to read any external coordinate les. To do so, in addition to dening the appropriate commands in the&SOLVENTenvironment, the crystal basic cell and the number

of replicas to be included in the MD simulation must be constructed using the commands

CELL and CRYSTALof the environment&SETUP. Alternatively, the solvent may also be read

from the PDB le specied in &SETUP.

The &PARAMETERSenvironment has been designed to dene a series of operations strictly

connected with the topology of the solute molecules. Thus, &PARAMETERSmust appear in

the input le only if the environment&SOLUTEis also present. In&PARAMETERSthe primary

structure of the solute is specied in the structured commandJOINas a sequence of molecular

units or residues forming the solute molecules, e.g. the amino acid residues. Each unit is labelled by a name to which corresponds topology data (atomic labels, connectivity, etc.) in an ascii topological le, hereafter referred to as eld.tpg. The name of the le eld.tpg is specied in another &PARAMETERcommand READ TPG ASCII. In the le eld.tpg the atomic

charges on each unit of the solute are also dened. The solute potential parameters, which include, stretching, bending, proper torsion, improper torsion and non{bonded parameters, are read in from the other ascii parameters le, hereafter referred to as eld.prm. The structure and format of both theeld.tpg andeld.prmles will be examined later on in this section. The name of the le eld.prm can be specied in the command READ PRM ASCII field.prm. While the intra{residue topology is dened in the eld.tpg le, inter{residue

topology, such as disulphur bonds and the added topology (bending, and torsions involving the two sulphur atoms), can be specied in &PARAMETERS using the structured command ADD TPG.

(5)

2. Simulation Environments

To this category belong the environments &SIMULATION, &RUN, &INTEGRATOR, &POTENTIAL. In &SIMULATION, parameters and keywords connected to the type of

simu-lation to be performed must be specied. Temperature and pressure are entered in this environment. A regular MD is performed only if the commandMDSIM is entered. If, along

with MDSIM, other commands such as STRESS and/or CONST TEMP are given, an extended

system simulation in the specied ensemble is performed. Simple minimizations (steepest descent only) are done if the commandMINIMIZEis present.

The &RUNenvironment contains commands specifying actions to be taken during the MD

run. For examples, the length of the rejection phase, when velocities are scaled, is specied byREJECT, the length of the production run byTIME, the printing interval for instantaneous

energies byPRINT, etc. Moreover,&RUNincludes the commandCONTROLwhich species if the

simulation must be run either from new input coordinates or from a restart le containing coordinates and velocities generated from an older simulation.

The integration algorithm to be used during the simulation and the time step size are specied in the &INTEGRATORenvironment. In case the selected integrator was r{RESPA,

the command TIMESTEP reads the largest timestep of the algorithm, namely t

h in Eq.

IV.51. Only two mutually exclusive commands can be provided to select the integrator:

SINGLE STEP, and in this case a conventional single step MD is performed, or the structured

commandMTS RESPAto perform a NVE ensemble simulation with the r{RESPA integration

algorithm. SINGLE STEPmustbe specied if the dynamics is carried out at constant

temper-ature and/or pressure, i.e. if CONST TEMP and/or (ISO)STRESS are given in &SIMULATION.

The structured command MTS RESPA denes the parameters of the r{RESPA integrator, namely the radii and healing lengths of the short, medium and long range shells for the non{bonded interactions, and the time steps dened in Eq. IV.51. MTS RESPAincludes also

a subcommand dening the reference system associated with the calculation of reciprocal space contribution to the SPME or to standard Ewald summation.

The &POTENTIALenvironments is used to dene various parameters related to the non{

bonded potential interaction and aecting both solute and solvent molecules. &POTENTIAL

allows commands to set up cuto schemes (commandsGROUP CUTOFFand EWALD), to change

(6)

the direct lattice cut-o (commands CUTOFF, GROUP CUTOFF, to modify the radius of the

Verlet Neighbor list and the frequency of its evaluation (command UPDATE) or the

param-eters of the linked cell neighbor list (command LINKED CELL). Also, the reciprocal space

convergence parameter of the Ewald sums ( in Eq. III.20) must be specied here

(com-mand EWALD), along with the grid constants K 1

;K 2

;K

3 and the order

n of the B-spline

interpolation if SPME is used.

3. Output Environments

To this category of environments belong &INOUT, &PROPERTIES.

The &INOUT environments handles the output operations carried out by ORAC on les

other that the standard output. Commands are provided that instructs the program to save coordinates to a le and how frequently this must happen. The binary trajectory les can be written onto sequential (commandDUMP) or direct access les (commandDUMP RAND). ORACprovide, of course, the possibility of writing down coordinates le in PDB ascii format.

This is accomplished by the command ASCII.

The &PROPERTIESdirective is used to compute statistical properties at the same time as

the simulation is being carried out. ORAC can compute radial distribution functions,

struc-ture factors (GOFR, SOLVENT GOFAR), velocity autocorrelation functions (VACF, MTS VACF),

infrared spectraMTS SPECTRA, root means square deviation from a reference structureX RMS.

The &PROPERTIESenvironment and the corresponding readproperties.ffortran source,

pro-vides a simple framework to the programmer for adding the command for the calculation of a new \property". Interfacing an user developed property computation to ORAC requires,

in principle, a very limited programming eort. This is discussed furtherly in the manual [48].

B. ORAC Auxiliary Files

Compared to molecular liquids, simulating proteins, or any complex biomolecule, poses additional problems due to the molecules' covalent structure, the knowledge of which must preceed any evaluation of the potential energy of the system.

(7)

The covalent topology of any complex biomolecule can be computed from the structure of its constituent residues. In ORAC , to curtail the complexity of the input data, only

minimal information on each residue needs to be provided, such as the constituent atoms, the covalent bonds and, in case of polymers or biopolymers, the terminal atoms used to connect the unit to the rest of the chain. In addition, in order to assign the correct potential parameters to the bonds, bendings and torsions of the residue, the type of each atom needs to be specied. Finally, to each atom type must correspond a set of non{bonded parameters. When the bonding topology of the dierent residues contained in the solute molecule(s) is known, these units are linked together according to their occurrence in the sequence. In this fashion the total bonding topology for the molecule is obtained. From this information, all possible bond angles are collected by searching for all possible couples of bonds which share one atom. Similarly, by selecting all couples of bonds linked among each other by a distinct bond, all the torsions can be obtained.

The following sections sketches the format of the topology and force eld parameters les read byORAC(eld.tpgandeld.prm, respectively). The topology and force eld parameters

les are strongly dependent from each other and together fully dene the molecular force eld of the solute molecule(s).

C. ORAC Auxiliary Files: eld.tpg ORAC is instructed to read the topology le by the command READ TPG ASCII eld.tpg

of the&PARAMETERSenvironment. Fileeld.tpgcontains information on the series of residues

needed to dene the topology of the actual solute molecules. This information is provided through a series of free format keywords and their corresponding input data as done in the main input le sys.mddata. In this way, ORAC reads the solute connectivity, the atomic

charges, the atomic labels corresponding to those found in the PDB le, and the atomic types according to the chosen force eld (i.e. AMBER, CHARMM or others). Moreover, the atomic groups and the improper torsions are also dened.

(8)

As for sys.mddata, the le eld.tpg is parsed and the composing substrings of each line are interpreted. Comment lines must have the \#" character in column 1. Each residue or unit denition starts with the keyword

RESIDUEresidue name

where residue name is a character label which must match labels found in the command

JOINof the environment&PARAMETERS, and must end with the keywordRESIDUE END. These

residue delimiting keywords are the only one in capital letters in eld.tpg. In Fig. 3 we give two example of \residue" denition, i.e. the alanine N terminus and the molecule of acetone, coded with the strings ala-hand aceto, respectively.

Atom type denitions and charges are read in between the keywords atomand end. For

each atom three strings must be entered: the PDB atom label, the potential type according to the selected force eld as specied ineld.prm (see later on in this section) and the point charge in electron units. Groups are composed of all atoms entered between two successive

groupkeywords. The PDB labels must be all dierent from each others since they are used

to establish the topology and connectivity of the solute.

In the alanine example, the atomic types and the charges are those of the AMBER force eld. For acetone, instead, the atomic types are still those of AMBER, while the charges are obtained from the MOPAC program [49] with the ESP tting procedure [50]. Four groups are dened for alanine and three groups for acetone. When using r{RESPA with SPME the groups should be dened as small as possible (ideally they should be composed of two, three atoms) to enhance the stability of the fast integrator. Dening large groups, on the other hand, allows substantial saving of memory, since it decreases the size of the nested Verlet neighbor lists used by the r{RESPA algorithm. Hence, in selecting the group size for large biological systems a compromise must be made.

The bond connectivity is specied between the keywords bondandendby providing the

series of bonds present in the residue. Each bond is specied by two atom labels correspond-ing to the atoms participatcorrespond-ing to the bond. In the example in Fig. 3, residue ala-h has

eleven bonds while nine bonds are found inaceto.

All possible bendings and proper torsions are computed byORACfrom bond connectivity

(9)

are used to impose geometrical constraints to specic quadruplets of atoms in the solute. In modern all{atoms force elds, improper torsions are generally used to ensure the planarity of an sp

2 hybridized atom. The convention in

ORAC to compute the proper or improper

torsion dihedral angle is the following: If r 1 ;r 2 ;r 3 ;r

4 are the position vectors of the four

atoms identifying the torsion, the dihedral angle is dened as

=ar cos " (r 2 ,r 1) (r 3 ,r 2) jr 2 ,r 1 jjr 3 ,r 2 j (r 3 ,r 2) (r 4 ,r 3) jr 3 ,r 2 jjr 4 ,r 3 j # (VI.1) In case of improper torsions involving a terminal atom a particular quadruplet of atoms must be selected. For instance, the alanine N{terminus is connected to the peptide chain from only one end. One improper torsions must then be specied involving the amino nitrogen (n+) of the subsequent residue to ensure planarity of the peptide planes.

There are other, less important, topology directives ineld.tpg which allows, e.g., to omit specic bendings, dene hydrogen bonds, etc. For a complete description we again refer to the manual [48].

D. ORAC Auxiliary Files: eld.prm

While the leeld.tpg provides electrostatic point charges for each atom of the residues,

ORAC reads the potential parameters for the bonded and the Lennard{Jones interactions

from the le eld.prm. The parsing of this additional auxiliary le is carried out in the same way as for eld.tpg and sys.mddata. In general, the parameters for a given interaction are listed between two keywords: A rst keyword identifying the type of interaction (e.g.

BOND, BENDING, etc.) and the keyword END. The order of the type of interactions and their

associated parameters in the auxiliary le is unimportant.

Bond stretchings (interaction keyword BOND) are entered specifying on a line, the two

atoms involved, followed by a numeric string providing, in this order, the force constant K r

and the equilibrium bond distancer

0 (see Eq. II.7). For example: BOND

....

# AMBER carbonyl stretching in Kcal and A

c o 570.00 1.229

.... END

(10)

For a bending (interaction keyword BENDING), three atoms must be entered along with

the force constant and the equilibrium bending angle in this order, (K and

0 in Eq. II.8).

In the atom sequence, the vertex atom is given as second while the order of the other two atoms is immaterial:

BENDING ...

# h20 bending in Kcal and rad hw ow hw 100.0 104.52 ...

END

For each proper torsion (interaction keywordTORSION PROPER) four atoms must be

pro-vided, the second and the third atoms being those on the dihedral angle axis. After the atom sequence the barrier height k

, the angle

and the integer n (see Eq. II.13) must

follow. For example:

TORSION PROPER ...

# Kcal/mole Gamma n

x ct ct x 0.1556 0.0 3

... END

The symbolx is the wild card symbol and is representative of any atomic type.

In the improper torsion potential (interaction keyword TORSION IMPROPER) again the

quadruplet of the atoms involved must be specied. The CHARMM harmonic torsional form (C

1 = 1 in Eq. II.13) or the AMBER form ( C

1 = 0 in Eq. II.13) are assumed if two or

three additional numeric characters are provided, respectively. The following is an example of the two possibilities:

TORSION IMPROPER ...

# for this torsion choose AMBER

x x n h 1.00 180.0 2

# for this instead choose CHARMM

cpb cpa nph cpa 20.80 0.0

.... END

Finally, the Lennard{Jones non{bonded atomic parameters are specied by entering 6 char-acters: The rst is the atomic type according to the chosen force eld, the second and

(11)

third are the Rmin 1 and

constants, the fourth and fth are the 1-4 interaction Rmin and constants, and the last is the atomic mass. To obtain cross interaction potentials, the

Lennard{Jones parameters are combined according to standard sum rules (see Eq. II.3). For the Lennard{Jones 1-4 non{bonded interactions, the potential function may be multi-plied by a so{called 1-4 factor, usually less or equal to 1. If zeros are entered in the fourth and fth elds, the 1-4 factor is set by the command LJ-FUDGE on environment&SOLUTE.

If some specic Lennard{Jones 1-4 interactions need to be multiplied by some alternative constants the resulting Lennard{Jones constants must be entered in the these elds. In the following, examples of the various alternatives are shown:

NONBONDED MIXRULE ...

# type o has the 1-4 factor provided by LJ-FUDGE in &SOLUTE o 1.661 0.210 0.000 0.000 16.0

# type oa is same as o but has the 1-4 factor equal to 1 oa 1.661 0.210 1.661 0.210 16.0

# type ob is same as o but has a different 1-4 potential ob 1.661 0.210 1.861 0.105 16.0

.... END

We stress that a 1{4 factor might also be specied for the 1{4 electrostatic interaction by means of the commandQQ-FUDGEin the&SOLUTEenvironment.

VI I. A TYPICAL EXAMPLE: BPTI IN WATER SOLUTION

ORAC is a general MD code which can simulate a variety of systems ranging from

sim-ple homogeneous uids and solids to comsim-plex heterogeneous systems. Here, we provide an example run for a solvated biomolecule. This is the type of systems that ORAC has been

designed to simulate and for which the highest performance can be achieved. We chose to simulate the typical guinea pig of proteins simulation, namely, the Bovine Trypsin Pancre-atic Inhibitor (BPTI), in water and at 300 K. We start our simulation from the available experimental X-ray structure of the orthorhombic type I crystal at low temperature [51]. In

1

Rmin corresponds to the minimum of the Lennard{Jones potential and is related to the

pa-rameter by = 2Rmin2 ,1=6

(12)

the following sections we go through all the steps that are needed to prepare the system for a typical MD run and to run the simulation itself. In particular, we discuss the following sequential steps:

Step I: Minimization of the protein structurein vacuum using the AMBER force eld

by means of r{RESPA MD at 20 K.

Step II: Solvent (water molecules) are added into the simulation box. The solvent

structure is relaxed at 300 K with a short r{RESPA simulation.

Step III: A few ps of molecular dynamics simulation at constant pressure and at 300

K is performed in order to nd the equilibrium density at P=1 MPa.

Step IV: A simulation of the hydrated BPTI at the equilibrium density at 300 K is

performed using NVE r{RESPA Molecular Dynamics

The discussion that follows is propaedeutic to the program usage.

A. Step I: Starting a Run from the X-ray PDB le

Our example run was started from the X-rays coordinates of the native bovine pancreatic trypsin inhibitor taken from the protein data bank at the Brookhaven national laboratory, le

pdb1bpi.ent. The PDB coordinate le contains 58 residues for a total of 460 non hydrogen protein atoms, a phosphate anion (5 atoms) and 167 water oxygens. Although ORAC is

able to read the PDB le as is, in le pdb1bpi.ent the GLU7 and ARG53 residues, and the phosphate anion are given in two alternative conformations named A and B. Thus, we retained only the \B" conformation and erased the coordinates of the \A" conformation. These changes done, the input le sys.mddatais given in Fig. 4.

1. Description of the Input File

Although, as we saw in the previous section, the order of the environment commands is immaterial, we chose to order them according to the same arbitrary subdivision and order used before. Thus, thedescription environmentare given rst. Since at this stage no solvent is present, only the environments&SETUP, &SOLUTEand &PARAMETERSare specied.

(13)

a. &SETUP In &SETUP only two commands are entered: CRYSTAL, where the simulation

cell parameters (a, b, c, , and discussed in Sec. IIB) are provided, and READ PDB bpti xray.ent, wherebpti xray.entis the lename of the initial solute coordinates obtained

from the Brookhaven PDB.

b. &SOLUTE The &SOLUTE environment contains ve commands: i) STRETCHING

pre-vents ORAC from enforcing constraints on bonds which are in conict with the r{RESPA

integrator to be used. ii) Two commands are used to dene the 1{4 multiplicative factors for electrostatic, QQ-FUDGE, and Lennard{Jones, LJ-FUDGE, interactions. These are discussed

in section VIID. iii) RESET CMshifts the origin of the simulation box to the center of mass

of the solute. iv) The command SCALE CHARGESinstructs ORAC to distribute any excess

charge2 over the rst two solute molecules, i.e. the BPTI and the phosphate ion. Since there

is no&SOLVENTdirective, the 167 crystallographic water molecules along with the phosphate

anion are considered as part of the \solute".

c. &PARAMETERS In the &PARAMETERS environment we enter the lenames of the

topology and parameters auxiliary les by using the commands READ ASCII TPG and READ ASCIIPRM, respectively. The two les amber95.tpg and amber95.prm corresponding

to the AMBER force eld are provided with the ORAC distribution les. If no hydrogen

coordinates are provided in the PDB le, as is generally the case, ORAC generates the

hy-drogen atoms according to simple geometric rules. The structured command JOIN is used

to dene the residues sequence given in the PDB le. We notice that n identical and con-secutive \residues" like the water molecules hoh can be specied with the format hohn.

In addition, by entering the subcommand BOND of the structured commandADD TPG, three

extra bonds corresponding to the three disulphur bridges (namely CYS5{CYS55, CYS14{ CYS38, AND CYS30{CYS51) are added. Finally, the binary le bpti amber95 osf.prmtpg

containing the full topology and interaction parameters of the system is written by the commandWRITE PFR BIN. The expensive computations of the topology and parameters le

can be avoided in subsequent runs by reading the le created by WRITE PFR BIN with the

2Using the standard protonation at PH 7 for his,glu,asp,arg,lys and charge -3e for the phosphate

anion, the system has a total charge of +3e.

(14)

commandREAD PER BIN.

d. &SIMULATION In the example,&SIMULATIONindicates that a normal MD simulationis

to be run at the temperature of 20 K with an oscillation band width of10K. By specifying

MDSIM in &SIMULATIONand by selecting r{RESPA as the integrator in&INTEGRATOR, the

minimization is run with a r{RESPA NVE MD algorithm rather than using theORAC min-imization algorithm, i.e. the moduledrvmin3.

e. &INTEGRATOR The rst command entered in the &INTEGRATOR environment is TIMESTEP. Since r{RESPA is used as the integrating algorithm, the time step given in input

to the commandTIMESTEPcorrespond to the thtime step in Eq. IV.51. The parameters of

the integration algorithm are given in the structured commandMTS RESPA. In the example,

the r{RESPA multiple time steps scheme includes ve time steps of which two time step involving bonded forces (step intra) and three steps involving non{bonded forces (step nonbond). The rst eld after the subcommands represents the integers, n0;n1;m;l;h

in Eq. IV.51 associated with each time step. Therefore, in the example we have that: th = 16:0=1 fs, tl= 16:0=4 = 4:0 fs, tm = tl=4 = 1:0 fs, n1 = m=2 = 0:5 fs and

n0 = n1=2 = 0:5=2 = 0:25 fs. The long, mediumand short range potentials (Vh;VlandVm

in Eqs. IV.37 through IV.39) are dened sequentially by the commandsstep nonbond. For

each of these one real number, corresponding to the shell radius r, must be entered. For each shell radius, two more optional parameters can be specied, i.e. the corresponding healing length and the neighbor list oset r, the neighbor radius for each shell being dened asrlist=r++r.). The values for the healing lengths and the neighbor list osets

given in this example have been tested for an energy conserving 5 time steps algorithm in solvated protein at 300 K [12]. The nal keyword reciprocalin the second subcommand step nonbondindicates that the reciprocal lattice sum must be computed during the l-th

time step. The option very cold start is used to prevent the simulation from crashing

due to an initial system very far from equilibrium. The argument following the command

very cold startis the maximum allowed increment per step of a Cartesian coordinate in 3To use

drvmin,MINIMIZEshould have been entered in place ofMDSIM along with the choice of a

single time step integrator

(15)

unit of A. Since during minimization the total system energy does not need to be conserved, the parameters of the r{RESPA algorithm can be selected with more freedom.

f. &POTENTIAL In the &POTENTIALenvironment the commandEWALDspecies that the

SPME method will be used in the simulation. The value of the convergence parameters

is given in A

,1 and must follow the keyword

pme. The subsequent four integers are

the constants K 1

;K 2

;K

3 (see Eq. III.30) determining the neness of the grid in reciprocal

space, and the order n of the B-spline interpolation. In this example the relative accuracy jE,E

exact j=E

exact of the Coulomb energy is in the order of ' 10

,4. The SPME reciprocal

lattice contribution V

q r is assigned to the

l shell by the structured command MTS RESPA in

&INTEGRATOR. FollowingEWALD, the commandUPDATEindicates that the Verlet neighbor list

is to be recalculated every 40.0 fs with a cuto 1.5 A larger than the potential cuto. In this example, the size of the system is not suciently large to make it convenient to use the linked-cell neighbor lists (accessed with the command LINKED CELL) rather than the more

conventional Verlet lists.

g. &RUN The rst command of the environment &RUN, CONTROL 0, species that the

simulation is not commenced from a restart le and that the velocities must be initialized from scratch. The subsequent REJECT 496.0, indicates that 496 fs of simulation with

ve-locity rescaling will be carried out. Veve-locity rescaling will occur each time that the system temperature goes beyond the oscillation bandwidth of 10 K dened in the environment

&SIMULATION. The commandTIME is used to dene the length of the production run with

no velocity rescaling. Since Step I is a minimization, this length is set to zero. The last commandPRINT 2.0indicates that intermediate results are to be written every 2.0 fs.

h. &INOUT The output generated byORAC is specied in the environment&INOUTand

consists of a binary restart le printed every 248 fs and of an PDB le printed every 496 fs, i.e only at the end of the run. While the restart le is rewound at each print, the PDB le is not and congurations accumulate during the run.

2. Results and Output from the Run

At execution time, if syntax errors or incompatible options are detected in sys.mddata,

ORAC aborts with an error message before attempting any calculation. If no error is found,

(16)

the program builds up the molecules of the system using the sequence specied in the structured commandJOINand the topology denition given in the topology leamber95.tpg.

As the next step, ORAC tries to match bonds, bends, proper and improper torsions with the potential parameters specied in amber95.prm. If matching fails ORAC stops with an error message. Finally, before the simulation can begin, the PDB le bpti.pdb is read in. This preliminary phase, which constructs the system topology, the parameters arrays and corresponds to the execution of the modules start, read input, join and bldbox, may take several minutes for large size biomolecules. The successful completion of this phase is signaled by the printing of a synthetic system description and topology information. For our example, the following output is obtained:

***************************************************************

* Solute TOPOLOGY List *

* *

* 1398 Atoms 1244 Bonds 1244 FLexible Bonds * * 0 Rigid Bonds 1799 Angles 2732 P-Torsions * * 199 I-Torsions 2347 1-4 Inter. 524 Atomic Groups *

* *

***************************************************************

Subsequently, ORAC enters the routine mtsmd instructed by the directive MTS RESPA in &INTEGRATORand the simulation starts.

When running with r{RESPA, at the very beginning of the run, ORAC prints out an estimated CPU time for the scheduled run. The cost per force call for each of the potential contributions in Eqs. IV.35 to IV.39 is also printed. This output helps in tuning the eciency of the integration schemes. For the simulation length specied in the input le we obtained the following output on a DEC alpha 3000/800workstation4:

CPUtime for m-contribution: RECP = 0.00 DIR = 0.137 TOT = 0.137

CPUtime for l-contribution: RECP = 0.37 DIR = 0.444 TOT = 0.811

CPUtime for h-contribution: RECP = 0.00 DIR = 0.683 TOT = 0.683

THEORIC SPEED UP FOR NON BONDED PART = 4.27

4The DEC

alpha 3000/800 workstation runs at 30 MF(Megaops) per second for the Linpack

benchmark.

(17)

CPUtime for n1-contribution = 0.0654

CPUtime for n0-contribution = 0.0215

OVERALL THEORIC SPEED UP = 11.48

Expected CPU time for the RUN: 0 hours and 4 min

Expected average time per M step: 0.60 sec.

Expected average time per femto : 0.60 sec.

Thus, the run is expected to last for 4 minutes. This estimate is quite accurate. The eective CPU time at the end of the simulation was 306 seconds. While the simulation is running, intermediate results are printed to standard output. These include various energies (in KJ per mole of \solute" thus encompassing all the 1398 atoms in the simulation box)

and temperatures (in K). The following is an example of the output:

Tstep = 494.000 Total = -15684.279 TotPot = -16053.392 Coulom = -17820.143 Recipr = -10938.008 NonBond = -18188.098 Ener14 = 1009.599 Bonded = 2134.706 Stretch = 348.402 Angle = 616.197 I-Tors = 29.845 P-Tors = 1140.262 TotTemp = 21.2 RotTemp = .000E+00 TraTemp = .000E+00

Tstep = 496.000 Total = -15683.211 TotPot = -16126.250 Coulom = -17777.902 Recipr = -10938.008 NonBond = -18161.346 Ener14 = 1019.448 Bonded = 2035.096 Stretch = 265.103 Angle = 601.302 I-Tors = 29.942 P-Tors = 1138.749 TotTemp = 25.4 RotTemp = .000E+00 TraTemp = .000E+00

The meaning of the symbols in the output is self evident: Tstep is the instantaneous

simulation time in fs;Totalis the total energy;TotPotis the total potential energy,Coulom

is the electrostatic energy;Recipris the reciprocal SPME lattice energy;NonBondis the total

electrostatic + Lennard{Jones non{bonded energy. Ener14is the 1-4 non{bonded Lennard{

Jones interaction energy;Bondedis the total energy due to intra{molecular interactions, and Stretch,Angle, I-Tors,P-Tors are the stretching, bending, improper and proper torsion

contributions, respectively.

At the endo of the run, the last conguration is saved to both a binary restart bpti1.rst

le and to an ascii PDB lebpti1.pdb.

(18)

The next step consists in hydrating the protein. To do so, the simulation box containing the protein is lled with solvent molecules generated on a regular grid. Only molecules at a sucient distance from any protein atom are included. Subsequently, a short simulation of about 1 ps in the NVE ensemble at about 300 K is carried out in order to randomize the solvent molecules around the protein. To accomplish this task, the le sys.mdata of step I needs to be modied. We show in Fig. 5 the input le for Step II.

1. Changestothe InputFile

a. &SETUP The environment&SETUPis changed to: &SETUP

CRYSTAL 35.0 35.0 35.0 90.0 90.0 90.0 READ_PDB bpti1.pdb

INSERT 0.75

CELL sc 11 11 11 &END

Here, two new commandsINSERTandCELLare used. The real argument toINSERTspecies

the criterion for discarding overlapping molecules. A solvent molecules is discarded if the distance between any atom of the solute and that of any atom of the solvent molecule is

r isjp

<r adius( is +

jp)

; (VII.2)

where r adius is the argument to INSERT, r

isjp is the distance between the

is{th atom of

the solvent molecule and the jp{th atom of the protein, and is

;

jp are the corresponding

Lennard{Jones diameters. Trial and error has shown that a reasonable solvent density can be achieved with values of radiusin the range between 0.6 and 0.8 units. In this example it was set to 0.75.

CELL generates a periodic structure of solvent molecules with a simple cubic (keyword sc) repeating unit. This basic cell is repeated in the three directions 11 times (keyword 11 11 11) as to reproduce, approximately, the water density at 300 K. Body and face

center cubic cell could have also been chosen with keywords bcc and fcc, respectively.

Since the simple cubic lattice has one molecule per repeating unit, 113 = 1331 solvent

(19)

molecules are added to the simulation box which already contained 167 crystallization water molecules. The equilibrium density at 300 K will be obtained in step III when running the constant pressure simulation. The initial protein coordinates, read by the command

READ PDB bpti1.pdb, were obtained from the simulation described in Step I.

b. &PARAMETERS Since all topology and force eld information has already been

gener-ated in the previous step, the &PARAMETERSenvironment are changed to: READ_PFR_BIN bpti_amber95.prmtpg

This instructs ORAC to read the the complete topology of the solute from the binary le

bpti amber95 osf.prmtpg. In this fashion, the expensive computations of the protein topology and force eld parameters arrays are skipped.

c. &SOLVENT A new environment appears in the input le to signal that solvent

molecules are present in the system, namely:

&SOLVENT

ATOM o 1 P 16.0 0.0 0.0 0.0

ATOM h 2 P 1.0 0.81650 -0.57735 0.0 ATOM h 2 P 1.0 -0.81650 -0.57735 0.0 INTERACTION 1 3.1656 0.1554 -0.82

INTERACTION 2 1.6 0.0 0.41

STRETCHING 1 2 524.86 1.0

STRETCHING 1 3 524.86 1.0

BENDING 2 1 3 55.00 109.47

&END

The rst three instructions dene the coordinates of the atoms contained in each of the sol-vent molecules. The commandATOMexpects the atom symbol, type, rank and mass followed

by its coordinates. The atom rank informsORACif the site should be considered as primary

or secondary in the calculation of constraints (see Ref. [23]). Acceptable ranks are P or S

for primary and secondary atoms, respectively. The interaction atom type must be dened by INTERACTIONwhich associate Lennard{Jones parameters and charges to atomic types.

In the example parameters for the SPC water model are provided. Moreover, commands

STRETCHINGand BENDINGdene the intra{molecular parameters for the solvent molecule.

The parameters (in KCal/mole and A) are taken from the CHARMM force eld. Finally, we stress that without the environment&SOLVENTthe changes made to&SETUPwill produce

an error condition.

(20)

d. Additional Changes In the new simulation step the temperature is modied to 300 K and the rejection phase increased to 992.0 fs. Thus, the argument to the command

TEMPERATUREin &SIMULATIONis replaced with TEMPERATURE 300.0 20.0to raise the

tem-perature to 300 K with a oscillation band of 20. In addition, the commandMTS RESPA in

&INTEGRATORis changed by removing the keywordvery cold start. Finally, the rejection

phase was increased to by modifying the commandREJECT in&RUN to REJECT 992.0.

2. Resultsand Output for Step II

In step II, most of the computational time of the preliminary phase is spent in hydrat-ing the BPTI molecule. This operation involves the time consumhydrat-ing calculation of all the contacts between the protein and the solvent molecules. Compared to Step I, ORAC skips

the computation of the topology and force eld arrays needed for the simulation phase and instead reads the binary lebpti amber95.prmtpg generated in Step I.

In the examplethe protein hydration is completedwith the following message on standard output:

495 molecules over 1331 have been removed

This means that 836 water molecules have been left in the 42875 A3 cubic box. The total

number of atoms of the new system is nowN = 1398 + 8363 = 3;906.

ORACchanges the output of the intermediate results according to the simulated system.

Since in Step II the system consists of \solute" and \solvent" molecules, the printout of the intermediate results will have a dierent format that in Step I, namely

Tstep = 496.000 Total = -31480.790 SlvPot = -34304.498 SlvCoul = -39226.780 SlvRec = -9510.855 SlvReal = -29715.925 SlvInt = 4584.133 SltTot = -8586.383 SltPot = -14068.423 SltCoul = -18441.684 SltL-J = -954.148 SltHyd = .000 SltBond = 5327.410 SltStr = 1741.395 SltBen = 2079.249 SltItor = 131.662 SltPtor = 1375.104 S-SPot = -3097.657 S-SCoul = -4187.726 SltTemp = 314.651 SlvTrTem = 308.998 SlvRoTem = 321.402 TotTemp = 316.331

Here, the prex Slt in the output labels corresponds to energies or temperatures of the

\solute". On the other hand, solvent properties are indicated by the prex Slv. Thus,

(21)

SltBen is the bending energy of the solute and SlvRoTem is the rotational temperature of

the solvent.

Since we have used the same r{RESPA algorithm and the same SPME parameters as in Step I, we expect that the CPU time spent to compute energies and forces will scale linearly with N the number of particles. Indeed, although the SPME algorithm scales in general

with NlogN, at small N (N 20000) the algorithm is eectively linear in N.

In reality, although the number of particles increases ' 2.8 times from Step I to II

(3915=1398 '2:8), Step II takes only 2.3 times more cpu time than step I corresponding to

1.4 s per fs on our alpha 3000/800workstation. This smaller than expected increase is due

to dierences in the direct force routines handling solvent and solute5. In Fig. 6 we show the

starting system conguration after solvent is added at timet= 0 and the nal conguration

at t ' 0:5 ps. We see that at the end of Step II the conguration of the solvent molecules

appears to be suciently randomized.

C. Step III: Obtaining the Equilibrium Density at 300 K

In the previous step we have run the hydrated protein at constant volume, guessing the equilibrium density. Here, we perform instead a constant pressure simulation at 300 K and at atmospheric pressure (P = 0.1 MPa) in order to obtain a better estimate of the system volume. As a starting point for the run, we use the solute and solvent coordinates obtained after 992 fs of simulation in Step II and contained in le bpti2.pdb. In Fig. 7 we show the

input le for Step III.

1. Changestothe InputFile

a. &SOLUTE Currently, ORAC can not carry out simulations at constant pressure with

multiple time step r{RESPA algorithms. In order to perform the simulation of Step III at a reasonable speed, constraints need to be used. To do so, the keyword HEAVYis added after

5In the solvent{solvent and solvent{solute routines there is no need to perform an additional loop

over the \masked list" to exclude the bonded contacts between neighboring groups.

(22)

the commandSTRETCHINGof the environment&SOLUTEto impose constraints only to bonds

involving hydrogen atoms. All theX,H bonds in the solute and theH,H distance within

each crystallographic water molecule will be constrained. Moreover, the command INSERT

&SOLUTE) is removed as the starting conguartion is already solvated.

b. &SETUP In the environment&SETUPthe PDB coordinate le produced at the end of

Step II is now read through the commandREAD PDB bpti2.pdb, the lebpti2.pdbcontaining

both solvent and solute coordinates.

c. &SOLVENT As for the solute, we need here to removethe internal degrees of freedom of

the solvent molecules and impose constraints. This is performed by replacing the commands

STRETCHINGand BENDINGwith the appropriate constraints. The environment will then be: &SOLVENT

ATOM o 1 P 16.0 0.0 0.0 0.0

ATOM h 2 P 1.0 0.81650 -0.57735 0.0

ATOM h 2 P 1.0 -0.81650 -0.57735 0.0

INTERACTION 1 3.1656 0.1554 -0.82 INTERACTION 2 1.6 0.0 0.41 CONSTRAINT 1 2

CONSTRAINT 1 3 CONSTRAINT 2 3 &END

d. &SIMULATION To carry out the simulation in the NPH ensemble, a new command

should be added to the environment&SIMULATION:

ISOSTRESS PRESS-EXT 0.1 BARO-MASS 40.0 COMPR 5.3D-04

This command instructs ORAC to run a simulation allowing only for isotropic changes of

the simulation box volume. This is done by using the Andersen [17] extended Lagrangian method. While the imposed external pressure (in MPa) is provided after the keyword

PRESS-EXT, ORAC computes the mass of the barostat according to Eq. V.61. Thus, the

command ISOSTRESS reads optionally (as it is done in the example) the frequency of the

barostat (keywordBARO-MASS) and the system compressibility (keywordCOMPR)

correspond-ing to ! Q and

B

,1 in the Eq. V.61, respectively. Finally, the system temperature dened

by the commandTEMPERATUREis left unchanged at the value of 300 K used in Step II. e. &INTEGRATOR, &RUN and POTENTIAL We choose a time step of 1.0 fs by using

the command TIMESTEP 1.0 of environment &INTEGRATOR and replace MTS RESPA with

(23)

SINGLE STEP. In the environment &RUN, we change the lengths of the rejection phase to

2000.0 fs and then set to 6000.0 fs the length of the production phase, 6000.0 fs. Finally, in

&POTENTIALwe impose a direct lattice cuto of 10.0 A with the command CUTOFF 10.0.

2. Resultsand Output for Step III

Once the rst 2000.0 fs of the rejection phase are completed,ORAC reports that:

Temperature has been rescaled 4 times

These four velocity rescaling occur near the beginning of the rejection phase, which means that the sample was already somewhat thermalized. At the end of the simulation, after the 6 ps un{scaled run, the average temperature is 315 K. As an example of ORAC 's typical

output during the onstant pressure simulation run, we show the intermediate results at time

t= 5994fs during the production phase:

Tstep = 5994.000 Total = -42608.023 SlvPot = -28504.079 SlvCoul = -33046.281 SlvRec = -8600.088 SlvReal = -24446.193 SlvInt = .000 SltTot = -8167.073 SltPot = -12691.405 SltCoul = -16082.588 SltL-J = -1174.510 SltHyd = .000 SltBond = 4565.693 SltStr = 685.674 SltBen = 2329.648 SltItor = 155.260 SltPtor = 1395.112 S-SPot = -12642.611 S-SCoul = -16720.347 SltTemp = 318.038 SlvTrTem = 322.703 SlvRoTem = 320.220 TotTemp = 320.073

TotPre = 69.33 ConPre = -44.02 KinPre = 113.36 TmpPre = 38.20 Volume = 39431.87 PV = 2.3748 ... cell parameters .... .... stress ... XYZ 34.0371 34.0371 34.0371 .1152E+06 -.7702E+05 .1324E+06 ABC 90.0000 90.0000 90.0000 .6704E+05 -.6433E+05 .6976E+05 ... ... ... .4689E+05 .1876E+06 -.3569E+06

With respect to the NVE runs, information about the instantaneous values of the pres-sure, the cell parameters and the stress tensor are now added to the output. Since the example only allows for isotropic volume changes, the angles do not vary and the edges change only isotropically (for cubic lattice the three cell edges are equal) in the output.

The command PROPERTY in the environment&RUN can be used to compute system

av-erages and test whether the system has reached statistical equilibrium. This command is active only in the production phase after the rejection part of the simulation is over. It in-structs ORAC to print the running averages and their standard deviations at time intervals

(24)

dened by its argument. In an equilibrated sample the running averages and their standard deviation must not change with time. The output produced by the commandPROPERTIES

at the end of our 6000 fs production run is shown in Fig. 8

In Fig. 9 we plot the volume as a function of time for the total 8 picoseconds of the run (2 ps of rejection and 6 of production). The average value in the 6 ps production phase (indicated by the straight line) is 39509 A3 to be compared to the starting volume of 42875

A3. Thus, the cell has shrunk 3366 A3 which, assuming a water molecular volume of 30 A,

corresponds to the volume of about 112 water molecules. As in Step I and II, the les produced byORAC at the end of step III are the standard output, and the PDB and restart

les entered as arguments to the commandsASCIIandRESTARTof the environment&INOUT,

respectively.

D. Step IV: Production Run with Multiple Time Steps and SPME

Step IV consists in a production run carried out with a fast and energy conserving r{ RESPA algorithm. During such a run some properties of the system at equilibrium are computed and analyzed. As stated previously,ORACcan compute at run time some general

properties of the system such as root mean square displacements or power spectra of velocity autocorrelation functions. For a more complete analysis the coordinates of all particles in the system can be written to a le in binary or ascii format.

The input le for Step IV is similar to that discussed for Step II and is shown in Fig. 10. With respect to Step II we must change a series of environments. In rst place, we remove the command INSERT of the &SOLUTEenvironment. Then, we replace the cell parameters

in the &SETUP environment with those obtained from Step III, i.e. with a cell edges of

34.0590 A. Again in the same environment, the PDB coordinate le obtained from Step III must be read by READ PDB. In addition, the commands MOLECULES 836 and READ PDB

should be added to the environment &SOLVENT. While the former indicates that there are

836 solvent molecules, the latter states that the coordinates of the solvent must be read from a PDB le.

The r{RESPA integration algorithm needs to be changed in &INTEGRATOR. Indeed, the

(25)

level for a production run. Thus, we decrease the th timestep and modify the relative

magnitude of the other time steps:

&INTEGRATOR

TIMESTEP 10.2 MTS_RESPA

step intra 3 step intra 2

step nonbond 2 4.2 0.3 0.4

step nonbond 3 7.4 0.3 0.4 reciprocal step nonbond 1 9.7 0.3 1.5

test-times OPEN bpti.tt END

&END

The parameters of the r{RESPA algorithm have been tested in past studies on solvated C-phycocyanin using both the AMBER and the CHARMM force eld [12]. To the structured commandMTS RESPA, we have also added the subcommandtest-times. This subcommand

instructs ORAC to dump onto the le bpti.tt the values of the total, potential and kinetic

energies at eachfull propagation step, i.e. when the propagator IV.50 has acted completely

on the vector state p;q. Energy is rigorously conserved at the order O(t

3) only at the

end of the full propagation step in Eq. IV.50 [3,4], i.e. each 10.2 fs in the present example. Velocities in the interior of the r{RESPA propagation step are not corrected to orderO(t3)

as they are in a simple single time step velocity Verlet scheme.

We stress here that the total energy printed out in theORACstandard output when using

r{RESPA is computed at the end of each m step and not at the end of the h step. Thus,

ORAC intermediate total energy is not rigorously conserved because the system velocities

are corrected only up to them reference system and do not include corrections for steps of the higher order reference systems.

In Table I, we show the performances in CPU time per picoseconds of simulation) of the above multiple time step algorithm on various machines. A nanosecond simulation, running with Ewald and integratingallthe 33912 degrees of freedom of the system, takes about 311

hours on a medium size workstation such as HP 735. In Fig. 11, the uctuations of the total energyE and the kinetic energy K are compared. 6 This level of conservation, which yields

6The

ORACr{RESPA algorithms can sometimes give for very long run of solvated proteins a small

(26)

an energy conservation ratio of E=K ' 0.05, is generally sucient to obtain accurate

structural and dynamical properties, including those properties depending on velocities [6,7]. In Fig. 12 we show the power spectrum of the atomic velocity autocorrelation function of the hydrated protein, while Fig. 13 presents, instead, the mean square deviation of the instantaneous BPTI coordinates from the crystallographic structure. This deviation was averaged over all non hydrogen and the backbone atoms. Fig. 12 and 13 were generated from data computed during the MD run with commands of the environment&PROPERTIES,

namely:

&PROPERTIES

X_RMS 50.0 OPEN bpti.xrms X_RMS CA HEAVY BACKBONE

VACF 4012.0 OPEN bpti4.vacf VACF_PRM 6963.2 3.4

&END &ENDINPUT

To compute the coordinate mean square deviation from the X-ray structure (commands

X RMS 50.0 OPEN bpti.xrmsandX RMS CA HEAVY BACKBONE), a reference PDB le named bpti xray.ent is specied in the environment &SOLUTE through the command TEMPLATE bpti xray.ent. During the run, ORAC computes the mean square displacements of the

instantaneous coordinates from this reference structure.

If, nally, a trajectory le needs to be generated, the commands DUMP or DUMP RAND of

energy drift. This is due to the intrinsically inhomogeneous nature of the system. As discussed in section II (subsection 3) we choose group based cut-os identical for all kind of non bonded interactions. The subdivision of non{bonded interactions with respect to the selected time steps can be \inappropriate" for some LJ pair potential with large sigma, and exceedingly \appropriate" for other LJ pair potential with low. The same argument can be used for direct space electrostatic

interactions. Again the cut-os are selected irrespectively of the intensity of the charges and Lennard{Jones diameters. Large 's LJ potential and/or interactions among large charges can

produce a small energy drift. For example the algorithm used in step 4 yields a drift 0.1 Kj per picoseconds. Such a drift is virtually undetectable for runs of the order the 10-100 picoseconds but on a nanosecond time scale it can produce visible eects, e.g. a raise of temperature of 20 K

(27)

the environment&RUN can print the system coordinates at a chosen time interval in an ascii

or binary formats, respectively. In the example we dump the system coordinates every 10.2 fs on le bpti4.dmp with the command:

DUMP_RAND 10.2 OPEN bpti4.dmp

(28)

TABLE I. Performance of ORACon a series of desktop workstations. The simulation runs were

carried out on two dierent systems: i) System 1 corresponded to the hydrated BPTI molecules discussed on the example and contained 3,906 atoms. ii) System 2 was a hydrated reaction cen-ter ofRhodobacter Sphaeroides containing 33,495 atoms in an non{orthogonal box of dimensions a=73.14 A,b=82.07 A, c=57.85 A, and =90.0, =88.6, =90.5 degrees. The algorithm used in

both cases was the ve step r{RESPA algorithm discussed in Step IV of the example. In order to obtain the same level of convergence in the electrostatic sum, the same SPME parameters used in the examples were adopted for both systems except that for system 2 the number of grid points was increased to 64, 72 and 48 in the three directions, respectively. This compensated for the larger dimensions of the simulation box. The timings reported in this tables are given in units of CPU seconds needed to run 1 femtosecond of simulation.

HP 735 DEC alpha 3000/800 IBM 580H SGI R10000

System 1 1.10 1.33 1.22 0.41

System 2 9.62 13.61 9.82 4.13

(29)

FIG. 3. The content of leeld.tpgfor N{terminus alanine and acetone. See text for discussion.

FIG. 4. Example Step I: ORAC input le to start a minimization run from an X-ray PDB le.

See text for discussion.

FIG. 5. Example Step II: ORAC input le to add solvent molecules to minimized solute

coordi-nates.See text for discussion.

FIG. 6. a) Solvated BPTI, represented by the strands structure, at the beginning of Step II (see text), before equilibration. The disordered water molecules are the those from the BPTI X-ray structure. b) The same system after 1.0 ps of simulation at 300 K with velocity rescaling.

FIG. 7. Example Step III: ORAC input le to obtain the equilibrium density of a hydrated

protein. See text for discussion.

FIG. 8. Example Step III: ORAC computed average quantity from the standard output. See

text for discussion.

FIG. 9. Volume uctuations as a function of time in a constant pressure (isotropic stress) simu-lation of solvate BPTI. The production run (i.e. no velocity rescaling) starts at 2.0 ps. The average value of the volume in the un-scaled part of the simulation is the dashed line.

FIG. 10. Example Step IV:ORACinput le for a production run of 20 ps. See text for discussion.

FIG. 11. Comparison between the uctuations of the Kinetic Energy and of the Total energy for a 5 time steps the r{RESPA integrator used in Step IV (see text section 7.4).

FIG. 12. Power spectra of the velocity autocorrelation function as computed byORACin a 20 ps

run during Step IV (see text, section 7.4). The bottom, medium and top curves are the spectra of all atoms, \solute" (in theORAC sense; see text section 2 and sections 7.1,7.2) atoms and solvent

atoms, respectively.

(30)

FIG. 13. Instantaneous mean square deviation of BPTI from its X-ray coordinates. This devia-tion is averaged over all non hydrogen and backbone atoms (solid and dashed line, respectively).

(31)

[1] A. Rahman. Phys. Rev.,136, 405, (1964).

[2] L. Verlet. Phys. Rev.,159, 98, (1967).

[3] M. E. Tuckerman, B.J. Berne, and G.J. Martyna. J. Chem. Phys.,97, 1990, (1992).

[4] D. D. Humphreys, R. A. Friesner, and B. J. Berne. J. Phys. Chem.,98, 6885, (1994).

[5] M. Watanabe and M. Karplus. J. Phys. Chem., 99, 5680, (1995).

[6] P. Procacci and B. J. Berne. J. Chem. Phys.,1015, 2421, (1994).

[7] P. Procacci and M. Marchi. J. Chem. Phys., 104, 3003, (1996).

[8] L. Greengard and V. Rokhlin. J. Comput. Phys., 73, 325, (1987).

[9] K. E. Schmidt and M. A. Lee. J. Stat. Phys.,63, 1223, (1991).

[10] T. Darden, D. York, and L. Pedersen. J. Chem. Phys.,98, 10089, (1993).

[11] U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee, and L. G. Pedersen. J. Chem. Phys.,101, 8577, (1995).

[12] P. Procacci, T. Darden, and M. Marchi. J. Phys. Chem,100, 10464, (1996).

[13] H. Berendsen. Molecular Dynamics and Protein Structure. Polycrystal Book Service, Western Spring, Illinois, (1985).

[14] H. Lee, T. A. Darden, and L. G. Pedersen. J. Chem. Phys.,102, 3830, (1995).

[15] M. Saito. J. Chem. Phys.,101, 4055, (1994).

[16] P. Procacci and B. J. Berne. Mol. Phys.,83, 255, (1994).

[17] H. C. Andersen. J. Chem. Phys.,72, 2384, (1980).

[18] M. Parrinello and A. Rahman. Phys. Rev. Letters,45, 1196, (1980).

[19] S. Nose. J. Chem. Phys.,81, 511, (1984).

(32)

[20] B. R. Brooks, R. E. Bruccoeri, B. D. Olafson, D.J. States, S. Swaminanthan, and M. Karplus. J. Comput. Chem.,4, 187, (1983).

[21] S. J. Wiener, P. A. Kollmann, D. T. Nguyen, and D. A. Case. J. Comput. Chem., 7, 230,

(1986).

[22] W. D. Cornell, P. Cieplak, C. I. Bavly, I. R. Gould, K. M. Merz Jr., D. M. Ferguson, D. C. Spellmeyer, T. Fox, J. W. Caldwell, and P. Kollmann. J. Am. Chem. Soc., 117, 5179, (1995).

[23] G. Ciccotti, M. Ferrario, and J.-P. Ryckaert. Mol. Phys.,47, 1253, (1982).

[24] W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey, and M. L. Klein. J. Chem. Phys.,79, 926, (1983).

[25] M. P. Allen and D. J. Tildesley. Computer Simulation of Liquids. Oxford University Press, Walton Street, Oxford OX2 6DP, (1989).

[26] P. Ewald. Ann. Phys.,64, 253, (1921).

[27] S.W. deLeeuw, J. W. Perram, and E. R. Smith. Proc. R. Soc. London A,373, 27, (1980).

[28] P. E. Smith and B. M. Pettitt. J. Chem. Phys.,105, 4289, (1996).

[29] R. W. Hockney. Computer Simulation Using Particles. McGraw-Hill, New York, (1989). [30] H.G. Petersen. J. Chem. Phys.,103, 3668, (1995).

[31] M. E. Tuckerman and B. J. Berne. J. Chem. Phys.,95, 8362, (1991).

[32] M. E. Tuckerman, G. J. Martyna, and B. J. Berne. J. Chem. Phys.,94, 6811, (1991).

[33] M. E. Tuckerman, B. J. Berne, and A. Rossi. J. Chem. Phys., 94, 1465, (1990).

[35] M. E. Tuckerman and M. Parrinello. J. Chem. Phys.,101, 1302, (1994).

[36] M. E. Tuckerman and M. Parrinello. J. Chem. Phys.,101, 1316, (1994).

[37] H. de Raedt and B. De Raedt. Phys. Rev. A,28, 3575, (1983).

(33)

[38] H. Goldstein. Classical Mechanics. Addison-Wesley, Reading MA, (1980). [39] S. K. Grey. J. Chem. Phys., 101, 4062, (1994).

[40] E. Paci and M. Marchi. J. Phys. Chem.,104, 3003, (1996).

[41] M. Ferrario and J.-P. Ryckaert. Mol. Phys.,78, 7368, (1985).

[42] S. Nose. Prog. Theor. Phys. Supp.,103, 1, (1991).

[43] M. Ferrario. In M.P.Allen and D.J.Tildesley, editors, Computer Simulation in Chemical Physics, page 153. Kluwer Academic Publishers, (1993).

[44] G. Ciccotti and J.-P. Ryckaert. Computer Phys. Rep.,4, 345, (1986).

[45] G.L. Martyna, M.L. Klein, and M. Tuckerman. J. Chem. Phys.,97, 2635, (1992).

[46] J.-P. Ryckaert and G. Ciccotti. J. Chem. Phys.,78, 7368, (1983).

[47] S. Nose and M.L. Klein. Mol. Phys.,50, 1055, (1983).

[48] M. Marchi and P. Procacci. ORAC Manual and Guide. CECAM, Available at ftp.cecam.fr:/pub/orac/doc/manual.ps CECAM-ENS Lyon, (1997).

[49] J.J.P. Stewart. J. Comp. Chem.,10, 221, (1989).

[50] S. J. Weiner, P. A. Kollmann, D. A. Case, U. C. Singh, C. Ghio, G. Alagona, S. Profeta Jr., and P. Weiner. J. Am. Chem. Soc., 106, 765, (1984).

[51] S. Parkin, B. Rupp, and H. Hope. The structure of bovine pancreatic trypsin inhibitor at 125k: Denition of carboxyl-terminal residues glycine-57 and alanine-58. to be published.

[52] G. J. Martyna, M. E. Tuckerman, D. J. Tobias, and M. L. Klein. Mol. Phys.,87, 1117, (1996).

(34)

....

RESIDUE ala-h atoms

group

n n3 0.14140 h1 h 0.19970 h2 h 0.19970 h3 h 0.19970 group

ca ct 0.09620 ha h1 0.08890 group

cb ct -0.05970 hb1 hc 0.03000 hb2 hc 0.03000 hb3 hc 0.03000 group

c c 0.61630

o o -0.57220 end

bonds

cb ca n h1 n h2 n h3

n ca o c c ca ca ha

cb hb1 cb hb2 cb hb3 end

imphd

ca +n c o

end

termatom * c backbone n ca c RESIDUE_END ....

RESIDUE aceto ( Total Charge = 0.0 ) atoms

group

c0 c 0.7865

o o -0.5811

group

c1 ct -0.4573 h1 h1 0.1182 h2 h1 0.1182 h3 h1 0.1182 group

c2 ct -0.4573 h4 h1 0.1182 h5 h1 0.1182 h6 h1 0.1182 end

bonds

c0 o c0 c1 c0 c2 c1 h1 c1 h2 c1 h3 c2 h4 c2 h5 c2 h6 end

imphd

c2 o c0 c1 end

(35)

#

# Description Commands: #

&SETUP

CRYSTAL 35.0 35.0 35.0 90.0 90.0 90.0 READ_PDB bpti_xray.ent &END &SOLUTE STRETCHING QQ-FUDGE 0.83333 LJ-FUDGE 0.50 RESET_CM

SCALE_CHARGES 2 1 2 &END &PARAMETERS WRITE_PFR_BIN bpti_amber95.prmtpg READ_TPF_ASCII amber95.tpg READ_PRM_ASCII amber95.prm JOIN

arg-h pro asp phe cys leu glu pro pro tyr thr gly pro cys lys ala arg ile ile arg tyr phe tyr asn ala lys ala gly leu cys gln thr phe val tyr gly gly cys arg ala lys arg asn asn phe lys ser ala glu asp cys met arg thr cys gly gly ala-o po4 hoh x 167

END ADD_TPG

bond 1sg 2sg residue 5 55 bond 1sg 2sg residue 14 38 bond 1sg 2sg residue 30 51 END

&END #

# Simulation Commands: #

&SIMULATION MDSIM

TEMPERATURE 20.0 10.0 &END

&INTEGRATOR

TIMESTEP 16.0 MTS_RESPA

very_cold_start 0.1 step intra 2 step intra 2

END &END &POTENTIAL

EWALD pme 0.43 32 32 32 4 UPDATE 40.0 1.5

&END &RUN CONTROL 0 PROPERTY 496.0 REJECT 496.0 TIME 0.0 PRINT 2.0 &END #

# Output Commands: #

&INOUT

(36)

#

&SETUP

CRYSTAL 35.0 35.0 35.0 90.0 90.0 90.0 READ_PDB bpti1.pdb

INSERT 0.75

CELL sc 11 11 11 &END &SOLUTE STRETCHING QQ-FUDGE 0.83333 LJ-FUDGE 0.50 RESET_CM

SCALE_CHARGES 2 1 2 &END

&PARAMETERS

READ_PFR_BIN bpti_amber95.prmtpg &END

&SOLVENT

ATOM o 1 P 16.0 0.0 0.0 0.0

INTERACTION 2 1.6 0.0 0.41 STRETCHING 1 2 524.86 1.0 STRETCHING 1 3 524.86 1.0 BENDING 2 1 3 55.00 109.47 &END

#

&SIMULATION MDSIM

&INTEGRATOR

TIMESTEP 16.0 MTS_RESPA

END &END &POTENTIAL

EWALD pme 0.43 32 32 32 4 UPDATE 40.0 1.5

CUTOFF 10.0 &END &RUN CONTROL 0 PROPERTY 496.0 REJECT 992.0 TIME 0.0 PRINT 2.0 &END #

&INOUT

(37)

#

&SETUP

CRYSTAL 35.0 35.0 35.0 90.0 90.0 90.0 READ_PDB bpti2.pdb &END &SOLUTE STRETCHING HEAVY QQ-FUDGE 0.83333 LJ-FUDGE 0.50 &END &PARAMETERS READ_PFR_BIN bpti_amber95.prmtpg &END &SOLVENT MOLECULES 836

ATOM o 1 P 16.0 0.0 0.0 0.0

INTERACTION 2 1.6 0.0 0.41 CONSTRAINT 1 2

CONSTRAINT 1 3 CONSTRAINT 2 3 READ_PDB

&END #

&SIMULATION MDSIM

TEMPERATURE 300.0 20.0

ISOSTRESS PRESS-EXT 0.1 BARO-MASS 40.0 COMPR 5.3D-04 &END &INTEGRATOR TIMESTEP 1.0 SINGLE_STEP &END &POTENTIAL

EWALD pme 0.43 16 16 16 4 UPDATE 40.0 1.5

CUTOFF 10.0 &END &RUN CONTROL 0 PROPERTY 500.0 REJECT 2000.0 TIME 6000.0 PRINT 2.0 &END #

&INOUT

(38)

================================================================================

= =

= Averages over 6000.0 fs of Simulation =

= =

================================================================================

**********************************

* *

* Energies in _Kjoule/mole_ * * Temperatures in _Kelvin_ * * Pressures in MPascal *

* Volumes in A^3 *

* Distances in A *

* Stress Tensor in Joule/A**3 *

* *

**********************************

Total = -42610.755+/- 2.769 SlvPot = -29299.835+/- 541.694 SlvCoul = -33833.541+/- 711.731 SlvRec = -8600.088+/- .005 SlvReal = -25233.453+/- 711.731 SlvInt = .000+/- .000 SltTot = -8728.819+/- 465.326 SltPot = -13181.195+/- 394.840 SltCoul = -16530.580+/- 360.207 SltL-J = -1122.090+/- 62.765 SltHyd = .000+/- .000 SltBond = 4471.475+/- 99.464 SltStr = 747.711+/- 46.135 SltBen = 2208.239+/- 72.024 SltItor = 121.239+/- 13.797 SltPtor = 1394.286+/- 39.698 S-SPot = -11184.717+/- 1024.034 S-SCoul = -14698.337+/- 1354.610 SltTemp = 312.980+/- 7.890 SlvTrTem = 316.996+/- 8.186 SlvRoTem = 315.918+/- 8.556 TotTemp = 315.047+/- 4.918 SlvKin = 6598.854+/- 118.810 SltKin = 4452.376+/- 112.239

-- Fluctuating Box

---TotPre = .25+/- 54.889 ConPre = -111.14+/- 54.614 KinPre = 111.39+/- 3.019 TmpPre = 332.69+/- 463.58 Volume = 39509.82+/- 333.219 PV = 2.38+/- .020

... cell parameters .... ... + / - ... 34.0590 34.0590 34.0590 .0959396 .0959396 .0959396 90.0000 90.0000 90.0000 .0000000 .0000000 .0000000

... stress .... ... + / - ...

-.2559E+06 .2994E+05 .7462E+04 .17039E+06 .14961E+06 .18375E+06 .2877E+05 -.2441E+06 -.2443E+05 .15448E+06 .18867E+06 .17406E+06 .7778E+04 -.2968E+05 -.2741E+06 .12618E+06 .16754E+06 .20909E+06

(39)

#

# Description Commands: define the MD box with solute, # define solute topology

# &SETUP

CRYSTAL 34.0590 34.0590 34.0590 90.0 90.0 90.0 READ_PDB bpti3.pdb TEMPLATE bpti_xray.ent &END &SOLUTE STRETCHING QQ-FUDGE 0.83333 LJ-FUDGE 0.50 &END &PARAMETERS READ_PFR_BIN bpti_amber95.prmtpg &END &SOLVENT MOLECULES 836

ATOM o 1 P 16.0 0.0 0.0 0.0

INTERACTION 2 1.6 0.001 0.41 STRETCHING 1 2 524.86 1.0 STRETCHING 1 3 524.86 1.0 BENDING 2 1 3 55.00 109.47 READ_PDB

&END #

# Simulation Commands: NVE MD with RESPA with velocity scaling # at 50 K using Ewald - PME

#

&SIMULATION MDSIM

&INTEGRATOR

TIMESTEP 10.2 MTS_RESPA

END &END &POTENTIAL

EWALD pme 0.43 32 32 32 4 UPDATE 40.0 1.5

&END &RUN CONTROL 0 PROPERTY 500.0 REJECT 2000.0 TIME 20000.0 PRINT 2.0 &END #

# Output Commands: The restart and PDB files are dumped #

&INOUT

RESTART 400.0 OPEN bpti4.rst ASCII 6000.0 OPEN bpti4.pdb DUMP_RAND 10.2 OPEN bpti4.dmp &END

&PROPERTIES

X_RMS 50.0 OPEN bpti4.xrms X_RMS CA HEAVY BACKBONE VACF 4012.0 OPEN bpti4.vacf VACF_PRM 6963.2 3.4