Force Field Parameterization - 2 Force Field Methods

2 Force Field Methods

2.3 Force Field Parameterization

Having settled on the functional description and a suitable number of cross terms, the problem of assigning numerical values to the parameters arises. This is by no means trivial.³⁶Consider for example MM2(91) with 71 atom types. Not all of these can form

stable bonds with each other, hydrogens and halogens can only have one bond, etc.

For the sake of argument, however, assume that the effective number of atom types capable of forming bonds between each other is 30.

• Each of the 71 atom types has two van der Waals parameters,R₀^Aand e^A, giving 142 parameters.

• There are approximately ¹/2×30 ×30 =450 possible different E_strterms, each requiring at least two parameters,k^ABand R₀^AB, for a total of at least 900 parameters.

• There are approximately ¹/2×30 ×30 × 30 = 13 500 possible different E_bendterms, each requiring at least two parameters,k^ABCand q0ABC, for a total of at least 27 000 parameters.

• There are approximately ¹/2 × 30 × 30 × 30 × 30 = 405 000 possible different E_tors terms, each requiring at least three parameters,V₁ÂBCD,V₂ÂBCDand V₃ÂBCD, for a total of at least 1 215 000 parameters.

• Cross terms may add another million possible parameters.

To achieve just a rudimentary assignment of the value of one parameter, at least 3–4 independent data should be available. To parameterize MM2 for all molecules described by the 71 atom types would thus require of the order of 10⁷independent experimental data, not counting cross terms, which clearly is impossible. Furthermore, the parameters that are the most numerous, the torsional constants, are also the ones that are the hardest to obtain experimental data for. Experimental techniques nor- mally probe a molecule near its equilibrium geometry. Getting energetical information about the whole rotational profile is very demanding and has only been done for a handful of small molecules. In recent years, it has therefore become common to rely on data from electronic structure calculations to derive force field parameters. Calcu- lating for example rotational energy profiles is computationally fairly easy. The so- called “Class II” and “Class III” force fields rely heavily on data from electronic structure calculations to derive force field parameters, especially the bonded parameters (stretch, bend and torsional).

While the non-bonded terms are relatively unimportant for the “local” structure, they are the only contributors to intermolecular interactions, and the major factor in determining the global structure of a large molecule, such as protein folding. The electrostatic part of the interaction may be assigned based on fitting parameters to the electrostatic potential derived from an electronic wave function, as discussed in Section 2.2.6. The van der Waals interaction, however, is difficult to calculate reliably by electronic structure methods, requiring a combination of electron correlation and very large basis sets, and these parameters are therefore usually assigned based on fitting to experimental data for either the solid or liquid state.³⁷

For a system containing only a single atom type (e.g. liquid argon), the R₀(atomic size) and e(interaction strength) parameters can be determined by requiring that the experimental density and heat of evaporation are reproduced, respectively. Since the parameterization implicitly takes many-body effects into account, a (slightly) different set of van der Waals parameters will be obtained if the parameterization instead focuses on reproducing the properties of the crystal phase. For systems where several atom types are involved (e.g. water), there are two van der Waals parameters for each atom type, and the experimental density and heat of evaporation alone therefore give insufficient data for a unique assignment of all parameters. Although one may include

additional experimental data, for example the variation of the density with tempera- ture, this still provides insufficient data for a general system containing many atom types. Furthermore, it is possible that several combinations of van der Waals parameters for different atoms may be able to reproduce properties of a liquid, i.e. even if there are sufficient experimental data, the derived parameter set may not be unique.

One approach for solving this problem is to use electronic structure methods to determine relativevalues for van der Waals parameters, for example using a neon atom as the probe, and determine the absolute values by fitting to experimental values.³⁸

An alternative procedure is to derive the van der Waals parameters from other physical (atomic) properties. The interaction strength eijbetween two atoms is related to the polarizabilities aiand aj, i.e. the ease with which the electron densities can be dis- torted by an electric field. The Slater–Kirkwood equation³⁹(2.37) provides an explicit relationship between these quantities, which has been found to give good results for the interaction of rare gas atoms.

(2.37)

Here Cis a constant for converting between the units of eand a, and Nieffis the effective number of electrons, which may be taken either as the number of valence electrons or treated as a fitting parameter. The R₀parameter may similarly be taken from atomic quantities. One problem with this procedure is that the atomic polarizability will of course be modified by the bonding situation (i.e. the atom type), which is not taken into account by the Slater–Kirkwood equation.

The above considerations illustrate the inherent contradiction in designing highly accurate force fields. To get a high accuracy for a wide variety of molecules, and a range of properties, many functional complex terms must be included in the force field expression. For each additional parameter introduced in an energy term, the potential number of new parameters to be derived grows with the number of atom types to a power between 1 and 4. The higher accuracy that is needed, the more finely the fundamental units must be separated, i.e. the more atom types must be used. In the extreme limit, each atom that is not symmetry related, in each new molecule is a new atom type. In this limit, each molecule will have its own set of parameters to be used just for this one molecule. To derive these parameters, the molecule must be subjected to many different experiments, or a large number of electronic structure calculations.

This is the approach used in “inverting” spectroscopic data to produce a potential energy surface. From a force field point of view, the resulting function is essentially worthless, it just reproduces known results. In order to be useful, a force field should be able to predict unknown properties of molecules from known data on other molecules, i.e. a sophisticated form for inter- or extrapolation. If the force field becomes very complicated, the amount of work required to derive the parameters may be larger than the work required for measuring the property of interest for a given molecule.

The fundamental assumption of force fields is that structural units are transferable between different molecules. A compromise between accuracy and generality must thus be made. In MM2(91) the actual number of parameters compared with the

e a a

a a

i j i i

j j

N N

eff + eff

theoretical estimated possible (based on the 30 effective atom types above) is shown in Table 2.3.

As seen from Table 2.3, there are a large number of possible compounds for which there are no parameters, and on which it is then impossible to perform force field calculations (a good listing of available force field parameters is Osawa and Lipkowitz⁴⁰).

Actually, the situation is not as bad as it would appear from Table 2.3. Although only

~0.2% of the possible combinations for the torsional constants has been parameterized, these encompass the majority of the chemically interesting compounds. It has been estimated that ~20% of the ~15 million known compounds can be modelled by the parameters in MM2, the majority with a good accuracy. However, the problem of lacking parameters is very real, and anyone who has used a force field for all but the most rudimentary problems has encountered the problem. How does one progress if there are insufficient parameters for the molecule of interest?

There are two possible routes. The first is to estimate the missing parameters by comparison with force field parameters for similar systems. If, for example, there are missing torsional parameters for rotation around a H—X—Y—O bond in your molecule, but parameters exist for H—X—Y—C, then it is probably a good approximation to use the same values. In other cases, it may be less obvious what to choose. What if your system has an O—X—Y—O torsion, and parameters exist for O—X—Y—C and C—X—Y—O, but they are very different? What do you choose then, one or the other, or the average? After a choice has been made, the results should ideally be evaluated to determine how sensitive they are to the exact value of the guessed parameters. If the guessed parameters can be varied by ±50% without seriously affecting the final results, the property of interest is insensitive to the guessed parameters, and can be trusted to the usual degree of the force field. If, on the other hand, the final results vary by a factor of two when the guessed parameters are changed by 10%, a better estimate of the critical parameters should be sought from external sources. If many parameters are missing from the force field, such an evaluation of the sensitivity to parameter changes becomes impractical, and one should consider either the second route described below, or abandon force field methods altogether.

The second route to missing parameters is to use external information, experimental data or electronic structure calculations. If the missing parameters are bond length and force constant for a specific bond type, it is possible that an experimental bond distance may be obtained from an X-ray structure and the force constant estimated from measured vibrational frequencies, or missing torsional parameters may be obtained from a rotational energy profile calculated by electronic structure calculations. If many parameters are missing, this approach rapidly becomes very time- Table 2.3 Comparison of possible and actual number of MM2(91) parameters

Term Estimated number of parameters Actual number of parameters

Evdw 142 142

Estr 900 290

Ebend 27 000 824

Etors 1 215 000 2466

consuming, and may not give as good final results as you may have expected from the

“rigorous” way of deriving the parameters. The reason for this is discussed below.

Assume now that the functional form of the force field has been settled. The next task is to select a set of reference data – for the sake of argument let us assume that they are derived from experiments, but they could also be taken from electronic structure calculations. The problem is then to assign numerical values to all the parameters such that the results from force field calculations match the reference data set as close as possible. The reference data may be of very different types and accuracy, containing bond distances, bond angles, relative energies, vibrational frequencies, dipole moments, etc. These data of course have different units, and a decision must be made how they should be weighted. How much weight should be put on reproducing a bond length of 1.532 Å relative to an energy difference of 10 kJ/mol? Should the same weight be used for all bond distances, if for example one distance is determined to 1.532 ± 0.001 Å while another is known only to 1.73 ± 0.07 Å? The selection is further complicated by the fact that different experimental methods may give slightly different answers for say the bond distance, even in the limit of no experimental uncertainty. The reason for this is that different experimental methods do not measure the same property. X-ray diffraction, for example, determines the electron distribution, while microwave spectroscopy primarily depends on the nuclear position.

The maximum in the electronic distribution may not be exactly identical to the nuclear position, and these two techniques will therefore give slightly different bond lengths.

Once the question of assigning weights for each reference data has been decided, the fitting process can begin. It may be formulated in terms of an error function.⁴¹

(2.38) The problem is to find the minimum of ErrF with the parameters as variables. From an initial set of guess parameters, force field calculations are performed for the whole set of reference molecules and the results compared with the reference data. The devi- ation is calculated and a new improved set of parameters can be derived. This is con- tinued until a minimum has been found for the ErrF function. To find the best set of force field parameters corresponds to finding the global minimum for the multi- dimensional ErrF function. The simplest optimization procedure performs a cyclic minimization, reducing the ErrF value by varying one parameter at a time. More advanced methods rely on the ability to calculate the gradient (and possibly also the second derivative) of the ErrF with respect to the parameters. Such information may be used in connection with optimization procedure as described in Chapter 12.

The parameterization process may be done sequentially or in a combined fashion.

In the sequential method, a certain class of compounds, such as hydrocarbons, is parameterized first. These parameters are held fixed, and a new class of compounds, for example alcohols and ethers, are then parameterized. This method is in line with the basic assumption of force field, i.e. that parameters are transferable. The advantage is that only a fairly small number of parameters is fitted at a time. The ErrF is therefore a relatively low dimensional function, and one can be reasonably certain that a “good”

minimum has been found (although it may not be the global minimum). The ErrF parameters

( )=data

∑

^weight_i⋅(reference value−calculated value)_i

disadvantage is that the final set of parameters necessarily provides a poorer fit (as defined from the value of the ErrF) than if all the parameters are fitted simultaneous.

The combined approach tries to fit all the constants in a single parameterization step.

Considering that the number of force field parameters may be many thousands, it is clear that the ErrF function will have a very large number of local minima. To find the global minimum of such a multivariable function is very difficult. It is thus likely that the final set of force field parameters derived by this procedure will in some sense be less than optimal, although it may still be “better” than that derived by the sequential procedure. Furthermore, many of the parameter sets that give low ErrF values (includ- ing the global minimum) may be “non-physical”, e.g. force constants for similar bonds being very different. Due to the large dimensionality of the problem, such combined optimizations require the ability to calculate the gradient of the ErrF with respect to the parameters, and writing such programs is not trivial. There is also a more fundamental problem when new classes of compounds are introduced at a later time than the original parameterization. To be consistent, the whole set of parameters should be re-optimized. This has the consequence that (all) parameters change when a new class of compounds is introduced, or whenever more data are included in the reference set.

Such “time-dependent” force fields are clearly not desirable. Most parameterization procedures therefore employ a sequential technique, although the number of com- pound types parameterized in each step varies.

There is one additional point to be mentioned in the parameterization process that is also important for understanding why the addition of missing parameters by comparison with existing data or from external sources is somewhat problematic.

This is the question of redundant variables, as can be exemplified by considering acetaldehyde.

H O

CH₃ C

Figure 2.17 The structure of acetaldehyde

In the energy bend expression there will be four angle terms describing the geometry around the carbonyl carbon, an HCC, an HCO, a CCO, and an out-of-plane bend.

Assuming the latter to be zero for the moment, it is clear that the other three angles are not independent. If the qHCOand qCCOangle are given, the qHCCangle must be 360°

−qHCO − qCCO. Nevertheless, there will be three natural angle parameters, and three force constants associated with these angles. For the whole molecule there are six stretch terms, nine bending terms and six torsional terms (count them!) in addition to at least one out-of-plane term. This means that the force field energy expression has 22 degrees of freedom, in contrast to the 15 (3N_atom − 6) independent coordinates necessary to completely specify the system. The force field parameters, as defined by the E_FFexpression, are therefore not independent.

The implicit assumption in force field parameterization is that, given sufficient amounts of data, this redundancy will cancel out. In the above case, additional data for other aldehydes and ketones may be used (at least partly) for removing this

ambiguity in assigning angle bend parameters, but in general there are more force field parameters than required for describing the system. This clearly illustrates that force field parameters are just that, parameters. They do not necessarily have any direct connection with experimental force constants. Experimental vibrational frequencies can be related to a unique set of force constants, but only in the context of a non- redundant set of coordinates.

It is also clear that errors in the force field due to inadequacies in the functional forms used for each of the energy terms will to some extent be absorbed by the parameter redundancy. Adding new parameters from external sources, or estimating missing parameters by comparison with those for “similar” fragments, may partly destroy this cancellation of errors. This is also the reason why parameters are not transferable between different force fields, the parameter values are dependent on the functional form of the energy terms, and are mutually correlated. The energy profile for rotating around a bond, for example, contains contributions from the electrostatic, the van der Waals and the torsional energy terms. The torsional parameters are therefore intimately related to the atomic partial charges, and cannot be transferred to another force field.

The parameter redundancy is also the reason that care should be exercised when trying to decompose energy differences into individual terms. Although it may be possible to rationalize the preference of one conformation over another by for example increased steric repulsion between certain atom pairs, this is intimately related to the chosen functional form for the non-bonded energy, and the balance between this and the angle bend/torsional terms. The rotational barrier in ethane, for example, may be reproduced solely by an HCCH torsional energy term, solely by an H—H van der Waals repulsion or solely by H—H electrostatic repulsion. Different force fields will have (slightly) different balances of these terms, and while one force field may con- tribute a conformational difference primarily to steric interactions, another may have the major determining factor to be the torsional energy, and a third may “reveal” that it is all due to electrostatic interactions.

2.3.1 Parameter reductions in force fields

The overwhelming problem in developing force fields is the lack of enough high quality reference data. As illustrated above, there are literally millions of possible parameters in even quite simple force fields. The most numerous of these are the torsional parameters, followed by the bending constants. As force fields are designed for predicting properties of unknown molecules, it is inevitable that the problem of lacking parameters will be encountered frequently. Furthermore, many of the existing parameters may be based on very few reference data, and therefore be associated with substantial uncertainty.

Many modern force field programs are commercial. Having the program tell the user that his or her favourite molecule cannot be calculated owing to lack of parameters is not good for business. Making the user derive new parameters, and getting the program to accept them, may require more knowledge than the average user, who is just inter- ested in the answer, has. Many force fields thus have “generic” parameters. This is just a fancy word for the program making more or less educated guesses for the missing parameters.

Dalam dokumen Introduction to Computational Chemistry (Halaman 72-83)