SCF Techniques - 3 Electronic Structure - Introduction to Computational Chemistry

3 Electronic Structure

3.8 SCF Techniques

As discussed in Section 3.6, the Roothaan–Hall (or Pople–Nesbet for the UHF case) equations must be solved iteratively since the Fock matrix depends on its own solutions. The procedure illustrated in Figure 3.3 involves the following steps:

(1) Calculate all one- and two-electron integrals.

(2) Generate a suitable start guess for the MO coefficients.

(3) Form the initial density matrix.

(4) Form the Fock matrix as the core (one-electron) integrals + the density matrix times the two-electron integrals.

(5) Diagonalize the Fock matrix. The eigenvectors contain the new MO coefficients.

(6) Form the new density matrix. If it is sufficiently close to the previous density matrix, we are done, otherwise go to step 4.

There are several points hidden in this scheme. Will the procedure actually converge at all? Will the SCF solution correspond to the desired energy minimum (and not a maximum or saddle point)? Can the number of iterations necessary for convergence

be reduced? Does the most efficient method depend on the type of computer and/or the size of the problem?

Let us look at some of the SCF techniques used in practice.

3.8.1 SCF convergence

There is noguarantee that the above iterative scheme will converge. For geometries near equilibrium and using small basis sets, the straightforward SCF procedure often converges unproblematically. Distorted geometries (such as transition structures) and large basis sets containing diffuse functions, however, rarely converge, and metal com- plexes, where several states with similar energies are possible, are even more trouble- some. There are various tricks that can be tried to help convergence:¹²

(1) Extrapolation. This is a method for trying to make the convergence faster by extrapolating previous Fock matrices to generate a (hopefully) better Fock matrix than the one calculated directly from the current density matrix. Typically, the last three matrices are used in the extrapolation.

(2) Damping. The reason for divergence, or very slow convergence, is often due to oscillations. A given density matrix D_ngives a Fock matrix F_n, which, upon diago- nalization, gives a density matrix D_n+1. The Fock matrix F_n+1 from D_n+1 gives a density matrix D_n+2that is close to D_n, but D_nand D_n+1are very different, as illustrated in Figure 3.5. The damping procedure tries to solve this by replacing the current density matrix with a weighted average,D′n+1= wDn + (1 − w)D_n+1. The weighting factor w may be chosen as a constant or changed dynamically during the SCF procedure.

Iterations Density

Converged value

D₀F₀

D₁F₁ D₂F₂

D₃F₃ D₄F₄

D₅F₅

Figure 3.5 An oscillating SCF procedure

(3) Level shifting. This technique¹³is perhaps best understood in the formulation of a rotation of the MOs that form the basis for the Fock operator (Section 3.6). At convergence, the Fock matrix elements in the MO basis between occupied and virtual orbitals are zero. The iterative procedure involves mixing (making linear combinations of) occupied and virtual MOs. During the iterative procedure, these mixings may be large, causing oscillations or making the total energy increase. The degree of mixing may be reduced by artificially increasing the energy of the virtual orbitals. If a sufficiently large constant is added to the virtual orbital energies, it can be shown that the total energy is guaranteed to decrease, thereby forcing convergence. The more the virtual orbitals are raised in energy, the more stable is the

convergence, but the rate of convergence also decreases with level shifting. For large enough shifts, convergence is guaranteed, but it is likely to occur very slowly, and may in some cases converge to a state that is not the ground state.

(4) Direct inversion in the iterative subspace(DIIS). This procedure was developed by P. Pulay and is an extrapolation procedure.¹⁴It has proved to be very efficient in forcing convergence and in reducing the number of iterations at the same time, and it is now one of the most commonly used methods for helping SCF convergence. The idea is as follows. As the iterative procedure runs, a sequence of Fock and density matrices (F₀,F₁,F₂, . . . and D₀,D₁,D₂, . . .) are produced. At each iteration, it is also assumed that an estimate of the “error” (E₀,E₁,E₂, . . .) is available, i.e. how far the current Fock/density matrix is from the converged solution. The converged solution has an error of zero, and the DIIS method forms a linear com- bination of the error indicators that, in a least squares sense, is a minimum (as close to zero as possible). In the function space generated by the previous iterations we try to find the point with lowest error, which is not necessarily one of the points actually calculated. It is common to use the trace (sum of diagonal elements) of the matrix product of the error matrix with itself as a scalar indicator of the error.

(3.64)

Minimization of the ErrF subject to the normalization constraint is handled by the Lagrange method (Section 12.5), and leads to the following set of linear equations, where lis the multiplier associated with the normalization.

(3.65)

In iteration n the A matrix has dimension (n + 1) × (n + 1), where n usually is less than 20. The coefficients c can be obtained by directly inverting the A matrix and multiplying it onto the bvector, i.e. in the “subspace” of the “iterations”

the linear equations are solved by “direct inversion”, thus the name DIIS.

Having obtained the coefficients that minimize the error function at iteration n, the same set of coefficients is used for generating an extrapolated Fock matrix (F*) at iteration n, which is used in place of F_n for generating the new density matrix.

a a a

c c c a

n n

nn n

ij i j

11 21

1 12 22

1 2

1 1 1

1 1 1 0

0 0 0 1 M M

L L O L L

M M M M

− − −

−















 −

















−

















= ( ⋅ )

l trace E E

Ac b c==A b⁻¹

ErrFc trace E E

E E

( )= ( ⋅ )

+ +

∑

n n

n i i

i n

i i

c c

1 1

1 0

(3.66) The only remaining question is the nature of the error function. Pulay suggested the difference FDS−SDF(Sis the overlap matrix), which is related to the gradient of the SCF energy with respect to the MO coefficients, and this has been found to work well in practice. A closely related method uses the energy as the error indicator, and has the acronym EDIIS.¹⁵

(5) “Direct minimization” techniques. The variational principle indicates that we want to minimize the energy as a function of the MO coefficients or the corresponding density matrix elements, as given by eq. (3.54). In this formulation, the problem is no different from other types of non-linear optimizations, and the same types of technique, such as steepest descent, conjugated gradient or Newton–Raphson methods can be used (see Chapter 12 for details).

As mentioned in Section 3.6, the variational procedure can be formulated in terms of an exponential transformation of the MOs, with the (independent) variational parameters contained in an Xmatrix. Note that the X-variables are pre- ferred over the MO coefficients in eq. (3.54) for optimization, since the latter are not independent (the MOs must be orthonormal). The exponential may be written as a series expansion, and the energy expanded in terms of the X-variables describ- ing the occupied–virtual mixing of the orbitals.¹⁶

(3.67) The first and second derivatives of the energy with respect to the X-variables (E′(0) and E″(0)) can be written in terms of Fock matrix elements and two-electron integrals in the MO basis.¹⁷ For an RHF type wave function these are given in eq.

(3.68).

(3.68)

The gradient of the energy is an off-diagonal element of the molecular Fock matrix, which is easily calculated from the atomic Fock matrix. The second derivative, however, involves two-electron integrals that require an AO to MO transformation (see Section 4.2.1), and is therefore computationally expensive.

In a density matrix formulation, the energy depends on the density matrix elements as variables, and can formally be written as the trace of the contraction of the density matrix with the one-electron matrix hand the two-electron matrix G, with the latter depending implicitly on D.

(3.69) The density matrix elements cannot be varied freely, however, as the orbitals must remain orthonormal, and this constraint can be formulated as the density matrix having to be idempotent,DSD=D. It is difficult to ensure this during an

E( )D =^trace( )Dh +^trace(DG D( ))

∂

∂ =

∂

∂ ∂ = − +

− −









E x E x x

i a

ia jb

ij a b ab i j

i b a j i j a b i a j b

4 4

f f

d f f d f f

f f f f f f f f f f f f F

F F

e^X X XX

X 0 0 X X 0 X

= + + +

( )= ( )+ ′( ) + ′′( ) + 1 ¹₂

1 2

. . .

E E E E

Fn iFi i

*= c

∑

= 0

optimization step, but the non-idempotent density matrix derived from taking an optimization step can be “purified” by the McWeeny procedure.¹⁸

(3.70) The idempotency condition ensures that each orbital is occupied by exactly one electron. E. Cancès has shown that relaxing this condition to allow fractional occupancy during the optimization improves the convergence, a procedure named relaxed constraint algorithm(RCA)¹⁹and which was subsequently improved using ideas from the DIIS algorithm, leading to the EDIIS(Energy DIIS) method.¹⁵The optimization in terms of density matrix elements has the potential advantage that the matrix becomes sparse for large systems, and can therefore be solved by techniques that scale linearly with the system’s size.²⁰

The Newton–Raphson method has the advantage of being quadratically convergent, i.e. sufficiently near the minimum it converges very fast. The main problem in using Newton–Raphson methods for wave function optimization is computational efficiency. The exact calculation of the second derivative matrix is somewhat demanding, and each iteration in a Newton–Raphson optimization therefore takes longer than the simple Roothaan–Hall iterative scheme. Owing to the fast convergence near the minimum, a Newton–Raphson approach normally takes fewer iterations than for example DIIS, but the overall computational time is still a factor of ~2 longer. Alternative schemes, where an approximation to the second derivative matrix is used (pseudo-Newton–Raphson), have also been developed, and they are often competitive with DIIS.²¹It should be kept in mind that the simple Newton–Raphson is unstable, and requires some form of stabilization, for example by using the augmented Hessian techniques discussed in Section 12.2.²²Alterna- tively, for large system (thousands of basis functions) the optimization may be carried out by conjugate gradient methods, but the convergence characteristic of these methods is significantly poorer.²³ Direct minimization methods have the advantage of a more stable convergence for difficult systems, where DIIS may display problematic behaviour or converge to solutions that are not the global minimum.

3.8.2 Use of symmetry

From group theory it may be shown that an integral can only be non-zero if the inte- grand belongs to the totally symmetric representation. Furthermore, the product of two functions can only be totally symmetric if they belong to the same irreducible representation. As both the Hamiltonian and Fock operators are totally symmetric (otherwise the energy would change by a rotation of the coordinate system), integrals of the following type can only be non-zero if the basis functions involving the same electron coordinate belong to the same representation.

(3.71) Similar considerations hold for the two-electron integrals.

By forming suitable linear combinations of basis functions (symmetry-adaptedfunctions), many one- and two-electron integrals need not be calculated as they are known to be exactly zero owing to symmetry. Furthermore, the Fock (in an HF calculation)

ca( ) ( )1cb 1 ₁ ca( )1 cb( )1 ₁ ca( )1 cb( )1 ₁

∫

^d^r

∫

^F ^d^r

∫

^H ^d^r

D_purified=3D²−2D³

or Hamiltonian matrix (in a configuration interaction (CI) calculation) will become block-diagonal, as only matrix elements between functions having the same symmetry can be non-zero. The saving depends on the specific system, but as a guideline the computational time is reduced by roughly a factor corresponding to the order of the point group (number of symmetry operations). Although the large majority of molecules do not have any symmetry, a sizeable proportion of the small molecules for which ab initio electronic structure calculations are possible are symmetric. Almost all ab initioprograms employ symmetry as a tool for reducing the computational effort.

3.8.3 Ensuring that the HF energy is a minimum, and the correct minimum The standard iterative procedure produces a solution where the variation of the HF energy is stationarywith respect to all orbital variations, i.e. the first derivatives of the energy with respect to the MO coefficients are zero. In order to ensure that this corresponds to an energy minimum, the second derivatives should also be calculated.²⁴ This is a matrix the size of the number of occupied MOs multiplied by the number of virtual MOs (identical to that arising in quadratic convergent SCF methods (Section 3.8.1)), and the eigenvalues of this matrix should all be positive in order to be an energy minimum. Of course only the lowest eigenvalue is required to probe whether the solution is a minimum. A negative eigenvalue means that it is possible to get to a lower energy state by “exciting” an electron from an occupied to an unoccupied orbital, i.e.

the solution is unstable. In practice, the stability is rarely checked – it is assumed that the iterative procedure has converged to a minimum. It should be noted that a positive definite second-order matrix only ensures that the solution is a localminimum;

there may be other minima with lower energies.

The problem of convergence to saddle points in the wave function parameter space and the existence of multiple minima is rarely a problem for systems composed of elements from the first two rows in the periodic table. For systems having more than one metal atom with several partially filled d-orbitals, however, care must be taken to ensure that the iterative procedure converges to the desired solution. Consider for example the Fe₂S₂system in Figure 3.6, where the d-electrons of two Fe atoms are coupled through the sulfur bridge atoms.

S S

R R

Fe Fe

R R R

High-spin coupling

Low-spin coupling

Figure 3.6 Two different singlet states generated by coupling either two high-spin or two low-spin states

Each of the two Fe atoms is formally in the +III oxidation state, and therefore has a d⁵configuration. A high-spin state corresponding to all the ten d-electrons being aligned can readily be described by a singledeterminant wave function, but the situation is more complicated for a low-spin singlet state. A singlet HF wave function must have an equal number of orbitals with aand belectron spin, but this can be obtained in several different ways. If each metal atom is in a high-spin state, an overall singlet state must have all the d-orbitals on one Fe atom occupied by electrons with aspin, while all the d-orbitals on the other Fe atom must be occupied by electrons with bspin.

An alternative singlet state, however, can be generated by coupling the single unpaired electron from the two Fe centres in a low-spin configuration. Each of these two wave functions will be valid minima in the orbital parameter space, but clearly describe com- plexes with different properties. Note also that neither of these two singlet wave functions can be described by an RHF type wave function. UHF type wave functions with the above two types of spin coupling can be generated, but will often be severely spin contaminated. One can consider other spin coupling schemes to generate an overall singlet wave function, and the situation becomes more complicated if intermediate (triplet, pentet, etc.) spin states are desired, and for mixed valence states (Fe²⁺/Fe³⁺).

The complications further increase when larger clusters are considered, as for example with the Fe₄S₄moiety involved in electron transfer in the photosystem I and nitroge- nase enzymes.

The question as to whether the energy is a minimum is closely related to the concept of wave function stability. If a lower energy RHF solution can be found, the wave function is said to possess a singlet instability. It is also possible that an RHF type wave function is a minimum in the coefficient space, but is a saddle point if the constraint of double occupancy of each MO is relaxed. This indicates that a lower energy wave function of the UHF type can be constructed, and this is called a triplet instability. It should be noted that in order to generate such UHF wave functions for a singlet state, an initial guess of the SCF coefficients must be specified that has the spatial parts of at least one set of a and bMOs different. There are other types of such instabilities, such as relaxing the constraint that the MOs should be real (allowing complex orbitals), or the constraint that a MO should only have a single spin function. Relaxing the latter produces the “general” HF method, where each MO is written as a spatial part having a spin plus another spatial part having b spin.²⁵ Such wave functions are no longer eigenfunctions of the S_zoperator, and are rarely used.

Another aspect of wave function instability concerns symmetry breaking, i.e. the wave function has a lower symmetry than the nuclear framework.²⁶ It occurs, for example, with the allyl radical with an ROHF type wave function. The nuclear geom- etry has C_2v symmetry, but the C_2v symmetric wave function corresponds to a (first- order) saddle point. The lowest energy ROHF solution has only C_s symmetry, and corresponds to a localized double bond and a localized electron (radical). Relaxing the double occupancy constraint, and allowing the wave function to become UHF, re- establishes the correct C_2v symmetry. Such symmetry breaking phenomena usually indicate that the type of wave function used is not flexible enough for even a qualita- tively correct description.

3.8.4 Initial guess orbitals

The quality of the initial guess orbitals influences the number of iterations necessary for achieving convergence. As each iteration involves a computational effort propor- tional to M⁴_basis, it is of course desirable to generate as good a guess as possible. Dif- ferent start orbitals may in some cases result in convergence to different SCF solutions, or make the difference between convergence and divergence. One possible way of generating a set of start orbitals is to diagonalize the Fock matrix consisting only of the one-electron contributions, the “core” matrix. This corresponds to initializing the density matrix as a zero matrix, totally neglecting the electron–electron repulsion in the first step. This is generally a poor guess, but it is available for all types of basis set and is easily implemented. Essentially all programs therefore have it as an option.

More sophisticated procedures involve taking the start MO coefficients from a semi- empirical calculation, such as Extended Hückel Theory (EHT) or Intermediate Neglect of Differential Overlap (INDO) (Sections 3.13 and 3.10). The EHT method has the advantage that it is readily parameterized for all elements, and it can provide start orbitals for systems involving elements from essentially the whole periodic table.

An INDO calculation normally provides better start orbitals, but at a price. The INDO calculation itself is iterative, and it may suffer from convergence problems, just as the ab initioSCF itself.

Many systems of interest are symmetric. The MOs will transform as one of the irreducible representations in the point group, and most programs use this to speed up calculations. The initial guess for the start orbitals involves selecting how many MOs of each symmetry should be occupied, i.e. the electron configuration.

Different start configurations produce different final SCF solutions. Many programs automatically select the start configuration based on the orbital energies of the start- ing MOs, which may be “wrong” in the sense that it does not produce the desired solution. Of course, a given solution may be checked to see if it actually corresponds to an energy minimum, but as stated above, this is rarely done. Furthermore, there may be several (local) minima, thus the verification that the found solution is an energy minimum is no guarantee that it is the global minimum. A particular case is open-shell systems having at least one element of symmetry, as the open-shell orbital(s) determine the overall wave function symmetry. An example is the N₂⁺radical cation, where two states of Σgand Πusymmetry exist with a difference of only ~70 kJ/mol in energy.

The reason different initial electron configurations may generate different final solutions is because matrix elements between orbitals belonging to different representations are exactly zero, thus only orbitals belonging to the same representation can mix.

Forcing the program to run the calculation without symmetry usually does not help.

Although turning the symmetry off will make the program actually calculate all matrix elements, those between MOs of different symmetry will still be zero (except for numerical inaccuracies). It is therefore often necessary to specify manually which orbitals should be occupied initially to generate the desired solution.

Dalam dokumen Introduction to Computational Chemistry (Halaman 121-134)