Sarah (Sally) L. Price
Department of Chemistry, University College London, London, U.K.
INTRODUCTION
A computational method of predicting all polymorphs of a given pharmaceutical molecule, and the conditions under which they could be found, requires a funda- mental understanding of the causes of polymorphism. A computational model would only be reliable if it incorporated all the factors that can affect which poly- morphs can be found. Given the diversity of methods that can generate new poly- morphs (1) , and the disappearance of polymorphs due to changes in impurity profi les (2) , modelling all relevant factors currently seems almost impossible. At the moment, we can aspire to compute the crystal energy landscape, the set of structures that are thermodynamically feasible, for a specifi c compound (3) . We can predict the most thermodynamically stable structure that should exist at specifi ed thermo- dynamic conditions, if we have performed the calculation of the relative energies suffi ciently accurately. Currently, this is the only crystal structure that can be pre- dicted, by assuming thermodynamic control of crystallization. However, comparing the other low-energy structures on the computed crystal energy landscape with each other and with the known polymorphs can provide considerable insight into the possible solid form diversity (4) . Using computational modelling in conjunction with the experimentally determined crystal structures can help provide an atomic level picture of the factors that are infl uencing the crystallization of a molecule, from guiding the experimental search to seek polymorphs with alternative packing motifs, to using the similarity between predicted structures to suggest the likely forms of disorder or crystal growth problems. Gaining a molecular level of under- standing of crystallization presents challenges to both experimental characteriza- tion of solids and nucleation processes (5) , and computational chemistry (6) . Thus, this chapter seeks to demonstrate the types of insight into polymorphism that can come from combining various computational tools with experimental work, with due allowance for the limitations of the complementary techniques.
STRUCTURE COMPARISON TOOLS
Many visualization tools are available for viewing organic crystal structures, but their three-dimensional nature often makes even qualitative comparison diffi cult, and quantifying similarity is a challenge. Some methods are demonstrated by comparing pairs of structures of acetaminophen, aspirin, and eniluracil. In all three cases the two structures have the same types of hydrogen bonds, and the same
3
graph sets (7) . In acetaminophen (paracetamol), the two polymorphs (8) are held together by O–H ⋅⋅⋅ O=C C(9) and N–H ⋅⋅⋅ O(H) C(7) hydrogen-bonded chains, but form I has highly undulating sheets, whereas the sheets are almost fl at in form II.
An attempt to overlay the 15 molecule coordination sphere of the two structures, using the Compack methodology (9) (incorporated in the CalculateSimilarity facility in Mercury (10) ), shows that only one molecule can be overlaid within the default tolerance of 20% in the atom–atom distances and 20° in the angles. The relative orientations of the other molecules are very different ( Fig. 1A ) despite having the same hydrogen bonding pattern. The conformations are very similar, as shown by the RMSD 1 value. [RMSD n is the minimum root-mean-square difference in the non- hydrogen atom positions for the n molecules that can be overlaid of the 15 (default value) coordination cluster of the two structures]. These differences in the packing are also evident in the Hirschfi eld ( 11 , 12 ) surfaces ( Fig. 1B ), which are defi ned by the surface where the molecule contributes half of the model for the electron den- sity in the crystal (11) . These shapes, particularly when color coded to show the nearest intermolecular atom distances, quickly show up the differences in packing.
Other derived plots can assist the structure comparison (11) . As would be expected, the simulated powder patterns of the two crystal structures are obviously different ( Fig. 1C ). The similarity in some peaks can be quantifi ed (13) using the program CalculateSimilarity (14) .
Aspirin illustrates a case where the differences between the two structures are more subtle. Eleven of the 15 molecule coordination group of the recently published structure for form II (15) can be overlaid with form I (16) to give an RMSD 11 of 0.07 Å ( Fig. 2A ). The two structures have the same hydrogen bonded layers (15) , but these stack with different C–H ⋅⋅⋅ O interactions, which can be seen as small differences in the acetyl region of the Hirschfi eld surfaces ( Fig. 2B ). The comparison of the energetic fi ngerprints (17) (the center of mass distance, symmetry relationship, and the compo- nents of the interaction energy between a central molecule and each of its coordinating molecules) of the crystal structures adds further clarity to the debate as to whether these two structures should be considered as polymorphs (18) . In this case, the simu- lated powder patterns are very similar ( Fig. 2C ), with a CalculateSimilarity (14) index of 0.96. It is therefore diffi cult to discriminate between the two structures by powder or single crystal X-ray diffraction work (19) . This value of 0.96 is also in the gray area where this index does not clearly distinguish (20) between polymorphs and redeter- minations of the same structure. This calibration of the powder pattern similarity index was established (20) using the polymorphs and redeterminations from different samples in different laboratories at different temperatures (with an approximate cor- rection for thermal expansion), in the Cambridge Structural Database (CSD) (21) . The dangers of just comparing powder patterns are illustrated by two structures proposed for eniluracil (5-ethynyluracil) from powder X-ray data (22) . Both structures are based on R22(8) N–H ⋅⋅⋅ O=C hydrogen bonded ribbons, but only fi ve molecules of the 15 molecule coordination sphere can be overlaid ( Fig. 3A ) with an RMSD 5 of 0.045 Å.
The entire coordination sphere would overlay if C4=O was chemically identical to C6–H, providing an almost identical coordination environment (Fig. 3B). Distinguish- ing between this oxygen and hydrogen, one structure is comprised of polar ribbons and the other of non-polar ribbons. Their simulated diffraction patterns are very similar ( Fig. 3C ) with a CalculateSimilarity index of 0.98 Å, a value more in keeping with dif- ferent determinations of the same structure, although the structures would normally be classifi ed as polymorphs.
Other methods of comparing crystal structures are being developed, for example, the Xpac (23) methodology, which helps avoid the tendency to concentrate on hydrogen bonding, and look at the importance of molecular shape. This approach demonstrated the relationship between the packing in 25 crystal structures of carbamazepine and close analogues (24) .
As experimental screening methods produce more crystal structures contain- ing the same or closely related molecules, the use of complementary comparison
5
Intensity / arbitrary units
(C) (A)
HXACAN08 HXACAN07
(B)
10 15 20 25
20/°
30 35 40 45 50
HXACANO7
O H
O NH
CH3
HXACANO8
FIGURE 1 Different methods of comparing the two polymorphs of acetaminophen (paracetamol) (8): ( A ) optimal overlay of central molecule, showing the hydrogen-bonded coordinating mole- cules in the two forms [form I HXACAN07 (gray); form II HXACAN08 (black); RMSD 1 = 0.096];
( B ) Hirshfeld surfaces, which emphasize the differences in the stacking in the two forms;
and ( C ) the simulated powder patterns (CalculateSimilarity index = 0.75). Abbreviation : RMSD1, root-mean-square difference in overlay of the molecule.
tools will become more widespread. Because computed crystal energy landscapes often generate huge numbers of thermodynamically feasible structures, further automation and development of comparison methods will be needed to obtain the real benefi ts of comparing known and computer-generated crystal structures. The ability to differentiate different types of polymorphism and solid-form diversity helps assess the implications for quality control of possible pharmaceutical products, as will be exemplifi ed by these three examples in the section “Interpretation of Crystal Energy Landscapes.”
5
Intensity / arbitrary units
(C) (A)
ACSALA13 ACSALA02
(B)
10 15 20 25
20/°
30 35 40 45 50
ACSALA02
H
O O O
O CH3
ACSALA13
FIGURE 2 Different methods of comparing the two experimental structures of aspirin repre- sented by ACSALA02 (gray), a 100 K determination of form I (16) and the form II structure ACSALA13 (black) (15): ( A ) optimal overlay of the 11 molecule cluster in common (RMSD 11 = 0.07 Å);
( B ) Hirshfeld surfaces aligned to show the difference in packing of the acetyl groups; and ( C ) the simulated powder patterns (CalculateSimilarity index = 0.96). Abbreviation : RMSD11, root-mean- square difference in overlay of the 11 molecule cluster.
CALCULATION OF CRYSTAL ENERGIES
The calculation of the relative energies of polymorphs provides a major challenge to computational chemistry. There is currently no method that can be considered reliable for all pharmaceutical molecules for all purposes, although this is an objec- tive of considerable research because it is closely related to other fi elds such as
5
Intensity / arbitrary units
(C) (A)
ah27
ak56 (B)
10 15 20 25
20/°
30 35 40 45 50
ah27 H
H O N
N H
O
ak56
FIGURE 3 Different methods of comparing the two idealized crystal structures of eniluracil (22), based on polar (ah27 in black) and non-polar hydrogen bonded ribbons (ak56 in gray): (A) the ribbon portion of the optimal overlay of the fi ve molecule cluster in common, showing how the ribbon is completed by molecules that differ in the position of C4=O and C6–H; the other two molecules that overlay are in the sheet above (RMSD5 = 0.045); (B) Hirshfeld surfaces, which show the very slight differences from the O/H distinction in the packing of the layer above; and (C) the simulated powder patterns (CalculateSimilarity index = 0.98). Abbreviation: RMSD5, root-mean-square difference in overlay of the fi ve molecule cluster.
computational drug design. However, there can be a choice of methods that could be applied to a given molecular system, and a key question is whether any one is accurate enough for your purposes. Very crude models, such as a computationally generated space-fi lling model, can readily deal with questions such as whether a structure is plausibly close-packed or sterically implausible. At the other extreme, periodic electronic structure methods are beginning to be evaluated for calculating organic crystal energies. The traditional approach to modelling organic crystals ( 25 , 26 ) sums the energy of the interactions between all the molecules in the crystal as evaluated from a model intermolecular potential. The molecules are either mod- elled as rigid or the energy penalty for the change in conformation is added. Organic crystal structure modelling is challenging because the energy differences between polymorphs are so small compared with the covalent bond energies.
A straightforward evaluation of the energy difference between two or more experimental crystal structures, by even the most expensive computational method, could be very misleading for several reasons. First, computed lattice energies are extremely sensitive to the location of the protons involved in hydrogen bonding.
X-ray determinations have a systematic error in hydrogen atom positions, and so the position of all protons must be adjusted so the X–H bond length is more realistic, by using average neutron values (27) or ab initio optimization. Also, the hydrogen charge density may have been carefully located in the published structure, but often the crystallographer has to make assumptions to include the proton positions. If, for example, a planar conformation had been assumed for an amine group, which in reality distorts to a pyramidal conformation to form better hydrogen bonds, the hydrogen bonding energy would be signifi cantly underestimated.
Second, the crystal structure should be optimized using the computational model for the energy. The van der Waals contacts within crystals are where the attractive and repulsive forces balance, and so small changes in these distances can lead to large energy differences because of the exponential distance dependence of the repulsion. Temperature affects organic crystal structures in an anisotropic fashion, refl ecting the nature of the intermolecular interactions in the different direc- tions. Hence, modelling based on low-temperature structures is always preferred, and mixing structures determined at different temperatures can lead to signifi cant uncertainties. For example, the lattice energy of form I acetaminophen, after rigid- body lattice energy optimization, differs by 2.1 kJ mol –1 , depending on whether the molecular conformation determined at 20 K or 330 K is used (28) . This is greater than the 1.0 kJ mol –1 difference between the two polymorphs, using the conforma- tions in structures determined at 123 K, and the same as the polymorphic energy difference using the molecular conformations determined at room temperature (28) . An ab initio estimate of the difference in energy due to the change in the molecular conformation between conformational polymorphs can be affected by experimen- tally insignifi cant variations in, for example, the C=O bond lengths. A more realistic estimate would be made by fi xing the degrees of freedom that have been affected by the crystal packing, such as torsions around single bonds, to those determined in the crystal structures, and optimizing all other bond lengths and angles.
Finally, computational work can reveal “errors” in the crystal structure, such as the diffraction experiment not detecting a small amount of disordered solvent.
Recent computational analyses of form II of carbamazepine, by either Hirshfeld surfaces (29) or energy calculations (30) , prompted investigations that showed that this polymorph is being stabilized by solvent.
Thus, the development of accurate methods of computing polymorphic energy differences is very dependent on the quality of the crystallography used for validation, although it is not unknown for modelling work to raise questions about the accuracy of a published structure.
Lattice Energy Evaluation
Most crystal structure modelling only considers the lattice energy, that is, the energy of the static crystal lattice relative to infi nitely separated molecules, both nominally at 0 K and neglecting zero-point vibrational motion. There are many programs that can calculate the energy of an infi nite static perfect lattice by using various mathe- matical techniques to sum up all the contributions. These range from electronic structure methods, which explicitly model the electrons in the structure by an approximate solution of the quantum mechanical equations, through to atom–atom force fi elds that use an equation for the energy as a function of the nuclear positions.
These empirically parameterized equations represent the energy penalties for vari- ous conformational distortions as well as the intermolecular interactions. The cur- rent state-of-the-art method for most organic crystal structures is the intermediate
“monomer + model” approach, in which ab initio calculations on the isolated molecule are used to model the molecular structure, energy, and charge density as a function of conformation, and then this charge density is used to construct a model for the intermolecular potential. These three approaches to evaluating lattice ener- gies are outlined, before the additional requirements to include the effect of tem- perature on the relative thermodynamics of pharmaceutical polymorphs are described in the section “Free Energies and Other Properties.”
Electronic Structure Modelling
Modern electronic structure methods are increasingly being applied to the solid state. However, organic crystals provide a particular challenge for an approximate solution of the Schrödinger equation, because the importance of modelling the dispersion forces adequately can vary signifi cantly between polymorphs. Because the dispersion forces arise from the correlation of electron motions, they are not described at all by routine molecular orbital methods, such as the Hartree–Fock approximation, which as the alternative name of Self-Consistent-Field indicates, only allows each electron to respond to the average fi eld of all the other electrons.
There are a variety of methods that include electron correlation under development, including many variants of density functional theory. However, correctly predicting the most stable gas phase conformations of fl exible molecules, such as polypep- tides, where there is a signifi cant dispersion contribution between the different functional groups, challenges all currently widely available methods (31) . The problem in modelling dispersion also produces very variable results for organic crystals, often producing unphysical expansion of the crystal in the directions where the dispersion interaction provides the bonding. For example, one polymorph with hydrogen bonds in all three dimensions may be well reproduced, whereas a poly- morph based on a hydrogen bonded sheet will have the stacking separation overes- timated. This has been demonstrated (32) by applying several types of periodic density functional theory to the two polymorphs of o -acetamidobenzamide and the fi ve polymorphs of oxalyl dihydrazide. The structures and relative energies are much more reasonably modelled (32) by a new empirically dispersion-corrected density functional, where the damping function for adding a C 6 /R 6 model for the
long-range dispersion to the electronic energy had been empirically fi tted to organic crystal structures (33) . This model was successful in the international blind test of crystal structure prediction (34) held in 2007 (35) , correctly predicting all four target structures ( Fig. 4 ) as the most stable (36) .
Force Fields
The simplest force fi elds, which are useful for organic crystal structure modelling, are the isotropic atom–atom exp-6 model intermolecular potentials of the form:
6
, exp(– ik) – / ik
i M k N
U=
∑
∈ ∈ Aik B Rik Cik R (1)where atom i in rigid molecule M and atom k in rigid molecule N are of atom types
i and k , respectively, and are separated by a distance R ik . This potential is only explicitly modelling the repulsion between the atoms as their charge clouds over- lap, and the dispersion force. The parameters for atomic types i = C, N, O, Cl, S, and separate parameters for H bonded to C, N, and O, have been derived ( 37 , 38 ) by fi t- ting to heats of sublimation and the crystal structures of rigid molecules. There is no explicit electrostatic term, so the lattice energies can be quickly evaluated by direct summation. This results in the hydrogen bonding potentials having particu- larly deep wells to absorb the missing electrostatic term. This exp-6 model does remarkably well for its simplicity, and can be used for approximate comparisons with the molecule held rigid at the experimental conformation.
Most commercial modelling programs use one of the many force fi elds that are being developed for biomolecular modelling, where the molecular fl exibility is represented by bond stretching, bond bending, and torsional terms, and the inter- molecular forces are modelled in the same way as the intramolecular interactions between atoms separated by a few covalent bonds. These non-bonded interactions are usually of the form of equation (1), or the Lennard–Jones 12-6 model, with the addition of an atomic point-charge electrostatic model. There are many force fi elds available (39) and the choice for a particular study should be dictated by the properties
H H
H XII
XIII
XIV XV
H H
H H
H N
N CH3
CH3 H3C
CH3 H
H N
N N
H H
O H O : S
S S H
Br Cl
Br H F O
FIGURE 4 The four molecules used in the 2007 Cambridge Crystallographic Data Centre’s international blind test of crystal structure prediction, with Roman numerals defi ned by this series of tests (35). These represent a simple rigid molecule, one with less common functional groups, a fl exible molecule and a cocrystal, believed to be within the claimed capabilities of many of the available methodologies. All these crystal structures were correctly predicted by methods based on the monomer + model approach and the dispersion-corrected density functional method (36). The success of these lattice energy-based predictions implies that the target crystal structures were the most stable for all compounds and monotropically related to any polymorphs.
and types of molecules used in the parameterization and validation. The essential preliminary test of a force fi eld for crystal structure modelling is whether it gives a minimum in the lattice energy reasonably close to the experimental crystal structure for a range for similar molecules. There are cases where the intramolecular forces cause a change in the conformation of a fl exible molecule that ensures that the opti- mized crystal structure is qualitatively wrong (40) . A prediction that aspirin should have a more stable polymorph with the molecule in a planar conformation (41) arose from the use of a force fi eld that predicted that the isolated molecule should be planar. Ab initio calculations show that the planar conformation is a transition state, although the conformation observed in the crystal is close to a local rather than the global minimum in the conformational energy (42) . A general limitation of such force fi elds is that the same atomic charges are simultaneously modelling the intermolecular interactions and determining the conformation of the molecule, and are unable to represent the changes in charge distribution with conformation suffi ciently realistically (43) .
Monomer + Model
The approach that has proved adequate for a wide range of organic crystal structures, including those in the 2007 blind test of crystal structure prediction ( Fig. 4 ), is to concentrate on the obtaining the best possible model for the intermolecular interac- tions ( 44 , 45 ). The energy penalty for any signifi cant change in conformation from the ab initio-optimized molecular structure, ∆ E intra , is evaluated by the best afford- able ab initio calculations on monomers. The lattice energy is then given by E latt = U inter + ∆ E intra , where U inter is the intermolecular lattice energy.
Atom–atom models for U inter explicitly model at least the electrostatic and repulsion–dispersion contributions. The electrostatic model is usually derived from the charge density of the molecule, preferably calculated for every distinct confor- mation to represent the redistribution of charge with changes in the intramolecular interactions. The electrostatic model can use the atomic charges that give the best possible reproduction of the electrostatic potential in the van der Waals contact region around the molecule (46) . However, modelling organic crystal structures satisfactorily often requires (47) additional point charges on non-nuclear sites to represent the electrostatic forces arising from lone pair and π electron density. These non-spherical features in the atomic charge distribution can be more effectively and automatically represented ( 44 , 45 ) by a distributed multipole model obtained by analyzing (48) the ab initio charge density of the molecule. Figure 5 shows the electrostatic potential around a fairly spherical molecule, and the errors from using an atomic point charge representation of the same charge density relative to the more complete distributed multipole representation. There are signifi cant differences even around the saturated hydrocarbon rings. The differences are more marked for molecules that form stronger hydrogen bonds (49) . A survey of the computed lattice energy landscapes for 50 rigid molecules containing only C, H, N, and O ( 50 , 51 ) concluded that the 64 known structures were signifi cantly more likely to be found at or near the global minimum in the lattice energy when a distributed multipole model was used rather than an atomic point-charge model.
The electrostatic interactions mainly determine the directionality of the hydro- gen bonding and π − π stacking, whereas the repulsion between atoms is critical in determining the van der Waals contact distances and the dispersion favors dense, close-packed crystals. Thus, in addition to the electrostatic interactions, U inter has to