University of Auckland Research Repository, ResearchSpace

(1)

Libraries and Learning Services

University of Auckland Research Repository, ResearchSpace

Version

This is the publisher’s version. This version is defined in the NISO recommended practice RP-8-2008 http://www.niso.org/publications/rp/

Suggested Reference

Busby, J. N., Lott, J. S., & Panjikar, S. (2016). Combining cross-crystal averaging and MRSAD to phase a 4354-amino-acid structure. Acta

Crystallography. Section D, Biological Crystallography, 72(2), 182-191.

doi:

10.1107/S2059798315023566

Copyright

Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher.

http://journals.iucr.org/services/copyrightpolicy.html http://www.sherpa.ac.uk/romeo/issn/0907-4449/

https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm

(2)

ISSN: 2059-7983

journals.iucr.org/d

Combining cross-crystal averaging and MRSAD to phase a 4354-amino-acid structure

Jason Nicholas Busby, J. Shaun Lott and Santosh Panjikar

Acta Cryst. (2016). D72, 182–191

IUCr Journals

CRYSTALLOGRAPHY JOURNALS ONLINE

Copyright cInternational Union of Crystallography

Author(s) of this paper may load this reprint on their own web site or institutional repository provided that this cover page is retained. Republication of this article or its storage in electronic databases other than as specified above is not permitted without prior permission in writing from the IUCr.

For further information seehttp://journals.iucr.org/services/authorrights.html

(3)

182

http://dx.doi.org/10.1107/S2059798315023566 Acta Cryst.(2016). D72, 182–191 Received 23 August 2015

Accepted 8 December 2015

Edited by Q. Hao, University of Hong Kong

Keywords:cross-crystal averaging; MRSAD;

Auto-Rickshaw; phasing.

PDB reference: B and C proteins from the ABC toxin complex ofYersinia entomophaga, 4igl

Supporting information:this article has supporting information at journals.iucr.org/d

Combining cross-crystal averaging and MRSAD to phase a 4354-amino-acid structure

Jason Nicholas Busby,^a* J. Shaun Lott^aand Santosh Panjikar^b,c*

aSchool of Biological Sciences, The University of Auckland, Private Bag 92019, Auckland 1142, New Zealand,^bMX, Australian Synchrotron, 800 Blackburn Road, Clayton, Melbourne, VIC 3168, Australia, and^cDepartment of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. *Correspondence e-mail:

[email protected], [email protected]

The B and C proteins from the ABC toxin complex of Yersinia entomophaga form a large heterodimer that cleaves and encapsulates the C-terminal toxin domain of the C protein. Determining the structure of the complex formed by B and the N-terminal region of C was challenging owing to its large size, the non- isomorphism of different crystals and their sensitivity to radiation damage. A native data set was collected to 2.5 A˚ resolution and a non-isomorphous Ta6Br12-derivative data set was collected that showed strong anomalous signal at low resolution. The tantalum-cluster sites could be found, but the anomalous signal did not extend to a high enough resolution to allow model building.

Selenomethionine (SeMet)-derivatized protein crystals were produced, but the high number (60) of SeMet sites and the sensitivity of the crystals to radiation damage made phasing using the SAD or MAD methods difﬁcult. Multiple SeMet data sets were combined to provide 30-fold multiplicity, and the low- resolution phase information from the Ta6Br12data set was transferred to this combined data set by cross-crystal averaging. This allowed the Se atoms to be located in an anomalous difference Fourier map; they were then used inAuto- Rickshawfor multiple rounds of autobuilding and MRSAD.

1. Introduction

The phase problem remains one of the key difﬁculties in X-ray crystallography. The number of structures deposited in the Protein Data Bank (PDB) increases every year (Bermanet al., 2000) and this provides an increasing pool of structures for use in molecular replacement, but experimental phasing is still required when investigating proteins for which no good molecular replacement models exist. The crystallographer’s next tool of choice tends to be SAD phasing using selenomethionine (SeMet)-substituted protein (Hendrickson et al., 1990). This technique is favoured for several reasons: SeMet can be incorporated into expressed proteins easily by supplementation during growth (Cowie & Cohen, 1957), SeMet incorporation can occur at high rates and the hydro- phobic methionine residues are often buried in the hydro- phobic core of proteins and are highly ordered. Another alternative is isomorphous replacement; however, this technique often suffers from non-isomorphism between different derivatives and the native crystal. Even relatively small changes in the unit cell between heavy-atom-soaked and native crystals can obliterate the isomorphous signal (Garman

& Murray, 2003).

SAD or MAD phasing using SeMet-substituted protein can sometimes be very difﬁcult. Complications can include crystals that fail to diffract to high resolution, limited anomalous signal, radiation damage and a very small or large number of methionine residues limiting the amount of phase information

ISSN 2059-7983

#2016 International Union of Crystallography

(4)

or making it difﬁcult to determine the heavy-atom substructure. Some of these problems can be remedied by collecting highly redundant data from a single crystal, but with additional X-ray exposure comes additional radiation damage.

Anomalous scatterers are often highly sensitive to radiation damage owing to their increased X-ray absorption cross- section, particularly at absorption peaks (Murrayet al., 2005), so collecting additional data from a single crystal may not help.

The advent of microfocus beamlines has enabled the collection of multiple data sets from different regions of a single crystal (Liet al., 2004; Moukhametzianovet al., 2008; Sanishvili et al., 2008), helping to combat radiation damage. Another method is to combine data from multiple different crystals (Liuet al., 2011, 2013, 2014). This can improve the signal-to- noise ratio, reduce crystal-speciﬁc systematic error and allow high completeness while maintaining a low dose to limit radiation damage.

There are a number of different phasing strategies available, and an even greater number of programs for pursuing them.

Many of these programs work in different ways, and in difﬁcult cases one program may succeed where others have failed, necessitating a large amount of trial and error. In some difﬁ- cult cases a combination of phasing techniques can work where individual techniques have failed.Auto-Rickshawis an automated structure-determination platform that is able to try a wide variety of phasing strategies separately and in combination (Panjikaret al., 2005, 2009). The techniques available include S-SAD, SAD, two-, three- and four-wavelength MAD, SIRAS, MR, MRSAD, MRSIRAS, RIP and MRRIP. The goal

of Auto-Rickshaw (Panjikar et al., 2005) is to emulate an experienced crystallographer making decisions about which approaches to try and which programs to use on the ﬂy during data evaluation in an automatic manner.

We have described the structure and function of the B–C component of the ABC toxin complex fromYersinia entomophaga previously (Busby et al., 2013). Here, we detail the methodologies undertaken to solve this difﬁcult structure and their applicability to general X-ray crystallographic problems.

2. Methods and results

2.1. Cloning, expression and purification of YenB–YenC2^NTR The cloning, expression and puriﬁcation of the YenB–

C2^NTRprotein complex is described in Busbyet al.(2013). The YenB and YenC2 genes from Y. entomophaga (GenBank accession No. DQ400808.1) were cloned into the pETDuet-1 co-expression vector (EMD Biosciences). YenB was cloned into multiple cloning site 1 (MCS1), including an N-terminal 6His tag followed by aTobacco etch virus(TEV) protease cleavage site. YenC2 was cloned into MCS2 with no tags.

Expression was performed in Escherichia coli Rosetta 2 (DE3) cells using ZYM-5052 auto-induction medium (Studier, 2005) at 291 K. The cells were lysed in 20 mMHEPES pH 7.5, 150 mMNaCl, 1 mM-mercaptoethanol by a continuous-flow cell disruptor (Microfluidics M-110p) and insoluble material was removed by centrifugation. The protein was purified by immobilized metal-affinity chromatography (IMAC) followed by removal of the His tag by TEV cleavage, subtractive IMAC and size-exclusion chromatography (SEC).

When co-expressed with YenB, YenC2 was cleaved into N-terminal and C-terminal regions (YenC2^NTRand YenC2^CTR) and all three peptides remained tightly associated throughout puriﬁcation. This complex was used for crystallization trials, but no crystals could be obtained. The YenB–YenC2^NTR– YenC2^CTRcomplex was dialysed against acetate buffer over- night (20 mM sodium acetate pH 4.5, 150 mM NaCl, 1 mM -mercaptoethanol), causing YenC2^CTRto dissociate from the complex and precipitate. This precipitate was removed by ﬁltration and the remaining supernatant was subjected to SEC in acetate buffer before dialysis against buffer to restore the pH to 7.5. This resulted in a YenB–YenC2^NTR complex that was subsequently used for crystallization trials.

2.2. Crystallization

The YenB–YenC2^NTRprotein complex was concentrated to 7.3 mg ml¹ and crystallization trials were carried out with nanolitre-dispensing robotics using the conditions described in Morelandet al.(2005) and Gorrec (2009). Several conditions were found to produce crystals.

Crystals were ﬁne-screened on a larger scale by hanging- drop vapour diffusion using 24-well plates. The best crystals were produced with well solution consisting of 18%(w/v) PEG 3350, 0.15M KH2PO4 pH 4.8. Microseeding (Luft &

DeTitta, 1999) was used to further optimize the crystallization, signiﬁcantly improving the size, quality and reproducibility of the crystals.

Figure 1

SeMet (a) and tantalum-cluster-derivatized (b) crystals of YenB–

YenC2^NTR. Scale bars are 100mm in length.

(5)

2.3. Heavy-atom derivatization

No structures of proteins homologous to either YenB or YenC2 were available, so experimental phasing was pursued.

SeMet-labelled protein was produced by expression in LB supplemented with selenomethionine. The protein was puri- ﬁed using the same methods as used for the native protein and was crystallized in the same condition using microseeding with native crystals. The crystals were cryoprotected by soaking them brieﬂy in mother liquor with the addition of 5–20%

glycerol before being snap-cooled in liquid nitrogen (Fig. 1a).

Hexatantalum dodecabromide (Ta6Br12

2+) is a cluster compound that has been used to successfully phase the structures of large biological assemblies (Kna¨bleinet al., 1997;

Banumathi et al., 2003). Aliquots of this cluster compound (Jena Bioscience) were dissolved in mother liquor [18%(w/v) PEG 3350, 0.15M KH2PO4 pH 4.8] to a concentration of 2 mM, with the addition of 5% glycerol as a cryoprotectant.

Crystals were transferred to drops of this solution and incu- bated over a well of mother liquor for 3 h. By this time, the crystals had taken on a distinct green colouration (Fig. 1b) and the green colour of the solution had diminished, indicating that derivatization had occurred. At this point the crystals were removed, brieﬂy back-soaked in mother liquor lacking the tantalum cluster and snap-cooled in liquid nitrogen.

2.4. Data collection and processing

X-ray diffraction data were collected on the MX1 (Cowieson et al., 2015) and MX2 (microfocus) beamlines at the Australian Synchrotron. Typically, images were collected at a temperature of 100 K with 0.5 or 1.0 s exposure time,

0.5 or 1.0 oscillation and a total of 360 or 720 collected (Table 1). Data were integrated using XDS (Kabsch, 2010), with careful attention paid to the change in anomalous correlation as additional frames were added. If the anomalous correlation began to drop this was taken as a sign of radiation damage and data were truncated before this point (e.g. data set SeMet 1). Integrated data were scaled and merged using AIMLESSfrom theCCP4 suite (Evans & Murshudov, 2013;

Winnet al., 2011).

Diffraction data from a native crystal were collected at 0.9537 A˚ to a resolution of 2.26 A˚ , but ultimately a high- resolution cutoff of 2.5 A˚ was used. This crystal showed signiﬁcant differences in unit-cell dimensions from other SeMet-labelled and tantalum-cluster-soaked crystals (Table 1) and so could not be used for isomorphous replacement directly.

Diffraction data from SeMet-labelled crystals were collected above the selenium peak (0.9791 A˚ ) and at the inﬂection point (0.9795 A˚ ). The anomalous signal from the best peak data set (SeMet1) only extended to 5.2 A˚ resolution, based on the resolution at which CCanomdrops below 0.3. This data set had to be truncated owing to radiation damage, resulting in low completeness. When peak data sets from multiple isomorphous crystals were combined, however, the anomalous correlation was found to improve, so additional data sets were added in an iterative fashion until a point at which adding additional data did not improve CCanom. This resulted in a data set (Combined SeMet) with high multiplicity (30) and anomalous signal that extended to5 A˚ resolution (Fig. 2).

184

Busbyet al. Combining cross-crystal averaging and MRSAD Acta Cryst.(2016). D72, 182–191 Table 1

Data-collection and processing statistics for YenB–YenC2^NTR. HREM, high-energy remote; LREM, low-energy remote.

Native SeMet1 SeMet2 SeMet3 SeMet4

Combined SeMet

(SeMet4+1+2+3) Ta1 HREM Ta1 LREM Ta2 HREM Space group P212121 P212121 P212121 P212121 P212121 P212121 P212121 P212121 P212121

Unit-cell parameters

a(A˚ ) 133.7 134.6 134.3 134.3 134.9 134.5 134.4 134.4 134.5

b(A˚ ) 147.6 149.7 150.5 150.4 150.3 150.2 152.2 152.2 150.7

c(A˚ ) 274.4 276.0 276.7 276.6 276.3 276.4 275.0 275.0 275.6

Wavelength (A˚ ) 0.9537 0.9791 0.9791 0.9791 0.9791 0.9791 1.2516 1.2580 1.2516

No. of frames 360 136† 360 360 720 1576 1440 1440 720

Oscillation () 1.0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0

Resolution (A˚ ) 49.55–2.49 (2.53–2.49)

94.11–2.78 (2.82–2.78)

96.36–2.92 (2.97–2.92)

96.32–2.91 (2.96–2.91)

96.53–2.77 (2.82–2.77)

96.44–2.77 (2.82–2.77)

44.57–3.15 (3.20–3.15)

44.58–3.15 (3.20–3.15)

39.38–2.79 (2.84–2.79) Rmeas‡ 0.268 (1.912) 0.080 (0.662) 0.319 (3.445) 0.279 (3.073) 0.344 (2.490) 0.367 (1.985) 0.368 (4.463) 0.364 (4.493) 0.827 (12.843) R_p.i.m.‡ 0.069 (0.487) 0.056 (0.466) 0.162 (1.825) 0.141 (1.644) 0.124 (1.158) 0.090 (0.894) 0.094 (1.252) 0.093 (1.281) 0.214 (4.013) hI/(I)i 11.3 (1.8) 12.0 (1.5) 7.7 (0.7) 8.2 (0.8) 14.0 (1.0) 14.6 (1.1) 13.9 (1.0) 14.3 (1.0) 9.5 (0.3) CC1/2 99.4 (52.0) 99.4 (58.9) 98.4 (17.7) 99.0 (20.7) 99.3 (44.7) 99.5 (47.2) 99.7 (38.8) 99.8 (40.7) 98.6 (8.7) Completeness (%) 100.0 (100.0) 93.5 (69.5) 99.5 (90.0) 99.8 (96.4) 99.0 (84.2) 99.2 (84.4) 90.6 (96.9) 89.8 (96.5) 92.7 (89.1) Multiplicity 14.9 (15.2) 2.9 (1.8) 7.4 (6.2) 7.4 (6.3) 14.4 (6.6) 29.8 (7.0) 29.0 (23.2) 29.1 (22.8) 28.7 (18.2) CCanom(%) N/A 27.6 (10.9) 8.8 (1.0) 9.9 (1.7) 9.7 (0) 13.8 (0) 52.5 (9.5) 27.4 (12.1) 13.3 (7.6) RCRanom§ N/A 1.328 (1.115) 1.091 (1.010) 1.104 (1.017) 1.103 (0.989) 1.149 (0.995) 1.791 (1.100) 1.325 (1.130) 1.143 (1.079) Anomalous

resolution}(A˚ ) N/A 5.18 6.7 6.6 5.5 5.0 5.68 6.7 5.9

No. of unique reﬂections

189788 (9300) 131321 (4767) 122012 (5408) 123352 (5835) 141318 (5888) 141967 (5939) 89256 (4841) 88515 (4823) 129593 (6081)

† Data were limited owing to radiation damage becoming apparent. Data were truncated at the point where adding additional frames no longer increased the anomalous signal. ‡ Rmeas(Rr.i.m.) andRp.i.m.are as deﬁned in Weiss (2001). § R.m.s. correlation ratio between two half data sets (Evans, 2011). }Anomalous resolution is deﬁned as the resolution at which CCanomdrops below 0.3.

(6)

Diffraction data from tantalum-cluster-derivatized crystals was collected above and below the tantalum edge (1.2516 and 1.2580 A˚ , respectively). The high-energy remote data set showed a very high anomalous signal at low resolution (Fig. 2), but this signal dropped off very sharply at moderate resolution (5.7 A˚ ), a phenomenon that has been identiﬁed previously in tantalum-cluster derivatization (Banumathiet al., 2003).

2.4.1. Initial phasing attempts. Phasing was initially attempted using SeMet SAD/MAD phasing. The protein complex under study contains 30 non-initiation methionine residues and the unit-cell parameters suggested two heterodimers in the asymmetric unit (Matthews coefﬁcient 2.78 A˚³Da¹, solvent content 56%; Matthews, 1968), giving a total of 60 heavy-atom sites. Initial attempts at substructure determination were carried out with the individual SeMet data sets and the combined data-set sites usingSHELXC/D v.2006/4.3 (Sheldrick, 2010), phenix.hyss v.1.8_1069 (Grosse- Kunstleve & Adams, 2003; McCoyet al., 2004) andSnBv.2.3 (Rappleyeet al., 2002), but these attempts were unsuccessful.

The most likely explanation for this is the limited resolution of the phase information (Fig. 2), the large number of sites (60) and possible radiation damage.

Tantalum bromide cluster compounds have been used to successfully phase large structures, where their extremely large anomalous and isomorphous signal and propensity to bind to relatively few sites are advantages (Kna¨blein et al., 1997; Banumathi et al., 2003; Gomis-Ru¨th & Coll, 2001).

Several data sets were collected from crystals derivatized with tantalum clusters and they showed very strong anomalous signal at low resolution (Fig. 2 and Table 1), but this dropped off sharply at around 6 A˚ . We searched for heavy-atom sites for SAD/MAD phasing using SnB and for SIRAS using SHELX. The best tantalum-cluster data set (Ta1 in Table 1) had very different unit-cell parameters compared with the native, and could be used for SAD or two-wavelength MAD but not SIRAS. To identify a suitably isomorphous crystal for SIRAS, we compared all tantalum-cluster and SeMet data sets to ﬁnd the two with the most similar unit-cell parameters.

Ultimately, we used a second tantalum-cluster data set (Ta2 in

Table 1) as a derivative and the SeMet4 data set to act as the native data set. Seven heavy-atom sites could be easily found using either method, and the same sites were found using both methods, giving us conﬁdence that they were correct.

As there are most likely to be two heterodimers in the asymmetric unit, twofold noncrystallographic symmetry (NCS) may be present. To test for this, we inspected a self- rotation function. The self-rotation function contained several peaks in addition to the origin, one of which had a much higher Rf/ than the others, suggesting that NCS is present.

We also tried to detect NCS from the positions of the tantalum clusters identiﬁed earlier. The CCP4 program PROFESSS (Winnet al., 2011) found a set of operators that matched the heavy-atom positions, and the rotation angles were the same as those determined from the self-rotation function (self- rotation function, 84.69, 90.00, 180.00; PROFESSS, 84.78, 90.40, 179.33), giving conﬁdence in both the NCS determination and the positions of the tantalum clusters.

SAD, MAD and SIRAS phasing were attempted using the tantalum-cluster data sets. Density modification produced solvent masks that, in retrospect, showed the correct molecular shape (Fig. 3), but the maps were not sufficiently well resolved to enable model building a priori. For example, the map correlation coefficient between a density-modified

Figure 3

Horizontal cross-section of the final refined structure (above) compared with a cross-section of the electron density from SIRAS phasing of the Ta2 data set after density modification (below). The overall shape is clearly similar, indicating that phasing has been successful, but the electron density was not of sufficient quality to allow model building.

Figure 2

Anomalous correlation versus resolution for data sets. Data were collected at the peak Se wavelength (0.9791 A˚ ) or above the tantalum edge (1.2516 A˚ ).

(7)

186

Busbyet al. Combining cross-crystal averaging and MRSAD Acta Cryst.(2016). D72, 182–191 Figure 4

Ctraces of autobuilt structures starting from tantalum-cluster phases (green), initial SeMet phases (blue) or MRSAD SeMet phases (red). The tantalum data fail to allow autobuilding, the autobuilt model from the initial SeMet data shows some correctly built areas of-sheet and the MRSAD SeMet data enabled the full model to be built correctly usingBuccaneer(Cowtan, 2006),RESOLVE(Terwilliger, 2000) andSHELXE(Sheldrick, 2010) and reﬁned withREFMAC(Murshudovet al., 2011) andphenix.reﬁne(Afonineet al., 2012).

NCS-averaged SIRAS map and the ﬁnal reﬁned model map is just 0.19 and the phase error is 85.9 (calculated using the CCP4 programCPHASEMATCH; Winnet al., 2011) to 2.8 A˚ resolution. The most likely explanation for this is the poor starting phase information. The Ta6Br12cluster compound has a bipyramidal arrangement of Ta atoms at its centre and at low resolution (less than 6 A˚ ) these atoms essentially scatter in phase, allowing them to be treated as a single ‘super atom’.

At higher resolution the Bijvoet differences drop sharply, reaching a local minimum at4.5 A˚ before increasing again at higher resolution (Banumathiet al., 2003). If the anomalous diffraction data are of high enough quality, the tantalum cluster can be fully modelled and correctly oriented within the cluster density, allowing phasing at high resolution. Sadly this was not the case for our data set, as the anomalous signal only extended to 5.7 A˚ resolution. The tantalum clusters could not be accurately oriented within the density, and the phase information was therefore limited to low resolution.

In order to provide high-resolution phase information, we attempted to locate SeMet sites by using the anomalous difference Fourier method. The best tantalum-cluster data set (Ta1) has significant differences in unit-cell dimensions compared with the best SeMet data sets (Tables 1 and 2), so in order to use the low-resolution phase information from the tantalum cluster with the SeMet data cross-crystal averaging was performed. The position of tantalum-cluster sites and a self-rotation function (MOLREP; Vagin & Teplyakov, 2010) could be used to find the NCS present in the tantalum-cluster two-wavelength MAD maps. These maps were density modified with NCS averaging (DM; Cowtan, 1994) and the electron density was cut out and placed into the SeMet data set by molecular replacement (Phaser; McCoyet al., 2007). The low- resolution phase information thus provided was combined Table 2

Cross-crystalRfactors for different derivative data sets.

Cross-crystalRfactors were calculated by scale reﬁnement using theCCP4 programSCALEIT(Winnet al., 2011). HREM, high-energy remote.

SeMet2 SeMet3 SeMet4 Ta1 HREM Ta2 HREM Native

SeMet1 0.144 0.139 0.155 0.377 0.291 0.278

SeMet2 0.085 0.139 0.352 0.252 0.309

SeMet3 0.138 0.355 0.253 0.307

SeMet4 0.346 0.267 0.346

Ta1 HREM 0.255 0.453

Ta2 HREM 0.401

(8)

with the anomalous difference intensity data from the SeMet peak wavelength to calculate an anomalous difference Fourier map. 50 peaks in this map were of >5and were used to define starting heavy-atom positions. At this point, the data were submitted toAuto-Rickshawfor rounds of automated model building and MRSAD phasing (Panjikar et al., 2009). The phase information from the SeMet data proved to be sufficient for initial automated model building. The calculated phases from this model were combined with the experimental phase information, improving the anomalous difference Fourier and LLG map (McCoy & Read, 2010) and allowing additional heavy-atom sites to be found. This process was repeated and eventually all 60 non-initiation SeMet residues could be correctly placed and most of the structure could be built automatically using Buccaneer (Cowtan, 2006), RESOLVE (Terwilliger, 2000) and SHELXE (Sheldrick, 2010) and refined with REFMAC5 (Murshudov et al., 2011) and phenix.refine(Afonineet al., 2012) (Fig. 4).

2.4.2.Ex post factoanalysis of phasing. While the strategy of combining data from several crystals into a high-multiplicity SeMet data set and cross-crystal averaging with tantalum- cluster phases was eventually successful in solving the YenB–

YenC2^NTRstructure, the individual data sets seemed to be in the ‘twilight zone’ for experimental phasing. We therefore performed anex post facto analysis on the requirements for successfully phasing this structure.

SHELX (SHELXC v.2014/2 and SHELXD v.2013/2) was used to attempt to locate heavy atoms in the individual SeMet data sets, searching for 60 sites and running 1000 trials, using a high-resolution cutoff of 4.7 A˚ and anEminof 1.5. This high- resolution limit was determined through a trial-and-error approach using repeated SHELX runs with varying high- resolution limits. Phasing attempts were unsuccessful for any individual SeMet data set, but the heavy-atom substructure could successfully be determined for the combined SeMet data set (Fig. 5). This is somewhat surprising, as our original

Figure 5

Attempted substructure solution for SeMet data sets. The ﬁrst four rows show individual SeMet data sets, for which the substructure could not be determined. The last row shows the combined data set containing all four individual data sets, from which the substructure was successfully determined.

This can be seen by the difference between hands for contrast, connectivity and estimated CC(map), the sharp decrease in site occupancy and the fact that some trials had signiﬁcantly higher CCalland CCweakthan the majority.

(9)

188

Busbyet al. Combining cross-crystal averaging and MRSAD Acta Cryst.(2016). D72, 182–191 Figure 6

Substructure determination for combinations of SeMet data sets. The best individual SeMet data set was SeMet4; the other SeMet data sets were added to this in various combinations and SAD phasing was attempted. Only the combination of data sets 4+1+2 and all four combined were successful when performing 1000 trials. The substructure of data sets 4+2+3 could be solved with 10 000 trials and data set 4 alone could be solved with 10 000 trials when using random-omit seeding rather than Patterson seeding.

(10)

attempts to solve the heavy-atom substructure of this data set were unsuccessful. Possible explanations include a better choice of high-resolution cutoff, using a larger number of trials and advances in the software used.

To determine whether the heavy-atom substructure had been successfully determined, we looked for trials in which the CCalland CCweakwere significantly higher than the rest, in which there were differences in contrast, connectivity and estimated CC(map) between the original and inverted hands, and in which there was a clear drop in occupancy values of the heavy-atom sites. Interestingly, for the combined data set the CCall/CCweakplot shows that the vast majority of trials failed to find the correct substructure, with only two trials differing from the main group. This finding emphasizes that for borderline data theSHELXdefault of 100 trials is insufficient and that 1000 or more trials may be required for success.

The high multiplicity of the combined SeMet data set seems to be key for successfully solving the substructure, as any of the data sets alone are insufﬁcient. These data sets produce no trials that are signiﬁcantly better than the majority, little difference between the two hands and have a heavy-atom site occupancy that drops rapidly to low levels (Fig. 5). We combined the four SeMet data sets in various combinations (Tables 2 and 3) and attempted to determine the heavy-atom substructure as above. We used SeMet4 as the base data set as this had a strong anomalous signal and high completeness.

None of the two-data-set combinations succeeded in solving the substructure, and only the three-data-set combination 4+1+2 was successful (Fig. 6). While this trial was successful, it found fewer sites than the combination of all four data sets and only a single trial had high CCall/CCweak.

Many of these phasing attempts found only a few successful runs out of 1000 trials. The fact that data set 4+1+2 succeeded while 4+2+3 failed was somewhat unexpected as the latter data set has higher multiplicity, so we repeated this phasing run using an even larger number of trials (10 000). In this case, the substructure was successfully determined (Fig. 6), emphasizing the need for a very large number of trials when dealing with weak anomalous data. Klinke et al.

(2015) recently determined the structure of a large protein with a large number of anomalous sites by S-SAD. One of the conclusions that they drew is that when a large number of anomalous scatterers are present, Patterson seeding can adversely affect the phasing process. Therefore, we attempted substructure determination using each individual data set with 10 000 trials and either Patterson or random-omit seeding.

Substructure determination was unsuccessful for SeMet1, SeMet2 and SeMet3 (not shown), but was successful in two of the 10 000 trials with SeMet4 when using random-omit seeding (Fig. 6). This supports the use of random seeding over Patterson when searching for large numbers of heavy atoms.

For the any of the combinations of data sets where heavy atoms could be located the maps were sufﬁcient to allow autobuilding, and phase information from this initial model could be combined with the experimental phases using MRSAD, allowing additional heavy-atom sites to be located and phases to be improved. This process is automated inAuto- Rickshaw, allowing the structure to be fully determined from any of these starting points.

2.4.3. Isomorphism of data sets. Despite growing in iden- tical conditions, crystals of YenB–YenC2^NTR showed varying levels of non-isomorphism from batch to batch. This may be attributed to intrinsic differences in the crystals, changes caused by soaking in heavy-atom solutions (in the case of the tantalum-cluster crystal) or varying levels of dehydration occurring during harvesting, cryoprotection and cryocooling.

The largest changes are seen between the native and tantalum- cluster data sets, with changes in individual unit-cell dimensions of 1.4, 5.3 and 1.9 A˚ (a change of 1.0, 3.6 and 0.7%, respectively). This results in an increase in the unit-cell volume of 5.4%.

In order to compare the imperfect isomorphism of the various data sets, we scaled them against each other in a pairwise fashion and calculated the cross-crystal R factors (Table 2) using the CCP4 program SCALEIT (Winn et al., 2011). The SeMet data sets were all relatively isomorphous to Table 3

Data-collection and processing statistics for combinations of SeMet data sets.

The best individual SeMet data set was SeMet4. Other data sets were added to this to test whether the Se substructure could be solved.

SeMet4+1 SeMet4+1+2 SeMet4+1+3 SeMet4+2 SeMet4+3 SeMet4+2+3 Space group P212121 P212121 P212121 P212121 P212121 P212121

Unit-cell parameters

a(A˚ ) 134.9 134.7 134.7 134.7 134.7 134.6

b(A˚ ) 150.2 150.3 150.3 150.4 150.3 150.4

c(A˚ ) 276.3 276.4 276.4 276.5 276.4 276.5

Wavelength (A˚ ) 0.9791 0.9791 0.9791 0.9791 0.9791 0.9791

No. of frames 856 1216 1216 1080 1080 1440

Oscillation () 0.5 0.5 0.5 0.5 0.5 0.5

Resolution (A˚ ) 96.50–2.77 (2.82–2.77)

96.46–2.77 (2.82–2.77)

96.45–2.77 (2.82–2.77)

96.47–2.77 (2.82–2.77)

96.46–2.77 (2.82–2.77)

96.44–2.77 (2.82–2.77) Rmeas† 0.335 (2.112) 0.371 (2.080) 0.356 (2.042) 0.381 (2.409) 0.365 (2.388) 0.388 (2.339) R_p.i.m.† 0.112 (0.930) 0.104 (0.919) 0.100 (0.902) 0.113 (1.122) 0.108 (1.113) 0.100 (1.099) hI/(I)i 14.7 (1.2) 15.1 (1.2) 14.6 (1.2) 14.6 (1.0) 14.6 (1.0) 14.6 (1.0) CC1/2 99.4 (47.7) 99.3 (48.8) 99.4 (49.3) 99.2 (45.1) 99.4 (46.3) 99.5 (45.2) Completeness (%) 99.4 (89.0) 99.4 (87.4) 99.4 (87.8) 98.9 (82.5) 98.9 (82.9) 98.8 (80.4) Multiplicity 17 (7.3) 23.4 (7.3) 23.4 (7.3) 20.8 (6.7) 20.9 (6.7) 27.3 (6.6) CCanom(%) 9.4 (0.7) 11.0 (0.3) 11.0 (0.4) 11.5 (0.6) 11.7 (1.7) 13.6 (2.5) RCRanom‡ 1.099 (0.993) 1.117 (0.969) 1.116 (0.997) 1.123 (0.994) 1.124 (0.984) 1.147 (0.975) Anomalous

resolution§ (A˚ ) 5.4 5.1 5.2 5.1 5.3 5.0

No. of unique reﬂections

141965 (6422) 141967 (6315) 141967 (6330) 141324 (5936) 141327 (5967) 141327 (5813)

†Rmeas(Rr.i.m.) andRp.i.m.are as deﬁned in Weiss (2001). ‡ R.m.s. correlation ratio between two half data sets (Evans, 2011). § Anomalous resolution is deﬁned as the resolution at which CCanomdrops below 0.3.

(11)

one another, with data sets 2 and 3 being the most similar. This justifies our decision to combine all four data sets to increase the multiplicity for SAD phasing. The tantalum-cluster and native data sets, in comparison, showed relatively large R factors. This is not surprising considering the large differences in unit-cell parameters (Table 1) and confirms our suspicion that non-isomorphism would make isomorphous replacement phasing difficult with these data sets.

2.4.4. Refinement. The autobuilt model was manually corrected and several rounds of reﬁnement model building took place. The phase information was then transferred to the higher resolution native data set by molecular replacement.

Rounds of manual model building usingCoot (Emsleyet al., 2010) and refinement usingphenix.refine(Afonineet al., 2012) followed. The final refinement statistics and the details and the interpretation of the structure have been published in Busbyet al.(2013).

3. Discussion

The crystal structure of the YenB–YenC2^NTR complex presented a difficult phasing problem. The complex is large, with a total of 4354 amino acids in the asymmetric unit, 64 of which are methionines. SeMet-labelled crystals diffracted relatively poorly, the anomalous signal was low and the large number of heavy-atom sites initially made SAD/MAD methods unsuccessful. A tantalum-cluster data set provided a much higher anomalous signal and heavy atoms could be located, but the limited resolution and poor phase quality at high resolution meant that the electron density produced could not be interpreted. It was only with the combined use of both derivative data sets that the phase problem could be solved by using the low-resolution phase information from the tantalum-cluster data set to find heavy-atom sites in the SeMet data set and then using multiple rounds of autobuilding and MRSAD to bootstrap to a structure that could be refined.

Despite a strong anomalous signal at low resolution, the tantalum-cluster data set did not result in an interpretable electron-density map. A sharp decrease in anomalous signal at 6 A˚ is often seen with this cluster compound (Banumathiet al., 2003), and while heavy-atom sites could be easily located and the maps produced are (in retrospect) correct, the resolution was not sufﬁcient to allow model building. This low- resolution phase information could, however, be used to determine the heavy-atom substructure of other derivatives. If these derivatives were isomorphous, this process would have been relatively straightforward, analogous to the process used in MIRAS (Panjikar & Tucker, 2002). A similar process can be used even with non-isomorphous derivatives by performing molecular replacement using electron density rather than an atomic model. This approach should be generally applicable in cases where an anomalous data set is available but is not quite good enough to determine the heavy-atom substructure on its own. An external source of phase information, even low- resolution or poor-quality phases, can aid in determining the substructure and allow bootstrapping into the build-and-reﬁne cycle. Such phase information could come from a low-

resolution heavy-atom data set, as is the case here, or from a poor molecular-replacement solution.

Our SeMet data sets appear to be on the border of solva- bility. The heavy-atom substructure could be determined in the high-multiplicity combined data set and in certain combinations of three data sets with our initial settings, but the single-crystal data set could only be solved by using very large numbers of trials with random-omit seeding rather than Patterson seeding. The benefit of highly redundant SAD data has been known for some time for single crystals (Dauteret al., 2002), and recently averaging data sets collected from a number of different crystals has been shown to be advanta- geous at low resolution (Liuet al., 2011, 2013, 2014; Mancusso et al., 2012; Akeyet al., 2014; Wo¨hlertet al., 2014; Bleichertet al., 2015; Kimet al., 2015; Klinke et al., 2015). This is often necessary for successful phasing using very weak anomalous signal such as from S-SAD, but can also help in other difficult cases such as with a large number of anomalous scatterers, large asymmetric units, poor resolution or crystals that are very sensitive to radiation damage. High-multiplicity data, particularly combining data from multiple crystals, has several benefits for phase determination. Aside from the obvious benefit that more measurements means greater accuracy of measurement (which is particularly important when measuring very small anomalous differences), combining isomorphous data from multiple crystals allows the creation of a high-completeness data set with a much lower total dose, limiting the effects of radiation damage. It also helps to remove systematic errors owing to individual crystal defects, as these will tend to average out over multiple crystals.

In summary, the work presented here emphasizes several key points about the experimental phasing of difﬁcult structures. (i) Combining data sets from multiple isomorphous crystals can allow phasing where individual crystal data sets could not. Several groups have observed this in recent years and multiple-crystal data collection is becoming more common (Cherezov et al., 2007; Liu et al., 2011, 2014;

Rasmussenet al., 2011). (ii) Very high levels of multiplicity can aid in substructure determination and phasing. (iii) When solving a substructure with large numbers of heavy atoms, higher numbers of trials (>10 000) may be necessary, and random-omit seeding may work better than Patterson seeding.

(iv) Low-resolution phase information that is insufﬁcient for model building on its own (such as from a poor molecular- replacement model or limited experimental phases) can be useful in locating heavy atoms in other data sets. When data sets are isomorphous, the phases can be used directly, but even when non-isomorphous they can be used via cross-crystal averaging and, when available, NCS averaging can assist this process.

Acknowledgements

This work was supported by a subcontract to JSL and JNB from the New Zealand Foundation for Research, Science and Technology contract C10X0804 awarded to Mark R. H. Hurst.

All X-ray diffraction data were collected on the MX1 and MX2 beamlines of the Australian Synchrotron with the

190

Busbyet al. Combining cross-crystal averaging and MRSAD Acta Cryst.(2016). D72, 182–191

(12)

support of the New Zealand Synchrotron Group for synchrotron-access arrangements. We wish to thank all of the beamline staff of the MX beamlines for their support in X-ray data collection.

References

Afonine, P. V., Grosse-Kunstleve, R. W., Echols, N., Headd, J. J., Moriarty, N. W., Mustyakimov, M., Terwilliger, T. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2012).Acta Cryst.D68, 352–367.

Akey, D. L., Brown, W. C., Konwerski, J. R., Ogata, C. M. & Smith, J. L. (2014).Acta Cryst.D70, 2719–2729.

Banumathi, S., Dauter, M. & Dauter, Z. (2003). Acta Cryst. D59, 492–498.

Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000).Nucleic Acids Res.28, 235–242.

Bleichert, F., Botchan, M. R. & Berger, J. M. (2015). Nature (London),519, 321–326.

Busby, J. N., Panjikar, S., Landsberg, M. J., Hurst, M. R. H. & Lott, J. S. (2013).Nature (London),501, 547–550.

Cherezov, V., Rosenbaum, D. M., Hanson, M. A., Rasmussen, S. G. F., Thian, F. S., Kobilka, T. S., Choi, H.-J., Kuhn, P., Weis, W. I., Kobilka, B. K. & Stevens, R. C. (2007).Science,318, 1258–1265.

Cowie, D. B. & Cohen, G. N. (1957). Biochim. Biophys. Acta, 26, 252–261.

Cowieson, N. P., Aragao, D., Clift, M., Ericsson, D. J., Gee, C., Harrop, S. J., Mudie, N., Panjikar, S., Price, J. R., Riboldi-Tunnicliffe, A., Williamson, R. & Caradoc-Davies, T. (2015).J. Synchrotron Rad.

22, 187–190.

Cowtan, K. D. (1994). Jnt CCP4/ESF–EACBM Newsl. Protein Crystallogr.31, 34–38.

Cowtan, K. (2006).Acta Cryst.D62, 1002–1011.

Dauter, Z., Dauter, M. & Dodson, E. J. (2002). Acta Cryst. D58, 494–506.

Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010).Acta Cryst.D66, 486–501.

Evans, P. R. (2011).Acta Cryst.D67, 282–292.

Evans, P. R. & Murshudov, G. N. (2013).Acta Cryst.D69, 1204–1214.

Garman, E. & Murray, J. W. (2003).Acta Cryst.D59, 1903–1913.

Gomis-Ru¨th, F. X. & Coll, M. (2001).Acta Cryst.D57, 800–805.

Gorrec, F. (2009).J. Appl. Cryst.42, 1035–1042.

Grosse-Kunstleve, R. W. & Adams, P. D. (2003). Acta Cryst. D59, 1966–1973.

Hendrickson, W. A., Horton, J. R. & LeMaster, D. M. (1990).EMBO J.9, 1665–1672.

Kabsch, W. (2010).Acta Cryst.D66, 125–132.

Kim, M.-S., Lapkouski, M., Yang, W. & Gellert, M. (2015).Nature (London),518, 507–511.

Klinke, S., Foos, N., Rinaldi, J. J., Paris, G., Goldbaum, F. A., Legrand,

P., Guimara˜es, B. G. & Thompson, A. (2015). Acta Cryst. D71, 1433–1443.

Kna¨blein, J., Neuefeind, T., Schneider, F., Bergner, A., Messer- schmidt, A., Lo¨we, J., Steipe, B. & Huber, R. (1997).J. Mol. Biol.

270, 1–7.

Li, J., Edwards, P. C., Burghammer, M., Villa, C. & Schertler, G. F. X.

(2004).J. Mol. Biol.343, 1409–1438.

Liu, Q., Guo, Y., Chang, Y., Cai, Z., Assur, Z., Mancia, F., Greene, M. I. & Hendrickson, W. A. (2014).Acta Cryst.D70, 2544–2557.

Liu, Q., Liu, Q. & Hendrickson, W. A. (2013).Acta Cryst.D69, 1314–

1332.

Liu, Q., Zhang, Z. & Hendrickson, W. A. (2011).Acta Cryst.D67, 45–59.

Luft, J. R. & DeTitta, G. T. (1999).Acta Cryst.D55, 988–993.

Mancusso, R., Gregorio, G. G., Liu, Q. & Wang, D.-N. (2012).Nature (London),491, 622–626.

Matthews, B. W. (1968).J. Mol. Biol.33, 491–497.

McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C. & Read, R. J. (2007).J. Appl. Cryst.40, 658–674.

McCoy, A. J. & Read, R. J. (2010).Acta Cryst.D66, 458–469.

McCoy, A. J., Storoni, L. C. & Read, R. J. (2004).Acta Cryst.D60, 1220–1228.

Moreland, N., Ashton, R., Baker, H. M., Ivanovic, I., Patterson, S., Arcus, V. L., Baker, E. N. & Lott, J. S. (2005). Acta Cryst.D61, 1378–1385.

Moukhametzianov, R., Burghammer, M., Edwards, P. C., Petitde- mange, S., Popov, D., Fransen, M., McMullan, G., Schertler, G. F. X.

& Riekel, C. (2008).Acta Cryst.D64, 158–166.

Murray, J. W., Rudin˜o-Pin˜era, E., Owen, R. L., Grininger, M., Ravelli, R. B. G. & Garman, E. F. (2005).J. Synchrotron Rad.12, 268–275.

Murshudov, G. N., Skuba´k, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011).

Acta Cryst.D67, 355–367.

Panjikar, S., Parthasarathy, V., Lamzin, V. S., Weiss, M. S. & Tucker, P. A. (2005).Acta Cryst.D61, 449–457.

Panjikar, S., Parthasarathy, V., Lamzin, V. S., Weiss, M. S. & Tucker, P. A. (2009).Acta Cryst.D65, 1089–1097.

Panjikar, S. & Tucker, P. A. (2002).Acta Cryst.D58, 1413–1420.

Rappleye, J., Innus, M., Weeks, C. M. & Miller, R. (2002).J. Appl.

Cryst.35, 374–376.

Rasmussen, S. G. F.et al.(2011).Nature (London),477, 549–555.

Sanishvili, R., Nagarajan, V., Yoder, D., Becker, M., Xu, S., Corcoran, S., Akey, D. L., Smith, J. L. & Fischetti, R. F. (2008).Acta Cryst.

D64, 425–435.

Sheldrick, G. M. (2010).Acta Cryst.D66, 479–485.

Studier, F. W. (2005).Protein Expr. Purif.41, 207–234.

Terwilliger, T. C. (2000).Acta Cryst.D56, 965–972.

Vagin, A. & Teplyakov, A. (2010).Acta Cryst.D66, 22–25.

Weiss, M. S. (2001).J. Appl. Cryst.34, 130–135.

Winn, M. D.et al.(2011).Acta Cryst.D67, 235–242.

Wo¨hlert, D., Ku¨hlbrandt, W. & Yildiz, O. (2014).Elife,3, e03579.