3 Chapter 3: Results and Discussion
3.3 Structural characterisation of CysB by X-ray crystallography
3.3.3 Solving the CysB structure
There are no full-length homologous CysB structures currently deposited in the PDB. There are, however, truncated structures from three organisms that report either the C-terminal regulatory domain or the N-terminal DNA-binding domain separately. The regulatory domain has been solved from K. aerogenes (1AL3, 1.80 Å), from S. typhimurium in complex with
72
different inducers and anti-inducers (4GXA, 4GWO, 4M4G, 4LQ5, 4LQ2, 2.4-2.8 Å) and from P. aeruginosa (5Z50, 2.21 Å). The DNA-binding domain has only been solved from P. aeruginosa (5Z4Y, 2.40 Å). These truncated forms showed good sequence similarity (≥
30%) and conservation of secondary structures (Figure 3.19) identifying them as good candidates for molecular replacement. The models from P. aeruginosa had good sequence similarity to that of CysB from N. gonorrhoeae, 54.5 % similarity to the DNA-binding domain (5Z4Y) and 49.7 % similarity to the regulatory domain (5Z50). As both the regulatory domain and DNA-binding domain had been solved from P. aeruginosa, and these models had good sequence similarity to CysB from N. gonorrhoeae, these models (5Z4Y & 5Z50) were used as search models for molecular replacement.
Solving this structure was a difficult process with iterative rounds of manual building in COOT (Emsley & Cowtan, 2004; Emsley et al., 2010) from the CCP4 suite (Winn et al., 2011) and refinement using phenix.refine (Afonine et al., 2012) to generate a logical model from the individual domains used as search models. Ambiguous density due to the 2.73 Å resolution also made positioning of sidechains difficult.
73
Figure 3.19 ENDscript analysis of the secondary structure of CysB from N. gonorrhoeae. For each residue solvent accessibility and hydrophobicity is labelled below the sequence.
Solvent accessibility is labelled ʺaccʺ and is rated in terms of blue being accessible, cyan less accessible and white being completely buried from solvent (red being not calculated). Hydrophobicity is labelled as ʺhydʺ and is ranked as cyan being hydrophilic, purple being hydrophobic and white being neutral. Secondary structure elements are labelled in grey above the peptide sequence, with α-helices, β-strands, andβ-turns, represented by linked loops, arrows and TT, respectively. Letters indicate noncrystallographic interaction with chain denoted by letter. Refer to Appendix AC.2 for chain labelling of the monomers in the ASU. Chain B was used for analysis. Figure generated using ENDscript 2.0 (Robert & Gouet, 2014).
74
The Mathew’s co-efficient was used to predict the number of monomers in the asymmetric unit (Matthews, 1968). The most likely number of monomer copies was determined to be four, calculated with a predicted molecular weight of 35.2 kDa for the CysB monomer (molecular weight calculated using Protparam (Gasteiger et al., 2005). Thus the P. aeruginosa CysB models (5Z4Y & 5Z50) were used as search models for molecular replacement with four monomers in the asymmetric unit.
Initial attempts using the P. aeruginosa DNA-binding domain (5Z4Y) and the regulatory domain (5Z50) as models for molecular replacement generated multiple solutions with poor TFZ and LLG scores. We hypothesised that this was due to the cross-over regions between domain I and domain II of the regulatory domain which allows flexibility within the structure.
We therefore split the regulatory domain into two models to allow for structural flexibility (Figure 3.20). When the DNA-binding domain and the split regulatory domains were used as a search model, Phaser found one solution with acceptable TFZ and LLG scores of 13.6 and 410.135 respectively (McCoy et al., 2007).
75
Figure 3.20 Initial input models for molecular replacement and autobuild output. A Input models used for molecular replacement modified from 5Z4Y and 5Z50. The 5Z4Y N-terminal domain model (green) was not modified, while the 5Z50 C-terminal regulatory domain was modified. The regulatory domain (5Z50) was split into two smaller domains (shown in yellow and blue) with the crossover regions deleted (shown in pink). B Output from phenix.autobuild visualised in Coot showing multiple chains in a random orientation where the two portions of the regulatory domain are not linked. Figure generated using PyMOL version 2.3.4 and Coot.
76
The CysB model was built using the autobuild.phenix programme (Terwilliger et al., 2008) from the PHENIX suite (Adams et al., 2010). The programme was supplied with the phenix.phaser (McCoy et al., 2007) output model and map, reflection file (with FreeR flag dataset) and the CysB amino acid sequence. Default settings were used (Appendix AC.3), with the exception of not placing waters in refinement. Rebuild was set to auto to allow the programme to build outside of the model where there may be unmodelled density, and to edit the structure file to match the CysB sequence file provided. The structure went through six iterative building rounds and three cycles of refinement. The resulting structure had an acceptable Rwork/Rfree score for the starting model of 0.2507/0.3069. The resulting structure had multiple fragments (18 in total) and had not built the two portions of the regulatory domain relative to one another such that they would form one full C-terminal regulatory domain (Figure 3.20).
Through investigation of the model generated by this initial round of autobuild.phenix, we were able to identify density that supported symmetry-related chains and were able to manually build two chains (from the initial 18 chain fragments). We then supplied these fully built chains as individual monomers for search models (Figure 3.21) and phenix.phaser found one solution with good TFZ and LLG scores of 76.2 and 6788.248 respectively. The autobuild.phenix programme was run using the same settings as described above and generated a good Rwork/Rfree
score for the starting model of 0.2280/0.2821. In the asymmetric unit, four monomers were built and arranged in a homotetramer. The majority of each monomer was built by automated building (95%), however, there were sections of the protein’s amino acid sequence missing (Figure 3.21). The remaining unbuilt peptide sequences were manually built in COOT. During manual building it was found that due to ambiguous density at 2.73 Å resolution, the autobuilding process had placed residues incorrectly within the density. The chains within the regulatory domains had to be rebuilt to allow the protein backbone to sit correctly within the density as there was continuous density with no gaps in density indicating no gaps in the Cα backbone, and as such, the regulatory domain chains had to be rebuilt. This was an iterative process of manual building in Coot and refinement using phenix.refine to build the remaining residues and place the protein chains correctly within the density.
77
Figure 3.21 Final input models for molecular replacement and autobuild output. A Input models (shown in green and blue) used for the second round of molecular replacement which were built from the first round of autobuild (Figure 3.20). B Output from the second round of autobuild visualised in Coot showing four chains arranged in a tetramer. Gaps are present in the regulatory domains of two of the monomers (blue and purple) shown as dotted lines. Figure generated using PyMOL version 2.3.4 and Coot.