Genome Evolution - Origin and Evolution of SARS-CoV-2 and Other

Ecology and Evolution of Betacoronaviruses

3.2 Origin and Evolution of SARS-CoV-2 and Other

3.2.1 Genome Evolution

Mutation is the fundamental evolutionary mechanism driving diversification in biological entities.

Although RNA virus populations generally have high mutation rates, not all mutations in a virus population have the same fate. Some mutations are lost over time by natural selection or genetic drift, while others increase in frequency and eventually become fixed in the virus population (Pagán 2018). Generally, the rate of fixation determines the long-term rate of evolution (Pagán 2018; Vandamme 2009) and results from the combined effects of mutation rates, generation times, effective population size, and fitness in viruses (Duffy et al. 2008), and it can be calculated using Bayesian phylogenetic infer- ence or regression methods (Bouckaert et al.

2019).

The mutation rates of RNA viruses are in the range from 1 × 10⁻³ to 1 × 10⁻⁶ mutations/nucleotide/replication round (m/n/r) (González- Candelas et al. 2018), and generally, this results in a rate of between 1 × 10⁻² and 1 × 10⁻⁵ nucleotide substitutions/site/year (s/s/y) (Duffy et al.

2008; González-Candelas et al. 2018; Hanada et al. 2004; Jenkins et al. 2002). These rates result from low-fidelity copying by their RdRps, which lack a proofreading mechanism (Drake and Holland 1999). Although coronaviruses possess the largest genomes (26–32 kb) among ribovi- ruses (Anthony et al. 2017; Gorbalenya et al.

2006), they have lower mutation rates than others as they have proofreading activity provided by the 3′ exoribonuclease (ExoN) (Minskaia et al.

2006; Zhang and Holmes 2020). Currently, the coronaviruses are the only group of RNA viruses

known to have polymerases with this type of exo- nuclease activity (Sanjuán and Domingo-Calap 2019).

The genetic inactivation of ExoN activity in engineered SARS-CoV genomes results in viable mutants that have a 15- to 20-fold increases in mutation rates, up to 18 times greater than those tolerated for fidelity mutants of other RNA viruses, and thus this protein is essential for replication fidelity and coronavirus genetic diversity (Denison et al. 2011; Gribble et al. 2020). ExoN has been determined as the first (non-RdRp) viral protein involved in the sensitivity of RNA viruses to mutagens and could be considered as a good target for therapeutics in coronaviruses (Smith et al. 2013) by, for example, inducing lethal mutagenesis in coronavirus populations (Perales et al. 2011).

The effect of ExoN is to produce a mutation rate for SARS-CoV genomes of around 9.06 × 10⁻⁷ substitutions/site/replication round (s/n/r), i.e., three orders of magnitude less than the average of other RNA viruses (1 × 10⁻⁴) (Cuevas et al. 2009; Drake 1991; Eckerle et al.

2010), and more like that of some ssDNA viruses such as the bacterial virus ΦX174 (Eckerle et al.

2010; Sanjuán et al. 2010). However, when ExoN is inactivated, the mutation rate for SARS-CoV increases to around 1.26 × 10⁻⁵ s/n/r (Eckerle et al. 2010), which is closer to that of other RNA viruses and near the limit as the virus population cannot fix/accumulate more, so it suffers an error catastrophe (Belshaw et al. 2011; Duffy et al.

2008; Sanjuán and Domingo-Calap 2019).

Knowledge of the rates of virus evolution allows the times of coalescence (or divergence) of taxa or lineages, and the date of their most recent common ancestor (tMRCA), to be inferred.

Note that although all mutations contribute to the

“molecular clock” of a population, most of the most recent mutations are quickly lost (Duchêne et al. 2014), so that the older a population is, the more diverse it will be, and its average evolutionary rate will be less.

Short-term rates of evolution of the current populations have been estimated for the five human betacoronaviruses (Table 3.3). It has been estimated that the tMRCA for the hCoVs OC43

Table 3.3Evolutionary rate estimates of human betacoronaviruses VirusEvolution rate (x10−3 s/s/y)References SARS-CoV-21.04 (0.71–1.40)Virological website SARS-CoV0.80–2.38Zhao et al. (2004) MERS-CoV0.63 (0.14–1.1)Cotten et al. (2013) 1.12 (0.88–1.37)Cotten et al. (2014) 0.96 (0.83–1.09)Dudas et al. (2018) hCoV-OC430.43 (0.27–0.60)Vijgen et al. (2005) hCoV-HKU10.62 (0.42–0.78)aAl-Khannaq et al. (2016) Prepared with data from http://virological.org/ aOnly the S gene

and HKU1 is around 120 and 70 years before present (ybp), respectively (Al-Khannaq et al.

2016; Forni et al. 2017; Vijgen et al. 2005), and the tMRCA for hCoVs SARS-CoV and MERS- CoV has been estimated to be around 1985–1998 and 2006, respectively (Forni et al. 2017).

Similarly, the tMRCA date for the divergence of SARS-CoV-2 and RaTG13 or RmYN02 was around 40–70 ypb and 37.02 ybp (18.19–55.85), respectively (Boni et al. 2020; Nielsen et al.

2020). Likewise, if all viruses of the SARS-2 lineage (Fig. 3.2b) are evolving at the same rate as that of the complete SARS-CoV-2 genome (1.04 × 10⁻³ s/s/y) (Table 3.3), one can estimate the dates of their divergences. The nonrecombi- nant 1-11502 nts region of the SARS-CoV-2 genome is evolving 9.2% faster than the complete genome. Thus, it has an evolutionary rate of 1.135 × 10⁻³ s/s/y. Therefore, using the pairwise evolutionary (i.e., patristic) distances between the SARS-2 lineage viruses, the nodes marked

“a” to “e” in Fig. 3.2b are dated as 1996.8 CE (23.2 years before present; ybp), 1987.3 CE (32.7 ybp), 1945.1 CE (74.9 ybp), 1919.3 CE (100.7 ybp), and 1800 CE (220 ybp), respectively. The standard deviation of the branch length estimates was less than 2%, and the most recent estimates are probably the most accurate because, as explained above, all mutations contribute to the

“molecular clock,” but most are quickly lost (Duchêne et al. 2014), and therefore times to the older dates are overestimated. However, likely, the two recombinant regions characteristic of the SARS-CoV-2s, EPI_ISL_412977, and MN996532 sequences were acquired by their shared progenitor more than 20 years ago!

It is noteworthy that the tMRCA for all members of the subfamily Orthocoronavirinae is estimated to be around 300 million years ago, which is believed to coincide with the separation of the classes Mammalia and Aves (Forni et al. 2017). It means that the evolutionary rate of the coronaviruses since their origins is many orders of magnitude smaller than the evolutionary rate of their extant populations.

As already described, coronaviruses can undergo homologous recombination between viral genomes of the same species (intraspecific

recombination) (Gorbalenya et al. 2020a; Lai 1996; Luk et al. 2019; Tao et al. 2017).

Recombination in genomes from members grouped in different coronavirus species (inter- specific recombination) has been suggested to play an important role in the diversification of this family (Tao et al. 2017); note that the “minor parent” of recombinants “a” and “b” in Table 3.2 are in the SARS-1 lineage, but the more recent ones are all from the SARS-2 lineage. The frequency of recombination in betacoronaviruses has been determined to be 25% or more in the entire genome. It is the highest determining recombination frequency for non-segmented single- stranded positive RNA viruses (Baric et al. 1990). Recombination is not just an experi- mental artifact and is frequent under natural con- ditions (Decaro et al. 2009; Herrewegh et al.

1998; Hon et al. 2008; Kahn and McIntosh 2005;

Lau et al. 2010, 2011), and as a result, the recombination of the spike protein gene may be the cru- cial event that changes the host range of coronaviruses (El-Duah et al. 2019).

Coronaviruses with closely similar sequences to human isolates have been found in wild ani- mals and may be a source of human isolates (Azhar et al. 2014; Corman et al. 2016; Guan et al. 2003; Tao et al. 2017). The coronavirus that is genetically most similar to SARS-CoV-2 is RaTG13 (96.1% nt identity) (Zhou et al. 2020), which was isolated from a bat. Recently, the RmYN02 coronavirus was also isolated from a bat, but even though it shares the highest nt identity in the nonstructural region (ORF 1ab), 97.2%, its RBD region has only 61.3% of nt identity with the SARS-CoV-2 region (Zhou et al. 2020), as it is recombinant in that region (Fig. 3.3).

The sequence of coronavirus Guangdong/1/2019, isolated from a Malayan pangolin (Manis javanica) has an nt identity of 91.02% with the whole genome, but interestingly its RBD region shares 97.4% of aa identity with SARS-CoV-2, including six key residues in the site probably involved in the interaction between the subunit S1 of the spike protein and the cell receptor in humans (ACE2) (Andersen et al.

2020; Lam et al. 2020). The 12 nts insertion in the SARS-CoV-2 genome encoding -PRRA-,

between the S1 and S2 subunits, makes it a site for cleavage by furin and other proteases, and it is not found in other sarbecoviruses (Andersen et al. 2020). This distinctive feature has contrib- uted to conspiracy theories about the origin of SARS-CoV-2. The recombination history we report above for SARS-CoV-2 and its nearest relatives resolves those suggested by others (Lau et al. 2020; Xiao et al. 2020) and indicates that the most reliable comparison of full-length homologs is between the SARS-CoV-2 and

RaTG13 sequences, and the slightly closer comparison between the SARS-CoV-2 and RmYN02 sequences is only valid for nts 1-11502.

We, therefore, compared them directly (Fig. 3.4). It can be seen that there are more synonymous (S) than non-synonymous (NS) changes (van Dorp et al. 2020), but both are distributed genome-wide. As expected, most of the differences, especially NS ones, are in the spike protein gene, especially its RBD region and an adjacent “-PRRA- ” insertion (Andersen et al.

Fig. 3.4 A histogram that scans the S/NS differences between the SARS-CoV-2 and RaTG13 concat sequences using the DnDscan method (Gibbs et al. 2007). The sequences were scanned codon by codon, and their S/NS differences determined one nucleotide position at a time,

before calculating sliding running sums for ten codons at each codon position. S and NS differences are in blue and orange, respectively, with their lengths indicating the score (NB the “PRRA” insert is of four NS differences)

2020), but there is also a region of NS differences around codon 1000, which seems to have evaded scrutiny so far. It is in the nsp3 region and named

“DUF (domain of unknown function) 3655”; it is a “disordered binding region” (Prates et al. 2020) and is N′ terminally adjacent to the ADP-ribose phosphatase. Most of the amino acid differences in that region are conservative, but notably, three involve proline and would therefore change the shape of the protein!

Dalam dokumen Book Coronavirus Disease - COVID-19 (Halaman 65-69)