We Know About the Characteristics of Viruses?
2.4 Molecular Features of SARS- CoV- 2 Genome and Proteins
2.4.1 Noncoding Genes
The 5′ UTR region is approximately 265-nt, which contains cis-acting sequences necessary for transcription and replication and is involved in the initiation of cap-dependent translation (Liu et al. 2007). The 5’UTR secondary structure con- sists of five stem-loop structures (SL1-SL5), which is relatively conserved in betacoronavi- ruses (Masters and Perlman 2013; Liu et al. 2007;
Rangan et al. 2020a, b). SL3 contains a short TRS, which is located immediately adjacent to ORF. SARS-CoV-2 genome contains nine TRSs, located before different nine ORFs, separately (Fig. 2.3). TRS of SRAS-CoV-2 is a conserved sequence (5’-ACGAAC-3′), which is consistent with that of SARS-CoV (Masters and Perlman 2013). SL5 contains the AUG start codon of ORF1ab (Rangan et al. 2020b).
The 3’ UTR region is approximately 229-nt, which possesses a polyadenylate tail and is
Fig. 2.2 SARS-CoV-2 structure and spike protein. (a) Diagram showing RNA genome and the four major structural proteins: spike protein (S), an envelope protein (E), membrane protein (M), and nucleocapsid protein (N). (b) Schematic of trimer spike protein structure. (c) Linear representation of monomeric spike protein
Fig. 2.3 Genome organization and protein products of SARS-CoV-2. A schematic of the complete genome of SARS-CoV-2 (NC_045512.2) is shown in the middle.
Replicase genes and products are present at the bottom.
Other genes from the seven human coronaviruses are placed on the top panel, which are OC43 (NC_006213.1),
HKU1 (NC_006577.2), NL63 (NC_005831.2), 229E (NC_002645.1), MERS-CoV (NC_019843.3), SARS- CoV (NC_004718.3), and SARS-CoV-2 (NC_045512.2).
The RNA structure of the frameshift motif was calculated using the UNAFold web server
considered important for virus replication and potentially translation (Rangan et al. 2020b;
Taiaroa et al. 2020). The 3′ UTR includes switch- like domain (mutually exclusive formation of a pseudoknot and stem-loop) and hypervariable region (HVR) (Rangan et al. 2020a, b). The switch-like domain is essential, while HVR is dis- pensable for viral replication (Rangan et al.
2020b). HVR contains a conserved octa-nucleo- tide sequence (5’-GGAAGAGC-3′) and a con- served subregion stem-loop II-like motif (s2m) (Rangan et al. 2020b; Masters and Perlman 2013).
2.4.2 Nonstructural Proteins
The Nsp1 gene encodes a 19.78 kDa protein, which consists of approximately 180 amino acids. Through binding to 40S and 80S ribo- somes and blocking mRNA entry channels, nsp1 can shutdown capped mRNA translation, includ- ing mRNAs coding for antiviral defense factors (Thoms et al. 2020). A C-terminal of nsp1 is cru-
cial for ribosome binding and translation sup- pression, which is similar to SARS-CoV (Kamitani et al. 2009). Mutations (K164A/
H165A) in this motif abrogate its binding capac- ity to the ribosome subunit (Thoms et al. 2020). It indicates that the K/H residues in this motif are crucial for ribosome binding and translation sup- pression, which is similar to SARS-CoV (Kamitani et al. 2009). This mechanism confers nsp1 to facilitate immune evasion through block- ing type I interferon innate immune response (Thoms et al. 2020). Besides, nsp1 shares 88.44%
identity with that of SARS-CoV at the protein level.
Nsp2 gene encodes a 638-amino-acid protein.
It shares 68.34% identity with that of SARS-CoV at the protein level. The function of nsp2 has not been investigated in SARS-CoV-2 yet but reported in SARS-CoV. The deletion of the nsp2 gene in the SARS-CoV genome attenuates viral growth and RNA synthesis (Graham et al. 2005), which indicates that nsp2 is dispensable for viral replication and provides a new scenery of attenu- ated live vaccine design.
Nsp3 gene codes a 217.28 kDa multi- transmembrane protein that consists of 1945 amino acids and is the largest of replicase- transcriptase complex (RTC) proteins (Angeletti et al. 2020; Wu et al. 2020a). It contains multiple tandem domains, performing different functions, separately. These modular domains are macrodo- main (Mac), SARS-unique domain (SUD), papain-like proteinase domain (PLpro2), nucleic acid-binding domain (NAB), transmembrane domains, and Y domain (Alhammad et al. 2020;
Frick et al. 2020; Rut et al. 2020b). Mac contains three tandem domains: Mac1, Mac2, and Mac3.
Mac1 exists in all coronaviruses and is previ- ously known as ADP-ribose-1″-phosphatase (ADRP) (Egloff et al. 2006; Putics et al. 2005;
Saikatendu et al. 2005). Mac1 functions as a highly efficient mono-adenosine diphosphate (ADP) ribosylhydrolase enzyme, which pos- sesses the capacity of binding ADP-ribose and hydrolyzing single mono-ADP-ribose unit (post- translational modification) from protein substrate (Alhammad et al. 2020; Frick et al. 2020).
Besides, a bioinformatic analysis suggested that
Table 2.1 SARS-CoV-2 (NC_045512.2) protein features
SARS- CoV- 2
Residues (amino acid)
Molecular size (KDa, without modifications)
Similarity with SARS-CoV in protein sequence level
nsp1 180 19.78 84.44%
nsp2 638 70.52 68.34%
nsp3 1945 217.28 75.97%
nsp4 500 56.19 80.00%
nsp5 306 33.8 96.08%
nsp6 290 33.04 88.15%
nsp7 83 9.24 98.80%
nsp8 198 21.89 97.47%
nsp9 113 12.38 97.35%
nsp10 139 14.79 97.12%
nsp11 13 1.33 84.62%
nsp12 932 106.67 96.35%
nsp13 601 66.86 99.83%
nsp14 527 59.82 95.07%
nsp15 346 38.82 88.73%
nsp16 298 33.33 93.29%
S 1273 141.2 75.96%
E 75 8.37 94.74%
M 222 25.15 90.54%
N 419 45.64 90.52%
Mac may remove mono-ADP-ribosylation from the STAT1, which may be related to cytokine storm observed of COVID-19 (Claverie 2020).
PLpro2 acts as a cysteine protease, which recog- nizes the LXGG motif in polyproteins 1a and 1ab between SARS-CoV-2 proteins nsp1 and nsp2, nsp2, and nsp3 and nsp3 and nsp4 (nsp1/nsp2, nsp2/nsp3, nsp3/nsp4) (Rut et al. 2020a). PLpro2 carries out a precise cleavage straight after this motif to separate nsp1, nsp2, and nsp3 from poly- proteins (Rut et al. 2020a). Also, PLpro2 harbors the deISGylating activity to regulate immune evasion, similar to SARS-CoV and MERS-CoV (Clasman et al. 2020; Ratia et al. 2014; Mielech et al. 2014). The ISGylation is defined as conju- gation of ISG15 to target proteins and mediate antiviral response, which can be reversed using deISGylation. This mechanism facilitates PLpro2 to block interferon-responsive factor 3 (IRF3) nuclear translocation and further attenuate type I interferon responses, which may be considered as an antiviral target (Shin et al. 2020). SARS-CoV PLpro2 has another scheme of hydrolyzing K48- linked Ub chains to mediate immune escape, in which SARS-CoV-2 PLpro2 does not possess (Rut et al. 2020a; Shin et al. 2020). The functions of other domains remain unclear and need further investigation.
Nsp4 is a 56.19 kDa protein that consists of 500 amino acids. This protein contains multiple transmembrane helices, anchoring in the intracel- lular membrane (Wu et al. 2020a). SARS-CoV-2 nsp4 is 80% identical with SARS-CoV nsp4 at the amino acid level. The function of nsp4 has not been investigated in SARS-CoV-2 yet but reported in SARS-CoV. SARS-CoV nsp4 bind- ing with nsp3 is essential but not sufficient for membrane rearrangement, which is crucial for RNA replication (Sakai et al. 2017).
Nsp5 gene encodes a 33.8 kDa protein con- sisting of 306 amino acids. This protein is a con- served 3-chymotrypsin-like protease (3CLpro) and shares a 96.08% identity between SARS- CoV and SARS-CoV-2 at the amino acid level (Tahir ul Qamar et al. 2020; Kneller et al. 2020).
3CLpro is the main protease (Mpro) in cleaving polyprotein1a and polyprotein 1ab at 11 distinct sites to produce 13 nonstructural proteins: nsp4–
nsp16 (Kneller et al. 2020; Zhang et al. 2020).
Mpro consists of three domains: I, II, III (Kneller et al. 2020; Zhang et al. 2020; Jin et al. 2020a).
Domains I/II (residues 8–184) are the catalytic domains, and domain III (residues 201–303) mediates the dimerization of nsp5, which is the necessary prerequisite to the catalytic activity (Zhang et al. 2020; Kneller et al. 2020). Based on its importance to viral replication, nsp5 can be designed as a drug target (Jin et al. 2020a, b;
Zhang et al. 2020; Tahir ul Qamar et al. 2020;
Elfiky et al. 2020).
Nsp6 is a 33.04 kDa protein that consists of 290 amino acids. This protein contains multiple transmembrane domains, together with nsp3 and nsp4, positioning the RTC in the intracellular membrane (Benvenuto et al. 2020; Wu et al.
2020a). SARS-CoV-2 nsp6 is 88.15% identical with SARS-CoV nsp6 at the amino acid level, and the function of nsp6 needs further lucubration.
Nsp7 is a 9.24 kDa protein that consists of 83 amino acids. Nsp8 is a 21.89 kDa protein that consists of 198 amino acids. Nsp7 and nsp8 mediate the formation of accessory subunits of RNA-dependent RNA polymerase (RdRp) (Peng et al. 2020; Hillen et al. 2020; Romano et al.
2020). Polymerase assay in vitro showed that the absence of either accessory subunit makes repli- cation impossible (Peng et al. 2020). Besides, nsp7 and nsp8 are conserved between SARS- CoV- 2 and SARS-CoV, with 98.8% and 97.47%
identity at the amino acid level, respectively.
Nsp9 is a 12.38 kDa protein that consists of 113 amino acids. Nsp9 takes part in viral genome reproduction via binding single-stranded RNA (Littler et al. 2020). Even more remarkably, it interacts with NKRF (NF-КB repressor) and facilitates both IL-8 and IL-6 induction, which is involved in cytokine storm syndromes and ele- vated mortality in COVID-19 patients (Mehta et al. 2020; Li et al. 2020; Wang et al. 2020a).
Besides, nsp9 shares 97.35% protein sequence identity and similar crystal structure between SARS-CoV-2 and SARS-CoV (Littler et al.
2020).
Nsp10 is a 14.79 kDa protein that consists of 139 amino acids. It is a zinc finger protein that
can bind nonspecific RNA and interact with nsp16 as its cofactor (Rosas-Lemus et al. 2020;
Li et al. 2020; Krafcikova et al. 2020). Nsp10 also interacts with nsp14 as its cofactor. More impressive is that nsp10 elevates both IL-8 and IL-6 production in lung epithelial A549 cells, reflecting that it is a potential virulence factor of hyper inflammation similar to nsp9 (Li et al.
2020). Besides, nsp10 is conserved between SARS-CoV-2 and SARS-CoV, with 97.12%
identity at the protein sequence level. Following behind nsp10, nsp11 is an accessory remnant from polyprotein 1a cleavage. It is a 1.33 kDa peptide that consists of 13 amino acids and shares 84.62% identity with that of SARS-CoV at the protein sequence level.
Nsp12 is a 106.67 kDa protein that consists of 932 amino acids. It is a core catalytic subunit of the polymerase complex, which also needs the involvement of an nsp7-nsp8 heterodimer and an additional nsp8 subunit (Peng et al. 2020; Hillen et al. 2020). Nsp12, nsp7, and nsp8 are remark- ably and equally crucial for polymerase activity because the absence of either will inactivate the enzyme capacity (Peng et al. 2020). Nsp12 con- tains an N-terminal nucleotidyltransferase (NiRAN) domain and a C-terminal RdRp domain, with an interface domain between them (Peng et al. 2020). The RdRp activity can be ter- minated in vitro by tenofovir and emtricitabine (two FDA-approved HIV drugs for pre-exposure prophylaxis, PrEP), which may be designated as a potential PrEP therapy against COVID-19 (Copertino Jr. et al. 2020; Jockusch et al. 2020).
Besides, nsp12 is 96.35% identical to that of SARS-CoV at the protein sequence level.
Nsp13 is a 66.86 kDa protein that consists of 601 amino acids. Nsp13 is the most conserved protein between SARS-CoV-2 and SARS-CoV, with 99.83% identity at the amino acid level.
Nsp13 of both viruses is a multifunctional pro- tein, comprising an N-terminal Zn2+ binding domain (ZBD) and a C-terminal helicase domain (Mirza and Froeyen 2020; Romano et al. 2020).
SARS-CoV nsp13 can unwind duplex RNA or DNA with a 5′ to 3′ directionality in an NTP- dependent manner, and this catalytic efficiency
can be enhanced (twofold) by nsp12 (Adedeji et al. 2012a). It also contains 5′-triphosphatase activity, required for the first step of 5’cap syn- thesis (Adedeji et al. 2012b; Ivanov et al. 2004;
Tanner et al. 2003). These functions of SARS- CoV- 2 nsp13 need further investigation and confirmation.
Nsp14 is a 59.82 kDa protein that consists of 527 amino acids. Nsp14 is a conserved protein with a 95.07% identity between SARS-CoV-2 and SARS-CoV at the amino acid level. Nsp14 of each virus is a bifunctional enzyme with an N-terminal 3′-5′ exonuclease (ExoN) domain and a C-terminal guanine-N7 methyltransferase (N7-MTase) domain (Minskaia et al. 2006;
Romano et al. 2020). ExoN can proofread the elongating RNA and excise the mismatched base (Ogando et al. 2019; Eckerle et al. 2010).
N7-MTase is S-adenosylmethionine (SAM)- dependent methyltransferases responsible for gRNA and sgRNA m7GpppA cap synthesis (Krafcikova et al. 2020). SARS-CoV study showed that the nsp10 cofactor enhances the ExoN activity of nsp14 (>35-fold) without influ- ence upon its N7-MTase activity (Bouvet et al.
2012). Yeast two-hybrid analysis provides evi- dence for SARS-CoV-2 nsp14-nsp10 interaction.
However, the function behind the complex needs further investigation (Li et al. 2020).
Nsp15 is a 38.82 kDa protein that consists of 346 amino acids. It is an endonuclease (NendoU) composed of an N-terminal oligomerization domain, a middle domain, and a C-terminal NendoU catalytic domain (Kim et al. 2020b).
NendoU activity is Mn2+-dependent, which cuts single-stranded RNA substrates with specificity cleavage downstream of uridylate residues and releases 2′-3′ cyclic phosphates (Kim et al. 2020b;
Bhardwaj et al. 2006). Also, the nsp15 protein sequence is relatively conserved, with 88.73%
identity between SARS-CoV-2 and SARS-CoV.
Nsp16 is a 33.33 kDa protein that consists of 298 amino acids. It is an m7GpppA-specific, SAM-dependent 2’-O-MTase, only active with the help of cofactor nsp10 (Decroly et al. 2011;
Krafcikova et al. 2020). Nsp16 protein sequence is conserved, with 93.29% identity between
SARS-CoV-2 and SARS-CoV. SARS-CoV nsp10/nsp16 MTase activity requires the involve- ment of Mg2+ (Bouvet et al. 2010), and the diva- lent cation is present in SARS-CoV-2 nsp16, outside the active site, with an undefined function (Decroly et al. 2011).
2.5 Structural Proteins