We used a graph-based approach to identify novel variants in the MHC region within African null samples. Nicky, thanks especially for your patience during the many months (and even years) of me not working on my thesis and somehow managing to keep up with my updates.
The human immune response to pathogenic infection
- Key features of the human immune system
- How the ”world’s most successful pathogen” - Mtb - evades the human
- Human immune response to HIV infection and the strategies for escaping
- Effects of HIV and Mtb on the immunity to each other
- Treating individuals co-infected with HIV and TB
Other genes involved in RNI resistance in Mtbin include methionine sulfoxide reductase (msrA), nox1, andnox2 (Gupta et al., 2012). Early infection of T cells in the mucosa enables rapid viral expansion (due to the high replication rate of T cells) and transfer to dendritic cells (Gupta et al., 2002).
Human genetic variation and susceptibility to HIV and TB
- Variants contributing to TB susceptibility
- HLA variants contributing to HIV susceptibility and disease progression . 29
- Biases in read alignment and variant classification
- From a linear reference towards a reference graph
Dilthey et al.(2015) describe an approach to represent variation within a reference, known as a population reference graph (PRG), and tested the approach on the MHC region. A similar method to the PRG has been implemented to identify variation in the malaria parasite Plasmodium falciparum (Miles et al., 2015).
Human-pathogen protein interaction networks
Protein interactions and interaction networks
In addition, PPIs can be predicted using silico methods, such as sequence-based approaches, gene expression-based approaches, and structure-based approaches (Rao et al., 2014). PPINs are constructed as network graphs and as such concepts and algorithms from graph theory can be used to identify subnetworks of key proteins and interactions within the full network (Mulder et al., 2014).
Host-pathogen interactions
Interactions between humans and HIV protein are much better documented thanks to the HIV-1 Human Interaction Database, but even interactions in this database may need to be filtered to avoid false positive results (MacPherson et al., 2010), which have been reported in more articles shall be discussed. detail in chapter 2. In such networks, few proteins are essential for the survival of the system, so proteins with a high centrality measure can be considered as potential drug targets (Mulder et al., 2014).
Aims
Which human proteins are most likely to facilitate interactions between HIV-1 and Mtbproteins and what are the roles of these human proteins. Do human proteins likely to facilitate interactions between HIV-1 and Mtb proteins have variants in their respective gene sequences that could influence disease progression.
Thesis structure
Introduction
- Human-human PPIs
- HIV-1-human PPIs
- Mtb-human PPIs
- Drug-target interactions
- Network analysis of PPINs
- Interpreting proteins and PPIs using gene ontology enrichment analysis . 44
The network built by Huo et al. (2015) contains predicted interactions between Mtblab strain H37RV and human proteins. The interactions predicted by Rapanoel et al. (2013) were predicted for the Mtb clinical isolate CDC1551 (Oshkosh), by means of interologists.
Methods
- Human-human PPIN dataset
- Mtb-Mtb PPIN dataset
- Mtb-human PPIN dataset
- HIV-human PPIN dataset
- Drug-target interactions
- Human-pathogen PPIN dataset
- Network analysis
- Protein annotation
- Statistical methods for comparing network properties
- GO term enrichment using DAVID
- GO semantic similarity comparisons of interacting pairs of human and
The interactions of Reactome in the above network were updated with the latest version of the database (version 46). The residual pathogenicity bridge centrality was calculated using the degree function of NetworkXin aPythonscript and multiplied by betweenness_centrality_subset.
Results
Comparison of the network properties of MHC proteins with other
Instead of looking at the importance of MHC proteins versus non-MHC proteins in the network, the distance from MHC proteins to proteins found to be important for pathogen bridging was calculated. This suggests that although MHC proteins are not necessarily more important for pathogen association in the high-confidence network, proteins important for pathogen association are closer to MHC proteins.
Network properties of drug target proteins
Interactions with Mtb and HIV-1 proteins are shown by blue and green lines from the human proteins, respectively, while interactions between the 28 binding proteins are colored red. Interactions with Mtb proteins and HIV-1 proteins are shown by blue and green lines from human proteins, respectively, while interactions between the 28 binding proteins are colored red.
The distance between human proteins important in the PPIN and existing
Discussion
Drug interactions
This is consistent with the idea that higher network centrality implies greater biological relevance and can be used to narrow down potential drug targets (Mulder et al., 2014). The only protein in PPIN known to interact with an anti-TB drug and an anti-HIV drug was human serum albumin (HSA), which interacts with the first-line TB drug rifampicin and the HIV protease inhibitor saquinavir, as well as four of the 28 bridging proteins, namely FN1, CTSD, NFκB1 and GAPDH. HSA is the most prominent protein in plasma and is known for its excellent drug binding capacity (Bocedi et al., 2004).
Studies have shown that rifampicin reduces the plasma concentration of saquinavir and ritonavir (which is taken with saquinavir) in HIV and TB co-infected patients, resulting in an increased risk of virologic failure (Ribera et al., 2007).
MHC proteins interact with proteins important for pathogen bridging
Conclusion
- de Bruijn graphs and sequence graphs
- The linked de Bruijn graph
- Multi-coloured linked de Bruijn graphs
- Aims and hypotheses
Sequence graphs are an extension of de novoassembly methods, which use the de Bruijn graph approach to assemble next-generation sequencing reads (Turner et al., 2017). Most genome assemblers use de Bruijn graphs for assembly of reads and variant identification relative to a reference genome (Li et al., 2011). Turner et al.(2017) have extended the Cortexassembler developed by Iqbal et al.(2012) to use the linked de Bruijn graph method in their program McCortex, which can be used for.
This figure was adapted from Ayling et al. (2019) to illustrate the difference between Overlap Consensus Layout (OLC) and de Bruijn graph layout.
Methods
- Whole genome sequence and variant file collection
- Read extraction and preparation
- PCA of chromosome 6 variants
- De novo assembly into a coloured multi-sample linked de Bruijn graph . 97
This figure is adapted from Iqbal et al. (2012). a) When comparing a diploid sample (blue line) with a reference (red line), you can detect heterozygous alleles where both sides of the bubble have blue lines. At that time, high-coverage whole-genome sequences for Panel C of the SGDP dataset, which included an additional 45 sequences from African individuals, were not available (Prüfer et al., 2014). For each of the 33 individuals, chromosome 6 plots were constructed separately using the build command in McCortex.
Finally, variants were compared to dbSNP based on variant position (ignoring nucleotide variation).
Results
Chromosome 6 sequence graph construction
Identifying variation from the graph
Variants called from the plot were annotated using ANNOVAR and compared with variants originally called from the reference-based assembly. Almost twice as many variants were identified from the plot than were called from the reference assembly. Although more variants were identified in the plot, only 46.23% of the variants recalled from the reference assembly were re-identified in the plot.
In contrast, 2.5 times more variants were identified in the MHC region; however, less than a quarter of all MHC variants identified in the plot map to reference chromosome 6 (see Table 3.3).
Annotation of novel variants
-associated protein KIAA0319 is involved in neuronal migration during the development of the cerebral neocortex, and it functionally interacts with human proteins in the functional host-pathogen PPIN. In the functional host pathogen PPIN, it is a high quality protein that functionally interacts with human proteins. Proteasome assembly chaperone 4 is a chaperone protein that promotes the assembly of the 20S proteasome and functionally interacts with human proteins in the functional host pathogen PPIN.
Tudor domain-containing protein 6 is involved in spermiogenesis, chromatoid body formation, and proper expression of precursor and mature miRNAs and functionally interacts with human proteins in host-pathogen functional PPIN.
Discussion
Variant identification from a de novo assembled graph
Research has shown that agreement between standard reference-based variant calling methods is often low. Thus, it is not surprising that there was a low intersection between the graph-based variant calls and reference-based variant calls. Had we made the reference-based variant called ourselves, we could have included the alternative loci as well.
Instead, the graph-based method identified more than twice as many variants as the reference-based method.
The feasibility of reference-free sequence graphs
We expected fewer variants to be called from the plot than against the reference because of the ancestral similarity of the samples and their diversity from the reference. Because most of the identified variants could be identified in one of the genetic variation databases, it is possible that part of the increase can be explained by changes in the original method of reference-based variant calling. A large proportion of decontaminated unmapped reads were included in the graph (68%), indicating that although unmapped reads are often discarded, they may contain biologically relevant information.
More work needs to be done to determine the locations of the unmapped reads that are included in the plot.
Recommendations for future reference graphs
Methods included unmapped reads as well as decontamination steps, similar to Sherman et al. (2019). In our case, the 68% of unmapped reads were filtered to include k-mers that matched chromosome 6 reads, but the overall percentage may still be an overestimate. It is possible that the unmapped reads map to multiple locations on the genome and were unmapped because of that.
For example, instead of coloring the sequences by individual, we could have used additional colors for unmapped reads so that they could be more easily distinguished.
Conclusion
- Graph databases for biological data integration
- Prioritising disease-associated genes by integrating gene expression and
- Analysing the impact of variants by integrating genetic variation, gene
- Aim and hypotheses
Incorporate known variation that has been called against the existing linear reference (Rakocevic et al., 2019). Furthermore, analyzing variants with gene expression data has been shown to be advantageous, as highly differentially expressed genes have been shown to be more likely to have variants associated with disease (Chen et al., 2008). Gene expression data from a case-control study of a cohort of HIV-TB co-infected and latent TB-infected African adults from Malawi and South Africa (Kaforou et al., 2013) were used to identify significantly differentially expressed genes that could be used as seeds to.
Genes that are highly differentially expressed have been found to be more likely to have disease-associated variants (Chen et al., 2008).
Methods
- Analysis of gene expression data
- Prioritisation of disease-associated genes by integrating gene expression
- Variant data analysis
- Integrating the data using a graph database
Each message is scored as the score of the node that sent the message plus the road weight. In this algorithm, the node score is the average of the normalized scores for each of the other three methods. In addition, nodes were annotated with the priority scores from each of the algorithms used.
Edges were labeled with the results of the interaction and any details about the source of the interaction.
Results
- Network importance of differentially expressed genes
- Gene prioritisation based on differential expression
- Known variants associated with HIV and Mtb
- Variants in prioritised proteins
- Properties of MHC proteins with novel variants identified from the
This tabular plot shows the log-fold change in expression of the 28 bridge proteins under three disease states compared to latent TB infection (HIV-TB co-infection, HIV infection and TB infection). Of the 145 MHC proteins in the network, 125 were assessed for differential expression, of which 46 were found to be significantly differentially expressed during co-infection. In addition, 22 888 of the potentially harmful or deleterious variants (37.37%) had a significantly higher frequency in an African population.
Of the deleterious or deleterious variants within bridge proteins were common in African populations and had significantly higher frequency in an African population.
Discussion
Differentially expressed genes have higher network importance
The human proteins (orange circles) are bridge proteins that are DE during co-infection with HIV and tuberculosis and monoinfection of the two diseases (NAP1L1, CDC42 and PRKCH). In addition, one third of the proteins containing clinically associated variants were differentially expressed during co-infection with HIV and tuberculosis. One of the bridge proteins, Protein Kinase R (EIF2AK2), was significantly differentially expressed during co-infection with HIV and tuberculosis and had a high priority score.
Subcellular localization of the interaction between the human immunodeficiency virus transactivator Tat and nucleosome assembly protein 1. Induction of apoptosis by binding of the carboxyl terminus of human immunodeficiency virus type 1 gp160 to calmodulin. The myristoyl moiety of HIV Nef is involved in regulating the interaction with calmodulin in vivo.
The usefulness of retaining link information when traversing de Bruijn graphs
Analysing variants from bubbles in de Bruijn graphs
Geographical distribution of the individuals whose genomic data were included 93
Distance matrix illustrating the proportion of k -mers shared between the 33
PCA of chromosome 6 variants genotyped in 32 African individuals
Comparison of the PCA distribution and geographical location
Venn diagram of how the novel variants were annotated
Log fold change in gene expression levels of the bridge proteins during HIV-TB
Visualisation of PPIs, drug-target interactions and variants
Visualisation of MHC proteins, with their variants and interactions
HLA-DOB was differentially expressed during HIV-TB co-infection and was in the 90th percentile for This chapter aimed to integrate gene expression, genetic variation, and host-pathogen PPIs to facilitate analysis of the potential impact of variation on HIV-TB co-infection. Genetic restriction of HIV-1 infection and progression to AIDS by a deletion allele of the CKR5 structural gene.
Ezrin is part of the HIV-1 virological presynapse and contributes to the inhibition of cell-cell fusion. HIV-1 Tat regulates endothelial cell cycle progression through activation of the Ras/ERK MAPK signaling pathway. Initiation of the adaptive immune response to Mycobacterium tuberculosis depends on antigen production in the local lymph nodes, not the lungs.
Host-pathogen PPIs for bridge proteins that are DE during mono- and co-infection.142
Visualisation of a variant in HLA-DOB in the context of the host-pathogen
Spearman’s rank correlation between scores of gene prioritisation algorithms
Prioritisation scores and ranking of the bridge proteins
Top ten ranking proteins across each of the four prioritisation algorithms
Wilcoxon rank-sum test comparing prioritisation measures for MHC and non-
Wilcoxon rank-sum test comparing network properties of proteins mapped to
Description of variants within prioritised proteins
Novel MHC variants that were mapped to the PPIN
Geographical distribution, sex, and sequence coverage of the individuals whose
Anti-TB drug-target interactions included in the analysis
Anti-HIV drug-target interactions included in the analysis
Read length and basic statistics for the sample sequences
Processing times for the multi-coloured de Bruijn graph
List of properties stored against nodes and relationships in the Neo4j graph
MHC proteins and their expression values and prioritisation scores
Description of interactions with and variants found in MHC proteins that have