Investigation of HIV-TB co-infection through analysis of the potential impact of host genetic variation on host-pathogen protein interactions

We used a graph-based approach to identify novel variants in the MHC region within African null samples. Nicky, thanks especially for your patience during the many months (and even years) of me not working on my thesis and somehow managing to keep up with my updates.

The human immune response to pathogenic infection

Key features of the human immune system
How the ”world’s most successful pathogen” - Mtb - evades the human
Human immune response to HIV infection and the strategies for escaping
Effects of HIV and Mtb on the immunity to each other
Treating individuals co-infected with HIV and TB

Other genes involved in RNI resistance in Mtbin include methionine sulfoxide reductase (msrA), nox1, andnox2 (Gupta et al., 2012). Early infection of T cells in the mucosa enables rapid viral expansion (due to the high replication rate of T cells) and transfer to dendritic cells (Gupta et al., 2002).

Human genetic variation and susceptibility to HIV and TB

Variants contributing to TB susceptibility
HLA variants contributing to HIV susceptibility and disease progression . 29
Biases in read alignment and variant classification
From a linear reference towards a reference graph

Dilthey et al.(2015) describe an approach to represent variation within a reference, known as a population reference graph (PRG), and tested the approach on the MHC region. A similar method to the PRG has been implemented to identify variation in the malaria parasite Plasmodium falciparum (Miles et al., 2015).

Human-pathogen protein interaction networks

Protein interactions and interaction networks

In addition, PPIs can be predicted using silico methods, such as sequence-based approaches, gene expression-based approaches, and structure-based approaches (Rao et al., 2014). PPINs are constructed as network graphs and as such concepts and algorithms from graph theory can be used to identify subnetworks of key proteins and interactions within the full network (Mulder et al., 2014).

Host-pathogen interactions

Interactions between humans and HIV protein are much better documented thanks to the HIV-1 Human Interaction Database, but even interactions in this database may need to be filtered to avoid false positive results (MacPherson et al., 2010), which have been reported in more articles shall be discussed. detail in chapter 2. In such networks, few proteins are essential for the survival of the system, so proteins with a high centrality measure can be considered as potential drug targets (Mulder et al., 2014).

Aims

Which human proteins are most likely to facilitate interactions between HIV-1 and Mtbproteins and what are the roles of these human proteins. Do human proteins likely to facilitate interactions between HIV-1 and Mtb proteins have variants in their respective gene sequences that could influence disease progression.

Thesis structure

Introduction

Human-human PPIs
HIV-1-human PPIs
Mtb-human PPIs
Drug-target interactions
Network analysis of PPINs
Interpreting proteins and PPIs using gene ontology enrichment analysis . 44

The network built by Huo et al. (2015) contains predicted interactions between Mtblab strain H37RV and human proteins. The interactions predicted by Rapanoel et al. (2013) were predicted for the Mtb clinical isolate CDC1551 (Oshkosh), by means of interologists.

Methods

Human-human PPIN dataset
Mtb-Mtb PPIN dataset
Mtb-human PPIN dataset
HIV-human PPIN dataset
Drug-target interactions
Human-pathogen PPIN dataset
Network analysis
Protein annotation
Statistical methods for comparing network properties
GO term enrichment using DAVID
GO semantic similarity comparisons of interacting pairs of human and

The interactions of Reactome in the above network were updated with the latest version of the database (version 46). The residual pathogenicity bridge centrality was calculated using the degree function of NetworkXin aPythonscript and multiplied by betweenness_centrality_subset.

Table 2.1 Scoring of the protein interaction data sources used by Bossi and Lehner (2009)

Results

Comparison of the network properties of MHC proteins with other

Instead of looking at the importance of MHC proteins versus non-MHC proteins in the network, the distance from MHC proteins to proteins found to be important for pathogen bridging was calculated. This suggests that although MHC proteins are not necessarily more important for pathogen association in the high-confidence network, proteins important for pathogen association are closer to MHC proteins.

Table 2.4 Wilcoxon rank-sum test comparing network properties of MHC and non-MHC proteins.

Network properties of drug target proteins

Interactions with Mtb and HIV-1 proteins are shown by blue and green lines from the human proteins, respectively, while interactions between the 28 binding proteins are colored red. Interactions with Mtb proteins and HIV-1 proteins are shown by blue and green lines from human proteins, respectively, while interactions between the 28 binding proteins are colored red.

Table 2.6 Wilcoxon rank-sum test comparing network properties of proteins targeted by anti-HIV or anti-TB medication to non-drug targets.

The distance between human proteins important in the PPIN and existing

Discussion

Drug interactions

This is consistent with the idea that higher network centrality implies greater biological relevance and can be used to narrow down potential drug targets (Mulder et al., 2014). The only protein in PPIN known to interact with an anti-TB drug and an anti-HIV drug was human serum albumin (HSA), which interacts with the first-line TB drug rifampicin and the HIV protease inhibitor saquinavir, as well as four of the 28 bridging proteins, namely FN1, CTSD, NFκB1 and GAPDH. HSA is the most prominent protein in plasma and is known for its excellent drug binding capacity (Bocedi et al., 2004).

Studies have shown that rifampicin reduces the plasma concentration of saquinavir and ritonavir (which is taken with saquinavir) in HIV and TB co-infected patients, resulting in an increased risk of virologic failure (Ribera et al., 2007).

Figure 2.13 How inhibiting human Protein Kinase R may improve the host’s ability to control HIV and TB during co-infection.

MHC proteins interact with proteins important for pathogen bridging

Conclusion

de Bruijn graphs and sequence graphs
The linked de Bruijn graph
Multi-coloured linked de Bruijn graphs
Aims and hypotheses

Sequence graphs are an extension of de novoassembly methods, which use the de Bruijn graph approach to assemble next-generation sequencing reads (Turner et al., 2017). Most genome assemblers use de Bruijn graphs for assembly of reads and variant identification relative to a reference genome (Li et al., 2011). Turner et al.(2017) have extended the Cortexassembler developed by Iqbal et al.(2012) to use the linked de Bruijn graph method in their program McCortex, which can be used for.

This figure was adapted from Ayling et al. (2019) to illustrate the difference between Overlap Consensus Layout (OLC) and de Bruijn graph layout.

Figure 3.1 Overlap Consensus Layout vs. de Bruijn Graph sequence assembly.

Methods

Whole genome sequence and variant file collection
Read extraction and preparation
PCA of chromosome 6 variants
De novo assembly into a coloured multi-sample linked de Bruijn graph . 97

This figure is adapted from Iqbal et al. (2012). a) When comparing a diploid sample (blue line) with a reference (red line), you can detect heterozygous alleles where both sides of the bubble have blue lines. At that time, high-coverage whole-genome sequences for Panel C of the SGDP dataset, which included an additional 45 sequences from African individuals, were not available (Prüfer et al., 2014). For each of the 33 individuals, chromosome 6 plots were constructed separately using the build command in McCortex.

Finally, variants were compared to dbSNP based on variant position (ignoring nucleotide variation).

Figure 3.4 Geographical distribution of the individuals whose genomic data were included

Results

Chromosome 6 sequence graph construction

Identifying variation from the graph

Variants called from the plot were annotated using ANNOVAR and compared with variants originally called from the reference-based assembly. Almost twice as many variants were identified from the plot than were called from the reference assembly. Although more variants were identified in the plot, only 46.23% of the variants recalled from the reference assembly were re-identified in the plot.

In contrast, 2.5 times more variants were identified in the MHC region; however, less than a quarter of all MHC variants identified in the plot map to reference chromosome 6 (see Table 3.3).

Figure 3.6 Distance matrix illustrating the proportion of k -mers shared between the 33 sequences The individuals are labelled at the top and the left of the matrix, indicating the population, the individual’s sequence identifier, and the sequencing projec

Annotation of novel variants

-associated protein KIAA0319 is involved in neuronal migration during the development of the cerebral neocortex, and it functionally interacts with human proteins in the functional host-pathogen PPIN. In the functional host pathogen PPIN, it is a high quality protein that functionally interacts with human proteins. Proteasome assembly chaperone 4 is a chaperone protein that promotes the assembly of the 20S proteasome and functionally interacts with human proteins in the functional host pathogen PPIN.

Tudor domain-containing protein 6 is involved in spermiogenesis, chromatoid body formation, and proper expression of precursor and mature miRNAs and functionally interacts with human proteins in host-pathogen functional PPIN.

Table 3.4 Annotation of the novel variants Variant type Total ”novel” Any

Discussion

Variant identification from a de novo assembled graph

Research has shown that agreement between standard reference-based variant calling methods is often low. Thus, it is not surprising that there was a low intersection between the graph-based variant calls and reference-based variant calls. Had we made the reference-based variant called ourselves, we could have included the alternative loci as well.

Instead, the graph-based method identified more than twice as many variants as the reference-based method.

The feasibility of reference-free sequence graphs

We expected fewer variants to be called from the plot than against the reference because of the ancestral similarity of the samples and their diversity from the reference. Because most of the identified variants could be identified in one of the genetic variation databases, it is possible that part of the increase can be explained by changes in the original method of reference-based variant calling. A large proportion of decontaminated unmapped reads were included in the graph (68%), indicating that although unmapped reads are often discarded, they may contain biologically relevant information.

More work needs to be done to determine the locations of the unmapped reads that are included in the plot.

Recommendations for future reference graphs

Methods included unmapped reads as well as decontamination steps, similar to Sherman et al. (2019). In our case, the 68% of unmapped reads were filtered to include k-mers that matched chromosome 6 reads, but the overall percentage may still be an overestimate. It is possible that the unmapped reads map to multiple locations on the genome and were unmapped because of that.

For example, instead of coloring the sequences by individual, we could have used additional colors for unmapped reads so that they could be more easily distinguished.

Conclusion

Graph databases for biological data integration
Prioritising disease-associated genes by integrating gene expression and
Analysing the impact of variants by integrating genetic variation, gene
Aim and hypotheses

Incorporate known variation that has been called against the existing linear reference (Rakocevic et al., 2019). Furthermore, analyzing variants with gene expression data has been shown to be advantageous, as highly differentially expressed genes have been shown to be more likely to have variants associated with disease (Chen et al., 2008). Gene expression data from a case-control study of a cohort of HIV-TB co-infected and latent TB-infected African adults from Malawi and South Africa (Kaforou et al., 2013) were used to identify significantly differentially expressed genes that could be used as seeds to.

Genes that are highly differentially expressed have been found to be more likely to have disease-associated variants (Chen et al., 2008).

Methods

Analysis of gene expression data
Prioritisation of disease-associated genes by integrating gene expression
Variant data analysis
Integrating the data using a graph database

Each message is scored as the score of the node that sent the message plus the road weight. In this algorithm, the node score is the average of the normalized scores for each of the other three methods. In addition, nodes were annotated with the priority scores from each of the algorithms used.

Edges were labeled with the results of the interaction and any details about the source of the interaction.

Table 4.1 Demographics of individuals enrolled in the gene expression analysis by Kaforou et al

Results

Network importance of differentially expressed genes
Gene prioritisation based on differential expression
Known variants associated with HIV and Mtb
Variants in prioritised proteins
Properties of MHC proteins with novel variants identified from the

This tabular plot shows the log-fold change in expression of the 28 bridge proteins under three disease states compared to latent TB infection (HIV-TB co-infection, HIV infection and TB infection). Of the 145 MHC proteins in the network, 125 were assessed for differential expression, of which 46 were found to be significantly differentially expressed during co-infection. In addition, 22 888 of the potentially harmful or deleterious variants (37.37%) had a significantly higher frequency in an African population.

Of the deleterious or deleterious variants within bridge proteins were common in African populations and had significantly higher frequency in an African population.

Table 4.2 Wilcoxon rank-sum test comparing network properties of proteins mapped to genes differentially expressed during HIV-TB co-infection compared with latent TB infection.

Discussion

Differentially expressed genes have higher network importance

The human proteins (orange circles) are bridge proteins that are DE during co-infection with HIV and tuberculosis and monoinfection of the two diseases (NAP1L1, CDC42 and PRKCH). In addition, one third of the proteins containing clinically associated variants were differentially expressed during co-infection with HIV and tuberculosis. One of the bridge proteins, Protein Kinase R (EIF2AK2), was significantly differentially expressed during co-infection with HIV and tuberculosis and had a high priority score.

Subcellular localization of the interaction between the human immunodeficiency virus transactivator Tat and nucleosome assembly protein 1. Induction of apoptosis by binding of the carboxyl terminus of human immunodeficiency virus type 1 gp160 to calmodulin. The myristoyl moiety of HIV Nef is involved in regulating the interaction with calmodulin in vivo.

Figure 4.4 Host-pathogen PPIs for bridge proteins that are DE during mono- and co-infection

The usefulness of retaining link information when traversing de Bruijn graphs

Analysing variants from bubbles in de Bruijn graphs

Geographical distribution of the individuals whose genomic data were included 93

Distance matrix illustrating the proportion of k -mers shared between the 33

PCA of chromosome 6 variants genotyped in 32 African individuals

Comparison of the PCA distribution and geographical location

Venn diagram of how the novel variants were annotated

Log fold change in gene expression levels of the bridge proteins during HIV-TB

Visualisation of PPIs, drug-target interactions and variants

Visualisation of MHC proteins, with their variants and interactions

HLA-DOB was differentially expressed during HIV-TB co-infection and was in the 90th percentile for This chapter aimed to integrate gene expression, genetic variation, and host-pathogen PPIs to facilitate analysis of the potential impact of variation on HIV-TB co-infection. Genetic restriction of HIV-1 infection and progression to AIDS by a deletion allele of the CKR5 structural gene.

Ezrin is part of the HIV-1 virological presynapse and contributes to the inhibition of cell-cell fusion. HIV-1 Tat regulates endothelial cell cycle progression through activation of the Ras/ERK MAPK signaling pathway. Initiation of the adaptive immune response to Mycobacterium tuberculosis depends on antigen production in the local lymph nodes, not the lungs.

Host-pathogen PPIs for bridge proteins that are DE during mono- and co-infection.142

Visualisation of a variant in HLA-DOB in the context of the host-pathogen

Spearman’s rank correlation between scores of gene prioritisation algorithms

Prioritisation scores and ranking of the bridge proteins

Top ten ranking proteins across each of the four prioritisation algorithms

Wilcoxon rank-sum test comparing prioritisation measures for MHC and non-

Wilcoxon rank-sum test comparing network properties of proteins mapped to

Description of variants within prioritised proteins

Novel MHC variants that were mapped to the PPIN

Geographical distribution, sex, and sequence coverage of the individuals whose

Anti-TB drug-target interactions included in the analysis

Anti-HIV drug-target interactions included in the analysis