To our knowledge this study is the first to analyse the network properties of genes within a host-pathogen protein interaction network alongside differential expression during HIV-TB co-infection. The finding that differentially expressed genes had significantly higher network importance than non-differentially expressed genes provides reassurance that network measures can be used as a proxy for biological relevance. The measure proposed to detect importance for involvement in functional interactions between the pathogens (Pathogenicity bridging centrality), was shown to be significantly higher for differentially expressed genes than non-differentially expressed genes. This shows that the measure may be useful for ranking genes based on their potential importance in HIV-TB co-infection.
Three of the bridge proteins were significantly differentially expressed during HIV-TB
co-infection as well as HIV and TB mono-infection compared with latent TB infection, namely:
NAP1L1, CDC42, and PRKCH. Both NAP1L1 and CDC42 functionally interact with HIV-1 protein Tat, while CDC42 functionally interacts with HIV-1 protein Nef. CDC42 and PRKCH both functionally interact withMtbproteins tuf and htpG. The interactions between these three bridge proteins and the pathogen proteins, as well as the variants in the three bridge proteins are depicted in Figure4.4. All three of the proteins had at least one variant with a higher frequency observed in an African population than an East Asian or European population. In addition, PRKCH and CDC42 had possibly deleterious variants.
Differentially expressed genes were used as seeds to assign priority ranking scores to other proteins in the network. The prioritised genes, as such, tended to be both differentially expressed and have high network centrality measures. The top scoring bridge proteins included EIF2AK2 (Protein Kinase R). In chapter2, we proposed that EIF2AK2 could be a potential drug target as inhibition of this protein has been shown to lead to apoptosis ofMtb infected cells and inhibition of the protein may reduce viral infectivity. EIF2AK2 expression was significantly lower during latent TB infection compared with HIV-TB co-infection (the log fold change was -1.11 which equates to just over 2 times lower).
While few MHC genes were significantly differentially expressed during co-infection, and, probably as a result, MHC proteins did not rank significantly higher on any of the priority
Huma nHIVPPI
HumanMTBPPI
Huma nMT BPPI
Huma nM…
Hu ma
nMT BPPI
HumanMT… HumanMT
…
HumanHIVPPI Huma
nMT BPPI
HumanMTBPPI Huma
nMT BPPI
HumanHIVPPI VariantOf
VariantOf VariantO
f
VariantOf
VariantOf
Varia ntOf Varia
ntOf
VariantOf
Vari an
tOf Varia
ntOf VariantOf
VariantOf MTBMT
BPPI MT
BMT B…
CDC42 nef
tuf
clpB
clpC1 pnp
kdc ilvD
NAP1L1
tat
cydD
PRKCH
htpG r…
r…
r…
r…
r…
r…
r…
r…
r…
r…
r…
r…
MtbProteins HIV-1 Proteins Human Proteins Human Variant
Variant with higher frequency in African populations
Possibly deleterious variant
Figure 4.4 Host-pathogen PPIs for bridge proteins that are DE during mono- and co-infection. The human proteins (orange circles) are bridge proteins that are DE during HIV-TB co-infection and mono- infection of the two diseases (NAP1L1, CDC42 and PRKCH). The blue and green circles represent Mtb and HIV-1 proteins respectively. The small beige and grey circles represent human variants, with the beige circles highlighting variants with a higher frequency in African populations than East Asian or European populations. Variants that are possibly deleterious based on SIFT/Polyphen scores have been flagged with a red lightning bolt. This subnetwork is available in an interactive viewer at the following link: https://www.ndexbio.org/#/network/9ddc6f74-76ff-11ec-b3be-0ac135e8bacf?
accesskey=387cb1d7a05b1bc0570be635c66b4821b433ff1737aa250e1166dedba4814ec4.
scoring measures, differentially expressed genes were shown to have significantly shorter minimum distance to MHC proteins. This corroborates the results in chapter2, and elevates the role of MHC proteins in facilitating interactions between the pathogens.
4.4.2 Variants associated with HIV and TB are found in proteins with higher network importance
There were few variants with known clinical association with HIV or TB, but the few proteins that had clinically-associated variants ranked significantly higher across the four priority measures used and had significantly higher network centrality. In addition, one third of the proteins that contained clinically-associated variants were differentially expressed during HIV-TB co-infection. This aligns with the findings byChen et al.(2008) that differentially expressed genes are more likely to have disease-associated DNA variants.
The only variant that was clinically-associated with both HIV-1 andMtbsusceptibility was located in the protein CD209. The interactions between HIV-1 andMtbwith CD209 were discussed at length in chapter2, section2.4.1.6. CD209 both enables the infection of dendritic cells withMtband enhances the infection of target cells with HIV-1. CD209 is the gene that encodes DC-SIGN, a C-type lectin known to be the major receptor ofMtbon human
dendritic cells (Barreiro et al.,2006). Within a South African cohort,Barreiro et al.(2006) found that two variants (-871G and -336A) of CD209 were associated with a lower risk of developing tuberculosis and that they may have a higher frequency in non-African populations.Martin et al.(2004) showed that one of the same variants identified byBarreiro et al.(2006) (-336C) was associated with higher susceptibility for HIV-1 infection in participants at risk of infection.
4.4.3 Potentially impactful variants in high priority proteins
Common non-synonymous variants were mapped to proteins in the interaction network and analysed for various sets of prioritised proteins in terms of their allele frequencies in African populations as well as their predicted effect. Several SNPs were identified within drug-target proteins that may impact HIV-TB co-infection. Using the network it is possible to find and visualise potentially deleterious variants in drug-target proteins of interest (refer to Figure4.5).
The treatment of drug-sensitive TB in HIV-infected individuals in South Africa typically includes a multi-drug TB regimen including both isoniazid and rifampicin, as well as an ART regimen which would usually include efavirenz as the NNRTI (Boulle et al.,2008). Using the network we could visualise that all three drugs interact with the human enzyme Cytochrome P450 3A4 (CYP3A4). Efavirenz is metabolised by CYP3A4, and low plasma concentration of efavirenz is associated with virologic failure while high plasma concentration is associated with drug toxicity (Atwine et al.,2018). Rifampicin induces CYP3A4, which enhances efavirenz metabolism and results in a lower plasma concentration (Atwine et al.,2018). This is typically offset by the inibitory effect of isoniazid on CYP3A4, which reduces efavirenz metabolism (Atwine et al.,2018). Two SNPs in CYP3A4 that were predicted to have deleterious effects and were known to have higher alternative allele frequency in African populations were visualised in the network (rs4986908 and rs57409622). rs57409622 was first identified in South African populations and was predicted to have an impact on the enzymes function due to the amino
Dru gT…
VariantOf
DrugTargetPPI Vari
antO f VariantOf
VariantOf VariantOf
VariantOf Varia
ntOf
VariantOf VariantOf
VariantO
f
VariantOf
VariantOf DrugTargetPPI Varia
ntOf
Varia ntOf DrugTargetPPI
VariantO
f
VariantOf
VariantOf VariantOf Varian
tOf
VariantOf
DrugTargetPPI
DrugTargetPPI
DrugTarg etPPI DrugTargetPPI
DrugTargetPPI
VariantOf DrugTarg
etPPI
Varia… VariantOf
VariantOf
DrugTarg etPPI
VariantOf
VariantOf
VariantOf
VariantOfVariantOf VariantOf
Dru gTarg
etPPI
DrugTar…
DrugTargetPPI
DrugTargetPPI
VariantOf
DrugTarg etPPI
VariantOf
VariaVarintanOtOff
VariantOf
DrugTargetPPI
VariantOf
VariantOf VariantOf
VariantOf
VariantO
f
Dru gTarg
etPPI
VariantOf Vari
antO f
VariantOf
Varia ntOf VariantO
f VariantOf
DrugTargetPPI
Varia ntOf Dru
gTarg etPPI
Varia ntOf Vari
antO f DrugTargetPPI
VariantOf VariantOfDrugTargetPPI
VariantOf DrugTargetPPI
VariantOf VariantOf
Varia ntOf VariantOf
VariantOf
VariantOf VariantOf Dru
gTarg etPPI
Varia ntOf DrugTargetPPI
VariantOf
Varian tOf Dru
gT…
VariantOf
DrugTargetPPI
VariantOf
DrugTargetPPI
VariantOf VariantOf
VariantOf
VariantOf pol
r…
CYP2D6
Efavirenz r…
r…
r…
r…
r…
r…
r…
r…
r…
r…
r… CYP3A4
r…
r…
CYP2B6 r…
r…
r…
r… r…
r…
inhA
Isoniazid
folA nat
katG
r…
CYP2A6
r… r…
r…
NAT2 r…
r…
r…
r…
r…
r…
rpoB
Rifampicin
rpoC
r…
ABCB11 r…
r…
r…
r…
CYP4A11
r…
r…
r…
r…
r…
ABCC1
r…
r…
r…
r…
r…
r…
SLC22A7
r…
ABCB1
r…
r…
r…
NR1I2
r…
ABCC3 r…
r…
r…
r… r…
r…
r… r…
r…
SLCO1A2
r…
r…
SLCO1B1
r…
CYP51A1
r…
SLCO1B3
r…
r… r…
MtbProteins r…
HIV-1 Proteins
Variant with higher frequency in African populations Drug Drug Target protein
Figure 4.5 Potentially deleterious variants with higher frequencies in African populations in targets of the drugs rifampicin, isoniazid and efavirenz. The network includes three drugs (dark pink circles), human targets (light pink circles), Mtb targets (blue circles), HIV-1 targets (green circles) and human genetic variants (small beige circles). Only variants that have higher alternative allele frequency in African populations and that were potentially deleterious are included in the visualisation. A yellow circle highlights the protein CYP3A4 that interacts with all three drugs and has two potentially deleterious SNPs. This subnetwork is available in an interactive viewer at the following link: https://www.ndexbio.org/#/network/2d1bccf1-76fe-11ec-b3be-0ac135e8bacf?
accesskey=cb64bc5bab6c318c41a7982ee7741039ee6cdf462a6cf5ca5e8d875c36627ed0.
acid change (Drögemöller et al.,2013). Both mutations may have functional effects on CYP3A4 and thus play a role in drug efficacy or toxicity during the treatment of TB in HIV-infected individuals.
4.4.4 Using the network to investigate the potential impact of MHC variants Here we discuss an example of how the network can be used to investigate the potential impact of one of the novel variants within the MHC gene HLA-DOB. HLA-DOB was differentially expressed during HIV-TB co-infection and was in the 90th percentile for
prioritisation. The novel variant identified was an intergenic five base pair insertion (-/ACAAC) between HLA-DOB and HLA-DQB2 at chromosome 6 position 32782641. An insertion has been found in this position before with a similar pattern (rs28987081, -/C -/CAAC
-/CAACAAC) and the variant identified in this analysis may be another alternative allele.
Using the network visualisation, both of these MHC proteins functionally interact with the HIV-1 protein Nef and the human bridge protein CTSD, which interacts with theMtbprotein PurT and the HIV-1 protein Env (refer to figure4.6). In chapter2, section2.4.1.6we discuss the
HumanPPI VariantOf
VariantOf
VariantOf VariantOf
VariantOf Varian
tOf Vari
antO f Varia
ntOf
VariantOf
HumanHIVPPI
Huma nMT
BPPI
Huma nHIVPPI VariantOf
Huma HumanPPI nPPI
Huma nHIVPPI HLA-DOB CTSD
r…
r…
r…
r…
r…
nef
purT
env HLA-DQ…
MtbProteins HIV-1 Proteins
Human Proteins – DE co-infection Human Variant. Larger = possibly deleterious
Variant with higher frequency in African populations. Larger = possibly deleterious Bridge Proteins MHC Proteins
Novel variant
Figure 4.6 Visualisation of a variant in HLA-DOB in the context of the host-pathogen interaction network. The figure shows the novel variant (small light blue circle) is an intergenic variant of the two MHC proteins (large circles with yellow borders), HLA-DOB and HLA-DQB2. The MHC proteins both interact with the bridge protein (large circle with red border) CTSD which interacts with the HIV protein (green circle) Env and Mtb protein (large blue circle) PurT. HLA-DOB and CTSD are both differentially expressed during HIV and TB co-infection (illustrated by their orange fill). Additional variants of HLA-DOB are depicted by small grey and beige circles, with beige ones representing variants with higher alternative allele frequency in African populations and larger circles representing variants that are possibly deleterious. This subnetwork is available in an interactive viewer at the following link: https://www.ndexbio.org/#/network/f33d122d-76fa-11ec-b3be-0ac135e8bacf?
accesskey=1bd036b286baebfb33cc3928a6649716e595e57b1a6d02b151b1b4c2f6749f66.
function of CTSD, which is involved in viral infectivity. Using the GWAS catalog, several
intergenic variants between HLA-DOB and HLA-DQB2 have been significantly associated with various traits, although none specifically report HIV or tuberculosis (Buniello et al.,2019).
However a variant in this intergenic region was associated with Kawasaki disease (Onouchi et al.,2012). Kawasaki disease is typically a childhood illness, however amongst adults with the disease there is a disproportionate number that are HIV-infected, which is thought to be due to the immuno-compromise brought about by HIV infection (Johnson et al.,2001). It is possible that this intergenic variant, and others in the region may also be associated with this susceptibility to Kawasaki disease amongst HIV-infected individuals.
4.4.5 Integrating different types of biological data into a graph database In this chapter, we integrated the gene expression, genetic variation and drug and protein interactions into a graph database that enabled efficient identification of interesting
subnetworks and visualisations across different data types. The user friendly Neo4j browser andCypherquery language enable the data to be animated without the need for front-end development. To filter out as much noise as possible, only high confidence interactions and common variants that confer protein changes or variants with known clinical association with HIV or TB were included. While this has the advantage of reducing noise, it also raises a limitation due to study biases. Often interactions are high confidence, and variants well annotated with allele frequency by virtue of them being found in highly studied genes. Only including common missense coding variants is another potential limitation, as variants in non-coding regions can also be clinically relevant. Future work could include a wider range of variants. In addition, only one gene expression study was included in the analysis as it was the only one at the time that compared HIV-TB co-infection to uninfected controls. Triangulating gene expression data from multiple studies with larger sample sizes would be useful and may assist with more accurate prioritisation of proteins within the network.
However, the data included in the current database provide a comprehensive picture of various biological elements involved in HIV-TB co-infection. The annotations attached to each node and relationship allow rapid, in depth analyses and visualisations of subnetworks of interest.
4.5 Conclusion
This chapter aimed to integrate gene expression, genetic variation, and host-pathogen PPIs to facilitate the analysis of the potential impact of variation on HIV-TB co-infection. We identified that genes that are differentially expressed in HIV-TB co-infected individuals have higher network importance in the functional host-pathogen PPIN than genes that are not differentially expressed. Similarly, we showed that variants with known clinical association with HIV and TB map to proteins that have higher network importance. Finally, we identified missense variants within prioritised genes in the network that may warrant further investigation of their impact on HIV-TB co-infection. This analysis has produced a graph database that integrates different kinds of biological information, which provides three useful features, namely: (1) it enables rapid and high throughput prioritisation of sets of genes or variants, (2) it facilitates detailed investigations of smaller sets of genes or variants, and (3) it allows network-based visualisation of the relationships between various biological elements. . All subnetworks in the figures presented, as well as the complete network are browsable as interactive networks and downloadable from NDEx at the following link:
https://www.ndexbio.org/#/networkset/8d0f15ef-d936-11ec-b397-0ac135e8bacf?accesskey=
d53b38beb93e7be80f374d3d8cebc626dcd5680a6767521ac9bac80e92c20e7a. To extend this resource beyond HIV-TB co-infection, future work may go into providing a regularly updated database with this information alongside other host-pathogen interactions, and gene
expression in different disease states.
5. Conclusion
This study focused on two often co-infecting pathogens, HIV andMtb, which continue to evolve to evade host responses and treatment strategies. The interaction between the
pathogens and their human host is further complicated by host genetic variation, the study of which is under-represented amongst African individuals. To contribute to the study of genetic variation in African populations, we used a reference free method to identify genetic variation in the highly diverse MHC region, which contains several genes involved in immune response.
Additionally, to contribute to the study of HIV-TB co-infection, we aimed to generate a host-pathogen interaction network that integrated the genetic variation with functional protein interactions, gene expression data and drug interactions.
5.1 Contributions
The under-representation of African sequences in genome analysis is widely acknowledged.
Despite containing the most genetic diversity, less than two percent of all human genomes analysed so far have originated in Africa (Wonkam,2021). Analysing sequences from 33 African individuals spanning the continent, we identified novel variants in the MHC region using a reference-free graph-based method for variant identification, which to our knowledge has not been used at this scale before. The novel variants that we identified provide a
contribution to the knowledge of variation in this region, particularly in African sequences, which are subject to reference mapping bias when using traditional methods of variant identification (Dilthey et al.,2015). Furthermore, in our analysis we developed a pipeline for sequence formatting, variant identification and variant annotation using existing tools that have not been used together in this manner previously. This pipeline could be used and improved by others.
We have generated, to our knowledge, the first host-pathogen interaction network that displays inter- and intraspecies interactions between HIV,Mtband Human proteins, along with drug-target interactions. Using the network we identified 28 Human ”bridge” proteins that functionally interact with both HIV andMtbproteins. One of the bridge proteins, Protein Kinase R (EIF2AK2), was significantly differentially expressed during HIV-TB co-infection and had a high prioritisation score. The existing research regarding the function of this protein in HIV and TB mono-infection, coupled with it’s prioritisation in the network made it stand out for further investigation. More apoptosis has been observed inMtbinfected cells in EIF2AK2 knockouts in mice (Wu et al.,2012b). Further more, during HIV-1 infection, HIV-1 protein Tat usually binds to EIF2AK2 increasing TNF-αproduction which in turn leads to increased viral replication andMtbgrowthImperiali et al.(2001);Li et al.(2005). We propose that if an EIF2AK2 inhibitor were a viable treatment method, it may reduce bothMtbpresence and HIV-1 viral load.
In addition to identifying individual proteins of interest, the network enables the identification of pathways of interactions between the two pathogens via interactions with their human