Network analysis of PPINs - Investigation of HIV-TB co-infection through analysis of the potent

2.1 Introduction

2.1.5 Network analysis of PPINs

Network analysis is a way of studying the relationships between individual and groups of points in large datasets by representing the points as nodes and the relationships between them as edges connecting the nodes. In the case of PPIN analysis, the network is represented as an undirected graph,G(V, E), in whichV is a set of nodes (proteins) andEis a set of edges that connect the proteins (the protein or drug-target interactions). The relative importance of nodes and edges in a network can be determined through calculations of measures that describe volume, distance, and mediation (Borgatti and Everett,2006). Four frequently used centrality measures are degree, closeness, betweenness and shortest path distance. These measures will be introduced, along with a fifth, and less frequently used, measure called bridging centrality.

2.1.5.1 Degree centrality

The degree centrality measures the number or volume of nodes that a given node is

connected to. In the case of PPINs, the degree of the protein indicates how many proteins it interacts with. The formula for calculating degree is:

D(p) =

∑n i=1

u(p, p_i)for i= 1,2, ...n,

whereu(a₁, a_i)is the Kronecker Delta function, andpis a protein in the networkGandnis the total number of proteins in the network. This formula can be normalised by dividing it by the maximum possible degreen−1.

D^′(p) = 1 n−1·

∑n i=1

u(p, pi)for i= 1,2, ...n

2.1.5.2 Shortest path distance

The shortest path distance between nodes is the minimal amount of edges traversed to reach a destination node from a given node. In protein terms, it is how many proteins a source protein must interact with to interact with a target protein.

2.1.5.3 Closeness centrality

Another measure of distance is closeness, which is defined as the total distance from a given node to every other node (Freeman,1979). It is a measure of how quickly information is transmitted through a network (Valente et al.,2008). The closeness centrality is calculated as the reciprocal of the sum of the shortest paths from a given node to all other nodes, and normalised by the sum of the minimum possible distances,n−1. In this way, interpretation of the centrality measure follows the same pattern as the others - that is that the higher the closeness, the nearer the node is to other nodes in the network. The formula is as follows:

C(p) = n−1

∑_n₋₁

q=1d(q, p),

whered(p, q)is the shortest path distance between proteinspandq, andnis the number of proteins in the network.

2.1.5.4 Betweenness centrality

In order to measure a node’s importance for mediation, the betweenness centrality has been defined (Freeman,1979). Betweenness (of nodes) is the number of shortest paths between any two nodes that need to pass through the given node. As such, it can be described as a

measure of potential influence of a node on both direct and indirect pathways in a network (Valente et al.,2008). The betweenness centrality of a nodepis calculated as the sum of the fraction of all the pairs of shortest paths that pass through a proteinp:

B(p) = ∑

s,tϵG

σ(s, t|p) σ(s, t) ,

whereGis the set of all proteins,σ(s, t)is the number of shortest paths between any proteins sandt, andσ(s, t|p)is the number of shortest paths between any proteinssandtthat pass throughp, ands̸=p. This formula is normalised to fall within a range of 0 to 1 as follows:

B(p) = 2

(n−1)·(n−2)·∑

s,tϵG

σ(s, t|p) σ(s, t) , wherenis the number of proteins in the network.

2.1.5.5 Bridging centrality

High degree nodes tend to have high betweenness, because the nodes with the highest frequency tend to fall in highly connected, core areas of the network resulting in the nodes having several shortest paths passing through them (Hwang et al.,2006). Betweenness is commonly used as a measure for how important a node is for information flow between other nodes; however, as explained above, it is really a measure of global importance. To measure the local importance of a node, bridging centrality was proposed byHwang et al.(2006) to identify nodes that are important for bridging submodules of the network. Bridging nodes, unlike nodes with high values for other centrality measures, can cause network disruption without dismemberment (Hwang et al.,2008).Hwang et al.(2008) define a bridge as a node or an edge that connects modules in a graph, and the bridging centrality of a node as the product of its global importance and its bridging coefficient. The global importance of a node or edge is calculated as its betweenness, and the bridging coefficient of a nodepis the average probability of leaving the set of nodes directly connected top. The bridging

coefficient of an edge is the product of the weighted average of the bridging coefficients for the two nodes whose connection creates the bridge, and the reciprocal of the number of common direct neighbour nodes of the two nodes. In mathematical terms, the bridging coefficient is calculated as follows:

BC(p) = D(p)⁻¹

∑

iϵN(p) 1 D(i)

whereD(p)is the degree of a nodep, andN(p)is the set of the neighbours of a nodep. Simplified, the bridging centrality of a nodepis:

Br(p) =BC(p)×B(p),

where is the betweenness centrality and is the bridging coefficient of a node .

Nodes with high bridging centrality will be more important in the network and connect more densely connected modules to one another. In addition, deletion of a node with high

bridging centrality will cause similar disruption to the path length distribution as deleting a node with high betweenness. It will also result in fewer singleton nodes created as the average size of the isolated modules will be larger than betweenness (Hwang et al.,2008).

As this study is interested in the human proteins that play an important role in HIV-TB

co-infection, the bridging centrality was adjusted to calculate a new measure,“pathogenicity bridging centrality”. This will be described in the methods section.

2.1.6 Interpreting proteins and PPIs using gene ontology enrichment analysis

Dalam dokumen Investigation of HIV-TB co-infection through analysis of the potential impact of host genetic variation on host-pathogen protein interactions (Halaman 42-45)