How to Answer - Thesis submitted in partial fulfilment of the requirements for the degree of

Q. Given data collected from healthy individuals and those suffering from a particular disease, can we find out some novel features that can differentiate between two classes of individuals?

Q. If so many of us are infected, why are not all of us sick? Can we identify a set of features that can distinguish individuals more susceptible to a particular disease than the rest?

understanding why different cellular systems behave differently under the same condition and why the same cell behaves differently under different conditions. The leading theory of how signals flow through a cell can be summarised as follows: The cellular system has multiple sensors or receptors; each assigned to sense a particular set of stimuli (Alon, 2006, Chapter 2). The receptors assigned to sense external stimuli are gener- ally found on the cell membrane. On the other hand, the internal-stimuli receptors are located inside the cell. Given a stimulus, the corresponding receptors sense the input signal and compare it with a reference signal. Depending on whether the input varies significantly from the reference with respect to some threshold, the receptors initiate a chain of events, known as the Signal transduction pathway of the given stimulus.

Typically, a signal transduction pathway involves a series of protein-protein interactions. It begins with the activation of a protein molecule specific to the receptor. This protein molecule, in turn, travels and activates another intermediate protein molecule.

The process continues until the target protein molecule, known as the Transcription Factor (TF), is activated. Once activated, the TF reaches to the DNA and activates the target gene. The activated gene produces a particular type of molecule called mes- senger RNA (mRNA) through a process named as transcription. Number of copies of mRNA transcribed from a particular gene per unit volume of the cellular environ- ment at a specific time point is referred to as thegene expressionof that gene at that time point. Transcribed mRNAs travel to ribosomes and get converted to corresponding Amino Acid chains; this conversion is called translation. Each amino acid chain folds into a 3D structure, forming a protein. These newly produced proteins again cause a cascade of protein-protein interactions to finally generate the cellular response against the aforementioned stimulus.

From the previous paragraph, it is perceived that the response of the cellular system emerges from a mechanistic interplay between different inter-dependent components of the system. Hence, if we can decipher the inter-dependencies between them, we would be able to understand and predict how the system would behave under a particular condition. Therefore, the question remains, how to decipher the inter-dependencies between the components of a given cellular system?

2.2.1.1 Deciphering Dependencies: Traditional Experimentation-only Ap- proach

The traditional approach is to perturb a system variable of interest and study how changes in its values affect the values of other variables of interest (Markowetz and Spang, 2007). Typical perturbation strategies are to completely (knockout) or partially (knockdown) block the component corresponding to the system variable. For example, given a set of genes, the aim is to reverse engineer how they are dependent on each other.

In that case, each gene can be perturbed by experimentally removing it from the DNA (knocked out) or reducing the number of mRNAs transcribed from it (knocked down);

and then observing how it affects the expressions of the rest of the genes. In addition, multiple genes can be perturbed simultaneously to study their combinatorial effect on the rest of the genes.

The limitation of this approach is that the number of perturbation experiments grows exponentially with the number of system variables. Hence, when the number of system variables of interest is very large, the effort-time-cost required for the experimentation becomes prohibitive. For example, there are around 25,000 genes in human DNA.

Therefore, inferring their interdependencies through the experimentation-only approach is infeasible. In such a case, we need complementary approaches that can significantly

reduce the burden of experimentations.

2.2.1.2 Deciphering Dependencies: The Computational Systems Biology (CSB) Approach

The CSB approach is a complementary approach for reducing the burden of experimentations. This approach requires data collected by simultaneously measuring the system variables. Then, a computational model is constructed from the observed data.

This model indicates the potential inter-dependencies among the variables. Therefore, perturbation experiments are performed only to verify potential inter-dependencies.

Graphical Models or Network Models In the CSB approach, a specific type of computational models, known as graphical or network models, is found to be very conve- nient for visualising inter-dependencies among a large number of variables (Markowetz and Spang, 2007; Raval and Ray, 2016). In a network model, the variables are represented as nodes and their dependency relationships are represented as edges. Absence of an edge between a pair of variables signifies their mutual independence. On the other hand, the presence of an edge implies that they are not mutually independent. The edge weight, if any, represents a quantitative measure of their dependence. Reverse engineer- ing such a network from an input dataset is known as theNetwork Reconstruction task. The task can be formally defined as follows:

• The input is adata matrix of dimensions (V ×N); it containsN measurements for each of the V variables of interest.

• The output is anetwork adjacency matrixof dimensions (V ×V). The (i, j)^th element in the matrix represents the dependency relationship between thei^th and the j^th variables. It can be noted that the data structure used to physically store the output network may not necessarily be an adjacency matrix. Depending upon the properties of the network and the desired operations, an efficient data structure can be chosen.

The computational algorithms designed to accomplish the aforementioned task is known as network-reconstruction algorithms (henceforth, simply reconstruction algorithms). Once the network is reconstructed, the network-analysis and network- visualisationtechniques are applied on it. Mainly, the following two types of analyses are performed on the reconstructed networks:

• Firstly, statistical and functional analyses are performed to check whether the reconstructed network can explain the known functionalities, if any, of the concerned system. As an example, a network that models a system responsible for fast responses to external stresses must have a statistically significantly shorter mean-path-length than that of a system with a slower response. As another example, the genes known to be functionally involved in the development of butterfly’s wings must have considerably more inter-connections in a network that models the metamorphosis stage than in a network that models another stage.

• Secondly, new experiments are designed to verify the edges for which prior knowledge is non-existent or limited. If the experimental results verify an edge, then the domain knowledge is enriched. Otherwise, the edge may be considered as incorrect. Consequently, the corresponding reconstruction algorithm can be modified to reject such incorrect edges.

Thus, computer-based reconstructions do not replace experiment-based approaches.

Rather, the former attempt to accelerate the latter by narrowing down the experimental search space.

Focus of the Literature Survey The network reconstruction task is the scope of this thesis (Figure 2.1 ).

System

Observations/Measurements

Network model (independent of the reconstruction algorithm) Reconstruction algorithm

Figure 2.1: Network Reconstruction. For a system of interest, values of its variables are measured across time and under different conditions. These measurements are in- putted into a reconstruction algorithm that outputs a network model of the concerned system. For each class of network models, multiple reconstruction algorithms can be designed to suit different types of measurements.

Different types of network models are designed to address the network reconstruction task. Moreover, for each type of models, a range of reconstruction algorithms is proposed. Hence, our literature survey is aimed at finding answers to the following questions:

Q. What are the different types of network models designed to represent inter-dependencies of variables in cellular systems?

Q. Given a particular type of network model, what are the reconstruction algorithms proposed to reconstruct it?

Dalam dokumen Thesis submitted in partial fulfilment of the requirements for the degree of (Halaman 38-41)