• Tidak ada hasil yang ditemukan

Graph reduction through graph summarization

2.4 Graph reduction

2.4.1 Graph reduction through graph summarization

One possible way to reduce a graph is to summarize it. The process of graph summarization removes unnecessary detail while retaining the general properties of the original graph [27].

The summarization process is performed by creating supernodes by grouping together similar nodes from the original graph [28]. The graph summarization process can be seen in Figure 2.4 [27]. The graph on the left-hand side is the original, and the graph on the right-hand side is the summarized version. Similar nodes are represented by the same colour and are grouped into a supernode.

Figure 2.4: Graph summarization process [27].

All graphs can either be classified as stationary or as stream graphs [27]. A stationary graph’s structure remains the same as time progresses, while a stream graph’s structure changes over time due to nodes being added or removed from the graph. An example of a stationary graph would be a graph representing several cities and how they are connected, while a social media network can be seen as a stream graph since nodes are added as new members join the network over time.

Since this study is centred around the attributed graph of a plant process, summarization methods of stationary graphs will be the main focus as the structure of a process plant typically remains the same.

All the summarization methods are based on similarity, and this similarity can be structural, attribute-based, or a combination of the two. Therefore, the summarization methods can be classified as structural, attribute-based, or hybrid approaches.

2.4.1.1 Structural graph summarization

Navlakha et al. [29] discuss a graph summarization method that is based on the structural composition of the graph. The process works by merging two or more nodes with edges going to the same set (or a very similar set) of other nodes into supernodes. The edges going to each common neighbour are then replaced with a super edge. This process is illustrated in

Figure 2.5 [29], where the graph G = (V(G), E(G)) on the left is summarized to create the graph on the right S= (V(S), E(S)).

Figure 2.5: An example of structural graph summarization [29].

The graph representationGhas a summary setS and a correlation setC, used to reconstruct the original graph. Navlakha et al. [29] define the cost of the representationR = (S, C), which determines the sum of the storage costs of its two inputs. The Minimum Description Length (MDL) principle is applied to determine the best possible summary. If ∧R= (∧S,∧C) is the minimum cost representation, then according to the MDL principle, the summary ∧S is the best possible summary graph.

2.4.1.2 Attribute-based summarization

In the paper by Tian et al. [30], two summarization operations are proposed. The first operation is called SNAP (Summarization by Grouping Nodes on Attributes and Pairwise Relationships). The SNAP operation works by grouping nodes that are homogeneous in terms of relationships and attributes together. Edges are then used to show the relationship between the different groups.

An example of the SNAP operation can be seen in Figure 2.6 [30]. The original graph on the left represents students (nodes) with different attributes (gender, department) and the relationship between the students (edges). Not all the relationships are shown in the figure.

The summary is created by grouping students of the same gender and department together, which results in four different groups (G1, G2, G3, G4). The edges are assigned to represent

the relationships between the different groups in the summary. The summarized graph can be seen on the right-hand side of the figure.

Figure 2.6: Illustration of the SNAP operation [30].

The second operation proposed by Tian et al. in [30] is called the k-SNAP operation. The k-SNAP operation relaxes the homogeneity requirements present in the SNAP operation by not requiring that every node in a group participates in a group relationship. The k-SNAP operation also allows the user to determine the size of the resulting summary. The user specifies the required number of groups in the summary, which is denoted by k.

An improvement on the k-SNAP operation is presented in a paper by Zhang et al. [31].

They propose using the CANAL (Categorization of Attributes with Numerical Values based on Attribute Values and Link Structures of Nodes) technique to summarize a graph. The CANAL technique automatically categorises the numerical values by assessing the similarities of the attribute values and the link structure of all the nodes in the graph.

The inputs to this algorithm are the graphG= (V, E), the attributes of all the nodes denoted bya, and the number of categories required by the user denoted by C. It works by grouping nodes based on the attribute values of all the nodes in the graph. All the nodes in one group have the same numerical value. The groups are ordered numerically. The algorithm then iteratively merges groups based on the similarity of their link structures until one group remains.

During the merging process, the algorithm constantly determines the quality of the summary.

If merging two groups significantly causes the summary quality to decrease, the boundary between the groups is an excellent cut-off position. A cut-off position splits two categories.

In the final step, the algorithm uses the boundaries of the C – 1 merging operations which produce the worst quality summaries, to categorise the numerical attributes.

The k-means clustering algorithm is another possible method that can be used to create a summarized graph based on node attributes. The algorithm sorts the data into k non- overlapping clusters. The user specifies thekvalue, and a cluster is represented by its centroid, which is the mean of all the points in the cluster [32]. This method is more suited for graphs that have numerical attributes.

The algorithm starts by selecting k centroids. Then, each point in the dataset is assigned to the closest centroid, and the collection of points around each centroid forms a cluster. Next, an updated centroid is assigned to the cluster according to the data points in that cluster.

This process repeats until the data points stop changing clusters [32].

Although graph summarization and graph clustering are two distinct operations, in the paper by Riondato et al. [33] they exploit the connection between graph summarization and geo- metric clustering (k-means clustering) to develop an algorithm that is capable of producing a summarized graph. The clusters produced by the k-means clustering form the supernodes in the summarized graph.

2.4.1.3 Hybrid graph summarization

In [34], a method is proposed that creates a summary of a graph-based on both virtual and real edges. The technique is called SGVR (SummarizingGraph based onVirtual andReal links), and it works by aggregating similar nodes into non-overlapping groups using user-selected attributes. It considers both virtual edges, which represent node attributes, and actual edges, representing the graph structure.

Ashrafi & Kangavari [35] propose a new approach that generates a hybrid summary of an attributed graph by allowing the user to specify - in percentage - the contribution of the structural information to the summarized graph. This method further allows the user to specify the resulting summary’s size and the importance of attributes if nodes have multi- valued similarities.

They also introduce the concepts of density and entropy, which are measures used to determine the quality of summarized graphs, depending on the type of summary. They compared their method with the method in [34] by using these two measures and found that their method results in a higher quality summary. Figure 2.7 [35] illustrates this summarization method.

The summarized graph now contains a supernode and two regular nodes.

(a) (b)

Figure 2.7: Example of hybrid graph summarization:(a) Orig- inal graph. (b) Summarized graph [35].

2.4.1.4 Using a summarized graph for FDI

For graph comparison to be applied to a summarized graph, an attribute will have to be assigned to each supernode, and its value should represent all the individual nodes from the original graph that make up the supernode. Furthermore, this newly assigned attribute will have to be selected in such a way that the graph comparison process should be able to detect a fault at any of the individual nodes forming the supernode while not detecting a fault if any of the individual node attributes exhibit normal levels of variation.

While Jouili & Tabbone [24] show that it is possible to compare graphs of different sizes, comparing a reference graph with an operational graph after both have been summarized, will result in a more accurate FDI process than a comparison between one standard and one summarized graph, since this will lead to valuable data not being considered. This FDI process involves the summarized graph being compared with itself under normal operating conditions (NOC). When considering an eigendecomposition FDI method as an example, this is done by generating a cost matrix and calculating the eigenvectors and eigenvalues for the normal operation.

The attributes of the summarized graph of the system are then continually updated and monitored during operation as the plant measurements are updated. Condition monitoring then takes place, and it involves applying graph comparison operations and generating a cost matrix that determines the difference between the normal operation summarized graph and the monitored summarized graph. Finally, the eigenvectors and eigenvalues of this cost matrix are calculated and compared to those of the normal plant operation. A fault has occurred if the deviation between the two sets of eigenvectors and eigenvalues is significant enough.