The special problem entitled “Community Detection in Philippine Congress Legislators,” prepared and submitted by Kharl Gaebriel A. Agir in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, has been examined and is recommended for acceptance. Accepted and approved in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science.
A program that would locate communities of specific sizes in a network of legislators, as well as calculate different centralities of each network nodes, has always been a sought-after tool in the field of political science for the entities in that network play an important role in our government. Clustering or communities are extracted, as well as the centrality measures of each node, to see whose members have a stronger connection with others. The software also produces a graph visualization of the network and exports the results as a PDF file.
Background of the Study
In a network, not only the nodes and their direct interconnection are important, but also a much larger perspective when looking at these entities, such as the formation of groups and clusters. They are a group of nodes that are more likely to be connected to each other than to members of other groups. Policy researchers can use these and related network measures to test theories about how networks influence politics, whether through coercion, agenda setting, and interest or identity changes within the network.
The meaning and importance of a node in political network is an aspect of the actual or potential interaction between two or more social actors. Community detection helps researchers understand the entire network by using the nodes individually or by the presence of clusters and communities. Since the network in this study is already delimited by an underlying function, it is appropriate to use clique-finding algorithms, because the network in question has identical node characteristics.
Statement of the Problem
In fact, weighted clique problems are considered NP-hard, which means that they can be polynomially reduced [4] and approximated under certain circumstances. There are many approximation algorithms that have dived into this problem, but this discourse specifically addresses the 2-approximation algorithm for finding cliques with minimal weights [4]. This algorithm uses a row subset of the symmetric matrix problem, where it considers the sum of the values in a subset of the matrix row over a specified number of elements.
Objectives of the Study
Display a graphical visualization of the political network formed, as well as the communities detected by it. Create a PDF document with the list of communities present and the centrality table of legislators.
Significance of the Project
Scope and Limitations
Assumptions
Review of Related Literature 6
Moreover, a community where all the entities are connected together provides valuable information and knowledge, and can now be represented as cliques. Although community structure can be applied to many real problems, its detection, especially a complete one, is one of the most challenging tasks. Now imagine that the maximum clique problem is reformulated in a different way, where instead of the cardinality, edge weights are the ones considered.
The computational complexity of the WCP is determined by the reducibility of the aforementioned clique problem, which is NP-complete in the strong sense [4]. Taking advantage of the graph property, they were able to solve a weighted clique problem using an efficient algorithm. The proposed algorithm used the row subset of the symmetric matrix problem for MWCP polynomial time reduction.
Philippine Congress Structure
Graph Construction
Centrality
The basic measure of betweenness centrality assumes that nodes that fall on one of the shortest paths between other nodes have an advantage in the network because others depend on them for information or resources. If a node has the highest closeness center because they have the shortest average path to another node, then conversely, the longer the path to another node means that it has a higher closeness center. low.
Cliques
The figure below shows the graphical representation of a clique with the largest size, shown as the blue entities.
Weighted Clique Problem
Approximation Algorithm for Finding a Clique with Minimum
The problem of finding the smallest weighted clique with respect to the total weight of its nodes and edges of a fixed size in a complete undirected weighted graph is always considered together with its subclasses. Not only that, the non-approximability of the problem is also proven for the general case. One of the possible motivations in the problem of grouping similar objects and cluster analysis is the existence of identical elements that have the same values of measured properties in a fixed collection of important characteristics from the set of analyzed objects.
They are combinations of the weights of paired objects compared and the Euclidean distances between them. They are combinations of the weights of pairwise comparison objects and squared Euclidean distances between them. The weighted clique problem uses an adjacency matrix of a graph where each element of the matrix is an edge weight between each node that takes non-negative values.
It is also proven that if an approximation algorithm exists, the solution to the problem will be bounded by a scalar multiple of the number of edges on the k-sized clique. The search for an approximation algorithm was tackled from the point of view of the input matrix. The problem is defined by an input-weighted adjacency matrix, which is symmetric in nature, has nonnegative entries except for the diagonal entries which take the value of 0.
The RSSM problem is then proved to be a polynomial time equivalent to the Minimum-Edge Weight Clique Problem in the form of property verification problem. 1, , n, find a set Bj consisting of indices of m smallest entries in the jth row of the matrix W including j itself, Define. This algorithm is then proven to be an approximate solution to the RSSM with the approximation guarantee 2 in time O(n 2 ) and is asymptotically feasible.
They showed that the general case of the problem is NP-hard and non-approximable.
Data Specifications
System Design
After generating the graph along its adjacency matrix, the clique finding algorithm finds a maximum weighted k-clique based on the generated graph and input data, and displays the legislators who are members of this clique. First step in using the tool is to choose between using an existing, built-in data in the tool or enter your own data type for the legislators, and enter the desired size of the clique to be found. After analyzing the data, the tool generates a graph that visualizes the connection of each entity to each other.
Along this generated graph is its adjacency matrix which is a tabular form of the connections of the nodes and the weights of the edges connecting them to each other. The output of this tool is based on the graph and matrix created from the input data. The tool finds the maximum weighted k-clique in the graph and calculates the center of each node.
This shows how much each legislator interacts with each other under different circumstances and factors.
System Architecture
Technical Architecture
Results 24
While the tool is running, the user is presented with the main menu interface, where the user can choose between starting the analysis or searching for information about the tool by clicking the "Help" button. If the user chooses to add a file, the file selector interface will immediately appear and direct the user to add the desired legislator data file. The user can import any file according to the dataset format specified in this paper (Figure 9).
After the file is read and analyzed, a list of legislative features specified in the dataset is shown and the user can select which of the legislative features will be used to create and evaluate the network of legislators. The user can choose to use one, two or all of the features in the feature list (Figure . 11). An empty attribute set error is raised if the user has not selected any attribute.
Since the graph in the tool is displayed as an adjacency matrix, the user is prompted to enter the specifications of the communities he/she wants to display. The text field allows the user to enter the size of each maximum weighted community to be displayed. After specifying the input parameters, the user can choose between an approximate algorithm or an exact algorithm to be used in community detection.
In the exact algorithm, if a GPU device is present on the host, the user can choose where to run the exact algorithm: on the GPU device or on the host's. Then the user can choose to view the centralities of each nodes, generate a PDF with the list of communities and centrality table, or view the graph visualization generated from the input data file. If the user chooses to export the result as a PDF, he/she will be prompted to enter a desired file name for the PDF file.
The user can move the graph using the arrow keys and zoom using the page up/down keys. As the data input is placed, the user gets a chance to select the features he/she wants to use to define the relationship of the legislator nodes. Other than that, the app also allows the user to choose which algorithm to use to detect the mentioned communities, approximate algorithm or the exact algorithm.
These algorithms are applied and implemented by the tool to generate and discover the number of communities desired by the user. Java's GraphStream library provides visualization of the graph generated by user input and parameters.