To my knowledge, no part of the work reported in this dissertation has been submitted for the award of a degree at any other institution. To address this, this dissertation proposes a new, first-of-its-kind community view of edges in a graph.
Representing real-world data as graphs
Machine Learning on graphs
Network Representation Learning: an introduction
Local versus global contexts in graphs for learning network representations 5
Therefore, the macroscopic network view, also known as the higher-order structural features of the underlying network, is important and plays a crucial role in the learning mechanism. State-of-the-art (SoTA) NRL link prediction methods for heterogeneous graphs primarily consider graph neural networks to learn local neighborhood contexts, such as enclosing subgraph context surrounding the (source, target) node pair for learning edge representations.
Research Objectives
Thesis Overview and Contributions
To this end, we propose three approaches - i) Semi-supervised group invariance property for homogeneous graphs, ii) InfoMax of group-aware graphs for manifold graphs, and iii) Metapathic and community views of relations in heterogeneous graphs for learned structure-conscious network representations. An invariance property of semi-supervised sets for nodes is proposed with the aim of building a unified learning framework of transductive node representation called Unified Semi-Supervised Non-Negative Matrix Factorization (USS-NMF).
Thesis Organization
- Graph Laplacian Matrix
- Non-Negative Matrix Factorization (NMF)
- Pointwise Mutual Information Matrix from Graph Adjacency
- Clusters and Communities
- Semi-Supervised Learning on graphs
- Semi-Supervised Node Classification Task
The number of connected components in a graph is obtained from the dimension of the null space of the Laplacian matrix. So, two design choices are involved, based on - the idea of similarity(A) to use, and the target space f(X) to be smoothed.
Preliminaries: Multiplex Graphs
Multiplex Graphs
The cross-connections can be -trivial when the edges connect two replica nodes across the layers based on the same identity, or non-trivial if bipartite edges have any other notion of closeness. For example, in a document multiplex where different relationships between a set of document nodes are modeled, intersection edges can either represent replica nodes across the layers or represent the cosine similarity between a pair of document nodes based on their component terms.
InfoMax Principle for learning network representations
- Learning network representations on multiplex graphs . 28
- InfoMax Principle applied to homogeneous graphs
- InfoMax Principle applied to multiplex graphs
Deep Graph InfoMax (DGI) [2] is the first of its kind work that proposes an InfoMax-based learning objective to learn node embeddings from homogeneous graphs in a completely unsupervised manner. Deep Multiplex Graph InfoMax (DMGI)[2] is the first work of its kind to propose an InfoMax-based model for the multiplex graphs.
Semi-Supervised Node Classification Task
He then proposes the technique of arranging a consensus, which uses an attention mechanism to meaningfully aggregate connection-specific node embeddings via the aggregation function Qin Figure 2.9. The consensus adjustment strategy minimizes disagreements between relation-specific node inclusions to systematically learn a final, unified representation of consensus nodes.
Preliminaries: Heterogeneous Graphs
Heterogeneous Graphs
Heterogeneous Information for predicting links
Contextual encoding of triples for Link Prediction
The problem of Structural Link Prediction
Challenges
It is a simplification of the basic path to contain only a sequence of information about edge types. For graph-based SSL methods, the choice of underlying data representation (embedded space) is critical.
Motivation
As shown in Figure 3.3c, the smoothness assumption states that if two nodes are closely related by any notion of similarity, the associated data points are highly likely to have . Thus, the optimal decision boundary should consider clusters of high-density data points and lie in the low-density region.
Research Objective
Although there are previous works that embed data in high-density areas into clusters, they are mainly unsupervised. In the classic SSL literature, graph-based and geodesic distance-preserving approaches have been said to learn high-density clusters.
Contributions
Literature Review
NRL on homogeneous graphs
- NRL for learning local representations
- NRL for learning global representations
NetMF uses a low-level approximation of the proposed matrices using SVD to obtain node embeddings. Discriminative Deep Random Walk (DDRW) [101] jointly learns topological structures in the graph via random walks and optimizes for the node classification objective to obtain more discriminative embeddings of nodes.
Research Gaps
DGI [2] is a representational method that computes a summary of the global graph by naively aggregating all the contextual embeddings of local nodes. They use unsupervised node clustering strategies to generate a personalized global graph context to enrich local node entries via the InfoMax principle.
Proposed Framework: Unified Semi-Supervised Non-Negative Matrix Fac-
Unified Semi-Supervised NMF (USS-NMF)
- Encoding local invariance aka network structure
- Encoding supervision knowledge
- Encoding local neighborhood invariance in label space . 59
- USS-NMF Model
- Derivation of multiplicative update rules
Here, we address blocks of size one by enforcing orthogonal constraints, which encourage each group to be different from each other, i.e., we enforce HHT =Ik similar to [29]. We define the label similarity network on the training data ofGasE= (W⊙Y)T(W⊙ Y)∈RN×N, where∆(E) =D(E)−E is the unnormalized Laplacian operator onE.
Evaluation Methodology
Datasets
In all these datasets, the task is to predict the node's research area (paper/author). PPI [120] is a protein-protein interaction dataset, where the task is to predict the functional properties of proteins.
Baselines
Experiment Setup
NMF:S+Y. We construct a version of MMDW that also incorporates supervised information into node embeddings by jointly optimizing Equations 3.1 and 3.2. In Table 3.3, we have given the range of actual values for each of the coefficients that apply to data sets of different sizes.
Performance Analysis
- Node classification
- Unsupervised Models
- Semi-Supervised Models
- Clusterability of Learned Representations
- Node Clustering
- t-SNE Visualization
- Ablation Study
- Importance of Label Information
- Importance of Cluster Information
- Study on Laplacian smoothing variants
- SSL with balanced dataset
- SSL with varying ratio of labeled data
- USS-NMF’s sensitivity to number of clusters
- Convergence Analysis for USS-NMF
The model in the last column is USS-NMF. USS-NMF is the winner across the board in all datasets with Rank1 and Penalty0. In this way, the proposed USS-NMF is able to contribute to real-world problems.
Challenges
This is because higher-order graph structures such as walks, relational paths, hyper-edges, communities, and clusters can span across layers. Depending on the downstream task, some layers may prove more useful than others.
Motivation
However, naively defining a different global context for each node, such as a subgraph-based approach, will defeat the original objective of learning common useful information from the entire graph. Therefore, the global context for node i should be more inclined towards C1, C2, C3 rather than a naive summation of all candidate clusters.
Research Objective
When a trivial global graph overview function, such as the average of all node embeddings, is used, the global context of all nodes becomes the same - since their global context is isomorphic. Naively maximizing the MI of a node's local representations with a shared global context can bias the model to encode trivial and noisy information that exists across all nodes' local information.
Contributions
Literature Review
NRL on multiplex graphs
- Global context-based NRL
- InfoMax based NRL
- Semi-Supervised Learning (SSL)
It formulates minimum-entropy-based clustering criteria based on the Cauchy kernel to cluster the node embeddings output from relation-specific autoencoders. It incorporates monitoring information via one cross-entropy-based semi-supervised prediction loss based on train samples.
Research Gaps
Proposed Framework: Semi-Supervised Deep Clustered Multiplex (SSDCM) 91
- Local Node Representations
- Contextual Global Node Representations
- Clustering
- Cross-relation Regularization
- Joint embedding with Consensus Regularization
- Semi-Supervised Deep Multiplex Clustered InfoMax
Similar to [39], this discriminator is learned universally, i.e. the weight is distributed among all layers in order to capture local-global correlations of relational representation. Since in practice sparse interactions are generally modeled as cross-edges [71–73] [See Section 2.2.1] to optimize modeling costs – we do not optimize for the special case where cross-edges are dense.
Evaluation Methodology
We selected State-Of-The-Art (SOTA) competing methods applicable to a wide range of multigraph settings. These methods either lack strategies to 1) capture global structural information or 2) aggregate node information across different counterparts to the same node from different layers, 3) capture useful structures.
Performance Analysis
- Node Classification
- Node Clustering
- t-SNE Visualizations
- Node Similarity Search
We train a logistic regression classifier on embedding learned nodes of the training data for unsupervised methods and report the performance of the predictor on embedding test nodes averaged over twenty runs. For a query node, the rest of the nodes are ranked based on the similarity scores.
Ablation Study
- Novelty of cluster-based graph summary
- Visualizing discriminator weights
- Effect of various regularizations
- Analyzing clusters
- Varying number of clusters
- Comparing relation-wise clustering performance
We see that removing cross-edge-based regularization from the layered node embeddings significantly degrades the performance of SSDCM, especially on FLICKR and ACM. In FT configuration, cluster orthogonality is varied in the absence of the cluster learning term.
Challenges
Type-Specific Structural Diversity In HINs, nodes of different types tend to connect to the rest of the graph in very different ways. We see that the boxplots have different variances with medians located at very different positions - suggesting the divergent tendencies of nodes to connect to the rest of the graph.
Motivation
Usefulness of metapaths
In a case study shown in Figure 5.3, we plot the co-occurrence characteristics of 1-hop relation and different metapaths (of length >1) between the same set of (source, target) node. For example, complex semantic associations such as —i) similar species causing similar diseases via shared genes (9,7,1,2) have been found to co-occur with the ART-with-DISEASE relationship (r=8), ii) chemicals found in certain species are useful for the treatment of certain diseases that affect genes corresponding to the species(6,7,0,1) appear to occur together with the relationship CHEMICAL-with-DISEASE(r=4).
Usefulness of communities
Therefore, without looking at the identities of neighborhood nodes, we can predict high-probability links between a (source, target) node pair by looking only at the relevant higher-order semantic associations that act as logical evidence to support the direct relationship to distract. The resemblance of the retrieved nodes to both the test PAPER and TERM nodes provides strong evidence for the direct relationship {HAS-TERM} between them.
Effective aggregation of views to contextualize a triple
Paper: Backtesting: an efficient framework for choosing between classifiers under sample selection bias Paper: Entity-subject statistical models.
Research Objective
Contributions
Literature Review
Context-based link prediction models
- Local contexts for link prediction
- Global contexts for link prediction
- Research gaps in learning from metapaths
- Research gaps in learning from communities
A number of works have incorporated metapath information into a network representation learning setup focused on link prediction. To our knowledge, we are the first to propose a new community view for link prediction in HINs by simultaneously learning communities and simultaneously optimizing for link prediction loss in a network representation learning setup.
Research Gaps: Summary
However, to the best of our knowledge, no recent work has considered communities or clusters as global information to optimize linkage prediction objectives in HINs. Past research examines the effect of community inclusion to predict connections in homogeneous graphs, mainly using different topological metrics in a representation-free learning setting.
Proposed Framework: Multi-View Heterogeneous Relation Embedding (MV-
- Learning Subgraph View
- Learning Metapath View
- Learning Community View
- Attentive View Aggregation
- Triplet Representation and Scoring
- Multi-Task Learning Objectives
- Inference Complexity
To generate a community image of a triplet(e,rt,t)— i) we simultaneously learn community structures from the heterogeneous network, ii) generate and t's graph-wide community summaries, iii) obtain their community similarity-based triplet plausibility representation as the community view. Community membership learning via modularity maximization. We employ modularity maximization based on community learning [29,34] — as this is one of the most widely used unsupervised algorithms [61] for community detection.
Evaluation Methodology
Here, the association prediction task is formulated as a classification problem that learns to distinguish between positive and negative triples. We use AUC-ROC and AUC-PR estimates to evaluate the association prediction task as a binary classification problem.
Results
- Transductive Link Prediction
- Performance on Benchmark splits
- Inductive Link Prediction
- Ablation Study
- Visualizing view-wise attention weights
- Similarity analysis of node communities
- Metapath clustering
- Intrinsic evaluation of communities
However, MV-HRE remains the best and second best performer on most datasets for ranking metrics. The performance of MV-HRE association prediction in PubMed is comparable to the rest of the methods that provided competitive performance.
Structure-Aware NRL on Multiplex Graphs
The proposed method simultaneously learns node embeddings from the cluster structures as well as from the local neighborhood contexts. The utility of such embeddings is verified in several challenging scenarios, such as different test-to-node sampling strategies and label sparsity.
Structure-Aware NRL on Heterogeneous Graphs
Publications
From Thesis
34;Revisiting Link Prediction on Heterogeneous Graphs with A Multi-view Perspective." for at blive vist i Proceedings of 22nd IEEE International Conference on Data Mining (ICDM 2022).
Outside Thesis
Github Repositories
From Thesis
Outside Thesis
Miscellaneous Research Activities
Research internships
Invited Talks
Service
Zhu, "Asymmetric transitivity preserving graph embedding", inProceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016, pp. Skiena, "Harp: Hierarchical representation learning for networks," inProceedings of the AAAI conference on artificial intelligence, vol.
Graphs from diverse domains [Image Source: Internet]
Various representations of networked data [Image Source: Internet]
A toy example of embedding a graph G 1 into 2D space with different granu-
A toy example of showing usefulness of community information for A) node
Higher-order structures at multi-scale for graphs. Examples include: walks,
Summary of contributions in the dissertation
An example graph G
Various Graph Laplacians
Homogeneous Networks: Examples [Image Source: Internet]
Visualizing node associations on Washington
Visualizing community similarity heatmaps on Washington
An example bibliographic multiplex graph and the structure of its adjacency