Structure-Aware Network Representation Learning on Graphs

To my knowledge, no part of the work reported in this dissertation has been submitted for the award of a degree at any other institution. To address this, this dissertation proposes a new, first-of-its-kind community view of edges in a graph.

Representing real-world data as graphs

Machine Learning on graphs

Network Representation Learning: an introduction

Local versus global contexts in graphs for learning network representations 5

Therefore, the macroscopic network view, also known as the higher-order structural features of the underlying network, is important and plays a crucial role in the learning mechanism. State-of-the-art (SoTA) NRL link prediction methods for heterogeneous graphs primarily consider graph neural networks to learn local neighborhood contexts, such as enclosing subgraph context surrounding the (source, target) node pair for learning edge representations.

Research Objectives

Thesis Overview and Contributions

To this end, we propose three approaches - i) Semi-supervised group invariance property for homogeneous graphs, ii) InfoMax of group-aware graphs for manifold graphs, and iii) Metapathic and community views of relations in heterogeneous graphs for learned structure-conscious network representations. An invariance property of semi-supervised sets for nodes is proposed with the aim of building a unified learning framework of transductive node representation called Unified Semi-Supervised Non-Negative Matrix Factorization (USS-NMF).

Thesis Organization

Graph Laplacian Matrix
Non-Negative Matrix Factorization (NMF)
Pointwise Mutual Information Matrix from Graph Adjacency
Clusters and Communities
Semi-Supervised Learning on graphs
Semi-Supervised Node Classification Task

The number of connected components in a graph is obtained from the dimension of the null space of the Laplacian matrix. So, two design choices are involved, based on - the idea of similarity(A) to use, and the target space f(X) to be smoothed.

Preliminaries: Multiplex Graphs

Multiplex Graphs

The cross-connections can be -trivial when the edges connect two replica nodes across the layers based on the same identity, or non-trivial if bipartite edges have any other notion of closeness. For example, in a document multiplex where different relationships between a set of document nodes are modeled, intersection edges can either represent replica nodes across the layers or represent the cosine similarity between a pair of document nodes based on their component terms.

InfoMax Principle for learning network representations

Learning network representations on multiplex graphs . 28
InfoMax Principle applied to homogeneous graphs
InfoMax Principle applied to multiplex graphs

Deep Graph InfoMax (DGI) [2] is the first of its kind work that proposes an InfoMax-based learning objective to learn node embeddings from homogeneous graphs in a completely unsupervised manner. Deep Multiplex Graph InfoMax (DMGI)[2] is the first work of its kind to propose an InfoMax-based model for the multiplex graphs.

Semi-Supervised Node Classification Task

He then proposes the technique of arranging a consensus, which uses an attention mechanism to meaningfully aggregate connection-specific node embeddings via the aggregation function Qin Figure 2.9. The consensus adjustment strategy minimizes disagreements between relation-specific node inclusions to systematically learn a final, unified representation of consensus nodes.

Preliminaries: Heterogeneous Graphs

Heterogeneous Graphs

Heterogeneous Information for predicting links

Contextual encoding of triples for Link Prediction

The problem of Structural Link Prediction

Challenges

It is a simplification of the basic path to contain only a sequence of information about edge types. For graph-based SSL methods, the choice of underlying data representation (embedded space) is critical.

Motivation

As shown in Figure 3.3c, the smoothness assumption states that if two nodes are closely related by any notion of similarity, the associated data points are highly likely to have . Thus, the optimal decision boundary should consider clusters of high-density data points and lie in the low-density region.

Research Objective

Although there are previous works that embed data in high-density areas into clusters, they are mainly unsupervised. In the classic SSL literature, graph-based and geodesic distance-preserving approaches have been said to learn high-density clusters.

Contributions

Literature Review

NRL on homogeneous graphs

NRL for learning local representations
NRL for learning global representations

NetMF uses a low-level approximation of the proposed matrices using SVD to obtain node embeddings. Discriminative Deep Random Walk (DDRW) [101] jointly learns topological structures in the graph via random walks and optimizes for the node classification objective to obtain more discriminative embeddings of nodes.

Research Gaps

DGI [2] is a representational method that computes a summary of the global graph by naively aggregating all the contextual embeddings of local nodes. They use unsupervised node clustering strategies to generate a personalized global graph context to enrich local node entries via the InfoMax principle.

Proposed Framework: Unified Semi-Supervised Non-Negative Matrix Fac-

Unified Semi-Supervised NMF (USS-NMF)

Encoding local invariance aka network structure
Encoding supervision knowledge
Encoding local neighborhood invariance in label space . 59
USS-NMF Model
Derivation of multiplicative update rules

Here, we address blocks of size one by enforcing orthogonal constraints, which encourage each group to be different from each other, i.e., we enforce HHT =Ik similar to [29]. We define the label similarity network on the training data ofGasE= (W⊙Y)T(W⊙ Y)∈RN×N, where∆(E) =D(E)−E is the unnormalized Laplacian operator onE.

Evaluation Methodology

Datasets

In all these datasets, the task is to predict the node's research area (paper/author). PPI [120] is a protein-protein interaction dataset, where the task is to predict the functional properties of proteins.

Baselines

Experiment Setup

NMF:S+Y. We construct a version of MMDW that also incorporates supervised information into node embeddings by jointly optimizing Equations 3.1 and 3.2. In Table 3.3, we have given the range of actual values for each of the coefficients that apply to data sets of different sizes.

Performance Analysis

Node classification

Unsupervised Models
Semi-Supervised Models

Clusterability of Learned Representations

Node Clustering
t-SNE Visualization

Ablation Study

Importance of Label Information
Importance of Cluster Information

Study on Laplacian smoothing variants
SSL with balanced dataset
SSL with varying ratio of labeled data
USS-NMF’s sensitivity to number of clusters
Convergence Analysis for USS-NMF

The model in the last column is USS-NMF. USS-NMF is the winner across the board in all datasets with Rank1 and Penalty0. In this way, the proposed USS-NMF is able to contribute to real-world problems.

Challenges

This is because higher-order graph structures such as walks, relational paths, hyper-edges, communities, and clusters can span across layers. Depending on the downstream task, some layers may prove more useful than others.

Motivation

However, naively defining a different global context for each node, such as a subgraph-based approach, will defeat the original objective of learning common useful information from the entire graph. Therefore, the global context for node i should be more inclined towards C1, C2, C3 rather than a naive summation of all candidate clusters.

Research Objective

When a trivial global graph overview function, such as the average of all node embeddings, is used, the global context of all nodes becomes the same - since their global context is isomorphic. Naively maximizing the MI of a node's local representations with a shared global context can bias the model to encode trivial and noisy information that exists across all nodes' local information.

Contributions

Literature Review

NRL on multiplex graphs

Global context-based NRL
InfoMax based NRL
Semi-Supervised Learning (SSL)

It formulates minimum-entropy-based clustering criteria based on the Cauchy kernel to cluster the node embeddings output from relation-specific autoencoders. It incorporates monitoring information via one cross-entropy-based semi-supervised prediction loss based on train samples.

Research Gaps

Proposed Framework: Semi-Supervised Deep Clustered Multiplex (SSDCM) 91

Local Node Representations
Contextual Global Node Representations
Clustering
Cross-relation Regularization
Joint embedding with Consensus Regularization
Semi-Supervised Deep Multiplex Clustered InfoMax

Similar to [39], this discriminator is learned universally, i.e. the weight is distributed among all layers in order to capture local-global correlations of relational representation. Since in practice sparse interactions are generally modeled as cross-edges [71–73] [See Section 2.2.1] to optimize modeling costs – we do not optimize for the special case where cross-edges are dense.

Evaluation Methodology

We selected State-Of-The-Art (SOTA) competing methods applicable to a wide range of multigraph settings. These methods either lack strategies to 1) capture global structural information or 2) aggregate node information across different counterparts to the same node from different layers, 3) capture useful structures.

Performance Analysis

Node Classification
Node Clustering
t-SNE Visualizations
Node Similarity Search

We train a logistic regression classifier on embedding learned nodes of the training data for unsupervised methods and report the performance of the predictor on embedding test nodes averaged over twenty runs. For a query node, the rest of the nodes are ranked based on the similarity scores.

Ablation Study

Novelty of cluster-based graph summary
Visualizing discriminator weights
Effect of various regularizations
Analyzing clusters

Varying number of clusters
Comparing relation-wise clustering performance

We see that removing cross-edge-based regularization from the layered node embeddings significantly degrades the performance of SSDCM, especially on FLICKR and ACM. In FT configuration, cluster orthogonality is varied in the absence of the cluster learning term.

Challenges

Type-Specific Structural Diversity In HINs, nodes of different types tend to connect to the rest of the graph in very different ways. We see that the boxplots have different variances with medians located at very different positions - suggesting the divergent tendencies of nodes to connect to the rest of the graph.

Motivation

Usefulness of metapaths

In a case study shown in Figure 5.3, we plot the co-occurrence characteristics of 1-hop relation and different metapaths (of length >1) between the same set of (source, target) node. For example, complex semantic associations such as —i) similar species causing similar diseases via shared genes (9,7,1,2) have been found to co-occur with the ART-with-DISEASE relationship (r=8), ii) chemicals found in certain species are useful for the treatment of certain diseases that affect genes corresponding to the species(6,7,0,1) appear to occur together with the relationship CHEMICAL-with-DISEASE(r=4).

Usefulness of communities

Therefore, without looking at the identities of neighborhood nodes, we can predict high-probability links between a (source, target) node pair by looking only at the relevant higher-order semantic associations that act as logical evidence to support the direct relationship to distract. The resemblance of the retrieved nodes to both the test PAPER and TERM nodes provides strong evidence for the direct relationship {HAS-TERM} between them.

Effective aggregation of views to contextualize a triple

Paper: Backtesting: an efficient framework for choosing between classifiers under sample selection bias Paper: Entity-subject statistical models.

Research Objective

Contributions

Literature Review

Context-based link prediction models

Local contexts for link prediction
Global contexts for link prediction
Research gaps in learning from metapaths
Research gaps in learning from communities

A number of works have incorporated metapath information into a network representation learning setup focused on link prediction. To our knowledge, we are the first to propose a new community view for link prediction in HINs by simultaneously learning communities and simultaneously optimizing for link prediction loss in a network representation learning setup.

Research Gaps: Summary

However, to the best of our knowledge, no recent work has considered communities or clusters as global information to optimize linkage prediction objectives in HINs. Past research examines the effect of community inclusion to predict connections in homogeneous graphs, mainly using different topological metrics in a representation-free learning setting.

Proposed Framework: Multi-View Heterogeneous Relation Embedding (MV-

Learning Subgraph View
Learning Metapath View
Learning Community View
Attentive View Aggregation
Triplet Representation and Scoring
Multi-Task Learning Objectives
Inference Complexity

To generate a community image of a triplet(e,rt,t)— i) we simultaneously learn community structures from the heterogeneous network, ii) generate and t's graph-wide community summaries, iii) obtain their community similarity-based triplet plausibility representation as the community view. Community membership learning via modularity maximization. We employ modularity maximization based on community learning [29,34] — as this is one of the most widely used unsupervised algorithms [61] for community detection.

Evaluation Methodology

Here, the association prediction task is formulated as a classification problem that learns to distinguish between positive and negative triples. We use AUC-ROC and AUC-PR estimates to evaluate the association prediction task as a binary classification problem.

Results

Transductive Link Prediction

Performance on Benchmark splits

Inductive Link Prediction
Ablation Study
Visualizing view-wise attention weights
Similarity analysis of node communities
Metapath clustering
Intrinsic evaluation of communities

However, MV-HRE remains the best and second best performer on most datasets for ranking metrics. The performance of MV-HRE association prediction in PubMed is comparable to the rest of the methods that provided competitive performance.

Structure-Aware NRL on Multiplex Graphs

The proposed method simultaneously learns node embeddings from the cluster structures as well as from the local neighborhood contexts. The utility of such embeddings is verified in several challenging scenarios, such as different test-to-node sampling strategies and label sparsity.

Structure-Aware NRL on Heterogeneous Graphs

Publications

From Thesis

34;Revisiting Link Prediction on Heterogeneous Graphs with A Multi-view Perspective." for at blive vist i Proceedings of 22nd IEEE International Conference on Data Mining (ICDM 2022).

Outside Thesis

Github Repositories

From Thesis

Outside Thesis

Miscellaneous Research Activities

Research internships

Invited Talks

Service

Zhu, "Asymmetric transitivity preserving graph embedding", inProceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016, pp. Skiena, "Harp: Hierarchical representation learning for networks," inProceedings of the AAAI conference on artificial intelligence, vol.

Graphs from diverse domains [Image Source: Internet]

Various representations of networked data [Image Source: Internet]

A toy example of embedding a graph G 1 into 2D space with different granu-

A toy example of showing usefulness of community information for A) node

Higher-order structures at multi-scale for graphs. Examples include: walks,

Summary of contributions in the dissertation

An example graph G

Various Graph Laplacians

Homogeneous Networks: Examples [Image Source: Internet]

Visualizing node associations on Washington

Visualizing community similarity heatmaps on Washington

An example bibliographic multiplex graph and the structure of its adjacency