A Survey of Spatio-temporal Data Mining
2.4 Preliminary of Deep Learning Methods in STDM
2.4.4 Graph Neural Networks
Over the past decade, deep learning techniques like CNNs and RNNs have made significant strides in various application domains. These methods are primarily designed for regular Euclidean data, such as images (2D grids) and text (1D sequences), which exhibit a well-structured grid format. Conversely, graphs are a prevalent data type in the real world, marked by complex relationships and dependencies between entities. Traditional deep learning models are ill-suited for direct application to graphs due to their inherent intricacy and non-Euclidean nature. To overcome this limitation, recent advancements in research have focused
Preliminary of Deep Learning Methods in STDM 49
on adapting deep learning approaches to handle graph data, leading to the development of a novel category of models calledGraph Neural Networks (GNNs).
Drawing inspiration from CNNs, RNNs, and autoencoders in the field of deep learning, innovative generalizations and definitions of essential operations have rapidly emerged in recent years to tackle the complexities associated with graph data. These techniques, collectively known as graph neural networks, can be further categorized into more specialized types. Graph neural networks pro- vide an end-to-end approach for learning relationships and extracting structural information from graph data.
2.4.4.1 A Brief History of GNNs
Sperduti et al. (1997) [113] first introduced the concept of applying neural net- works to directed acyclic graphs, sparking early research on GNNs. The idea of graph neural networks was initially proposed by Gori et al. (2005) [48] and further expanded upon by Scarselli et al. (2008) [106]. These pioneering studies are categorized asrecurrent graph neural networks (RecGNNs). They learn the repre- sentation of a target node by iteratively propagating neighbor information until a stable fixed point is achieved, making this process computationally demanding.
Nevertheless, GNNs did not remain in this preliminary form and rapidly incorporated concepts from other successful areas of deep learning, resulting in more advanced architectures. Inspired by the accomplishments of CNNs in the field of computer vision, numerous methods have been developed to redefine the concept of convolution for graph data. These techniques fall under the category of convolutional graph neural networks (ConvGNNs)such as graph convolutional networks (GCNs),graph attention networks (GATs). In addition to RecGNNs and ConvGNNs, various other GNNs have emerged in recent years, includinggraph autoencoders (GAEs)andspatio-temporal graph neural networks (STGNNs).
2.4.4.2 Graph Convolution Networks
As the name implies,graph convolutional networks (GCNs)adapt the concept of con- volution neural networks for use in the graph domain. In essence, GCNs extend the traditional convolution from conventional grid data to graph-structured data.
The term GCNs was initially introduced by Thomas et al. [71]. GCNs are designed to process local information in the graph, allowing them to learn meaningful rep- resentations of nodes while considering their surrounding context. The primary motivation behind GCNs is to generalize the traditional convolution operation
50 A Survey of Spatio-temporal Data Mining
from grid data, such as images, to irregular data structures like graphs. This generalization enables GCNs to efficiently capture neighborhood information for each node in the graph and effectively learn from the inherent relational structure.
A fundamental component of GCNs is the convolution operation, which can be thought of as a localized weighted average of node features in the neighbor- hood. In its simplest form, a single layer of a GCN can be represented mathemati- cally using the following equation:
H(l+1)=σ
D˜−12A˜D˜−12H(l)W(l)
(2.22) In this equation,H(l+1)denotes the feature matrix at layerl, where each row represents the feature vector of a node in the graph. ˜Ais the adjacency matrix of the graph, augmented with self-connections, which allows each node to include its own features in the convolution operation. ˜Dis the diagonal degree matrix corresponding to the augmented adjacency matrix, andW(l)is the weight matrix for layerl, which is learned during the training process. The activation function, σ, is typically a non-linear function such as ReLU or tanh.
The primary advantage of GCNs is their ability to exploit both the graph topol- ogy and node features simultaneously. By using the graph’s structure, GCNs can effectively capture the local context around each node, learning more meaningful representations. This makes GCNs particularly suitable for tasks such as node classification, link prediction, and graph classification, among others.
GCNs can be extended and combined with various other techniques to en- hance their performance. For example, they can be integrated with attention mechanisms to allow the model to learn the importance of different neighboring nodes. Moreover, GCNs can be combined with other GNN architectures, such as GraphSAGE or GAT, to develop more powerful and sophisticated models that can handle diverse graph-structured data and a wide range of tasks in various application domains.
2.4.4.3 Graph Attention Networks
Graph attention networks (GATs)represent a notable advancement in GNNs by introducing the attention mechanism into the graph domain. First proposed by Velickovic et al. [129], GATs enable the model to weigh the importance of neighboring nodes differently, thereby learning more expressive and adaptive node representations.
Preliminary of Deep Learning Methods in STDM 51
The primary motivation behind GATs is to address the limitations of GCNs, which often assume equal importance for all neighboring nodes when aggregating information. By contrast, GATs assign different importance scores to neighbors, allowing the model to focus on the most relevant nodes in the neighborhood. This attention mechanism is particularly advantageous for dealing with noisy graphs, where some nodes may be less relevant or even misleading. Mathematically, the GAT layer can be described as follows:
Hi(l+1) =σ
∑
j∈ N(i)αijW(l)H(jl) (2.23)In this equation,Hi(l+1)represents the feature vector of nodeiat layerl,W(l)is the trainable weight matrix for layerl, andαi,j, and denotes the attention weight between nodesiandj. The summation is taken over the neighborhoodN(i)of nodei. The attention weightsαij are computed using an attention mechanism, typically a single-layer feedforward neural network with a LeakyReLU activation function:
αij =softmaxj
LeakyReLU
aT[W(l)Hi(l)∥W(l)H(jl)] (2.24) whereais a trainable weight vector, and∥denotes the concatenation operation.
The softmax function normalizes the attention scores across all neighboring nodes ofi.
GATs provide a powerful and adaptive method for learning node representa- tions in graph-structured data by incorporating the attention mechanism. This innovation allows GATs to focus on the most relevant nodes in the neighborhood, making them more robust and flexible than traditional GCNs.
2.4.4.4 GraphSAGE
GraphSAGE, or Graph Sample and Aggregative, is a significant contribution to the field of GNNs that was introduced by Hamilton et al [56]. This innovative approach allows for the inductive learning of node embeddings, meaning it can generate embeddings for previously unseen nodes or entire graphs during the training process. GraphSAGE is particularly well-suited for large-scale graphs where nodes and edges are continually added or removed, such as social networks or knowledge graphs.
The key idea behind GraphSAGE is to learn an aggregation function that combines the feature information of a node’s neighbors to generate a new node
52 A Survey of Spatio-temporal Data Mining
representation. This approach is inductive because the aggregation function can be applied to any node, regardless of whether it was part of the training graph or not. Mathematically, the GraphSAGE layer can be defined as follows:
Hi(l+1) =σ
W(l)·AGGREGATE
H(jl)|j∈ N(i) (2.25) whereHi(l+1)denotes the feature vector of nodeiat layerl,W(l)represents the trainable weight matrix for layerl, and AGGREGATE is the aggregation function applied to the feature vectors of nodei’s neighbors (j ∈ N(i)). The non-linear activation function is represented by σ. Several aggregation functions can be used in the GraphSAGE framework, such as mean, max-pooling, or LSTM-based aggregation. The choice of the aggregation function depends on the specific problem and the graph’s properties.
GraphSAGE is a powerful and flexible approach for inductive learning of node embeddings in graph-structured data. By learning an aggregation function that combines neighboring node information, GraphSAGE can generate embeddings for unseen nodes or entire graphs, making it particularly useful for large-scale and dynamic graph settings.
2.4.4.5 Graph Autoencoders
Graph Autoencoders (GAEs)are a class of GNNs that adopt the concept of autoen- coders from traditional deep learning and apply it to graph-structured data. GAEs are unsupervised learning models that aim to capture the underlying structure and features of a graph by learning a low-dimensional latent representation for each node. These learned embeddings can be used for various downstream tasks, such as link prediction, node classification, and graph clustering.
The GAE framework consists of two main components: an encoder and a decoder. The encoder function maps the input node features and graph structure to a low-dimensional embedding space, while the decoder function reconstructs the graph structure or adjacency matrix from these embeddings.