Spatio-temporal Graph Representation Learning

(1)

Spatio-temporal Graph Representation Learning

Song Yang

A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science,

The University of Auckland, Feb 2024.

(2)

Abstract

This thesis delves into Spatio-temporal Graph Representation Learning (STGRL), an emerging and specialized subset of Spatio-temporal Data Mining (STDM).

STGRL capitalizes on deep learning methodologies applied to the inherent graph structure of spatio-temporal data, thereby facilitating a deeper understanding and exploitation of spatio-temporal interdependence in various tasks. Our research primarily investigates STGRL, aiming to explore its transferability, generalization, scalability, construction of informative graph structures from raw data, and the sparse sampling issue within several concrete application scenarios. To achieve this, we undertake the following three tasks: (1) Discerning that spatial features, temporal features, and spatio-temporal correlations can be concurrently learned when constructing a spacetime manifold, we propose an innovative spacetime neural network for learning translation-invariant spatio-temporal patterns, serving as a universal traffic model. This approach not only tackles transferability and generalization challenges but also presents a viable solution to enhance scalability.

(2) We examine the graph construction process, illustrating how a more insightful and conceptual graph can facilitate next POI recommendations, as opposed to relying solely on less informative geographic graphs. This insight suggests that in numerous STDM tasks, the straightforward geographic graph structure may not be the optimal choice, and alternative, more beneficial graph structures could be employed. (3) Recognizing the often sparse and discrete nature of STDM and STGRL data, we introduce an innovative representation learning framework adept at inferring latent representations, thereby decoding target features for arbitrary timestamps from the local spatio-temporal context. We evaluate these models using multiple real-world datasets, encompassing traffic forecasting, next POI recommendations, weather pattern reconstruction, and air quality reconstruction. Experimental results substantiate the efficacy of these novel algorithms, demonstrating their superior performance compared to existing methods.

i

(3)

(4)

Acknowledgement

First and foremost, I would like to express my deepest gratitude to my main supervisor, Prof. Jiamou Liu, for his invaluable guidance, unwavering support, and patient mentorship throughout my PhD journey. His expertise, constructive feedback, and enthusiasm have been pivotal in shaping my research and academic growth. I also extend my immensely heartfelt gratitude to my co-supervisor, Dr.

Kaiqi Zhao, for his keen insights, encouragement, and invaluable advice, which have significantly contributed to the success of my study.

Secondly, I am very grateful to Professor Ehsan for providing me with the extraordinary opportunity to utilize my expertise in AI on the real world com- mercial healthcare challenges, ultimately contributing to the betterment of society.

His steadfast confidence in my abilities and commitment to fostering social improvement have served as unwavering sources of motivation throughout my research in the field of deep learning.

Also, I would like to extend my appreciation to my friends, and colleagues at the University of Auckland. Your camaraderie, stimulating discussions, and shared experiences have enriched my time at the university and made it a memo- rable and enjoyable experience. I am truly fortunate to have been surrounded by such a supportive and inspiring academic community.

Lastly, I cannot thank my parents enough for their unconditional love, support, and encouragement in all my endeavors. Their unwavering faith in me and the sacrifices they have made to ensure my success have been my guiding light throughout my life. I am eternally grateful for everything they have done for me.

To everyone who has played a part in my academic journey, thank you from the bottom of my heart.

iii

(5)

(6)

List of Figures

2.1 A diagram illustrating the various categories of ST data instances that can be derived from ST data types. Furthermore, the visual highlights the possible representations of each data instance and the popular DL methods employed. . . 35 2.2 An illustration of dilated TCN from [142]. . . 48 2.3 The architecture of DCRNN [79]. The historical time series are fed into

an encoder whose final states are used to initialize the decoder. The decoder makes predictions based on either previous ground truth or the model output . . . 54 2.4 The architecture of STGCN [152]. The STGCN is composed of two

ST-Conv blocks and a fully-connected output layer at the end. Each ST-Conv block encompasses two temporal gated convolution layers and a single spatial graph convolution layer situated in the middle. . 55 2.5 The architecture of Graph WaveNet [142]. Several spatio-temporal

layers are successively arranged on the left, with an output layer situated on the right. A pair of gated temporal convolution modules (Gated TCN) operate sequentially to extract temporal characteristics, succeeded by a graph convolutional layer (GCN) devised to discern spatial features. . . 57 2.6 The architecture of ASTGNN [54]. Successive temporal trend-aware

self-attention blocks and spatial dynamic GCN blocks are alternatively combined in both the encoder and decoder. . . 59 2.7 The architecture of LSTPM [116] for Next POI Recommendation . . . 61 2.8 The architecture of STAN [86] . . . 62

viii

(10)

List of Figures ix

3.1 An illustration of spacetime interval. The figure shows three snapshots of the network. The target node’s state att₃is strongly influenced by the state ofaatt₁and the state ofbatt₂, but only weakly influenced by the state ofaatt₂and the state ofbatt₁. . . 66 3.2 The architecture of STNN with an example local-spacetime constructed

from the input data. STNN consists of k spacetime modules (ST- Modules) and a fully-connected output layer. Each ST-Module contains a spacetime attention block (ST-Attn block) and a spacetime convolution block (ST-Conv block). The ST-Attn block uses self-attention mechanism to spotlight the most contributive traffic events. In each ST-Conv block, three different convolution kernels are employed to aggregate the spatio-temporal correlations in different perspectives.

Then, the extracted features are stacked, and condensed by the 1×1 convolution. . . 75 3.3 Spatial kernel, temporal kernel and spacetime kernel on the sub-

spacetime . . . 77 3.4 Simulated road network illustration . . . 79 3.5 Case study . . . 85 4.1 Two trajectories (red and blue) of two users in different days share the

same fragment (restaurant to theater) in Manhattan, NYC . . . 90 4.2 An overview of the proposed GETNext model . . . 95 4.3 Average hourly check-in frequency of two POI categories (“train sta-

tion” and “bar”) in NYC dataset . . . 100 4.4 Partial trajectory flow map of NYC (directions and edge weights are

removed for better visualization) . . . 111 5.1 Difference between imputation and continuous reconstruction. Impu-

tation aims to replace the missing value (shaded circle) in the given dataset with a rigid data shape. Continuous reconstruction tries to learn the underlying continuous temporal pattern and be able to infer data for any timestamp (arrow). . . 114

(11)

x List of Figures

5.2 Model architecture overview. The proposedSTRLmodel consists of three parts: an encoder, representation construction, and a decoder. The encoder captures spatiotemporal correlations and learns embeddings for spacetime events. The representation of a given timestamp is then constructed from its neighboring time steps. Finally, the representation is fed into the decoder, which projects it back into the feature space. . 119 5.3 Continuous reconstruction of PM2.5 readings for sensor 0 in AQI-Seoul

dataset from time steps 10350 to 10365. . . 133 5.4 Continuous reconstruction of traffic condition for sensor 0 in PeMS-

Bay dataset from time steps 1400 to 1445. . . 133

(12)

List of Tables

1.1 Common deep learning techniques for spatio-temporal data mining

regarding to traffic forecasting . . . 11

3.1 Existing methods fall into the same paradigm . . . 70

3.2 Table of Notations . . . 70

3.3 Statistics of datasets . . . 78

3.4 Overall performance of short-term (15 mins), mid-term (30 mins) and long-term (60 mins) traffic forecasting. . . 82

3.5 Performance of traffic volume prediction on the simulated dynamic network. We highlight sensor 7,8,9 as they are most affected by the changing network topology. . . 84

3.6 Ablation study . . . 86

4.1 Statistics of dataset . . . 104

4.2 Performance comparison in Acc@k and MRR on three datasets . . . . 107

4.3 Cold Start (due to inactive users) performance on NYC . . . 109

4.4 Cold Start (due to short trajectory) performance on NYC . . . 110

4.5 Performance of proposed model without trajectory flow map on NYC 110 4.6 Ablation study: Comparing the full model with 6 variants . . . 112

5.1 Statistics of datasets . . . 129

5.2 Performance comparison under different number of missing steps on three datasets . . . 131

xi

(13)

(14)

Chapter 1

Introduction

1.1 Background

1.1.1 Spatio-temporal Data

Spatio-temporal data, a multidimensional dataset characterized by its spatial and temporal attributes, has become a critical component in a wide range of scientific disciplines and practical applications [34, 46, 110]. These datasets consist of observations or measurements recorded at specific spatial locations and time instances, providing a comprehensive representation of phenomena unfolding in both space and time [3]. The fusion of spatial and temporal dimensions in such data enables researchers and practitioners to explore and understand complex patterns, relationships, and dynamics that may not be discernible when either the spatial or temporal aspect is examined in isolation [154]. As a result, the study and analysis of spatio-temporal data have rapidly evolved, expanding the horizons of knowledge in fields as diverse as geography [45], environmental science [122, 75], climate science [73, 44], epidemiology [17, 52], urban planning [10], transportation [23, 90], and social sciences [119, 21], to name just a few.

The concept of spatio-temporal data can be traced back to the earliest car- tographic endeavors and timekeeping practices, which sought to record and represent various phenomena on the Earth’s surface and their temporal variations [101]. The development of advanced data collection and storage technologies, such as remote sensing, global positioning systems (GPS), and the Internet of Things (IoT), has led to an exponential growth in the volume and complexity of spatio-temporal data [1]. This data deluge has created new opportunities and challenges for researchers, necessitating the development of novel computational

1

(15)

2 Introduction

methods and tools to efficiently analyze this unique type of data and capture the interdependence between spatial and temporal dimensions.

A concrete example of spatio-temporal data in action can be observed in the context of traffic dynamics, where road-side sensors play a crucial role in monitoring and managing transportation systems. There are different types of road-side traffic sensors including inductive loop detectors, radar sensors, in- frared sensors and video cameras [13]. These sensors installed along roads or highways that measure and continuously record measurements such as vehicle speed, volume, and associated timestamps, generating a rich spatio-temporal dataset that captures the intricate patterns and interactions occurring within the transportation network [8]. By analyzing this spatio-temporal data, researchers can identify recurring congestion hotspots, evaluate the impact of infrastructure changes or traffic management strategies, and, furthermore, forecast future traffic conditions based on historical trends and real-time updates [79, 152, 54].

Predicting future traffic conditions is of paramount importance in managing urban transportation systems and enhancing their efficiency, safety, and sustainability. Accurate traffic forecasts enable transportation planners and engineers to proactively identify and mitigate potential congestion points, optimize traffic signal timings, and enhance the overall performance of the transportation network [130]. Furthermore, real-time traffic predictions provide valuable information for commuters, allowing them to make informed decisions on route planning and travel mode choices, thereby reducing travel time and fuel consumption [30]. Additionally, reliable traffic forecasts facilitate the development of innovative transportation solutions, such as smart traffic management systems and autonomous vehicles, which rely on accurate predictions of traffic conditions to optimize their performance. In other words, the utilization of historical spatio-temporal sensor data for traffic prediction plays a crucial role in promoting sustainable urban mobility and improving the quality of life in modern cities.

Another example is in the realm of climate science, where spatio-temporal data plays a pivotal role in enhancing our understanding of the Earth’s atmospheric processes and the underlying factors influencing global climate change. Weather sensors, strategically distributed across the globe, continuously record a multitude of weather conditions, such as temperature, humidity, wind, and precipitation, generating a complex spatio-temporal dataset that encapsulates the dynamic nature of the Earth’s climate [73]. By harnessing the power of this data, researchers can analyze historical climate patterns, discern trends, and develop predictive

(16)

Background 3

models to project future climate scenarios based on various environmental and anthropogenic factors [51]. This holds significant importance in various aspects of human life and environmental management. Accurate and timely weather forecasts facilitate informed decision-making across a multitude of sectors, including agriculture, aviation, energy management, and emergency response planning [89]. For instance, farmers can optimize crop yields and reduce the risk of crop failure by adjusting their planting, irrigation, and harvesting schedules based on anticipated weather patterns [60]. In aviation, accurate weather predictions can help minimize risks associated with adverse weather conditions, ensuring safer air travel and reducing operational costs [137]. Furthermore, energy utilities can use weather forecasts to better match energy supply with demand, improving the efficiency of power generation and distribution systems [63]. Overall, leveraging historical spatio-temporal data for weather prediction plays a crucial role in enhancing societal resilience and promoting sustainable development across diverse sectors.

1.1.2 Graph

Graph data, a fundamental data structure in computer science, has been extensively utilized to represent, model, and analyze a wide range of complex systems across various scientific disciplines and practical applications. At the core of graph data are two primary components: nodes and edges. Nodes, also referred to as ver- tices, represent discrete entities or objects in the system, while edges, sometimes called links or connections, signify relationships, interactions, or dependencies between these entities. This subsection aims to provide a brief introduction to graph data, focusing on the structural characteristics, properties, and potential applications of nodes and edges in the context of diverse research fields.

Nodes serve as the building blocks of a graph, embodying the individual components that constitute the system under study. In different contexts, nodes can represent a broad array of entities, such as individuals in a social network, web pages in the World Wide Web, genes in a biological network, or traffic sensors in a transportation network [93]. The attributes or properties of nodes, often referred to as node features, can encapsulate various quantitative or qualitative characteristics of the entities they represent, such as demographic information, geographical coordinates, or functional annotations. The study of nodes in a graph can reveal crucial insights into the system’s structure, organization, and

(17)

4 Introduction

dynamics, enabling researchers to identify central or influential nodes, detect communities or clusters, and uncover patterns of connectivity and hierarchy [15].

Edges, on the other hand, represent the connections or relationships between nodes, capturing the interactions or dependencies that drive the behavior and evolution of the system. Edges can be directed or undirected, depending on the nature of the relationships they represent. Directed edges have an inherent directionality, indicating a one-way relationship between two nodes. Undirected edges, conversely, imply a mutual or bidirectional relationships. Edges may also be weighted or unweighted, with weights encoding the strength, intensity, or cost associated with the connections, such as distances between locations, capacities of transportation links, or similarity scores between entities [16]. The analysis of edge properties in a graph can lead to a deeper understanding of the system’s connectivity patterns, robustness, and dynamics, allowing researchers to examine the role of edge weights in the emergence of global and local structures, study the distribution and correlation of edge weights, and explore the effects of edge addition or removal on the system’s stability and resilience.

Graph data’s inherent flexibility in representing complex systems with diverse characteristics has led to its widespread adoption across numerous research domains. For instance, in social network analysis, nodes can represent individuals, organizations, or groups, while edges signify various types of social relationships, such as friendships, collaborations, or communication links [133]. In biological and ecological networks, nodes can stand for genes, proteins, or species, while edges depict functional interactions, regulatory relationships, or predator-prey dynamics [97]. In transportation and logistics, nodes can correspond to traffic sensors or facilities, such as airports, bus stops, or warehouses, and edges can represent transportation links or routes, with associated attributes such as distances, travel times, or capacities [25]. Graph-based methods can be applied to optimize routing, scheduling, and resource allocation, ultimately enhancing the efficiency and sustainability of transportation systems.

In recent years, graph data has been increasingly employed in the context of data mining, machine learning, and artificial intelligence, leading to the development of novel techniques and frameworks for graph-based learning and pattern recognition, such as graph neural networks, graph embedding, and graph- based clustering algorithms [166, 140, 173]. These advancements have further broadened the scope and applicability of graph data, enabling researchers and practitioners to address complex, large-scale, and dynamic problems with high accuracy and efficiency.

(18)

Background 5

1.1.3 Graph Structure in Spatio-temporal Data

As discussed, graph data structure, characterized by its versatile representation of nodes and edges, plays a pivotal role in modern data-driven research and applications, including spatio-temporal data mining¹.

Spatio-temporal data and graph data are intrinsically connected in various ways, particularly when representing and analyzing complex systems with spatial and temporal dimensions. Graph can efficiently represent the spatial attributes of spatio-temporal data, while also capturing the relationships and interactions between spatial entities. This enables researchers to exploit the rich body of graph theory and network analysis techniques to investigate the structure, dynamics, and properties of the underlying spatio-temporal system. For instance, in transportation networks, nodes can represent sensors or transit stops, while edges signify transit connections, with the associated spatio-temporal attributes such as travel times or traffic volumes being attached to the edges [152]. In such cases, graph-based methods can be employed to identify the spatial relationships and even aggregate the spatio-temporal correlations. Similarly, in social networks or epidemiological studies, graph data can be used to model the spatial distribution of individuals, their connections, and the spatio-temporal dynamics of information or disease spread. The synthesis of spatio-temporal data with graph representation thus offers a powerful and versatile framework for analyzing and understanding a wide range of complex systems.

Furthermore, what is even more captivating is the adoption of graph representation in spatio-temporal data has opened up new avenues for leveraging state-of-the-art deep learning techniques, specificallygraph neural networks (GNNs), to effectively capture and model complex spatial features embedded in the data.

GNNs, a class of deep learning methods designed to operate directly on graphs, have emerged as powerful tools for learning meaningful representations and patterns in graph-structured data [173, 140]. By representing the spatial aspect of spatio-temporal data as graphs, researchers can leverage the capabilities of GNNs to automatically learn spatial features and dependencies, thereby enabling the development of more accurate and efficient predictive models for various spatio- temporal tasks, such as forecasting, anomaly detection, and pattern recognition [79]. The ability of GNNs to model non-Euclidean data structures, coupled with their capacity to exploit both local and global information in the graph, offers

1We use spatio-temporal data mining and spatio-temporal data analysis interchangeably in this thesis, despite minor distinctions between the two.

(19)

6 Introduction

a significant advantage in capturing the intricacies of spatial relationships and dynamics inherent in spatio-temporal data. Consequently, the synergy between graph representation and graph neural networks is driving advancements in the analysis and understanding of spatio-temporal phenomena across diverse disciplines and applications.

1.1.4 Deep Learning Methods

The predominant methodologies used in the research area of spatio-temporal data mining have been completely transformed by the booming of deep learning. In the past, traditional statistical and machine learning approaches were primarily employed for the analysis of spatio-temporal data, such as linear regression, time series models, vector autoregression, autoregressive integrated moving average, support vector machine, and clustering techniques [34]. However, with the advent of deep learning and its impressive performance in various domains, such as computer vision, natural language processing, and speech recognition, researchers have increasingly shifted their focus towards adopting and adapting deep learning methods for spatio-temporal data analysis [3, 131, 55, 1].

Deep learning, a subfield of artificial intelligence (AI) and machine learning, has emerged as one of the most significant advancements in recent years. Deep learning is a collection of algorithms and techniques that employ deep artificial neural networks (ANNs) with multiple hidden layers to learn hierarchical representations of data. It has its roots in the perceptron, a simple linear classifier proposed by Frank Rosenblatt in the late 1950s [102]. In the 1980s, researchers developed the backpropagation algorithm, which enabled the efficient training of multi-layered neural networks [104]. However, early efforts in training deep neural networks were plagued by several challenges, such as the vanishing gradi- ent problem, overfitting, and most important, the limited computational resource available. It was not until the 2012 that the introduction of AlexNet stunned the computer vision community by achieving a significant improvement in the ImageNet classification challenge, far surpassing previous traditional computer vision methods. Since then, deep learning has rapidly evolved, leading to the development of more sophisticated architectures, such as convolutional neural networks, recurrent neural networks, long short-term memory networks, and transformers. These architectures have been successfully applied to various tasks, achieving state-of-the-art performance in many research fields.

(20)

Background 7

One of the primary factors that have contributed to the rapid adoption of deep learning in spatio-temporal data mining is its exceptional representation power and ability to automatically learn complex features from raw data. Un- like traditional methods, which often rely on manual feature engineering and pre-defined assumptions about data distributions, deep learning models have the remarkable ability to learn intricate, high-level features and representations directly from raw data [78]. This process of automatic feature learning enables deep learning models to capture complex patterns and relationships in the data that may be difficult or even impossible for humans to engineer manually. As a result, deep learning models are often more flexible and adaptive to a wide range of problems, exhibiting superior performance in many tasks. Furthermore, deep learning models are capable of handling large-scale, high-dimensional, and noisy datasets, which are prevalent in many real-world applications. Consequently, the representation power of deep learning models has revolutionized the way we analyze data, leading to more accurate and robust models that can uncover previously hidden insights and patterns in complicated spatio-temporal data.

Recurrent neural networks (RNNs)and their variants, such as long short-term memory (LSTM) networks and gated recurrent units (GRUs), have emerged as one of the most widely-used deep learning architectures in the analysis of spatio- temporal data due to their ability to model sequential data and capture long-term dependencies [62, 32]. By modeling the temporal aspect of spatio-temporal data, RNNs can effectively capture dynamic patterns and trends, which is particularly useful in forecasting and prediction tasks, such as weather forecasting, traffic flow prediction, and disease spread modeling [170].

Convolutional neural networks (CNNs)served as one of the most widely-used deep learning architectures for the analysis of image-based data, have also gained popularity in spatio-temporal data mining. In some cases, the spatial features can be transformed into an image-like format to utilize CNNs for analysis. This is particularly useful when the spatial features in the data exhibit complex patterns that may benefit from the spatial processing capabilities of CNNs. Consider, for instance, a dataset comprising weather patterns across a certain timeframe, encompassing both spatial information (i.e., latitude and longitude) and temporal information (such as time of day, month, or year). One can transform this data into a grid-like arrangement, where each cell corresponds to a distinct region and encapsulates pertinent weather-related parameters (such as temperature, humidity, and wind speed). By conceptualizing these grids as images, with each

(21)

8 Introduction

cell acting as a pixel and every weather parameter serving as a channel, one can manipulate them accordingly [112]. Furthermore, the introduction of 3D convolutions and recurrent neural network (RNN) layers in CNN architectures has enabled the incorporation of temporal information, allowing for the joint analysis of spatial and temporal features in spatio-temporal data [66, 148, 147].

Graph neural networks (GNNs), another major development in spatio-temporal data mining driven by deep learning, which have been discussed earlier. GNNs offer a powerful way to model and analyze graph-structured data, such as spatial networks, and can be particularly useful in capturing complex spatial dependencies and relationships in spatio-temporal data [166]. By combining GNNs with other deep learning architectures, such as CNNs or RNNs, researchers can develop advanced hybrid models that can effectively analyze both spatial and temporal dimensions of spatio-temporal data [152, 37, 165, 141].

The attention mechanism (Attns), has evolved into the preeminent instrument within the realm of natural language processing, concurrently manifesting as a formidable apparatus in a multitude of other deep learning domains, encompassing computer vision and spatio-temporal data mining [127]. The attention mechanism enable models to focus on specific spatial and temporal features that are most relevant for the task at hand. In the context of spatio-temporal data mining, attention mechanisms can be incorporated into various deep learning architectures, such as convolutional neural networks, recurrent neural networks, and graph neural networks, to enhance their performance in capturing and processing the complex interactions and dependencies among spatial and temporal components of the data [171]. By assigning different weights to spatial and temporal features, attention-based models can adaptively focus on the most informative parts of the input data, leading to improved interpretability and predictive accuracy in tasks such as traffic forecasting, climate modeling, and event detection.

Furthermore, attention mechanisms can help mitigate the challenges posed by large-scale, high-dimensional, and noisy spatio-temporal datasets by allowing the models to selectively attend to the most relevant information while filtering out noise and irrelevant details. As a result, attention-based deep learning models have become increasingly popular in spatio-temporal data mining, opening up new possibilities for more accurate and robust analyses in a wide range of applications.

In summary, the booming of deep learning has revolutionized the research area of spatio-temporal data mining by introducing powerful and flexible modeling techniques, such as CNNs, RNNs, GNNs and attention mechanism, which

(22)

Background 9

can automatically learn complex spatial and temporal patterns from raw data.

The integration of deep learning methods in spatio-temporal data mining has led to significant advancements in various domains, including environmental science, transportation, urban planning, and epidemiology, among others. Furthermore, the development of interpretability techniques, along with efficient and scalable computing solutions, has enabled researchers to better understand and harness the potential of deep learning models for spatio-temporal data mining.

1.1.5 Spatio-temporal Data Mining with Deep Learning

Spatial-temporal data mining is the process of extracting valuable information, patterns, and relationships from spatio-temporal data [131]. The rapid increase in spatial-temporal data generated by diverse sources, including GPS, remote sensing, and social media, has fueled the growth of this research area. A key aspect of spatio-temporal data mining is the identification and quantification of spatial and temporal dependencies, which arise due to the interconnectedness and interdependence of the processes governing the observed phenomena. These dependencies can manifest in various forms, such as spatial autocorrelation, where the values of a variable at nearby locations are more similar than those at distant locations, or temporal autocorrelation, where the values of a variable at a given location are more similar across adjacent time periods than across distant time periods. Understanding and modeling these dependencies is crucial for making accurate predictions and inferences about the underlying processes and their future behavior.

We have deliberated upon numerous prevalent deep learning methodologies extensively employed in spatio-temporal data mining. Nevertheless, it is crucial to emphasize that these approaches are customarily integrated, rather than utilized in isolation, to address the multifaceted dimensions of spatio-temporal data. This characteristic uniquely distinguishes spatio-temporal data from other data types.

The heterogeneity of spatial and temporal data poses significant challenges in developing a single unified model for spatio-temporal data mining. This complexity arises from the distinct characteristics and relationships that exist within and across the spatial and temporal dimensions, necessitating the use of specialized models and techniques to effectively capture these aspects. Consequently, most studies in this domain employ a combination of deep learning techniques, such as CNNs, GNNs, RNNs, and Transformers, to capture the spatial and temporal features independently and subsequently integrate them for holistic analysis [1].

(23)

10 Introduction

Spatial feature extraction In terms of spatial feature extraction, GNNs and CNNs have demonstrated their proficiency to learn intricate spatial patterns and structures from input data by automatically learning spatially invariant feature representations. These techniques excel at capturing local and global spatial dependencies in data, such as the neighborhood interactions, hierarchical structures, and spatial correlations, which are crucial for understanding the underlying spatial phenomena. Specifically, Graph Convolutional Networks (GCNs) [71], Graph Attention Netowrk (GAT) [128], GraphSAGE [56] are frequently employed to learn spatial representations of spatial aspects encoded in graph structures. These graph-based methods have gained popularity due to their powerful ability to model intricate spatial relationships and interactions present in various real-world datasets. Conversely, CNNs often require the transformation of spatial data into grid image data, a representation that might not always be the most suitable choice for certain types of spatial information.

Temporal feature extraction On the other hand, temporal feature extraction often leverages RNNs and their variants, such as LSTM and GRU networks, which are specifically designed to model and capture temporal dependencies and dynamics in sequential data. By maintaining an internal memory state, these models can effectively learn long-range temporal patterns and dependencies, providing a powerful tool for capturing the temporal aspect of spatio-temporal data. Besides RNN based models, temporal Convolutional Networks (TCNs) [7]

are another powerful and efficient approach for time series data mining, designed specifically to capture and model temporal dependencies within sequential data.

Unlike traditional recurrent architectures, such as LSTMs and GRUs, TCNs utilize a series of dilated convolutional layers to model temporal relationships, enabling the efficient learning of long-range dependencies without the need for recurrent connections. One key aspect of TCNs is the incorporation of gated activation functions, which help regulate the flow of information through the network and facilitate the learning of complex temporal patterns. Lastly, some studies exploit the attention based methods to learn the temporal correlations from the entire sequence [171, 54].

Once the spatial and temporal features are extracted separately, various approaches have been proposed to integrate and fuse these features for comprehensive spatio-temporal analysis. One common approach is the sandwich model design, wherein the spatial and temporal features are combined through a series of layers, such as convolutional or recurrent layers, that successively refine

(24)

Background 11

the fused representation to capture higher-level spatio-temporal patterns and interactions. Alternatively, some studies utilize linear layers or other fusion techniques, such as element-wise addition or concatenation, to combine the spatial and temporal features in a more straightforward manner.

Table 1.1: Common deep learning techniques for spatio-temporal data mining regarding to traffic forecasting

Studies Spatial layers Temporal layers

DCRNN (2017) [79] Diffusion Conv GRU

STGCN (2017) [152] GCN Gated TCN

DMVST-Net (2018) [148] CNN LSTM

STDN (2019) [147] CNN LSTM + Attn

GraphWaveNet (2019) [142] GCN Gated TCN

DSTGCN (2019) [37] GCN TCN

ASTGCN (2019) [53] GCN + Attn TCN + Attn

SLCNN (2020) [165] GCN Gated TCN

GMAN (2020) [171] Spatial Attn Temporal Attn

MTGCN (2020) [141] GCN Gated TCN

ASTGNN (2021) [54] GCN + Attn Temporal Attn

D2STGNN (2022) [109] Diffusion CNN GRU

STEP (2022) [108] GNN Transformer

Here we take the traffic forcasting problem as an example, a crucial research domain in spatio-temporal data mining. Due to the importance of accurate traffic prediction, numerous models have been proposed to address this challenge.

In particular, various deep learning techniques have been employed to handle the spatial and temporal features inherent in traffic data. A summary of some representative studies and their respective spatial and temporal layers is provided in Table 1.1. It showcases a diverse range of deep learning techniques employed for spatio-temporal data mining in the context of traffic forecasting. The listed studies span from 2017 to 2022 and utilize various spatial layers, such as diffusion convolution, GCNs, and CNNs, often enhanced with attention mechanisms to improve model performance. In terms of temporal layers, GRUs, LSTM, TCNs, and attention-based approaches are common choices for modeling the temporal dependencies in traffic data. This overview demonstrates the rapid evolution and innovation in deep learning techniques for spatio-temporal data mining.

The increasing complexity and diversity of these techniques reflect the growing interest and need for accurate and efficient traffic forecasting. As research in this area continues to advance, it is anticipated that novel deep learning architectures

(25)

12 Introduction

and methodologies will further improve the state-of-the-art in spatio-temporal data mining, enabling more effective and reliable traffic forecasting solutions.

In summary, the inherent complexity and heterogeneity of spatio-temporal data necessitate the use of a combination of deep learning techniques to capture the distinct spatial and temporal features independently, followed by their integration through various fusion approaches. These hybrid models provide a flexible and robust framework for tackling the challenges of spatio-temporal data mining, allowing researchers to uncover complex patterns, relationships, and dynamics across a wide range of applications.

1.2 Challenges

Despite the growing interest in spatio-temporal data mining owing to its prospec- tive applications across various domains and the progress in deep learning techniques, the analysis of current spatio-temporal data still presents several challenges, particularly in the pursuit of precise and effective deep learning models.

In this section, we will discuss four key challenges in spatio-temporal data mining:

(1) Transferrability and generalization issue, (2) Scalability issue, (3) Construct efficient graph structures from raw data to augment model efficacy, and (4) Sparse and discrete data sampling issue. We will proceed to introduce each of the four challenges individually in the rest of this section.

1) Transferrability and generalization issue A major challenge in spatio- temporal data mining is the transferrability and generalization of models across different geographic regions. Due to the diverse nature of spatio-temporal data, models trained on one dataset may not necessarily perform well on others, leading to poor generalization [55]. For example, a traffic prediction model trained on data from New York City may not be directly applicable to San Francisco, as the underlying road network, traffic patterns, and other factors may differ significantly between the two cities. Moreover, the lack of labeled data in certain application areas makes it difficult to train and validate models. One approach to address this issue is transfer learning, which involves leveraging the knowledge gained from one domain to improve performance in another, related domain [175]. In the context of spatio-temporal data mining, the term "domain" specifically refers to geographic locations and their unique characteristics. Further on, a even better ideal solution would be to train a single model that functions akin to a

(26)

Challenges 13

Language Model (LM) in the natural language domain, possessing the capability to be effectively utilized across multiple geographic regions and scenarios in a specific spatio-temporal data mining task such as traffic forecasting. Recent work has explored the application of transfer learning to spatio-temporal data mining, with promising results [132]. However, the exploration and discussion of universal models for practical applications in spatio-temporal data mining remains an open research area that warrants further investigation.

2) Scalability issue Scalability is another critical challenge in spatio-temporal data mining, as real-world datasets often involve large-scale networks and time range. Firstly, typical GNN models assume a small fixed network size, limited by computational capability, and may struggle to handle large-scale graphs efficiently [26]. Though, researchers have proposed various strategies to improve the scalability of GNN models, such as sampling techniques, graph partition- ing, and parallelization to address the scalability issue of graph neural networks [56, 157, 29], achieving scalability without sacrificing accuracy remains a major challenge, and further research is needed to handle large-scale spatio-temporal datasets. Secondly, the combination of temporal and spatial features in spatio- temporal data mining exacerbates scalability concerns beyond those posed by large graphs. Temporal Convolutional Networks (TCNs) [7], which offer faster processing compared to RNN-based models, have become a popular choice for handling sequential data in spatio-temporal analysis, and exemplify the trade-off between speed and accuracy. However, TCNs are limited in their ability to efficiently capture complex temporal interactions. As a result, there is an ongoing need to improve the scalability and computational efficiency of current methods to accommodate the ever-increasing volume and complexity of spatio-temporal data.

3) Build efficient graph structures from raw data to augment model efficacy Another challenge in spatio-temporal data mining is the construction of informative graph structures from raw data to enhance the effectiveness of deep learning models. Raw spatio-temporal data often requires preprocessing and transformation into a structured format, such as graphs to enable effective analysis. However, determining the most suitable graph structures and representations that capture the underlying spatial and temporal patterns is a non-trivial task [140]. The most common approach to constructing graphs is to exploit geographical locations. In

(27)

14 Introduction

this method, a graph is built to represent spatial locations, where each node de- notes a single location, and edges represent the distance between two nodes. For instance, in a traffic prediction scenario, the graph would reflect the geographical distribution of sensors, with each node representing a traffic sensor recording features such as speed and volume, and each edge reflecting the travel distance between two sensors along the road network [152]. However, in other applications, multiple ways of constructing graphs may be possible. Considering the next point-of-interest (POI) recommendation as an example, a straightforward graph construction method would involve nodes representing POIs and edges representing the distance between them. Consequently, in a next POI recommendation scenario, the model would select from the nearby neighborhood of the current location, as these locations are more likely to be visited due to spatial convenience.

Alternatively, another possibly more informative graph can be constructed, where nodes still represent POIs, but edges now denote the population-level frequency of transitions between two nodes. In other words, this graph captures the most frequent subsequent visited POIs given the current location and time. Such a graph not only contains geographical information but also reflects general human tendencies, making it superior to a simple geo-based graph. However, to the best of our knowledge, we have not identified any studies that specifically discuss this issue.

4) Sparse and discrete data sampling issue The fourth challenge concerns the issue of sparse and discrete data sampling. Real-world spatio-temporal data is often marked by sparsity, with numerous data points missing or unobserved due to various factors, such as sensor failures, limited sampling intervals, or inherent variability in the underlying processes [33, 120]. In many existing spatio-temporal data mining tasks, temporal data is represented as a sequence of discrete time steps, causing the prediction granularity to be highly dependent on the input data’s sampling interval. For example, if weather sensors record temperature data hourly, it becomes highly unlikely for the model to accurately predict the temperature in the next 30 minutes. Consequently, these models are limited in their ability to infer data at arbitrary time points. This limitation primarily stems from the intrinsic discreteness of neural networks and the data collection process, which is often constrained by practical factors such as energy consumption and storage limitations. As a result, sensors in applications like meteorology, aerography, and environmental monitoring collect data at fixed intervals. Re- constructing the continuous world from observed discrete data is essential for

(28)

Research Questions 15

accurate spatio-temporal data analysis. However, the continuous reconstruction problem, estimating data at any given timestamp within the period of interest, differs from traditional imputation tasks that aim to fill missing data at predefined timestamps. This distinction highlights the need for further research in this area.

1.3 Research Questions

This thesis aims to address the aforementioned four challenges by posing three research questions, which encompass two levels of analysis: macro-level and micro-level.

Themacro-level analysisaims to capture the general, aggregate patterns in spatio-temporal data, which are often individual-agnostic. These general patterns may represent phenomena such as overall traffic flow, shared preferences among users, or large-scale weather systems. By identifying and understanding these macro-level patterns, researchers can gain insights into the underlying processes governing the spatio-temporal data and develop more effective models for prediction and analysis. Data-driven methods often fall into this category. These methods leverage machine learning and statistical techniques to identify patterns and relationships within the data, enabling the discovery of previously unknown insights and the development of more accurate predictive models.

Themicro-level analysisfocuses on the individual entities within the system, such as vehicles in traffic prediction, users in point-of-interest (POI) recommendation, or airflows in weather prediction. These micro-level entities exhibit unique spatio-temporal characteristics, and their movements and interactions drive the observed spatio-temporal patterns in the data. Model-driven methods usually fall into this category. These methods utilize domain-specific knowledge to develop mathematical or simulation models that capture the underlying mechanics of a system, allowing for improved understanding, prediction, and control of individual entity behavior.

It is important to note that our focus in this thesis is not on developing mathematical or simulation models; instead, we concentrate on employing deep learning methods at both macro and micro levels in spatio-temporal data mining.

The interplay between micro-level and macro-level analysis is crucial for attaining a comprehensive understanding of the data. The movements and interactions of micro-level entities contribute to the emergence of macro-level patterns observed in the data. Concurrently, these macro-level patterns can offer valuable context and guidance for interpreting and predicting the behavior of micro-level entities.

(29)

16 Introduction

To address the first challenge of transferability and generalization issue, macro- level analysis is particularly advantageous, as individual entities may exhibit significant variation, whereas aggregate patterns tend to be more stable across diverse spatial or temporal scenarios. For example, in traffic forecasting, driving habits may differ from person to person; however, the underlying rules governing traffic patterns are likely to be consistent. For instance, despite the diverse driving habits, congestion around rush hours is expected to disperse along the incoming road direction. The critical question is how can we capture these macro patterns that appear to possess spatio-temporal translation invariance properties. Notably, certain local patterns are common across different geographic regions, such as edges and corners in almost all images.

The challenge lies in the fact that while edges and corners in images can be easily captured through human feature engineering, identifying spatio-temporal features is more complex. If a method can be developed to learn integrated spatio- temporal features with translation invariance, the issues of transferability and generalization can be effectively resolved. Furthermore, by designing models that focus on smaller areas at a time and can be run in parallel, scalability issues can also be addressed as a bonus. Thus, a more refined approach to capturing macro patterns could lead to the development of robust and scalable spatio- temporal data mining techniques, overcoming the challenges of transferability, generalization, and scalability. Therefore, we present our first research question (RQ):

RQ 1. Can we develop a macro-level approach to learn integrated spatio-temporal features with translation invariance that is applicable across different geographic regions and time periods, thereby addressing the transferability, generalization, and scalability issues in spatio-temporal data mining?

The potential solutions ofRQ 1 offered by macro-level approaches would solve the challenges related to transferability, generalization, and scalability in spatio-temporal data mining. These approaches can effectively learn integrated spatio-temporal features with translation invariance, thereby addressing the key issues identified earlier. As we progress, our focus shifts towards leveraging macro-level information for constructing efficient graph structures and enhancing micro-level model predictions in spatio-temporal data mining. In particular, we

(30)

Research Questions 17

concentrate on next point-of-interest (POI) recommendation tasks, where micro- level data consists of the user’s visited sequence.

Using the next point-of-interest (POI) recommendation as an illustrative example, a basic approach to building a graph would have nodes symbolizing POIs and edges signifying the distances between them. On the other hand, a potentially more insightful graph could be created where nodes continue to represent POIs, but edges now indicate the frequency of population-level transitions between pairs of nodes. Essentially, this graph encapsulates the most common subsequent POIs visited, given a specific location and time. This type of graph encompasses not only geographical information but also captures general human behavior patterns, rendering it more advantageous than a mere geospatial-based graph.

In this context, our next research question seeks to explore the use of macro- level information to develop graph structures that can significantly augment model efficacy. Moreover, this part of the study also aims to propose a novel framework that seamlessly integrates macro-level information and micro-level predictions. By doing so, we expect to address not only the challenge of constructing efficient graph structures but also contribute to the broader goal of improving the overall performance and applicability of spatio-temporal data mining models.

As such, we propose the next research question:

RQ 2. How can we leverage macro-level information to construct an efficient graph structure and use it to enhance micro-level predictions in spatio-temporal data mining?

InRQ 1andRQ 2, we discussed data mining applications at both macro-level and micro-level, focusing on specific application scenarios, such as predicting traffic conditions like average speed or generating a list of recommended POIs for users. In these cases, we were primarily concerned with leveraging embeddings for direct application purposes, rather than obtaining representations for individual nodes in the graph at a given time. However, as we shift our attention to the final challenge concerning sparsity and discrete data sampling, our objective transitions from a specific prediction problem to a genuine representation learning problem.

Representation learning seeks to uncover meaningful and compact representations of raw data, which can enable efficient learning, generalization, and interpretability across various tasks. In this context, our goal is to learn representations for each spatio-temporal unit (i.e., sensor measurements at a specific

(31)

18 Introduction

location and timestamp) and utilize these representations to capture the underlying continuous data distribution based on observed data points. This can be likened to estimating the surface of a 3D space from discrete points. To tackle the sparsity and discrete data sampling challenge and enhance spatio-temporal data analysis, it is essential to reconstruct continuous representations from observed discrete data. With this in mind, we propose the third research question:

RQ 3. How to combine macro and micro-level information to reconstruct the underlying continuous distribution from a discretely sampled dataset, addressing the sparse data sampling issue in spatio-temporal data mining?

By investigating this research question, our objective is to devise innovative methodologies that harness the power of both macro and micro-level information.

In doing so, we strive to surmount the limitations inherent in data collection processes and existing spatio-temporal data mining techniques, ultimately leading to more robust and accurate analysis by using the learned representations.

1.4 Thesis Outline

In this thesis, our primary objective is to address the three research questions proposed earlier. The thesis is organized into four main parts, each dedicated to exploring various aspects of the research questions.

Chapter 2 presents an in-depth and extensive survey on current spatio- temporal graph representation learning methods. In this chapter, we system- atically introduce the research status of spatio-temporal data mining and discuss the latest advancements in the rapidly evolving field of graph neural networks, since the graph neural networks are increasingly becoming the predominant tool for extracting spatial relationships.

Chapter 3 is devoted to resolvingRQ 1, using traffic forecasting as an illustrative application case to demonstrate the practical implications of our research.

In this chapter, we delve into the investigation and design of a novel framework that captures the universal spatio-temporal patterns present in traffic datasets across various regions. By harnessing these universal traffic patterns, we can significantly enhance the generalization ability and scalability of proposed traffic prediction model, ultimately improving the real-world applicability.

(32)

Thesis Outline 19

Chapter 4 shifts focus toRQ 2, exploring an alternative application in the form of next POI recommendation. In this realm, micro-level information has been the primary focus of researchers, with macro-level graphs often overlooked.

We address this gap by proposing a trajectory flow map that is based on population movement trends, and demonstrate how it can be effectively leveraged in individual-level next POI recommendation tasks, showcasing its potential for enhancing the overall performance of the recommendation system.

In Chapter 5, we aim to tackleRQ 3by proposing a new research problem calledContinuous Spatio-Temporal Data Reconstruction. In this chapter, we propose a groundbreaking spatio-temporal representation learning framework that seamlessly combines macro-level and micro-level information to reconstruct the underlying continuous data distribution from discretely observed data points.

In the subsequent sections, we delve into the topics and present the results obtained in each chapter of the thesis, providing a comprehensive understanding of our research contributions.

Chapter 2. A Survey of Spatio-temporal Graph Representation Learning Before delving into the research forefront, it is essential to gain an overview of the relevant research fields. In this context, our primary focus lies onSpatio-temporal Data Mining (STDM)andGraph Representation Learning (GRL).

Spatio-temporal data miningis a burgeoning research field that focuses on the extraction of valuable patterns, relationships, and knowledge from vast amounts of data that contain both spatial and temporal components. This interdisciplinary area of study brings together principles from diverse domains, such as geography, computer science, data mining, machine learning, and statistics, to address complex problems that arise in various real-world scenarios, including transportation, urban planning, environmental monitoring, and public health. The importance of spatio-temporal data mining has grown exponentially in recent years, largely due to the proliferation of location-aware devices, remote sensing technologies, and the increasing availability of geospatial data. By developing and applying advanced analytical techniques to spatio-temporal data, researchers aim to uncover hidden patterns and insights that can not only improve decision-making processes but also facilitate a deeper understanding of the intricate relationships between space and time.

Meanwhile, graph representation learningis an emerging research field that focuses on developing methods to learn compact, expressive, and informative

(33)

20 Introduction

embeddings of graph-structured data. As a subfield of machine learning, graph representation learning aims to encode the topological structure, node features, and relationships within graphs into continuous vector spaces, which can then be utilized for various downstream tasks such as node classification, link prediction, and community detection. This research area has gained significant momentum in recent years, primarily due to the prevalence of complex and interconnected data in diverse domains such as social networks, biological systems, transportation, and recommender systems. Graph representation learning methods, including Graph Neural Networks (GNNs), Graph Convolutional Networks (GCNs), and Graph Attention Networks (GATs), have demonstrated remarkable success in tackling these complex data structures by leveraging their inherent relational information. In the Chapter 2, we will delve into the current state of the art in graph representation learning, review the key methodologies, and discuss the main challenges and future directions in this rapidly advancing field.

There is a strong connection between STDM and GRL, as many spatio- temporal data sets can be naturally modeled as graphs, where nodes represent spatial entities and edges capture the relationships or interactions between them over time. By leveraging GRL techniques, researchers can better analyze the complex and dynamic nature of spatio-temporal data, incorporating both the spatial and temporal dependencies that are vital for accurate analysis and prediction tasks. Furthermore, the integration of STDM and GRL can lead to the development of more robust, scalable, and generalizable models that can be applied across different geographic regions and time periods. In this regard, the synergy between STDM and GRL offers significant potential for advancing the state of the art in both fields and enabling more powerful data-driven solutions for a wide range of real-world problems.

In short, in the Chapter 2, we will explore the current state of the art in spatio-temporal data mining, including spatio-temporal data types, applications, methods and the recent advanced in graph neural networks.

Chapter 3. Space Meets Time: Local Spacetime Neural Network For Traffic Flow Forecasting

In this chapter, we examine the traffic prediction problem and introduce aLocal Space-Time Neural Network (STNN). This innovative approach utilizes spacetime convolution and attention mechanisms to learn the universal spatio-temporal correlations, effectively addressingRQ 1.

(34)

Thesis Outline 21

The growing influence of data-driven technologies in modern transportation systems has led to an increased focus on traffic flow forecasting. Accurate and timely predictions of traffic dynamics can significantly enhance transportation management, alleviate congestion, and improve overall efficiency. Traffic systems, characterized by changing flows in road networks, exhibit salient patterns influenced by various extrinsic factors and intrinsic principles. Accurate traffic flow predictions rely on a model’s ability to capture not only extrinsic features but also the intrinsic, universal patterns that govern traffic flow. Uncovering these patterns and understanding the latent correlations between a location’s current state and its surrounding locations’ past are crucial for developing effective traffic forecasting models.

Recent advancements in neural-based techniques, especially GNNs, have significantly improved traffic feature representation and prediction results compared to earlier statistical methods. However, existing GNN-based traffic forecasting models face three major challenges: (1) their heavy reliance on graph structure limits their applicability to specific road networks, preventing the discovery of intrinsic traffic system properties; (2) the computationally expensive feature aggre- gation operations, such as graph convolutions, impede scalability for large road networks with numerous sensors; and (3) the separate components for spatial and temporal feature extraction in these models assume uniform correlations between locations over time, which may not accurately reflect the dynamic nature of traffic systems.

In order to solve the above challenges andRQ 1, we begin by defining several key concepts that will be useful throughout the thesis. One such concept is the traffic event, which integrates both the spatial and temporal aspects of a traffic measurement collected by a sensor station. Drawing a parallel to the concept of events in physics [22], this concept will prove essential in our efforts to solve the challenges at hand.

Definition 1(Traffic Event). Given a traffic measurement s (e.g., speed) observed at sensor v_i and time t, a traffic event is a tuple consists of the measurement, time, and location, namely,(s,t,v_i).

Accordingly,we can generalize this definition to encompass a broader range of scenarios, not solely limited to traffic prediction.

(35)

22 Introduction

Definition 2(Spacetime Event). Given any sensor measurement m observed at location v_iand time t, a spacetime event is a tuple consists of the measurement, time, and location, namely,(m,t,v_i).

Additionally, the essential concept for this chapter, as well as the entire thesis, is thespacetime interval. Thespacetime intervalbetween two traffic events signifies the degree to which one event impacts the other; a smaller interval indicates a stronger connection between the two traffic events. Within a local-spacetime context, we are primarily concerned with the intervals between traffic events at the target sensor and those at other sensors.

Definition 3(Spacetime Interval). Spacetime interval is the quantified influence of a traffic event imposed on another traffic event regarding to the traffic measurement.

Figure 3.1 provides an illustration of the spacetime interval concept. The figure displays three instances of the network. The state of the target node att₃is significantly affected by the state ofaatt₁and the state ofbatt2. However, it is only mildly influenced by the state ofaatt₂and the state ofbatt₁.

time

target spacetime interval

1

0

impact

a

b

a a

b b

t1 t2 t3

Figure 3.1: An illustration of spacetime interval

Armed with these key concepts, we propose a novel spatio-temporal correlation learning paradigm called Spacetime Interval Learning. This approach fuses spatial and temporal dimensions into a single manifold, referred to as spacetime, and captures correlations as intervals between traffic events. The paradigm extracts traffic data from nearby sensors within a fixed time window, called the local-spacetime context, which allows the model to focus on relevant sensors. Our method correlates nodes at different times within the local-spacetime context, resulting in a model that is universal, independent of graph structure,

(36)

Thesis Outline 23

and applicable to various traffic systems. By shifting the focus from network- level to node-level predictions, our approach facilitates parallel predictions for multiple locations and efficiently captures varying spatial correlations between locations over time. The architecture of the model is shown in Figure 3.2. The STNN compriseskspacetime modules (ST-Modules) and a fully-connected output layer. Each ST-Module contains a spacetime attention block (ST-Attn block) and a spacetime convolution block (ST-Conv block). The ST-Attn block utilizes a self-attention mechanism to emphasize the most influential traffic events. Within each ST-Conv block, three distinct convolution kernels are employed to aggregate the spatio-temporal correlations from various perspectives. Subsequently, the extracted features are stacked and condensed using a 1×1 convolution.

time

Sub-spacetime Input network

ST-Conv

Input:

Output:

...

ST-Atten

Residual Connections

1x1 Conv

ST-Conv Block ST-Attn Block

ST-Conv ST-Atten

1x1 Conv FC

Input: Output:

Residual Connections Residual Connections

ST-Conv ST-Attn

Residual Connections ST-Conv Block ST-Attn Block

ST ModuleST Module

Linear

Conv Conv

condense ST-Conv Block

concat Conv

...

Figure 3.2: The architecture of STNN with an example local-spacetime.

At last, the proposed STNN model was evaluated in various settings to demonstrate its robustness and ability to generalize to unseen traffic networks, unlike existing state-of-the-art models. In the first setting, STNN was trained and tested on the same network, where it surpassed baselines in terms of prediction accuracy. In the second setting, STNN