Spatio-temporal Graph Representation Learning
Song Yang
A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science,
The University of Auckland, Feb 2024.
Abstract
This thesis delves into Spatio-temporal Graph Representation Learning (STGRL), an emerging and specialized subset of Spatio-temporal Data Mining (STDM).
STGRL capitalizes on deep learning methodologies applied to the inherent graph structure of spatio-temporal data, thereby facilitating a deeper understanding and exploitation of spatio-temporal interdependence in various tasks. Our research primarily investigates STGRL, aiming to explore its transferability, generalization, scalability, construction of informative graph structures from raw data, and the sparse sampling issue within several concrete application scenarios. To achieve this, we undertake the following three tasks: (1) Discerning that spatial features, temporal features, and spatio-temporal correlations can be concurrently learned when constructing a spacetime manifold, we propose an innovative spacetime neural network for learning translation-invariant spatio-temporal patterns, serv- ing as a universal traffic model. This approach not only tackles transferability and generalization challenges but also presents a viable solution to enhance scalability.
(2) We examine the graph construction process, illustrating how a more insight- ful and conceptual graph can facilitate next POI recommendations, as opposed to relying solely on less informative geographic graphs. This insight suggests that in numerous STDM tasks, the straightforward geographic graph structure may not be the optimal choice, and alternative, more beneficial graph structures could be employed. (3) Recognizing the often sparse and discrete nature of STDM and STGRL data, we introduce an innovative representation learning framework adept at inferring latent representations, thereby decoding target features for arbitrary timestamps from the local spatio-temporal context. We evaluate these models using multiple real-world datasets, encompassing traffic forecasting, next POI recommendations, weather pattern reconstruction, and air quality recon- struction. Experimental results substantiate the efficacy of these novel algorithms, demonstrating their superior performance compared to existing methods.
i
Acknowledgement
First and foremost, I would like to express my deepest gratitude to my main supervisor, Prof. Jiamou Liu, for his invaluable guidance, unwavering support, and patient mentorship throughout my PhD journey. His expertise, constructive feedback, and enthusiasm have been pivotal in shaping my research and academic growth. I also extend my immensely heartfelt gratitude to my co-supervisor, Dr.
Kaiqi Zhao, for his keen insights, encouragement, and invaluable advice, which have significantly contributed to the success of my study.
Secondly, I am very grateful to Professor Ehsan for providing me with the extraordinary opportunity to utilize my expertise in AI on the real world com- mercial healthcare challenges, ultimately contributing to the betterment of society.
His steadfast confidence in my abilities and commitment to fostering social im- provement have served as unwavering sources of motivation throughout my research in the field of deep learning.
Also, I would like to extend my appreciation to my friends, and colleagues at the University of Auckland. Your camaraderie, stimulating discussions, and shared experiences have enriched my time at the university and made it a memo- rable and enjoyable experience. I am truly fortunate to have been surrounded by such a supportive and inspiring academic community.
Lastly, I cannot thank my parents enough for their unconditional love, support, and encouragement in all my endeavors. Their unwavering faith in me and the sacrifices they have made to ensure my success have been my guiding light throughout my life. I am eternally grateful for everything they have done for me.
To everyone who has played a part in my academic journey, thank you from the bottom of my heart.
iii
Contents
CONTENTS iv
LIST OF FIGURES vii
LIST OF TABLES x
1 INTRODUCTION 1
1.1 Background . . . 1
1.1.1 Spatio-temporal Data . . . 1
1.1.2 Graph . . . 3
1.1.3 Graph Structure in Spatio-temporal Data . . . 5
1.1.4 Deep Learning Methods . . . 6
1.1.5 Spatio-temporal Data Mining with Deep Learning . . . . 9
1.2 Challenges . . . 12
1.3 Research Questions . . . 15
1.4 Thesis Outline . . . 18
1.5 Contributions . . . 29
2 A SURVEY OFSPATIO-TEMPORALDATAMINING 31 2.1 Introduction . . . 31
2.2 Spatio-temporal Data Types and Instances . . . 33
2.2.1 Data Types . . . 33
2.2.2 Data Instances . . . 35
2.2.3 Data Representations . . . 36
2.3 Spatio-temporal Data Mining Applications . . . 37
2.4 Preliminary of Deep Learning Methods in STDM . . . 40
2.4.1 Recurrent Neural Networks and Their Variants . . . 40
2.4.2 Attention and Transformer . . . 43
v
vi Contents
2.4.3 Convolutional Neural Networks . . . 46
2.4.4 Graph Neural Networks . . . 48
2.5 Advanced Spatio-temporal Data Mining Models . . . 52
2.5.1 Traffic Prediction Models . . . 53
2.5.2 Next POI Recommendation Models . . . 60
2.6 Conclusion . . . 63
3 SPACEMEETSTIME: LOCALSPACETIMENEURALNETWORKFORTRAF- FICFLOWFORECASTING 65 3.1 Introduction . . . 65
3.2 Related Works . . . 68
3.3 Problem Formulation . . . 70
3.4 Spacetime Interval Learning . . . 71
3.5 Spacetime Neural Network . . . 75
3.5.1 Spacetime Attention Block . . . 76
3.5.2 Spacetime Convolution Block. . . 77
3.6 Experiments . . . 78
3.6.1 Experimental Setup . . . 78
3.6.2 Evaluation on Real-World Data . . . 81
3.6.3 Evaluation on Simulated Dynamic Network. . . 83
3.6.4 Case Study . . . 84
3.6.5 Ablation Study . . . 86
3.6.6 Complexity Analysis and Scalability . . . 86
3.7 Conclusion . . . 87
4 GETNEXT: TRAJECTORYFLOWMAPENHANCEDTRANSFORMER FOR NEXTPOI RECOMMENDATION 89 4.1 Introduction . . . 89
4.2 Related Works . . . 92
4.2.1 Next POI Recommendation . . . 92
4.2.2 Graphs in Location-based Recommendation . . . 93
4.3 Problem Formulation . . . 94
4.4 Our Approach: GETNext . . . 95
4.4.1 Model Structure Overview . . . 95
4.4.2 Learning with Trajectory Flow Map . . . 96
4.4.3 Contextual Embedding Module . . . 98
4.4.4 Transformer Encoder and MLP Decoders . . . 101
Contents vii
4.5 Experiments . . . 103
4.5.1 Experimental Setup . . . 104
4.5.2 Results . . . 106
4.5.3 Inspecting the Trajectory Flow Map . . . 108
4.5.4 Ablation Study . . . 111
4.6 Conclusion . . . 112
5 REPRESENTATION LEARNING FORCONTINUOUS SPATIO-TEMPORAL DATARECONSTRUCTION 113 5.1 Introduction . . . 113
5.2 Literature Review . . . 116
5.2.1 Spatio-temporal Data Analysis. . . 116
5.2.2 Imputation . . . 117
5.3 Problem Formulation . . . 118
5.4 Our Approach . . . 119
5.4.1 Model Structure Overview . . . 119
5.4.2 Spacetime Encoder . . . 120
5.4.3 Representation Construction . . . 124
5.4.4 Representation Decoder . . . 125
5.4.5 Proposed Training Algorithm . . . 126
5.5 Experiments . . . 127
5.5.1 Experimental Setup . . . 128
5.5.2 Results . . . 130
5.5.3 Flexible Granularity . . . 132
5.6 Conclusion . . . 134
6 CONCLUSION ANDOUTLOOK 137
BIBLIOGRAPHY 141
List of Figures
2.1 A diagram illustrating the various categories of ST data instances that can be derived from ST data types. Furthermore, the visual highlights the possible representations of each data instance and the popular DL methods employed. . . 35 2.2 An illustration of dilated TCN from [142]. . . 48 2.3 The architecture of DCRNN [79]. The historical time series are fed into
an encoder whose final states are used to initialize the decoder. The decoder makes predictions based on either previous ground truth or the model output . . . 54 2.4 The architecture of STGCN [152]. The STGCN is composed of two
ST-Conv blocks and a fully-connected output layer at the end. Each ST-Conv block encompasses two temporal gated convolution layers and a single spatial graph convolution layer situated in the middle. . 55 2.5 The architecture of Graph WaveNet [142]. Several spatio-temporal
layers are successively arranged on the left, with an output layer situated on the right. A pair of gated temporal convolution modules (Gated TCN) operate sequentially to extract temporal characteristics, succeeded by a graph convolutional layer (GCN) devised to discern spatial features. . . 57 2.6 The architecture of ASTGNN [54]. Successive temporal trend-aware
self-attention blocks and spatial dynamic GCN blocks are alternatively combined in both the encoder and decoder. . . 59 2.7 The architecture of LSTPM [116] for Next POI Recommendation . . . 61 2.8 The architecture of STAN [86] . . . 62
viii
List of Figures ix
3.1 An illustration of spacetime interval. The figure shows three snapshots of the network. The target node’s state att3is strongly influenced by the state ofaatt1and the state ofbatt2, but only weakly influenced by the state ofaatt2and the state ofbatt1. . . 66 3.2 The architecture of STNN with an example local-spacetime constructed
from the input data. STNN consists of k spacetime modules (ST- Modules) and a fully-connected output layer. Each ST-Module contains a spacetime attention block (ST-Attn block) and a spacetime convo- lution block (ST-Conv block). The ST-Attn block uses self-attention mechanism to spotlight the most contributive traffic events. In each ST-Conv block, three different convolution kernels are employed to aggregate the spatio-temporal correlations in different perspectives.
Then, the extracted features are stacked, and condensed by the 1×1 convolution. . . 75 3.3 Spatial kernel, temporal kernel and spacetime kernel on the sub-
spacetime . . . 77 3.4 Simulated road network illustration . . . 79 3.5 Case study . . . 85 4.1 Two trajectories (red and blue) of two users in different days share the
same fragment (restaurant to theater) in Manhattan, NYC . . . 90 4.2 An overview of the proposed GETNext model . . . 95 4.3 Average hourly check-in frequency of two POI categories (“train sta-
tion” and “bar”) in NYC dataset . . . 100 4.4 Partial trajectory flow map of NYC (directions and edge weights are
removed for better visualization) . . . 111 5.1 Difference between imputation and continuous reconstruction. Impu-
tation aims to replace the missing value (shaded circle) in the given dataset with a rigid data shape. Continuous reconstruction tries to learn the underlying continuous temporal pattern and be able to infer data for any timestamp (arrow). . . 114
x List of Figures
5.2 Model architecture overview. The proposedSTRLmodel consists of three parts: an encoder, representation construction, and a decoder. The encoder captures spatiotemporal correlations and learns embeddings for spacetime events. The representation of a given timestamp is then constructed from its neighboring time steps. Finally, the representation is fed into the decoder, which projects it back into the feature space. . 119 5.3 Continuous reconstruction of PM2.5 readings for sensor 0 in AQI-Seoul
dataset from time steps 10350 to 10365. . . 133 5.4 Continuous reconstruction of traffic condition for sensor 0 in PeMS-
Bay dataset from time steps 1400 to 1445. . . 133
List of Tables
1.1 Common deep learning techniques for spatio-temporal data mining
regarding to traffic forecasting . . . 11
3.1 Existing methods fall into the same paradigm . . . 70
3.2 Table of Notations . . . 70
3.3 Statistics of datasets . . . 78
3.4 Overall performance of short-term (15 mins), mid-term (30 mins) and long-term (60 mins) traffic forecasting. . . 82
3.5 Performance of traffic volume prediction on the simulated dynamic network. We highlight sensor 7,8,9 as they are most affected by the changing network topology. . . 84
3.6 Ablation study . . . 86
4.1 Statistics of dataset . . . 104
4.2 Performance comparison in Acc@k and MRR on three datasets . . . . 107
4.3 Cold Start (due to inactive users) performance on NYC . . . 109
4.4 Cold Start (due to short trajectory) performance on NYC . . . 110
4.5 Performance of proposed model without trajectory flow map on NYC 110 4.6 Ablation study: Comparing the full model with 6 variants . . . 112
5.1 Statistics of datasets . . . 129
5.2 Performance comparison under different number of missing steps on three datasets . . . 131
xi
Chapter 1
Introduction
1.1 Background
1.1.1 Spatio-temporal Data
Spatio-temporal data, a multidimensional dataset characterized by its spatial and temporal attributes, has become a critical component in a wide range of scientific disciplines and practical applications [34, 46, 110]. These datasets consist of observations or measurements recorded at specific spatial locations and time instances, providing a comprehensive representation of phenomena unfolding in both space and time [3]. The fusion of spatial and temporal dimensions in such data enables researchers and practitioners to explore and understand complex patterns, relationships, and dynamics that may not be discernible when either the spatial or temporal aspect is examined in isolation [154]. As a result, the study and analysis of spatio-temporal data have rapidly evolved, expanding the horizons of knowledge in fields as diverse as geography [45], environmental science [122, 75], climate science [73, 44], epidemiology [17, 52], urban planning [10], transportation [23, 90], and social sciences [119, 21], to name just a few.
The concept of spatio-temporal data can be traced back to the earliest car- tographic endeavors and timekeeping practices, which sought to record and represent various phenomena on the Earth’s surface and their temporal variations [101]. The development of advanced data collection and storage technologies, such as remote sensing, global positioning systems (GPS), and the Internet of Things (IoT), has led to an exponential growth in the volume and complexity of spatio-temporal data [1]. This data deluge has created new opportunities and challenges for researchers, necessitating the development of novel computational
1
2 Introduction
methods and tools to efficiently analyze this unique type of data and capture the interdependence between spatial and temporal dimensions.
A concrete example of spatio-temporal data in action can be observed in the context of traffic dynamics, where road-side sensors play a crucial role in monitoring and managing transportation systems. There are different types of road-side traffic sensors including inductive loop detectors, radar sensors, in- frared sensors and video cameras [13]. These sensors installed along roads or highways that measure and continuously record measurements such as vehicle speed, volume, and associated timestamps, generating a rich spatio-temporal dataset that captures the intricate patterns and interactions occurring within the transportation network [8]. By analyzing this spatio-temporal data, researchers can identify recurring congestion hotspots, evaluate the impact of infrastructure changes or traffic management strategies, and, furthermore, forecast future traffic conditions based on historical trends and real-time updates [79, 152, 54].
Predicting future traffic conditions is of paramount importance in managing urban transportation systems and enhancing their efficiency, safety, and sustain- ability. Accurate traffic forecasts enable transportation planners and engineers to proactively identify and mitigate potential congestion points, optimize traf- fic signal timings, and enhance the overall performance of the transportation network [130]. Furthermore, real-time traffic predictions provide valuable in- formation for commuters, allowing them to make informed decisions on route planning and travel mode choices, thereby reducing travel time and fuel con- sumption [30]. Additionally, reliable traffic forecasts facilitate the development of innovative transportation solutions, such as smart traffic management systems and autonomous vehicles, which rely on accurate predictions of traffic condi- tions to optimize their performance. In other words, the utilization of historical spatio-temporal sensor data for traffic prediction plays a crucial role in promoting sustainable urban mobility and improving the quality of life in modern cities.
Another example is in the realm of climate science, where spatio-temporal data plays a pivotal role in enhancing our understanding of the Earth’s atmospheric processes and the underlying factors influencing global climate change. Weather sensors, strategically distributed across the globe, continuously record a multitude of weather conditions, such as temperature, humidity, wind, and precipitation, generating a complex spatio-temporal dataset that encapsulates the dynamic na- ture of the Earth’s climate [73]. By harnessing the power of this data, researchers can analyze historical climate patterns, discern trends, and develop predictive
Background 3
models to project future climate scenarios based on various environmental and anthropogenic factors [51]. This holds significant importance in various aspects of human life and environmental management. Accurate and timely weather fore- casts facilitate informed decision-making across a multitude of sectors, including agriculture, aviation, energy management, and emergency response planning [89]. For instance, farmers can optimize crop yields and reduce the risk of crop failure by adjusting their planting, irrigation, and harvesting schedules based on anticipated weather patterns [60]. In aviation, accurate weather predictions can help minimize risks associated with adverse weather conditions, ensuring safer air travel and reducing operational costs [137]. Furthermore, energy utilities can use weather forecasts to better match energy supply with demand, improving the efficiency of power generation and distribution systems [63]. Overall, lever- aging historical spatio-temporal data for weather prediction plays a crucial role in enhancing societal resilience and promoting sustainable development across diverse sectors.
1.1.2 Graph
Graph data, a fundamental data structure in computer science, has been exten- sively utilized to represent, model, and analyze a wide range of complex systems across various scientific disciplines and practical applications. At the core of graph data are two primary components: nodes and edges. Nodes, also referred to as ver- tices, represent discrete entities or objects in the system, while edges, sometimes called links or connections, signify relationships, interactions, or dependencies between these entities. This subsection aims to provide a brief introduction to graph data, focusing on the structural characteristics, properties, and potential applications of nodes and edges in the context of diverse research fields.
Nodes serve as the building blocks of a graph, embodying the individual components that constitute the system under study. In different contexts, nodes can represent a broad array of entities, such as individuals in a social network, web pages in the World Wide Web, genes in a biological network, or traffic sensors in a transportation network [93]. The attributes or properties of nodes, often referred to as node features, can encapsulate various quantitative or qualitative characteristics of the entities they represent, such as demographic information, geographical coordinates, or functional annotations. The study of nodes in a graph can reveal crucial insights into the system’s structure, organization, and
4 Introduction
dynamics, enabling researchers to identify central or influential nodes, detect communities or clusters, and uncover patterns of connectivity and hierarchy [15].
Edges, on the other hand, represent the connections or relationships between nodes, capturing the interactions or dependencies that drive the behavior and evolution of the system. Edges can be directed or undirected, depending on the nature of the relationships they represent. Directed edges have an inherent directionality, indicating a one-way relationship between two nodes. Undirected edges, conversely, imply a mutual or bidirectional relationships. Edges may also be weighted or unweighted, with weights encoding the strength, intensity, or cost associated with the connections, such as distances between locations, capacities of transportation links, or similarity scores between entities [16]. The analysis of edge properties in a graph can lead to a deeper understanding of the system’s connectivity patterns, robustness, and dynamics, allowing researchers to examine the role of edge weights in the emergence of global and local structures, study the distribution and correlation of edge weights, and explore the effects of edge addition or removal on the system’s stability and resilience.
Graph data’s inherent flexibility in representing complex systems with diverse characteristics has led to its widespread adoption across numerous research do- mains. For instance, in social network analysis, nodes can represent individuals, organizations, or groups, while edges signify various types of social relationships, such as friendships, collaborations, or communication links [133]. In biological and ecological networks, nodes can stand for genes, proteins, or species, while edges depict functional interactions, regulatory relationships, or predator-prey dynamics [97]. In transportation and logistics, nodes can correspond to traffic sensors or facilities, such as airports, bus stops, or warehouses, and edges can rep- resent transportation links or routes, with associated attributes such as distances, travel times, or capacities [25]. Graph-based methods can be applied to optimize routing, scheduling, and resource allocation, ultimately enhancing the efficiency and sustainability of transportation systems.
In recent years, graph data has been increasingly employed in the context of data mining, machine learning, and artificial intelligence, leading to the de- velopment of novel techniques and frameworks for graph-based learning and pattern recognition, such as graph neural networks, graph embedding, and graph- based clustering algorithms [166, 140, 173]. These advancements have further broadened the scope and applicability of graph data, enabling researchers and practitioners to address complex, large-scale, and dynamic problems with high accuracy and efficiency.
Background 5
1.1.3 Graph Structure in Spatio-temporal Data
As discussed, graph data structure, characterized by its versatile representation of nodes and edges, plays a pivotal role in modern data-driven research and applications, including spatio-temporal data mining1.
Spatio-temporal data and graph data are intrinsically connected in various ways, particularly when representing and analyzing complex systems with spatial and temporal dimensions. Graph can efficiently represent the spatial attributes of spatio-temporal data, while also capturing the relationships and interactions between spatial entities. This enables researchers to exploit the rich body of graph theory and network analysis techniques to investigate the structure, dynamics, and properties of the underlying spatio-temporal system. For instance, in trans- portation networks, nodes can represent sensors or transit stops, while edges signify transit connections, with the associated spatio-temporal attributes such as travel times or traffic volumes being attached to the edges [152]. In such cases, graph-based methods can be employed to identify the spatial relationships and even aggregate the spatio-temporal correlations. Similarly, in social networks or epidemiological studies, graph data can be used to model the spatial distri- bution of individuals, their connections, and the spatio-temporal dynamics of information or disease spread. The synthesis of spatio-temporal data with graph representation thus offers a powerful and versatile framework for analyzing and understanding a wide range of complex systems.
Furthermore, what is even more captivating is the adoption of graph repre- sentation in spatio-temporal data has opened up new avenues for leveraging state-of-the-art deep learning techniques, specificallygraph neural networks (GNNs), to effectively capture and model complex spatial features embedded in the data.
GNNs, a class of deep learning methods designed to operate directly on graphs, have emerged as powerful tools for learning meaningful representations and patterns in graph-structured data [173, 140]. By representing the spatial aspect of spatio-temporal data as graphs, researchers can leverage the capabilities of GNNs to automatically learn spatial features and dependencies, thereby enabling the development of more accurate and efficient predictive models for various spatio- temporal tasks, such as forecasting, anomaly detection, and pattern recognition [79]. The ability of GNNs to model non-Euclidean data structures, coupled with their capacity to exploit both local and global information in the graph, offers
1We use spatio-temporal data mining and spatio-temporal data analysis interchangeably in this thesis, despite minor distinctions between the two.
6 Introduction
a significant advantage in capturing the intricacies of spatial relationships and dynamics inherent in spatio-temporal data. Consequently, the synergy between graph representation and graph neural networks is driving advancements in the analysis and understanding of spatio-temporal phenomena across diverse disciplines and applications.
1.1.4 Deep Learning Methods
The predominant methodologies used in the research area of spatio-temporal data mining have been completely transformed by the booming of deep learning. In the past, traditional statistical and machine learning approaches were primarily employed for the analysis of spatio-temporal data, such as linear regression, time series models, vector autoregression, autoregressive integrated moving average, support vector machine, and clustering techniques [34]. However, with the advent of deep learning and its impressive performance in various domains, such as computer vision, natural language processing, and speech recognition, researchers have increasingly shifted their focus towards adopting and adapting deep learning methods for spatio-temporal data analysis [3, 131, 55, 1].
Deep learning, a subfield of artificial intelligence (AI) and machine learning, has emerged as one of the most significant advancements in recent years. Deep learning is a collection of algorithms and techniques that employ deep artificial neural networks (ANNs) with multiple hidden layers to learn hierarchical rep- resentations of data. It has its roots in the perceptron, a simple linear classifier proposed by Frank Rosenblatt in the late 1950s [102]. In the 1980s, researchers developed the backpropagation algorithm, which enabled the efficient training of multi-layered neural networks [104]. However, early efforts in training deep neural networks were plagued by several challenges, such as the vanishing gradi- ent problem, overfitting, and most important, the limited computational resource available. It was not until the 2012 that the introduction of AlexNet stunned the computer vision community by achieving a significant improvement in the ImageNet classification challenge, far surpassing previous traditional computer vision methods. Since then, deep learning has rapidly evolved, leading to the development of more sophisticated architectures, such as convolutional neural networks, recurrent neural networks, long short-term memory networks, and transformers. These architectures have been successfully applied to various tasks, achieving state-of-the-art performance in many research fields.
Background 7
One of the primary factors that have contributed to the rapid adoption of deep learning in spatio-temporal data mining is its exceptional representation power and ability to automatically learn complex features from raw data. Un- like traditional methods, which often rely on manual feature engineering and pre-defined assumptions about data distributions, deep learning models have the remarkable ability to learn intricate, high-level features and representations directly from raw data [78]. This process of automatic feature learning enables deep learning models to capture complex patterns and relationships in the data that may be difficult or even impossible for humans to engineer manually. As a result, deep learning models are often more flexible and adaptive to a wide range of problems, exhibiting superior performance in many tasks. Furthermore, deep learning models are capable of handling large-scale, high-dimensional, and noisy datasets, which are prevalent in many real-world applications. Consequently, the representation power of deep learning models has revolutionized the way we analyze data, leading to more accurate and robust models that can uncover previously hidden insights and patterns in complicated spatio-temporal data.
Recurrent neural networks (RNNs)and their variants, such as long short-term memory (LSTM) networks and gated recurrent units (GRUs), have emerged as one of the most widely-used deep learning architectures in the analysis of spatio- temporal data due to their ability to model sequential data and capture long-term dependencies [62, 32]. By modeling the temporal aspect of spatio-temporal data, RNNs can effectively capture dynamic patterns and trends, which is particularly useful in forecasting and prediction tasks, such as weather forecasting, traffic flow prediction, and disease spread modeling [170].
Convolutional neural networks (CNNs)served as one of the most widely-used deep learning architectures for the analysis of image-based data, have also gained popularity in spatio-temporal data mining. In some cases, the spatial features can be transformed into an image-like format to utilize CNNs for analysis. This is particularly useful when the spatial features in the data exhibit complex patterns that may benefit from the spatial processing capabilities of CNNs. Consider, for instance, a dataset comprising weather patterns across a certain timeframe, encompassing both spatial information (i.e., latitude and longitude) and temporal information (such as time of day, month, or year). One can transform this data into a grid-like arrangement, where each cell corresponds to a distinct region and encapsulates pertinent weather-related parameters (such as temperature, humidity, and wind speed). By conceptualizing these grids as images, with each
8 Introduction
cell acting as a pixel and every weather parameter serving as a channel, one can manipulate them accordingly [112]. Furthermore, the introduction of 3D convolutions and recurrent neural network (RNN) layers in CNN architectures has enabled the incorporation of temporal information, allowing for the joint analysis of spatial and temporal features in spatio-temporal data [66, 148, 147].
Graph neural networks (GNNs), another major development in spatio-temporal data mining driven by deep learning, which have been discussed earlier. GNNs offer a powerful way to model and analyze graph-structured data, such as spatial networks, and can be particularly useful in capturing complex spatial depen- dencies and relationships in spatio-temporal data [166]. By combining GNNs with other deep learning architectures, such as CNNs or RNNs, researchers can develop advanced hybrid models that can effectively analyze both spatial and temporal dimensions of spatio-temporal data [152, 37, 165, 141].
The attention mechanism (Attns), has evolved into the preeminent instrument within the realm of natural language processing, concurrently manifesting as a formidable apparatus in a multitude of other deep learning domains, encom- passing computer vision and spatio-temporal data mining [127]. The attention mechanism enable models to focus on specific spatial and temporal features that are most relevant for the task at hand. In the context of spatio-temporal data mining, attention mechanisms can be incorporated into various deep learning architectures, such as convolutional neural networks, recurrent neural networks, and graph neural networks, to enhance their performance in capturing and pro- cessing the complex interactions and dependencies among spatial and temporal components of the data [171]. By assigning different weights to spatial and tempo- ral features, attention-based models can adaptively focus on the most informative parts of the input data, leading to improved interpretability and predictive accu- racy in tasks such as traffic forecasting, climate modeling, and event detection.
Furthermore, attention mechanisms can help mitigate the challenges posed by large-scale, high-dimensional, and noisy spatio-temporal datasets by allowing the models to selectively attend to the most relevant information while filtering out noise and irrelevant details. As a result, attention-based deep learning mod- els have become increasingly popular in spatio-temporal data mining, opening up new possibilities for more accurate and robust analyses in a wide range of applications.
In summary, the booming of deep learning has revolutionized the research area of spatio-temporal data mining by introducing powerful and flexible model- ing techniques, such as CNNs, RNNs, GNNs and attention mechanism, which
Background 9
can automatically learn complex spatial and temporal patterns from raw data.
The integration of deep learning methods in spatio-temporal data mining has led to significant advancements in various domains, including environmental science, transportation, urban planning, and epidemiology, among others. Furthermore, the development of interpretability techniques, along with efficient and scalable computing solutions, has enabled researchers to better understand and harness the potential of deep learning models for spatio-temporal data mining.
1.1.5 Spatio-temporal Data Mining with Deep Learning
Spatial-temporal data mining is the process of extracting valuable information, patterns, and relationships from spatio-temporal data [131]. The rapid increase in spatial-temporal data generated by diverse sources, including GPS, remote sensing, and social media, has fueled the growth of this research area. A key aspect of spatio-temporal data mining is the identification and quantification of spatial and temporal dependencies, which arise due to the interconnectedness and interdependence of the processes governing the observed phenomena. These dependencies can manifest in various forms, such as spatial autocorrelation, where the values of a variable at nearby locations are more similar than those at distant locations, or temporal autocorrelation, where the values of a variable at a given location are more similar across adjacent time periods than across distant time periods. Understanding and modeling these dependencies is crucial for making accurate predictions and inferences about the underlying processes and their future behavior.
We have deliberated upon numerous prevalent deep learning methodologies extensively employed in spatio-temporal data mining. Nevertheless, it is crucial to emphasize that these approaches are customarily integrated, rather than utilized in isolation, to address the multifaceted dimensions of spatio-temporal data. This characteristic uniquely distinguishes spatio-temporal data from other data types.
The heterogeneity of spatial and temporal data poses significant challenges in developing a single unified model for spatio-temporal data mining. This complex- ity arises from the distinct characteristics and relationships that exist within and across the spatial and temporal dimensions, necessitating the use of specialized models and techniques to effectively capture these aspects. Consequently, most studies in this domain employ a combination of deep learning techniques, such as CNNs, GNNs, RNNs, and Transformers, to capture the spatial and temporal features independently and subsequently integrate them for holistic analysis [1].
10 Introduction
Spatial feature extraction In terms of spatial feature extraction, GNNs and CNNs have demonstrated their proficiency to learn intricate spatial patterns and structures from input data by automatically learning spatially invariant feature representations. These techniques excel at capturing local and global spatial de- pendencies in data, such as the neighborhood interactions, hierarchical structures, and spatial correlations, which are crucial for understanding the underlying spa- tial phenomena. Specifically, Graph Convolutional Networks (GCNs) [71], Graph Attention Netowrk (GAT) [128], GraphSAGE [56] are frequently employed to learn spatial representations of spatial aspects encoded in graph structures. These graph-based methods have gained popularity due to their powerful ability to model intricate spatial relationships and interactions present in various real-world datasets. Conversely, CNNs often require the transformation of spatial data into grid image data, a representation that might not always be the most suitable choice for certain types of spatial information.
Temporal feature extraction On the other hand, temporal feature extraction often leverages RNNs and their variants, such as LSTM and GRU networks, which are specifically designed to model and capture temporal dependencies and dynamics in sequential data. By maintaining an internal memory state, these models can effectively learn long-range temporal patterns and dependencies, providing a powerful tool for capturing the temporal aspect of spatio-temporal data. Besides RNN based models, temporal Convolutional Networks (TCNs) [7]
are another powerful and efficient approach for time series data mining, designed specifically to capture and model temporal dependencies within sequential data.
Unlike traditional recurrent architectures, such as LSTMs and GRUs, TCNs utilize a series of dilated convolutional layers to model temporal relationships, enabling the efficient learning of long-range dependencies without the need for recurrent connections. One key aspect of TCNs is the incorporation of gated activation functions, which help regulate the flow of information through the network and facilitate the learning of complex temporal patterns. Lastly, some studies exploit the attention based methods to learn the temporal correlations from the entire sequence [171, 54].
Once the spatial and temporal features are extracted separately, various ap- proaches have been proposed to integrate and fuse these features for compre- hensive spatio-temporal analysis. One common approach is the sandwich model design, wherein the spatial and temporal features are combined through a se- ries of layers, such as convolutional or recurrent layers, that successively refine
Background 11
the fused representation to capture higher-level spatio-temporal patterns and interactions. Alternatively, some studies utilize linear layers or other fusion tech- niques, such as element-wise addition or concatenation, to combine the spatial and temporal features in a more straightforward manner.
Table 1.1: Common deep learning techniques for spatio-temporal data mining regarding to traffic forecasting
Studies Spatial layers Temporal layers
DCRNN (2017) [79] Diffusion Conv GRU
STGCN (2017) [152] GCN Gated TCN
DMVST-Net (2018) [148] CNN LSTM
STDN (2019) [147] CNN LSTM + Attn
GraphWaveNet (2019) [142] GCN Gated TCN
DSTGCN (2019) [37] GCN TCN
ASTGCN (2019) [53] GCN + Attn TCN + Attn
SLCNN (2020) [165] GCN Gated TCN
GMAN (2020) [171] Spatial Attn Temporal Attn
MTGCN (2020) [141] GCN Gated TCN
ASTGNN (2021) [54] GCN + Attn Temporal Attn
D2STGNN (2022) [109] Diffusion CNN GRU
STEP (2022) [108] GNN Transformer
Here we take the traffic forcasting problem as an example, a crucial research domain in spatio-temporal data mining. Due to the importance of accurate traffic prediction, numerous models have been proposed to address this challenge.
In particular, various deep learning techniques have been employed to handle the spatial and temporal features inherent in traffic data. A summary of some representative studies and their respective spatial and temporal layers is provided in Table 1.1. It showcases a diverse range of deep learning techniques employed for spatio-temporal data mining in the context of traffic forecasting. The listed studies span from 2017 to 2022 and utilize various spatial layers, such as diffusion convolution, GCNs, and CNNs, often enhanced with attention mechanisms to improve model performance. In terms of temporal layers, GRUs, LSTM, TCNs, and attention-based approaches are common choices for modeling the temporal dependencies in traffic data. This overview demonstrates the rapid evolution and innovation in deep learning techniques for spatio-temporal data mining.
The increasing complexity and diversity of these techniques reflect the growing interest and need for accurate and efficient traffic forecasting. As research in this area continues to advance, it is anticipated that novel deep learning architectures
12 Introduction
and methodologies will further improve the state-of-the-art in spatio-temporal data mining, enabling more effective and reliable traffic forecasting solutions.
In summary, the inherent complexity and heterogeneity of spatio-temporal data necessitate the use of a combination of deep learning techniques to capture the distinct spatial and temporal features independently, followed by their in- tegration through various fusion approaches. These hybrid models provide a flexible and robust framework for tackling the challenges of spatio-temporal data mining, allowing researchers to uncover complex patterns, relationships, and dynamics across a wide range of applications.
1.2 Challenges
Despite the growing interest in spatio-temporal data mining owing to its prospec- tive applications across various domains and the progress in deep learning tech- niques, the analysis of current spatio-temporal data still presents several chal- lenges, particularly in the pursuit of precise and effective deep learning models.
In this section, we will discuss four key challenges in spatio-temporal data mining:
(1) Transferrability and generalization issue, (2) Scalability issue, (3) Construct efficient graph structures from raw data to augment model efficacy, and (4) Sparse and discrete data sampling issue. We will proceed to introduce each of the four challenges individually in the rest of this section.
1) Transferrability and generalization issue A major challenge in spatio- temporal data mining is the transferrability and generalization of models across different geographic regions. Due to the diverse nature of spatio-temporal data, models trained on one dataset may not necessarily perform well on others, lead- ing to poor generalization [55]. For example, a traffic prediction model trained on data from New York City may not be directly applicable to San Francisco, as the underlying road network, traffic patterns, and other factors may differ significantly between the two cities. Moreover, the lack of labeled data in certain application areas makes it difficult to train and validate models. One approach to address this issue is transfer learning, which involves leveraging the knowledge gained from one domain to improve performance in another, related domain [175]. In the context of spatio-temporal data mining, the term "domain" specifi- cally refers to geographic locations and their unique characteristics. Further on, a even better ideal solution would be to train a single model that functions akin to a
Challenges 13
Language Model (LM) in the natural language domain, possessing the capability to be effectively utilized across multiple geographic regions and scenarios in a specific spatio-temporal data mining task such as traffic forecasting. Recent work has explored the application of transfer learning to spatio-temporal data mining, with promising results [132]. However, the exploration and discussion of universal models for practical applications in spatio-temporal data mining remains an open research area that warrants further investigation.
2) Scalability issue Scalability is another critical challenge in spatio-temporal data mining, as real-world datasets often involve large-scale networks and time range. Firstly, typical GNN models assume a small fixed network size, limited by computational capability, and may struggle to handle large-scale graphs ef- ficiently [26]. Though, researchers have proposed various strategies to improve the scalability of GNN models, such as sampling techniques, graph partition- ing, and parallelization to address the scalability issue of graph neural networks [56, 157, 29], achieving scalability without sacrificing accuracy remains a major challenge, and further research is needed to handle large-scale spatio-temporal datasets. Secondly, the combination of temporal and spatial features in spatio- temporal data mining exacerbates scalability concerns beyond those posed by large graphs. Temporal Convolutional Networks (TCNs) [7], which offer faster processing compared to RNN-based models, have become a popular choice for handling sequential data in spatio-temporal analysis, and exemplify the trade-off between speed and accuracy. However, TCNs are limited in their ability to effi- ciently capture complex temporal interactions. As a result, there is an ongoing need to improve the scalability and computational efficiency of current methods to accommodate the ever-increasing volume and complexity of spatio-temporal data.
3) Build efficient graph structures from raw data to augment model efficacy Another challenge in spatio-temporal data mining is the construction of informa- tive graph structures from raw data to enhance the effectiveness of deep learning models. Raw spatio-temporal data often requires preprocessing and transforma- tion into a structured format, such as graphs to enable effective analysis. However, determining the most suitable graph structures and representations that capture the underlying spatial and temporal patterns is a non-trivial task [140]. The most common approach to constructing graphs is to exploit geographical locations. In
14 Introduction
this method, a graph is built to represent spatial locations, where each node de- notes a single location, and edges represent the distance between two nodes. For instance, in a traffic prediction scenario, the graph would reflect the geographical distribution of sensors, with each node representing a traffic sensor recording features such as speed and volume, and each edge reflecting the travel distance between two sensors along the road network [152]. However, in other applica- tions, multiple ways of constructing graphs may be possible. Considering the next point-of-interest (POI) recommendation as an example, a straightforward graph construction method would involve nodes representing POIs and edges repre- senting the distance between them. Consequently, in a next POI recommendation scenario, the model would select from the nearby neighborhood of the current location, as these locations are more likely to be visited due to spatial convenience.
Alternatively, another possibly more informative graph can be constructed, where nodes still represent POIs, but edges now denote the population-level frequency of transitions between two nodes. In other words, this graph captures the most frequent subsequent visited POIs given the current location and time. Such a graph not only contains geographical information but also reflects general human tendencies, making it superior to a simple geo-based graph. However, to the best of our knowledge, we have not identified any studies that specifically discuss this issue.
4) Sparse and discrete data sampling issue The fourth challenge concerns the issue of sparse and discrete data sampling. Real-world spatio-temporal data is often marked by sparsity, with numerous data points missing or unobserved due to various factors, such as sensor failures, limited sampling intervals, or inherent variability in the underlying processes [33, 120]. In many existing spatio-temporal data mining tasks, temporal data is represented as a sequence of discrete time steps, causing the prediction granularity to be highly dependent on the input data’s sampling interval. For example, if weather sensors record temperature data hourly, it becomes highly unlikely for the model to accurately predict the temperature in the next 30 minutes. Consequently, these models are limited in their ability to infer data at arbitrary time points. This limitation primarily stems from the intrinsic discreteness of neural networks and the data collection pro- cess, which is often constrained by practical factors such as energy consumption and storage limitations. As a result, sensors in applications like meteorology, aerography, and environmental monitoring collect data at fixed intervals. Re- constructing the continuous world from observed discrete data is essential for
Research Questions 15
accurate spatio-temporal data analysis. However, the continuous reconstruction problem, estimating data at any given timestamp within the period of interest, differs from traditional imputation tasks that aim to fill missing data at predefined timestamps. This distinction highlights the need for further research in this area.
1.3 Research Questions
This thesis aims to address the aforementioned four challenges by posing three research questions, which encompass two levels of analysis: macro-level and micro-level.
Themacro-level analysisaims to capture the general, aggregate patterns in spatio-temporal data, which are often individual-agnostic. These general patterns may represent phenomena such as overall traffic flow, shared preferences among users, or large-scale weather systems. By identifying and understanding these macro-level patterns, researchers can gain insights into the underlying processes governing the spatio-temporal data and develop more effective models for pre- diction and analysis. Data-driven methods often fall into this category. These methods leverage machine learning and statistical techniques to identify patterns and relationships within the data, enabling the discovery of previously unknown insights and the development of more accurate predictive models.
Themicro-level analysisfocuses on the individual entities within the system, such as vehicles in traffic prediction, users in point-of-interest (POI) recommenda- tion, or airflows in weather prediction. These micro-level entities exhibit unique spatio-temporal characteristics, and their movements and interactions drive the observed spatio-temporal patterns in the data. Model-driven methods usually fall into this category. These methods utilize domain-specific knowledge to de- velop mathematical or simulation models that capture the underlying mechanics of a system, allowing for improved understanding, prediction, and control of individual entity behavior.
It is important to note that our focus in this thesis is not on developing mathematical or simulation models; instead, we concentrate on employing deep learning methods at both macro and micro levels in spatio-temporal data mining.
The interplay between micro-level and macro-level analysis is crucial for attaining a comprehensive understanding of the data. The movements and interactions of micro-level entities contribute to the emergence of macro-level patterns observed in the data. Concurrently, these macro-level patterns can offer valuable context and guidance for interpreting and predicting the behavior of micro-level entities.
16 Introduction
To address the first challenge of transferability and generalization issue, macro- level analysis is particularly advantageous, as individual entities may exhibit significant variation, whereas aggregate patterns tend to be more stable across diverse spatial or temporal scenarios. For example, in traffic forecasting, driving habits may differ from person to person; however, the underlying rules governing traffic patterns are likely to be consistent. For instance, despite the diverse driving habits, congestion around rush hours is expected to disperse along the incoming road direction. The critical question is how can we capture these macro patterns that appear to possess spatio-temporal translation invariance properties. Notably, certain local patterns are common across different geographic regions, such as edges and corners in almost all images.
The challenge lies in the fact that while edges and corners in images can be easily captured through human feature engineering, identifying spatio-temporal features is more complex. If a method can be developed to learn integrated spatio- temporal features with translation invariance, the issues of transferability and generalization can be effectively resolved. Furthermore, by designing models that focus on smaller areas at a time and can be run in parallel, scalability issues can also be addressed as a bonus. Thus, a more refined approach to capturing macro patterns could lead to the development of robust and scalable spatio- temporal data mining techniques, overcoming the challenges of transferability, generalization, and scalability. Therefore, we present our first research question (RQ):
RQ 1. Can we develop a macro-level approach to learn integrated spatio-temporal features with translation invariance that is applicable across different geographic regions and time periods, thereby addressing the transferability, generalization, and scalability issues in spatio-temporal data mining?
The potential solutions ofRQ 1 offered by macro-level approaches would solve the challenges related to transferability, generalization, and scalability in spatio-temporal data mining. These approaches can effectively learn integrated spatio-temporal features with translation invariance, thereby addressing the key issues identified earlier. As we progress, our focus shifts towards leveraging macro-level information for constructing efficient graph structures and enhancing micro-level model predictions in spatio-temporal data mining. In particular, we
Research Questions 17
concentrate on next point-of-interest (POI) recommendation tasks, where micro- level data consists of the user’s visited sequence.
Using the next point-of-interest (POI) recommendation as an illustrative ex- ample, a basic approach to building a graph would have nodes symbolizing POIs and edges signifying the distances between them. On the other hand, a potentially more insightful graph could be created where nodes continue to represent POIs, but edges now indicate the frequency of population-level transitions between pairs of nodes. Essentially, this graph encapsulates the most common subsequent POIs visited, given a specific location and time. This type of graph encompasses not only geographical information but also captures general human behavior patterns, rendering it more advantageous than a mere geospatial-based graph.
In this context, our next research question seeks to explore the use of macro- level information to develop graph structures that can significantly augment model efficacy. Moreover, this part of the study also aims to propose a novel framework that seamlessly integrates macro-level information and micro-level predictions. By doing so, we expect to address not only the challenge of construct- ing efficient graph structures but also contribute to the broader goal of improving the overall performance and applicability of spatio-temporal data mining models.
As such, we propose the next research question:
RQ 2. How can we leverage macro-level information to construct an efficient graph structure and use it to enhance micro-level predictions in spatio-temporal data mining?
InRQ 1andRQ 2, we discussed data mining applications at both macro-level and micro-level, focusing on specific application scenarios, such as predicting traffic conditions like average speed or generating a list of recommended POIs for users. In these cases, we were primarily concerned with leveraging embed- dings for direct application purposes, rather than obtaining representations for individual nodes in the graph at a given time. However, as we shift our attention to the final challenge concerning sparsity and discrete data sampling, our objec- tive transitions from a specific prediction problem to a genuine representation learning problem.
Representation learning seeks to uncover meaningful and compact repre- sentations of raw data, which can enable efficient learning, generalization, and interpretability across various tasks. In this context, our goal is to learn repre- sentations for each spatio-temporal unit (i.e., sensor measurements at a specific
18 Introduction
location and timestamp) and utilize these representations to capture the under- lying continuous data distribution based on observed data points. This can be likened to estimating the surface of a 3D space from discrete points. To tackle the sparsity and discrete data sampling challenge and enhance spatio-temporal data analysis, it is essential to reconstruct continuous representations from observed discrete data. With this in mind, we propose the third research question:
RQ 3. How to combine macro and micro-level information to reconstruct the underlying continuous distribution from a discretely sampled dataset, addressing the sparse data sampling issue in spatio-temporal data mining?
By investigating this research question, our objective is to devise innovative methodologies that harness the power of both macro and micro-level information.
In doing so, we strive to surmount the limitations inherent in data collection processes and existing spatio-temporal data mining techniques, ultimately leading to more robust and accurate analysis by using the learned representations.
1.4 Thesis Outline
In this thesis, our primary objective is to address the three research questions proposed earlier. The thesis is organized into four main parts, each dedicated to exploring various aspects of the research questions.
Chapter 2 presents an in-depth and extensive survey on current spatio- temporal graph representation learning methods. In this chapter, we system- atically introduce the research status of spatio-temporal data mining and discuss the latest advancements in the rapidly evolving field of graph neural networks, since the graph neural networks are increasingly becoming the predominant tool for extracting spatial relationships.
Chapter 3 is devoted to resolvingRQ 1, using traffic forecasting as an illus- trative application case to demonstrate the practical implications of our research.
In this chapter, we delve into the investigation and design of a novel framework that captures the universal spatio-temporal patterns present in traffic datasets across various regions. By harnessing these universal traffic patterns, we can significantly enhance the generalization ability and scalability of proposed traffic prediction model, ultimately improving the real-world applicability.
Thesis Outline 19
Chapter 4 shifts focus toRQ 2, exploring an alternative application in the form of next POI recommendation. In this realm, micro-level information has been the primary focus of researchers, with macro-level graphs often overlooked.
We address this gap by proposing a trajectory flow map that is based on popu- lation movement trends, and demonstrate how it can be effectively leveraged in individual-level next POI recommendation tasks, showcasing its potential for enhancing the overall performance of the recommendation system.
In Chapter 5, we aim to tackleRQ 3by proposing a new research problem calledContinuous Spatio-Temporal Data Reconstruction. In this chapter, we propose a groundbreaking spatio-temporal representation learning framework that seamlessly combines macro-level and micro-level information to reconstruct the underlying continuous data distribution from discretely observed data points.
In the subsequent sections, we delve into the topics and present the results obtained in each chapter of the thesis, providing a comprehensive understanding of our research contributions.
Chapter 2. A Survey of Spatio-temporal Graph Representation Learning Before delving into the research forefront, it is essential to gain an overview of the relevant research fields. In this context, our primary focus lies onSpatio-temporal Data Mining (STDM)andGraph Representation Learning (GRL).
Spatio-temporal data miningis a burgeoning research field that focuses on the extraction of valuable patterns, relationships, and knowledge from vast amounts of data that contain both spatial and temporal components. This interdisciplinary area of study brings together principles from diverse domains, such as geography, computer science, data mining, machine learning, and statistics, to address com- plex problems that arise in various real-world scenarios, including transportation, urban planning, environmental monitoring, and public health. The importance of spatio-temporal data mining has grown exponentially in recent years, largely due to the proliferation of location-aware devices, remote sensing technologies, and the increasing availability of geospatial data. By developing and applying advanced analytical techniques to spatio-temporal data, researchers aim to un- cover hidden patterns and insights that can not only improve decision-making processes but also facilitate a deeper understanding of the intricate relationships between space and time.
Meanwhile, graph representation learningis an emerging research field that focuses on developing methods to learn compact, expressive, and informative
20 Introduction
embeddings of graph-structured data. As a subfield of machine learning, graph representation learning aims to encode the topological structure, node features, and relationships within graphs into continuous vector spaces, which can then be utilized for various downstream tasks such as node classification, link prediction, and community detection. This research area has gained significant momentum in recent years, primarily due to the prevalence of complex and interconnected data in diverse domains such as social networks, biological systems, transportation, and recommender systems. Graph representation learning methods, including Graph Neural Networks (GNNs), Graph Convolutional Networks (GCNs), and Graph Attention Networks (GATs), have demonstrated remarkable success in tackling these complex data structures by leveraging their inherent relational information. In the Chapter 2, we will delve into the current state of the art in graph representation learning, review the key methodologies, and discuss the main challenges and future directions in this rapidly advancing field.
There is a strong connection between STDM and GRL, as many spatio- temporal data sets can be naturally modeled as graphs, where nodes represent spatial entities and edges capture the relationships or interactions between them over time. By leveraging GRL techniques, researchers can better analyze the com- plex and dynamic nature of spatio-temporal data, incorporating both the spatial and temporal dependencies that are vital for accurate analysis and prediction tasks. Furthermore, the integration of STDM and GRL can lead to the develop- ment of more robust, scalable, and generalizable models that can be applied across different geographic regions and time periods. In this regard, the synergy between STDM and GRL offers significant potential for advancing the state of the art in both fields and enabling more powerful data-driven solutions for a wide range of real-world problems.
In short, in the Chapter 2, we will explore the current state of the art in spatio-temporal data mining, including spatio-temporal data types, applications, methods and the recent advanced in graph neural networks.
Chapter 3. Space Meets Time: Local Spacetime Neural Network For Traffic Flow Forecasting
In this chapter, we examine the traffic prediction problem and introduce aLocal Space-Time Neural Network (STNN). This innovative approach utilizes space- time convolution and attention mechanisms to learn the universal spatio-temporal correlations, effectively addressingRQ 1.
Thesis Outline 21
The growing influence of data-driven technologies in modern transportation systems has led to an increased focus on traffic flow forecasting. Accurate and timely predictions of traffic dynamics can significantly enhance transportation management, alleviate congestion, and improve overall efficiency. Traffic systems, characterized by changing flows in road networks, exhibit salient patterns influ- enced by various extrinsic factors and intrinsic principles. Accurate traffic flow predictions rely on a model’s ability to capture not only extrinsic features but also the intrinsic, universal patterns that govern traffic flow. Uncovering these patterns and understanding the latent correlations between a location’s current state and its surrounding locations’ past are crucial for developing effective traffic forecasting models.
Recent advancements in neural-based techniques, especially GNNs, have sig- nificantly improved traffic feature representation and prediction results compared to earlier statistical methods. However, existing GNN-based traffic forecasting models face three major challenges: (1) their heavy reliance on graph structure limits their applicability to specific road networks, preventing the discovery of intrinsic traffic system properties; (2) the computationally expensive feature aggre- gation operations, such as graph convolutions, impede scalability for large road networks with numerous sensors; and (3) the separate components for spatial and temporal feature extraction in these models assume uniform correlations between locations over time, which may not accurately reflect the dynamic nature of traffic systems.
In order to solve the above challenges andRQ 1, we begin by defining several key concepts that will be useful throughout the thesis. One such concept is the traffic event, which integrates both the spatial and temporal aspects of a traffic measurement collected by a sensor station. Drawing a parallel to the concept of events in physics [22], this concept will prove essential in our efforts to solve the challenges at hand.
Definition 1(Traffic Event). Given a traffic measurement s (e.g., speed) observed at sensor vi and time t, a traffic event is a tuple consists of the measurement, time, and location, namely,(s,t,vi).
Accordingly,we can generalize this definition to encompass a broader range of scenarios, not solely limited to traffic prediction.
22 Introduction
Definition 2(Spacetime Event). Given any sensor measurement m observed at location viand time t, a spacetime event is a tuple consists of the measurement, time, and location, namely,(m,t,vi).
Additionally, the essential concept for this chapter, as well as the entire thesis, is thespacetime interval. Thespacetime intervalbetween two traffic events signifies the degree to which one event impacts the other; a smaller interval indicates a stronger connection between the two traffic events. Within a local-spacetime context, we are primarily concerned with the intervals between traffic events at the target sensor and those at other sensors.
Definition 3(Spacetime Interval). Spacetime interval is the quantified influence of a traffic event imposed on another traffic event regarding to the traffic measurement.
Figure 3.1 provides an illustration of the spacetime interval concept. The figure displays three instances of the network. The state of the target node att3is significantly affected by the state ofaatt1and the state ofbatt2. However, it is only mildly influenced by the state ofaatt2and the state ofbatt1.
time
target spacetime interval
1
0
impact
a
b
a a
b b
t1 t2 t3
Figure 3.1: An illustration of spacetime interval
Armed with these key concepts, we propose a novel spatio-temporal correla- tion learning paradigm called Spacetime Interval Learning. This approach fuses spatial and temporal dimensions into a single manifold, referred to as space- time, and captures correlations as intervals between traffic events. The paradigm extracts traffic data from nearby sensors within a fixed time window, called the local-spacetime context, which allows the model to focus on relevant sen- sors. Our method correlates nodes at different times within the local-spacetime context, resulting in a model that is universal, independent of graph structure,
Thesis Outline 23
and applicable to various traffic systems. By shifting the focus from network- level to node-level predictions, our approach facilitates parallel predictions for multiple locations and efficiently captures varying spatial correlations between locations over time. The architecture of the model is shown in Figure 3.2. The STNN compriseskspacetime modules (ST-Modules) and a fully-connected out- put layer. Each ST-Module contains a spacetime attention block (ST-Attn block) and a spacetime convolution block (ST-Conv block). The ST-Attn block utilizes a self-attention mechanism to emphasize the most influential traffic events. Within each ST-Conv block, three distinct convolution kernels are employed to aggregate the spatio-temporal correlations from various perspectives. Subsequently, the extracted features are stacked and condensed using a 1×1 convolution.
time
Sub-spacetime Input network
ST-Conv
Input:
Output:
...
ST-Atten
Residual Connections
1x1 Conv
ST-Conv Block ST-Attn Block
ST-Conv ST-Atten
1x1 Conv FC
Input: Output:
Residual Connections Residual Connections
ST-Conv ST-Attn
Residual Connections ST-Conv Block ST-Attn Block
ST ModuleST Module
Linear
Conv Conv
condense ST-Conv Block
concat Conv
...
...
Figure 3.2: The architecture of STNN with an example local-spacetime.
At last, the proposed STNN model was evaluated in various settings to demon- strate its robustness and ability to generalize to unseen traffic networks, unlike existing state-of-the-art models. In the first setting, STNN was trained and tested on the same network, where it surpassed baselines in terms of prediction accu- racy. In the second setting, STNN