• Tidak ada hasil yang ditemukan

Traffic Prediction Models .1 DCRNN

A Survey of Spatio-temporal Data Mining

2.5 Advanced Spatio-temporal Data Mining Models

2.5.1 Traffic Prediction Models .1 DCRNN

TheDiffusion Convolutional Recurrent Neural Network (DCRNN), proposed by Li et al. (2017) [79], is a pioneering study that applies graph-based deep learning methods to traffic prediction while explicitly utilizing graph information. This groundbreaking research has effectively opened up a new field of investigation, focusing on the exploitation of graph-based deep learning techniques for traffic prediction tasks and the incorporation of graph structures explicitly.

The authors tried to solve three main challenges lies in traffic forecasting including: (1) the complex spatial dependency within road networks, (2) the non- linear temporal dynamics due to changing road conditions, and (3) the inherent difficulty of making long-term predictions. Then, author proposed modeling the traffic flow as a diffusion process on a directed graph and introducing the DCRNN.

This deep learning framework for traffic forecasting effectively incorporates both spatial and temporal dependencies in the traffic flow. DCRNN specifically cap- tures spatial dependency using bidirectional random walks on the graph, while the temporal dependency is managed through an encoder-decoder architecture with scheduled sampling. The model structure is demonstrated in Figure 2.3

In DCRNN, a diffusion convolution operation is performed on a graph signal XRN×Pand a filtergφ, and can be defined as:

X:,p⋆G gφ=

K1 k

=0

φk,1

DO1Wk

+φk,2

DI 1Wk

X:,p forp∈1,· · · ,P (2.26) whereφRK×2signifies the filter parameters, whileDO1W andDI 1Wrepre- sent the transition matrices of the diffusion process and its reverse, respectively. A diffusion convolutional layer is then constructed to mapP-dimensional features toQ-dimensional outputs. The parameter tensor is denoted asRQ×P×K×2 = [φ]q,p, withq,p,:,:,RK×2parameterizing the convolutional filter for the pth input and theqth output. The diffusion convolutional layer is expressed as:

H:,q=b

p=1PX:,pGgq,p,:,:, forq1,· · · ,Q (2.27)

54 A Survey of Spatio-temporal Data Mining

Figure 2.3: The architecture of DCRNN [79]. The historical time series are fed into an encoder whose final states are used to initialize the decoder. The decoder makes predictions based on either previous ground truth or the model output

In this equation, XRN×P denotes the input, HRN×Q represents the output,gq,p,:,: refers to the filters, andbis the activation function. The diffusion convolutional layer learns representations for graph-structured data, and it can be trained using stochastic gradient-based methods.

To model the temporal dynamics, the authors employed a slightly modified GRU, in which the matrix multiplications within the GRU are replaced by diffu- sion convolutions. This results in the creation of theDiffusion Convolutional Gated Recurrent Unit (DCGRU):

r(t) =σ

ΘrG hX(t),H(t1)i +br

u(t) =σ Θu⋆G

h

X(t),H(t1)i +bu

C(t) =tanh

ΘCGhX(t),

r(t)H(t1)i +bc

H(t) =u(t)H(t1)+1−u(t)

C(t)

(2.28)

whereX(t)andH(t)represent the input and output at timet, respectively, while r(t)andu(t)denote the reset and update gates, respectively. The symbol⋆G indi- cates the diffusion convolution, andΘr,Θu, andΘCcorrespond to the parameters of the respective filters. Analogous to the GRU, the DCGRU can be utilized to construct recurrent neural network layers and trained using the backpropagation through time method.

Advanced Spatio-temporal Data Mining Models 55

2.5.1.2 STGCN

Another important pioneer work isSpatio-Temporal Graph Convolutional Networks (STGCN)proposed by Yu et al. in 2017 [152]. The authors of STGCN contend that while commonly used statistical methods excel at short interval predictions, their effectiveness diminishes for more extended, long-term forecasts due to the inherent uncertainty and complexity of traffic flow. Moreover, previous studies often overlook the spatial attributes of traffic networks, such as connectivity and globality, by dividing them into segments or grids.

To tackle these challenges, they introduce a novel deep learning architec- ture named STGCN. This architecture comprises multiple spatio-temporal con- volutional blocks, which merge graph convolutional layers and convolutional sequence learning layers to capture spatial and temporal dependencies. They rep- resent the traffic network as a general graph to fully harness spatial information, and they utilize a fully convolutional structure along the time axis to address the innate shortcomings of recurrent networks. The author claim STGCN marks the first instance where purely convolutional structures have been employed to concurrently extract spatio-temporal features from graph-structured time series in a traffic study.

Figure 2.4: The architecture of STGCN [152]. The STGCN is composed of two ST-Conv blocks and a fully-connected output layer at the end. Each ST-Conv block encompasses two temporal gated convolution layers and a single spatial graph convolution layer situated in the middle.

The architecture of STGCN is depicted in Figure 2.4. Each ST-Conv block features a "sandwich" structure, consisting of two gated sequential convolution

56 A Survey of Spatio-temporal Data Mining

layers with a spatial graph convolution layer positioned between them. The spatial graph-conv layer employs the standard GCN introduced in Chapter 2.4.4.2.

To extract temporal features, STGCN utilizes a 1D convolutional layer with gated linear units, adding non-linearity to the model.

Specifically, the convolution kernelΓ∈RKt×Ci×2Cois designed to map input Yto a single output element[PQ]∈R(MKt+1)×(2Co), wherePandQare split in half, each with the same size of channels. Thus, the temporal gated convolution can be defined as:

Γ∗TY= P⊙σ(Q)∈R(MKt+1Co (2.29) wherePandQare the inputs of gates in the GLU, and⊙denotes the element- wise Hadamard product. The sigmoid gateσ(Q)determines which inputPof the current states is relevant for identifying the compositional structure and dynamic variations in time series. The non-linearity gates also aid in exploiting the full input field through stacked temporal layers. Additionally, residual connections are employed among stacked temporal convolutional layers.

STGCN represents one of the pioneering efforts in explicitly utilizing GCN, garnering significant interest and making substantial contributions to the field, ultimately transforming traffic prediction models.

2.5.1.3 Graph WaveNet

A brief and clear model design isGraph WaveNetproposed by Wu et al. in 2019 [142]. The authors argue existing studies have two significant shortcomings of capturing spatial and temporal dependencies simultaneously. The first drawback is other studies may assume the graph structure of data reflects the genuine dependency relationships among nodes. The second problem is other studies may be ineffective in learning temporal dependencies due to the limitations of RNN and CNN methods.

Too address these shortcomings, Graph WaveNet was proposed. This model features a graph convolution layer with a self-adaptive adjacency matrix, which can be learned from the data through end-to-end supervised training. This pre- serves hidden spatial dependencies. Moreover, stacked dilated causal convo- lutions are adopted to capture temporal dependencies, allowing the model to efficiently and effectively handle spatial-temporal graph data with long-range temporal sequences.

Advanced Spatio-temporal Data Mining Models 57

Figure 2.5: The architecture of Graph WaveNet [142]. Several spatio-temporal layers are successively arranged on the left, with an output layer situated on the right. A pair of gated temporal convolution modules (Gated TCN) operate se- quentially to extract temporal characteristics, succeeded by a graph convolutional layer (GCN) devised to discern spatial features.

The model structure of Graph WaveNet is shown in Figure 2.5. It mainly con- sists of several GCN and TCN layers for sptio-temporal correlation learning. The modified GCN introduced in Chapter 2.4.4.2 with generalized diffusion convolu- tion are employed to capture spatial feature with a self-adaptive adjacency matrix.

The self-adaptive adjacency matrix can be viewed as a learnable 2D parameter ma- trix through stochastic gradient descent. In detail, this is accomplished by initial- izing two node embedding dictionaries with learnable parameters,E1,E2RN×c, in a random manner. The self-adaptive adjacency matrix can be expressed as follows:

A˜adp =SoftMax(ReLU(E1E2T)) (2.30) Employing the ReLU activation function effectively eliminates tenuous con- nections, while the SoftMax function ensures normalization of the self-adaptive adjacency matrix. As a result, the normalized self-adaptive adjacency matrix may

58 A Survey of Spatio-temporal Data Mining

be regarded as the transition matrix for an inherent diffusion process. By amal- gamating pre-established spatial dependencies and the self-learned adjacency matrix, the graph convolution layer used in Graph WaveNet can be represented as:

Z=

K k=0

PkfXWk1+PkbXWk2+A˜kaptXWk3. (2.31) WherePf andPbis the forward transition matrix and backward transition ma- trix defined based on the given graph adjacency matrix. In particular, Pf = A/rowsum(A), andPb=AT/rowsum(AT). The superscriptkis the power series.

In each spatio-temporal layers, two TCN layers are adopted to extract tempo- ral features and combined with an output gate. For a given inputX ∈RN×D×S, the expression takes the form:

h=g(Θ1⋆X +b)⊙σ(Θ2⋆X +c),

where Θ1,Θ2,b, and c represent model parameters, ⊙ signifies the element- wise product,g(·)is the activation function for outputs, andσ(·)denotes the sigmoid function, which determines the proportion of information conveyed to the subsequent layer.

2.5.1.4 ASTGNN

Unlike previous studies we listed above,Attention based Spatial-Temporal Graph Neural Network (ASTGNN)proposed by Guo et al. 2021 [54] heavily depends on the attention mechanism and transformer model as shown in Figure 2.6. Par- ticularly, a novel self-attention mechanism was devised within the temporal dimension, adept at harnessing local context and tailored for numerical sequence representation transformation. This empowers the prediction model to capture traffic data’s temporal dynamics and benefit from global receptive fields, advan- tageous for long-term forecasting. In the spatial dimension, a dynamic graph convolution module was developed, utilizing self-attention to dynamically cap- ture spatial correlations. Moreover, periodicity was explicitly modeled, and spatial heterogeneity was encapsulated through embedding modules.

Temporal-trend aware multi-head self-attention is a mechanism developed to tackle the challenge of local trend agnostics commonly found in traditional multi- head self-attention models when forecasting numerical data. This method takes into account the local context and can be regarded as a form of Convolutional Self-Attention. The projection operations on queries and keys are replaced with

Advanced Spatio-temporal Data Mining Models 59

Figure 2.6: The architecture of ASTGNN [54]. Successive temporal trend-aware self-attention blocks and spatial dynamic GCN blocks are alternatively combined in both the encoder and decoder.

1D convolutions, enabling the model to recognize the local changing trends concealed within traffic data series. By integrating this trend-aware self-attention mechanism, the model can appropriately align points based on their local trends, resulting in more precise long-term forecasts.

Spatial dynamic graph convolution (DGCN) is a method designed to capture dynamics across the spatial dimension in traffic networks by adaptively adjusting the correlation strengths among nodes. Traditional GCN are time-invariant, mean- ing that the weight matrix is constant for a given graph. However, this approach fails to capture the changing correlations among nodes in traffic networks. DGCN addresses this issue by employing self-attention to dynamically calculate the spatial correlation strengths among nodes based on their input representations.

The spatial correlation weight matrix, which represents the correlation strength between nodes, is then utilized to adjust the static weight matrix with an element- wise dot-product operation. Consequently, the dynamic graph convolution blocks

60 A Survey of Spatio-temporal Data Mining

aggregate neighbor information based on the varying correlation matrix, resulting in a spatially informed output.

2.5.2 Next POI Recommendation Models