Recurrent Neural Networks and Their Variants

A Survey of Spatio-temporal Data Mining

2.4 Preliminary of Deep Learning Methods in STDM

2.4.1 Recurrent Neural Networks and Their Variants

40 A Survey of Spatio-temporal Data Mining

10) Video Processing:In the domain of video processing, spatio-temporal data mining plays a significant role in extracting valuable insights and patterns from large volumes of video data. The typical ST data in this domain consists of sequences of images or frames captured over time, which together form a video.

Each frame contains spatial information in the form of pixel values, while the temporal aspect is captured through the chronological order of frames. Video processing in the context of STDM can be employed for various purposes, such as video summarization, object tracking, motion detection, and activity recognition.

By analyzing the spatio-temporal patterns within video data, researchers and practitioners can develop algorithms to identify and classify objects or actions, detect and track movement, and even predict future events or behaviors. This application domain is particularly relevant in areas such as surveillance, sports analytics, traffic management, and human-computer interaction [81, 121].

Preliminary of Deep Learning Methods in STDM 41

dependencies and temporal patterns within sequences, making them well-suited for various applications. However, standard RNNs face challenges with vanishing or exploding gradients when learning long-term dependencies, leading to the development of RNN variants, such asLong Short-Term Memory (LSTM)networks and Gated Recurrent Units (GRUs). These RNN variants introduce specialized gating mechanisms that facilitate the learning of longer-range dependencies, improving the overall performance and stability of the network.

In the context of spatio-temporal data mining, RNNs have been widely uti- lized to model and analyze the temporal aspects of the data, capturing complex dependencies and patterns over time, and subsequently improving the overall performance of the models in various applications.

2.4.1.1 Recurrent Neural Networks

The central idea behind RNNs is the incorporation of recurrent connections, which enable the network to retain and process information from previous time steps, thus giving the model a form of memory. The basic RNN architecture can be mathematically described as follows:

ht =σ(W_xhxt+W_hhh_t−1+b_h) (2.1)

y_t =W_hyh_t+b_y (2.2)

wherex_tdenotes the input at time stept,h_trepresents the hidden state at time step t, and y_t is the output at time step t. The weight matrices W_xh,W_hh, and W_hyare the learnable parameters of the model, andb_handbyare the bias terms.

The functionσrepresents the activation function, which is typically a non-linear function such as the hyperbolic tangent (tanh) or the sigmoid function.

While RNNs have shown promising results in various applications, they suffer from a well-known issue called the vanishing gradient problem, which hinders their ability to learn long-range dependencies in the input data.

2.4.1.2 Long Short-Term Memory

Long Short-Term Memory (LSTM)networks are a variant of RNNs specifically designed to overcome the vanishing gradient problem that plagues traditional RNNs. Introduced by Hochreiter and Schmidhuber in 1997 [62]. The key innova- tion of LSTMs is the introduction of memory cells and gating mechanisms, which

42 A Survey of Spatio-temporal Data Mining

allow the network to selectively remember and forget information from previous time steps.

An LSTM consists of memory cells and three gates: input, output, and forget.

These gates control the flow of information within the network and are defined by the following equations:

i_t =σ(W_xix_t+W_hih_t−1+b_i) (2.3) f_t= σ(W_{x f}x_t+W_{h f}h_t₋₁+b_f) _(2.4) ot =σ(Wxoxt+W_hoh_t−1+bo) (2.5)

ct=tanh(Wxcx_t+W_hch_t−1+b_c) (2.6) whereit, ft, andotrepresent the input, forget, and output gates, respectively, at time stept. The weight matricesW_xi,W_hi,W_{x f},W_{h f},W_xo, andW_hoare the learnable parameters of the gates, andb_i,b_f, andb_o are the corresponding bias terms. The functionσdenotes the sigmoid activation function, andxtandht−1represent the input and previous hidden state, respectively.

The memory cell state at time stept, denoted by ct, is updated using the following equation:

c_t = f_t⊙c_t−1+i_t⊙c˜_t (2.7) where⊙denotes element-wise multiplication, and ˜c_tis the candidate cell state, which is a function of the input and previous hidden state. Finally, the hidden state at time stept,ht, is calculated as follows:

h_t=o_t⊙tanh(c_t) (2.8)

Their ability to learn and maintain long-range dependencies makes them particularly suitable for tasks that require modeling complex temporal patterns.

Furthermore, LSTMs have inspired the development of other gated RNN archi- tectures, such as GRUs, which also effectively capture long-range dependencies but with a simpler structure.

2.4.1.3 Gated Recurrent Units

Gated Recurrent Units (GRUs)were proposed by Cho et al. in 2014 [32] as a simpli- fied alternative to LSTMs. Similar to LSTMs, GRUs are designed to address the vanishing gradient problem commonly encountered in traditional RNNs. GRUs

Preliminary of Deep Learning Methods in STDM 43

employ gating mechanisms to selectively remember and forget information, al- lowing them to capture long-range dependencies effectively. The main difference between LSTMs and GRUs is that the latter has a simpler structure, which results in fewer learnable parameters and consequently faster training.

A GRU consists of two gates: update and reset. These gates control the flow of information within the network and are defined by the following equations:

zt=σ(Wxzxt+W_hzh_t−1+bz) (2.9) r_t =σ(W_xrx_t+W_hrh_t−1+b_r) (2.10) wherez_tandr_trepresent the update and reset gates, respectively, at time stept.

The weight matricesW_xz,W_hz,W_xr, andW_hr are the learnable parameters of the gates, andbzandbrare the corresponding bias terms. The functionσdenotes the sigmoid activation function, andx_tand h_t₋₁ represent the input and previous hidden state, respectively. The candidate hidden state at time stept, denoted by h˜t, is calculated as follows:

h˜t=tanh(Wxhxt+W_hh(rt⊙h_t−1) +b_h) (2.11) where⊙denotes element-wise multiplication, andW_xh,W_hh, andb_hare the learnable parameters and bias term associated with the candidate hidden state. Finally, the hidden state at time stept,h_t, is updated using the following equation:

h_t= (1−z_t)⊙h_t₋₁+z_t⊙h^˜_t (2.12) Compare to LSTMs, GRUs have demonstrated competitive performance, making them a popular choice for tasks that require modeling complex temporal patterns.

Dalam dokumen Spatio-temporal Graph Representation Learning (Halaman 53-56)