• Tidak ada hasil yang ditemukan

Spatio-temporal Data Types and Instances

A Survey of Spatio-temporal Data Mining

2.2 Spatio-temporal Data Types and Instances

2.2.1 Data Types

In a wide range of real-world applications, various spatio-temporal (ST) data types are encountered, each differing in terms of data collection and representation methods. This leads to distinct categories of STDM problem formulations and necessitates identifying the appropriate ST data type for effectively utilizing STDM techniques. Building upon and expanding the classification presented by Atluri et al. [3], here we categorize spatio-temporal data into five distinct types:

(1) event data, representing discrete events occurring at specific locations and times, such as crime incidents within a city; (2) trajectory data, capturing the movement paths of objects, for example, the patrol routes of police surveillance vehicles; (3) point reference data, which involves measuring continuous ST fields at dynamic spatio-temporal reference sites, like weather balloon-collected surface temperature measurements; and (4) raster data, wherein observations of ST fields are collected at fixed cells within an ST grid, like fMRI brain activity scans; and (5) video data, capturing a continuous visual representation of events and objects unfolding in both space and time. While the first two data types (events and trajectories) document discrete events and objects, the latter three (point reference, rasters, and videos) collect information on continuous or discrete ST fields.

1) Event Data:Event data in the context of spatio-temporal analysis typically refer to discrete occurrences in both space and time, usually with well-defined spatial locations and specific timestamps. These events may stem from a variety of sources, including social media, sensor networks, or natural occurrences such as

34 A Survey of Spatio-temporal Data Mining

earthquakes and forest fires. For instance, a disease outbreak event can be charac- terized by a tuple(oi,ti,vi), whereoi represents disease outbreak conditions such as number of people infected,virefers to the outbreak location, andti indicates the timestamp. Figure 1 provides a visual representation of event data. Analyzing event data frequently involves identifying patterns, trends, and anomalies to gain a deeper understanding of the underlying processes and interconnections between events [34].

2) Trajectory Data:Trajectory data represent the movement of objects in space and time and are often composed of sequences of spatial locations with associated timestamps [172]. Examples of trajectory data include the movement of vehicles, animals, or people captured using GPS or other tracking technologies. Analyzing trajectory data enables researchers to study various aspects of mobility, such as patterns, behaviors, and interactions among moving objects [2]. This analysis is crucial in various domains, including transportation, ecology, and urban planning.

3) Point Reference Data:Point reference data, also known as point pattern data, consist of a set of spatial points that represent the locations of objects or events in a study area [38]. These data usually contain spatial coordinates and may also include additional attributes or temporal information. Point reference data are commonly used in spatial statistics to model spatial processes, such as the distribution of plants, animals, or disease occurrences, and to identify clusters or hotspots of activity.

4) Raster Data: Raster data represent spatial information in a grid format, where each grid cell, or pixel, contains a value that corresponds to a specific attribute or measurement . Raster data can be generated from various sources, such as remote sensing imagery, digital elevation models, or interpolated point data. Spatio-temporal raster data are particularly useful in environmental studies, where they can be used to model and analyze changes in land cover, vegetation, and climate over time.

5) Videos:Video data can be considered a specific type of spatio-temporal data, as they capture both spatial and temporal information through sequences of images, or frames [81]. Video analysis has become increasingly important in various fields, such as surveillance, sports, and human activity recognition, as it allows researchers to extract valuable insights from the captured scenes. Deep learning techniques, particularly CNNs and RNNs have been widely employed for video-based spatio-temporal analysis, enabling the detection and recognition of objects, actions, and events in complex and dynamic environments [149].

Spatio-temporal Data Types and Instances 35

2.2.2 Data Instances

In data mining and machine learning algorithms, the fundamental unit of data, known as a data instance, typically comprises a collection of observed features, sometimes accompanied by supervised labels. However, when examining ST data, the definition of instances becomes more complex and multifaceted, resulting in various STDM formulations. This section delves into the five main categories of ST instances commonly encountered in STDM problems, including points, trajectories, time series, spatial maps, and ST rasters [3, 131]. These instances serve as the basis for analyzing a wide array of problems and methods in STDM, which will be explored in greater detail later. Figure 2.1 shows the connections between different data types, data instances, data representations and popular DL methods.

ST Events

Trajectories

ST Point Reference

ST Raster

Videos

Points

ST Data Type ST Data Instance

Trajectories

Time Series

Spatial Maps

ST Raster

Data Representation

Sequence

Graph

Matrix

Tensor

DL Method

CNN

GNN

RNN, LSTM, GRU

Others Transformer

Hybrid

Figure 2.1: A diagram illustrating the various categories of ST data instances that can be derived from ST data types. Furthermore, the visual highlights the possible representations of each data instance and the popular DL methods employed.

1) Points:Points are tuples consisting of spatial and temporal details for indi- vidual observations, as well as any related variables such as the types of crimes or severity of epidemic outbreak. They are commonly used in STDM analyses for event data or point reference data and can also form trajectories. For instance, a trajectory can be divided into multiple discrete points to determine the number of trajectories that have traversed a specific area during a particular time period.

2) Trajectories:Trajectory is a unique category of ST data instances which involves moving objects and can be depicted as multi-dimensional sequences

36 A Survey of Spatio-temporal Data Mining

with time-ordered lists of visited locations and any additional recorded data.

3) Time series:Time series act as data instances in two ST data situations: ST raster data, where observations at each spatial cell in the ST grid are examined, and trajectory data, where multiple dimensions represent spatial identifiers (e.g., location coordinates) followed by moving objects over time.

4) Spatial maps:Spatial maps data instances encompass data observations from all sensors within the entire ST field at each time stamp. For instance, traffic speed readings from all loop sensors placed on an expressway at timet constitute a spatial map data instance. On the other hand, ST raster data instances include measurements covering the complete range of locations and time stamps. In essence, an ST raster is composed of a collection of spatial maps.

5) ST raster data:ST raster data can be regarded as individual data instances in STDM analyses, covering measurements for all locations and time stamps. Various data instances can be derived from ST raster data based on specific applications and analytical needs. Firstly, for time series mining tasks, the measurements at a specific ST grid can be regarded as a time series. Secondly, the measurements of an ST raster at each time stamp can be treated as a spatial map. Lastly, an alternative approach involves considering the entirety of the measurements across all locations and time stamps for analysis, in which case the ST raster data itself serves as a data instance.

2.2.3 Data Representations

Regarding the previously mentioned five types of ST data instances, four common data representations are employed as input for various deep learning models, which include sequence, graph, 2D matrix, and 3D tensor, as illustrated in yellow of Figure 2.1. Different deep learning models necessitate distinct data representa- tion types as input. Consequently, the representation of ST data instances depends on the specific data mining task being examined and the chosen deep learning approaches.

A single ST point is typically represented as a tuple or, in the context of machine learning, a vector. However, when dealing with a set of points, they can be represented using tensors where the first axis represents the temporal dimension, the second axis represents the spatial dimension, and the third axis refers to the feature vector. In most cases of STDM, ST points are not found in isolation, but rather as part of a set, which forms raster data.

Spatio-temporal Data Mining Applications 37

Both trajectories and time series can be represented as sequences, although tra- jectories are sometimes depicted as a matrix with two dimensions corresponding to the row and column IDs of the grid ST field. Each matrix entry value indicates whether the trajectory traverses the corresponding grid region, which is a repre- sentation often used to facilitate the application of CNN models [126, 95]. While graphs can also be represented as matrices, we distinguish between graph and image matrix as separate data representation types. This distinction arises because graph nodes do not follow Euclidean distance like image matrices, leading to entirely different methods for handling each representation type.

Spatial maps can take the form of either graphs or matrices, depending on the specific use case. For instance, in predicting urban traffic flow, data from a city’s transportation network might be depicted as a graph [79]. or a cell region-level traffic flow matrix [96, 115].

Raster data, on the other hand, are usually portrayed as 2D matrices or 3D tensors. When using a matrix, the two dimensions represent locations and time steps, whereas a tensor consists of three dimensions: region, time stamp, and features. Although matrices offer a simpler data representation format compared to tensors, they lack the ability to capture spatial correlation information among locations. Nevertheless, both formats are widely employed for representing raster data [83, 174].