Automation in Construction

(1)

Contents lists available atScienceDirect

Automation in Construction

journal homepage:www.elsevier.com/locate/autcon

Interaction analysis for vision-based activity identi ﬁ cation of earthmoving excavators and dump trucks

Jinwoo Kim

^a

, Seokho Chi

^a,b,⁎

, Jongwon Seo

^c

aDepartment of Civil and Environmental Engineering, Seoul National University, 1 Gwanak-Ro, Gwanak-Gu, Seoul, Republic of Korea

bThe Institute of Construction and Environmental Engineering (ICEE), 1 Gwanak-Ro, Gwanak-Gu, Seoul, Republic of Korea

cDepartment of Civil and Environmental Engineering, Hanyang University, 222 Wangsimni-Ro, Seongdong-Gu, Seoul, Republic of Korea

A R T I C L E I N F O

Keywords:

Vision-based Activity identiﬁcation Interaction

Earthmoving operatrion Excavator

Dump truck

A B S T R A C T

Activity identiﬁcation is an essential step to measure and monitor the performance of earthmoving operations.

Many vision-based methods that automatically capture and explain activity information from image data have been developed with economic advantages and analysis efficiency. However, the previous methods failed to consider the interactive operations among equipment, and thus limited the applicability to the operation time estimation for productivity analysis. To address the drawback, this research developed a vision-based activity identification framework that incorporates interactive aspects of earthmoving equipment's operation. This framework included four main processes: equipment tracking, action recognition of individual equipment, interaction analysis, and post-processing. The interactions between excavators and dump trucks were examined due to its significant impacts on earthmoving operations. TLD (Tracking-Learning-Detection) was adapted to track the heavy equipment. Spatio-temporal reasoning and image differencing techniques were then implemented to categorize individual actions. Third, interactions were interpreted based on a knowledge-based system that evaluates equipment actions and proximity between operating equipment. Lastly, outliers or noisy results were filtered out considering work continuity. To validate the proposed framework, two experiments were performed:

one with the interaction analysis and the other without the analysis. 11,513 image frames from actual earthmoving sites in total were tested. The consequent average precision of activity analysis was enhanced from 75.68% to 91.27% after the interaction analysis was applied. In conclusion, this research contributes to identifying critical elements that explain interactive operations, characterize the vision-based activity identiﬁcation framework, and improve the applicability of the vision-based method for the automated equipment operations analysis.

1. Introduction

Activity identification classifies types of equipment operations such as working, traveling, and idling. It is vital to monitor the performance of on-site earthmoving operations, as sequential information of equipment's actions can serve as indicators to calculate direct work rates and cycle durations[1–4]. Based on the information gained from activity identification, site managers can perform project-related decision makings (e.g., resource allocation, path planning, scheduling, and site layout analysis) to be more efficient and operative[5–9]. Such insight can also be utilized to assess productivity and reduce idling time of the earthmoving equipment; slight reduction of the operation cycle has significant impacts on the productivity as the operations are normally repetitive by nature[10].

In the past, activity identiﬁcation and analysis have commonly been

performed manually. However, such human-dependent approach has several salient shortcomings; they are expensive and time-consuming, and thus prone to yield an inconsistent set of data[5]. To overcome such limitations, automated activity identiﬁcation systems have been introduced[11–12]. One of the most popular systems is a radio-based method that obtains on-site data using RFID (Radio Frequency Identi- ﬁcation), GPS (Global Positioning System), UWB (Ultra-Wideband), and BLE (Bluetooth Low Energy) [13–15]. The radio-based method cate- gorizes equipment activities based on equipment types, locations, and movements (i.e., acceleration, velocity, orientation). On the other hand, many researchers also pay attention to a vision-based method that capture and explain activity information from dataset in forms of images[16–22]. Image data contains detailed information about the equipment's action as well as types and locations[23]. The action information is a critical cue for classifying operation types. That is the

https://doi.org/10.1016/j.autcon.2017.12.016

Received 9 May 2017; Received in revised form 5 November 2017; Accepted 7 December 2017

⁎Corresponding author at: Department of Civil and Environmental Engineering, Seoul National University, 1 Gwanak-Ro, Gwanak-Gu, Seoul, Republic of Korea.

E-mail addresses:[email protected](J. Kim),[email protected](S. Chi),[email protected](J. Seo).

Available online 02 January 2018

T

(2)

information that the radio signal is difficult to provide; the radio signal often has difficulty identifying activity types precisely when equipment is located on the same position but performs different operations such as waiting for soil loading in case of dump trucks or excavators' rotating without center of gravity changes. Furthermore, increased accessibility to image data, in the line with the installment of CCTVs on construction sites for monitoring purposes [9,23], makes vision based approaches more practical and feasible. For instance, it is legalized to install cameras on construction sites for on-site monitoring in Korea according to [24].

Due to such aforementioned increased attention and inherent advantages, a range of vision-based methods have been developed and they showed promising performance. However, the previous methods did not fully investigate interactive operations among equipment, which is crucial for real-world activity identification and productivity analysis. To address the drawbacks, this research developed a vision- based activity identification framework that considers interactive operations of earthmoving equipment. Fig. 1illustrates the concept of interaction. An excavator and a dump truck on the right hand side of Fig. 1work together by loading and unloading soils from each other, even though the dump truck‘stops’. If the other dump arrives at the working area as shown in the left hand side, its activity should be ca- tegorized as ‘idling’, according to their concurrent working status. It indicates that the activity states of equipment can influence and be influenced by other equipment.

Many researchers also perceived the importance of interactive operations and several studies made eﬀorts to develop vision-based

activity identification methods[10,25]. However, previous studies focused on certain qualities of interactions (e.g., interactions based on the equipment's proximity) under controlled environments (e.g., thefixed number of equipment in the image frame). It limited their research findings from providing comprehensive explanation about the complex interactions between multiple equipment. The present study aims tofill these existing knowledge gaps. First, this study reviews and identifies critical elements for understanding interactive operations of excavators and dump trucks. Second, it proposes the vision-based framework for automated activity analysis of the heavy equipment through further considerations of identified elements. Third, this framework can provide continuous information for the types of equipment operations, which is fundamental data for calculating the cycle time and measuring the equipment productivity. Finally, these approaches are expected to improve the practicality of vision-based activity analysis on actual earthmoving sites.

2. Theoretical background and related works

2.1. Earthmoving operations

Earthmoving operations are fundamental construction processes to relocate soil from one site to the other[26–27]. A single earthmoving event involves series of speciﬁc activities such as excavating, loading, hauling, and dumping the soil. In order to successfully perform such earthmoving operations, excavators and dump trucks are essential input resources.

Excavators are assigned to the construction site to cut soil and hand it over to dump trucks. During the operational process, excavators can carry out one out of three types of activities as illustrated inFig. 2; they are‘working’,‘traveling’, and‘idling’[28–31]. Excavators are‘working’ under the cycle of swinging, loading, hauling, and unloading; within this cycle, excavators go over speciﬁc individual actions, including

‘scooping’,‘rotating’, and‘dropping’[32]. These sets of individual ac-

tions mean that excavators can either work alone (e.g., soil preparation and excavation) or interact with dump trucks (i.e., soil-loading).‘Tra- veling’activity indicates that excavators are‘moving’to other positions, such as working areas or parking stations[33]. When excavators are neither‘working’nor ‘traveling’, it can be interpreted that they are

‘idling’with‘stopping’actions[34].

In the meantime, dump trucks mainly move to deliver soil from one site to the other [31]. Two types of activities are involved; one is

Idle

Work Work

Fig. 1.Examples of interactions between earthmoving equipment.

Activity: work, travel, and idle Action: move, scoop/rotate/drop, and stop

Load soil Haul soil

Swing bucket Unload soil to dump truck

Wait for dump truck

Move to other sites

Excavator

One-to-one interaction

Stay in queue and wait for the next soil filling

order Idle (Stop) Travel to

dumping area

Return to

working area Dump soil

Work (Move & Stop)

Other working excavator and dump

truck

Dump truck

Group interaction Fill soil

from excavator Work (Scoop/Rotate/Drop)

Idle (Stop)

Travel (Move)

Fig. 2.Earthmoving activities and actions of earthmoving equipment.

(3)

‘working’and the other is‘idling’. One crucial nature of dump trucks' operational move is that their‘working’status can be both sedentary and mobile. Unlike excavators, traveling of dump trucks falls under

‘working’category since dump trucks travel to relocate loads of soil

[35]. The working process of dump trucks is composed of specific actions such asfilling, traveling, and dumping[5,30–31,36]. In case of filling and dumping, trucks are‘stopping’. In other words, when they

are‘working’, it can either be‘moving’(for traveling) or‘stopping’(for

ﬁlling and dumping) [37]. However, dump trucks also may take

‘stopping’actions when its activity status shifts to ‘idling’[38]; this could happen when they wait for the next soilﬁlling order, when other excavators and dump trucks are busy (Figs. 1, 2). This implies that dump trucks'‘stopping’motion may occasionally indicate‘idling’status instead of‘working’.

Actions and activities of excavators and dump trucks also imply that there are signiﬁcant amount of interactions between those two types of equipment. When excavators are ‘working’, particularly interacting with dump trucks, they unload soil to dump trucks. Likewise, dump trucks also ﬁll soil from excavators when they are ‘working’. Both equipment is able to fully perform their tasks with those one-to-one interactions[39–40]. In case that excavators and dump trucks co-exist, their individual actions are also consistent that the excavator is

‘scooping/rotating/dropping’and the dump truck is‘stopping’; to put it differently, action consistency between equipment needs to be met in this case for explaining filling operations. When action inconsistency happens, (e.g., both excavator and dump truck are stopping), this action is classified as‘idling’. In addition to the consistency factor, another key concept to illustrate the interaction is the proximity between equipment. This means that excavators and dump trucks should stand within the effective distance of excavators to hand over soil (i.e., arm's length).

This aspect enables to ﬁlter out the unrealistic case; for example,

‘stopping’dump trucks are 30 m apart from‘scooping/rotating/dropping’excavators although their actions are consistent. In addition to these one-to-one interactions, group interactions can also occur among more than three equipment involved in the operation process. When two dump trucks keep‘stopping’nearby an excavator, one of them is normallyﬁlled with soil. In this case, the two dump trucks take same actions (‘stopping’); however, their activities are diﬀerent (‘working’ and‘idling’) as shown inFig. 1.

To summarize, excavators perform three operation types of

‘working’, ‘idling’, and ‘traveling’, while dump trucks performs two operation types of ‘working’ and ‘idling’. Those activities for each equipment type are performed through diverse individual actions and through one-to-one or group interactions within the working area [36,41–44]. Under the interactive operations, types of equipment activities can be eﬀectively inferred using both individual aspects (i.e., object type, location, and individual action) and interactive characteristics (i.e., co-existence, proximity, and action consistency).

2.2. Related works and limitations

Numerous computer vision techniques have been developed in order to accurately recognize target object's actions and activities.

There are three main categories: space-time, shape-based, and rule- based approaches [45]. Space-time approaches recognize actions by using spatio-temporal features such as optical ﬂows and trajectory changes within consecutive image frames [46–48]. Shape-based approaches determine actions by applying gesture/appearance-based features such as oriented histogram, bags-of-rectangles, and skeleton models [49–53]. Lastly, rule-based approaches utilize pre-deﬁned knowledge of behaviors (i.e., a set of if-then rules) and recognize actions[54–55].

Many researchers have made eﬀorts to apply such computer vision approaches to construction sites, especially for equipment action and activity recognition. Zou and Kim[33], for instance, proposed a method to calculate hydraulic excavator's idle time using hue, saturation, and

value color spaces. Kinematic features were also used to identify excavator's activities by representing articulated shapes of the excavator [28]. The study determined activity types by analyzing distance and elevation changes of the detected parts (e.g., bucket, body, and joint) of the excavator. Gong and Caldas[56]analyzed cycle time of concrete placement operations by detecting a concrete bucket and tracking its movement. Working cycle of tower crane was also investigated by[57], through tracking three dimensional locations of its jibs and body. On the other hand, machine learning based methods (e.g., Bags-of-Fea- tures) were implemented by [58,5] for the purpose of recognizing equipment's individual actions. The methods, however, required a large amount of training datasets in order to learn robust action recognizers for various operation types. For analyzing the interactions between heavy machinery, Azar et al.[10]developed a vision-based approach to estimate the dirt-loading cycle of an excavator and a dump truck by detecting one-to-one interactions. However, they faced diﬃculties not only in continuously identifying the types of equipment operations but also in recognizing group interactions between multiple equipment.

Recently, Bugler et al.[25]proposed an automated productivity assessment method of earthmoving operations through photogrammetry and video analysis. It determined activity types by considering the proximity between equipment and their actions; nonetheless, excavator's‘scooping/rotating/dropping’actions, which are fundamental information for classifying ‘working’ excavators even though their center of gravity is in theﬁxed position, were not able to be explained.

The aforementioned research showed promising results for the construction applications and built up strong foundations for automated activity analysis based on vision techniques. In the early stage, the researchers had attention to identifying actions of single equipment such as excavators and tower cranes. Currently, many researchers have made efforts to analyze site productivity based on the identified information of equipment's actions. However, challenging issues still remain on the classification of equipment operations in complex, real- world working environments. One major issue is that most previous studies mainly focused on individual characteristics (e.g., equipment shapes, orientations, and locations) of single equipment, but did not fully consider the interactions among earthmoving equipment for the analysis. This issue limits the applicability of the previousfindings to the actual construction context, since interactive aspects (e.g., co-existence, proximity, and action consistency between equipment) as well as individual features (e.g., object types, locations, and actions) col- lectively function as significant cues for activity identification. For instance, without the interaction analysis, it is difficult to explain whether

a‘stopped’dump truck is‘working’(ﬁlling soil from an excavator) or

‘idling’(waiting for the next soilﬁlling order due to excavator-to-dump truck interaction). Such limitations have led previous studies to make only partial technical advancements.

3. Research framework

The research framework inFig. 3highlights the interaction analysis to classify types of equipment activities. The framework includes four main modules: equipment tracking, action recognition of individual equipment, interaction analysis, and post-processing. First, the locations of excavators and dump trucks are tracked over time to collect two types of data: equipment types and their trajectories. Second, individual actions of the tracked objects are recognized using the spatio-temporal reasoning and the image diﬀerencing technique. Third, interactions of excavators and dump trucks are then analyzed based on the factors such as co-existence, proximity, and action consistency. Finally, post-processing is conducted to reduce recognition errors by considering continuity of equipment activities. Details of each module are described in this section.

(4)

3.1. Construction equipment tracking

The purpose of this module is to obtain information of equipment types and trajectories. An extended version of TLD (Tracking-Learning- Detection), developed by[59,60], was adapted and customized in this research to track multiple construction equipment in the long term. TLD consists of two main processes: functional integration and online learning[61].Fig. 4illustrates the methodology of the adapted TLD.

Functional integration localizes target objects using both a pre- trained detector and a tracker that analyzes sequential images. This aspect enables to track construction equipment with dynamic movements and high interclass/intraclass variations. The detector learned from a well-developed training data enables to manage sudden changes of objects and environments, since it independently detects and tracks target objects in each frame[62–64]. Therefore, long-term tracking of equipment becomes possible even though construction equipment involves series of abrupt characteristic changes in terms of colors, shapes, orientations, and velocities. However, it is very diﬃcult to pre-develop high quality training datasets[65]. Once new or unexpected events that are not part of a training data arise, detection errors may occur. On the other hand, the tracker analyzing sequential image frames can compensate such detector errors since the sequential analysis localizes the most similar region among the consecutive motion-based and feature- representation-based images. In this way, the tracker is able to adapt to gradual changes of object characteristics; hence, possible detection failures can be prevented by the adaptation of the tracker. Reversely, the tracker has also shortcomings that can be mediated by the detector;

it is sensitive to sudden changes of object's movements and shapes.

Since the sequential analysis is based on similarity of object's motion and representation, noise (e.g., background shadow) brings negative

impacts to recognition; hence, the tracker can easily miss target objects without re-tracking. Such failures can be counterbalanced with the detector's strengths which are robustness to abrupt changes and dynamic movements.

Along with functional integration, online learning is also a key process for long-term tracking of construction equipment[61]. It is one of potential machine learning techniques to generate and reinforce a detector with sequentially updated training data [66–67]. To track construction objects using a detector, major challenging issue is to collect high quality training datasets that covers high interclass/intraclass variations [61]. In this study, the training data was newly generated on site through the functional integration process, in attempt to address this challenge. To be speciﬁc, false positive and false negative errors of the detector were added onto training data as negative samples to prevent the occurrence of similar errors. On the other hand, true positive results were added as positive samples with higher weights for recognition.

In sum, the functional integration enhances the detection and tracking performance by compensating weaknesses of a detector and a tracker. The online learning trains and customizes a detector, and de- velops and updates training data in real time without pre-works. Based on the two processes, TLD is able to recognize and track construction equipment that has dynamic motion changes and various characteristics (e.g., shapes, colors, and textures). Since it was originally developed for ﬂat objects such as human eyes or license plates in less changeable environments, the authors customized it to improve its practical applicability to the construction site. The technical details of the customization can be found from the authors' other publication:

Kim and Chi[61].

Equipment Tracking

Online Learning Tracking Detection

Functional Integration

Tracking Results

Detection Results

Positive Negative

Truck Exc.

ID, Type, Location

Action Recognition of Individual Equipment

Stop Drop

- =

Individual Action = {‘scooping/rotating/dropping’,

‘moving’, ‘stopping’}

Interaction Analysis

Proximity

&

Co-existence

Action consistency

&

Work Work

Activity = {‘idling’, ‘traveling’, ‘working’}

Post-processing

Activity type at every t

Fig. 3.Research framework with interactive operations.

Functional Integration Online Learning

Tracking Results Tracker

Detection Results Detector

Integrated Results

Training Datasets

Positive Negative

Learning

Fig. 4.Equipment tracking methodology.

(5)

3.2. Action recognition of individual equipment

This module recognizes individual actions of construction equipment using spatio-temporal reasoning and image diﬀerencing (Fig. 5).

For spatio-temporal reasoning, object's centroid is compared within sequential images to classify‘moving’or‘non-moving’actions. Image diﬀerencing is then performed to detect shape changes of excavators and classify ‘non-moving’ actions further into sub-action categories:

‘scooping/rotating/dropping’and‘stopping’.

The centroid change between image frames can be calculated based on the tracking results (locations over times). The centroid indicates that the center point of a 2D bounding box of an object and its change ratio is calculated by Eq. (1) to consider bi-directional movements.

Additionally, in order to reduce the scale dependency on the equipment-to-camera distances, the centroid's coordination change is divided by the diagonal length of the bounding box for tracking (Eq.(1)).

= x₊ −x + y₊ −y Centroid change ratio ( _i _i)L (_i _i)

i

1 2

2

(1) where,

xi: thex-axis position of centroid ati-th tracking result;yi: they-axis position of centroid ati-th tracking result;Li: the diagonal length of i-th tracking result.

Pre-deﬁned threshold value then determines whether the tracked object is ‘moving’ or‘non-moving’by comparing calculated centroid change ratio. The threshold value is deﬁned based on the geometric

relationship between the global coordination and the local (image) coordination (Eq.(2.1)).

⎡

⎣⎢

⎤

⎦⎥ =

⎡

⎣

⎢

⎤

⎦

⎥

⎥ s

x

y T

X Y 1 Z

1 (2.1)

where,

s:scale factor;T: transformation matrix; [x,y]: pixel coordination;

[X,Y,Z]: world coordination.

The equation explains that 3D global point P(X, Y, Z) can be transformed to the point p(x,y) in the image plane by the transformation matrixT. With the decomposition ofT, Eq. (2.1)can be for- mulated as below (Eq.(2.2)).

⎡

⎣⎢ ⎤

⎦⎥ =

⎡

⎣

⎢

⎤

⎦

⎥

=

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥

⎡

⎣

⎢

⎤

⎦

⎥

⎡

⎣

⎢

⎢⎢

⎢

⎤

⎦

⎥

⎥⎥

⎥

⎡

⎣

⎢

⎤

⎦

⎥

⎥ s

x

y KT R t

X Y Z

f c

r r r

t t t

X Y 1 Z

(1)[ | ] 1

0 0 0 0 1

1 0 0 0 1 0 0 0 1

0 0

0 0 0 0 1 1

pers

x x

y y

x y z 11

21 31

12 22 32

13 23 33

(2.2) where,

[R|t]: rigid project matrix;Tpers(1): normalization matrix;

K: transformation matrix of normalized coordination to pixel coordination.

fx, fy: focal length at each direction;cx,cy: principal point at each direction;

Object type?

Move

Calculate the centroid change

Perform image differencing

Calculate the sum of absolute value

of pixel changes

Stop

Calculate the centroid change

Stop Move

(ID, type, (x,y))

Excavator Dump truck

Yes

No

Yes No

Centroid change < threshold?

Centroid change > threshold?

Pixel change < threshold?

Scoop/

Rotate/

Drop

Fig. 5.Methodology for action recognition of individual equipment.

(6)

rij: rotated angle ati-jdirection;tx,ty,tz: translation at each direction.

By integratingTpers(1) and [R|t], Eq.(2.2)is expressed as Eq.(2.3).

⎡

⎣

⎢ ⎤

⎦

⎥ =

⎡

⎣

⎢

⎤

⎦

⎥

⎥ s

x

y K R t

X Y 1 Z

[ | ]

1 (2.3)

where,

K: intrinsic parameter; [R|t]: extrinsic parameter.

Finally, 3D global coordination is transformed to the local image coordination using camera's intrinsic and extrinsic parameters, which can be obtained during calibration. According to the official equipment specifications of Hyundai, Volvo, and Doosan manufacturers, the excavator's minimum moving speed and average diagonal length are assumed as 3 km/h and 4–7 m; the range of threshold value can be calculated by using Eqs. (1) and (2.3). The 10% per frame wasfinally optimized after the experiment with 11,513 images, given two frames per second. This frame rate suits well not only for tracking actions of the excavator (e.g., moving and non-moving) working with the velocity of 3 km/h but also for increasing computational efficiency for the real- time processing. After the spatio-temporal reasoning, image differencing is applied to further classify‘non-moving’into‘scooping/rotating/

dropping’or‘stopping’. The shapes of excavators are changing during

‘scooping/rotating/dropping’while the shapes do not change in case of

‘stopping’. The image differencing can detect such shape changes between consecutive image frames by calculating the sum of pixel value differences from the absolute differences (Eq.(3)).Fig. 6also explains this calculation process. To eliminate effects of different scales, the size of each image is resized to the size of previous imagefirst.

=

∑ ∑ p⁺ −p Sum ratio of absolute diffrences A

j k jk

i jk i

i 1

(3) where,

pjki: the pixel value atjrow andkcolumn in bounding box fori-th tracking image;Ai: the area of bounding box ati-th tracking result.

Dump trucks have two individual action types:‘moving’and‘stopping’. In this study, it was assumed that there is no critical shape change of the dump truck during‘stopping’, thus the analysis focused on the spatio-temporal reasoning. The centroid change rate is calculated and compared to the pre-deﬁned threshold value of 20% per frame for two frames per second (Eqs.(1) and (2.3)) under the consideration of dump truck's velocity (6 km/h).

3.3. Interaction analysis

Using the extracted information of object types, locations, and individual actions, this module analyzes interactions among equipment.

The knowledge (rule)-based decision making was used to analyze both one-to-one and group interactions. In recognition of such interactive nature, factors such as the co-existence, proximity, and action consistency between excavators and dump trucks were investigated and adopted during the classiﬁcation of equipment activity types. As a result, more precise performance indicators (e.g., cycle time and direct work rates) can be obtained; for instance, interaction information can clearly distinguish whether excavators work‘alone’or‘together’with dump trucks.

Fig. 7 illustrates the process of activity recognition. In case of

‘moving’and‘stopping’of an excavator, it is deﬁned as‘traveling’and

‘idling’respectively (Fig. 7); it is because excavators' individual action varies according to each activity. However, for actions such as

‘scooping/rotating/dropping’, the interaction analysis enables an activity identiﬁcation system to decide whether the excavators are

‘working’alone (e.g., soil preparation and excavation) or together (i.e.,

loading soil to dump trucks). When the excavators and the dump trucks are‘working’together, they should both be placed within the operating boundary of the excavators (i.e., arm's length). If there exists at least one dump truck within the same image frame, the distances between centroids of the excavator and all the dump truck are computed (Eq.

(4)). Based on the distances, the nearest dump truck to the excavator is selected.

= x −x + y −y

Distance ²( _exc_. _dump)² (_exc_. _dump)² (4)

where,

xobject type: the x-axis position of centroids of each object type;

yobject type: the y-axis position of centroids of each object type.

The threshold proximity was optimized as 150 and 200 pixels after the experiments with 11,513 images. The previous Eqs.(2.1)–(2.3)also required the threshold values for the camera's intrinsic and extrinsic parameters and the average length of excavator's arms were between 8 and 15 m. The action consistency should also be considered since the co-existence and proximity alone are insuﬃcient evidence to guarantee the interactions. Even if excavators are‘scooping/rotating/dropping’, dump trucks can‘move’; in this case, no interaction took place between two equipment. Excavators and dump trucks are ‘working’ together only when they show action consistency forﬁlling soil within the effective distance; in other words, excavators are ‘scooping/rotating/

dropping’while dump trucks are‘stopping’for interaction. Based on this principle, the proposed framework can conﬁdently decide whether

‘scooping/rotating/dropping’ excavator is interacting with a dump

truck or working alone (e.g., soil preparation).

‘Moving’dump trucks can be classiﬁed as‘working’. Yet,‘stopping’ dump trucks can either be classiﬁed as‘working’or‘idling’according to the interaction with excavators. Similar to the interaction analysis of excavators, the co-existence, proximity, and action consistency are examined in order. When an excavator is not captured in the same image,

‘stopping’dump trucks should be‘idling’. Otherwise, the distances be-

tween the dump trucks and existing excavators are calculated in 2D images (Eq.(4)). The distances are used to select the nearest dump truck within the operating boundary of each excavator. Only after se- lecting nearest and ‘stopping’dump truck, excavators' individual actions can be accurately conﬁrmed. If all the conditions are met, the

‘stopping’dump trucks are then classiﬁed as‘working’and vice versa.

These processes enable the research framework to identify‘working’

dump trucks interacting with excavators (one-to-one interactions) and

‘idling’dump trucks waiting for itsﬁlling order due to the already interacting couple (group interactions).

- =

Fig. 6.Image diﬀerencing processes.

(7)

3.4. Post-processing

The post-processing is designed tofilter out noise or other kinds of outliers from the analyzed results. The activities at each frame are classified independently by the previous modules. However, it is ob- vious that construction equipment works continuously over time, which means that one activity takes place for certain duration. Majority voting of classified activity types (Eq.(5)) was applied every 5 s to compensate misclassification errors due to real-time discrete analysis and optimize classification results (Fig. 8).

=

Post S mode S t( , ) (5)

where

S: pre-classiﬁed activity state for all frames;t: time interval for post- processing.

4. Experimental results

4.1. Data collection and description

The authors collected video stream data fromfive earthmoving sites by using CCTVs and smartphones. The collected data included not only interactive operations among dump trucks and excavators but also operations in a group of multiple dump trucks working simultaneously (Fig. 9). To represent various appearances, actions, and activities of construction equipment, the stationary cameras were installed at 10 to 30 m distances, 0 to 3 m heights, and different viewpoints (i.e., front, back, side, and diagonal), 11 different positions in total. The total of 11,513 image frames, approximately 150 min of operation, was collected with 720 × 1280 and 1080 × 1920 pixels. The video data also included diverse equipment with different shapes, colors, or appearances manufactured by multiple corporations such as Caterpillar, Doosan, Hyundai, Scania, and Volvo.

4.2. Performance metrics

To quantify the performance of activity identiﬁcation, two metrics were evaluated: precision and recall rate. The precision indicates the reliability of predictions and is calculated by Eq.(6). However, even though the most predicted results of algorithm are correct (high precision), there can be occasions of omitted objects or activities. Thus, this Idle

(ID, type, (x,y), action)

Object type?

Idle Travel

Calculate the proximity with all excavators

Excavators exist?

Nearest to an excavator?

Proximity <

operating boundary?

The excavator is scooping/rotating/dropping?

Dump trucks exist?

Calculate the proximity with all dump trucks

Select the nearest dump trucks

Proximity <

operating boundary?

The dump truck is stopping?

No

Excavator

No Dump truck

Move

Stop Move

Scoop/rotate/drop Stop

Yes Yes

Yes

Work (inter- acting) Work

(alone)

Work (travel) Individual

action?

Individual action?

Fig. 7.Interaction analysis methodology.

Work Work Idle Work Work Work

Work (work=6, idle=0)

( , 5 )

Fig. 8.Concept for post-processing.

(8)

study also evaluated the recall rate (Eq.(7)) to check the stability of the proposed framework. Furthermore, the estimated working, traveling, and idling times were compared with the actual operation time.

= +

Precision True Positive

True Positive False Positive (6)

= +

Recall rate True Positive

True Positive False Negative (7)

4.3. Implementation and experimental setup

C ++ programming language with Visual Studio 2012 was used for

the development of module 1 and MATLAB 2014b was implemented for the development of module 2, 3, and 4. A desktop computer [Intel i7- 4790 CPU @ 3.60 GHz, 32.0 GB RAM, Windows 7, 64 bit] was used for the series of experiments. Two main experiments with and without the interaction analysis were performed to investigate the advancement of the proposed solution. Additionally, individual action recognition and post-processing impacts were also analyzed.

4.4. Experimental results and analysis 4.4.1. Performance of the developed framework

Fig. 10shows examples of the classiﬁed activity types for all tracked Fig. 9.Examples of the collected data from actual earthmoving sites.

(a)

(b)

(c)

T = 248 T = 308

T = 103 T = 9

Fig. 10.Examples of experimental results. (a) Continuous classiﬁcation over time. (b) One-to-one interactions. (c) Group interactions.

(9)

object. The responses were displayed for each bounding box. The performance metrics were discovered with the average precisions and recall rates of 91.27% and 92.42% respectively (Tables 1, 2). The precision results indicated that 91 classiﬁcations out of 100 responses were correct. Besides, the results of the recall rates meant that the implemented framework accurately classiﬁed 92 cases out of 100 actual occurrences for each activity type. Considering that the average processing time required for single frame analysis was 0.1 to 0.3 s while satisfying real-time applications. Moreover, the average error rate of estimated time for each activity was 5.4% compared to the original operation time; in other words, the model estimated 94.6 min as

‘working’and 5.4 min as ‘idling’ when the equipment were actually

‘working’for 100 min.

4.4.2. Performance without the interaction analysis

In addition to the validity, signiﬁcant impacts of the interactive operations were also observed. For representing the classiﬁcation without the interaction analysis, module 3 was opted out from the framework. In this case,‘stopping’dump trucks are regarded as‘idling’ (Fig. 11). Without the interaction analysis, the average precision was 75.68% dropped by 15.59%, whereas the average recall rate was 74.72% decreased by 17.70% (Table 2). The performance loss was salient with dump trucks; the interpretation errors of‘idling’increased remarkably for the dump trucks. Therefore, the recall rates for dump trucks were dropped by 38.91% and the‘idling’time was estimated with 41% error rates. On the other hand, the precisions and recall rates were not decreased for excavators.

4.4.3. Performance of the equipment tracking and individual action recognition

Since individual actions are one of vital factors for activity identification, its accuracy is important to determine the classification performance of operation types.Table 3 shows the experimental results for module 2: 91.58% of the average precision and 93.39% of the average recall rate. The results implied that the module worked properly for recognizing individual actions. It was also observed that the precisions for dump trucks were larger than those of excavators. This was consistent with the complicatedness of individual actions since dump trucks had binary cases while excavators had one more individual action. Regarding to this issue, the framework occasionally experienced a difficulty to further classify‘scooping/rotating/dropping’and‘stopping’. It can be explained with the effects of the image differencing technique. The technique was able to detect change in two consecutive image frames by considering the

‘changed pixel values’. However, all changing pixels were not able to be fully considered if the excavator's bucket was not tracked. It occurred when the bounding box of the excavator object did not cover the location of the bucket. Reversely, noise (e.g., background clutters and non-target objects) also affected the recognition performance with less significant change in image; for instance, dump trucks moved within the bounding box of ‘stopping’ excavators. Despite such technical limitations, the method was still effective in recognizing individual actions. TLD successfully tracked equipment with 90.96% and 92.23% of the average precisions and recall rates. Thus, it was able to reduce the number of noisy bounding boxes and also to support the shape detections for the‘scooping/

rotating/dropping’excavators.

4.4.4. Performance of the post-processing

Finally, the post-processing effectivelyfiltered out noisy classification results. Module 4 increased the precisions and recall rates by 3.88% and 4.15% compared to the classification without the post- processing (Table 2). The filtering performance varied depending on interval durations for majority voting. The results revealed that 5 s was optimumfinding for the applied 11,513 images. The cases with the longer than 5 s showed poor performance due to the lingering imagery effects when activityAis already switched to another activityB(e.g.,

‘idling’states were kept monitoring for several seconds although it already started to‘work’). Thus, the post-processing of 5 s played a sub- stantial role for the activity identiﬁcation.

Table 1

Experimental results for performance metrics.

(%) The proposed method (with interaction analysis)

Without interaction analysis Without post-processing

Excavator Dump truck Excavator Dump truck Excavator Dump truck

Precision Recall rate Precision Recall rate Precision Recall rate Precision Recall rate Precision Recall rate Precision Recall rate

Idling 86.70 93.87 88.49 93.10 86.70 93.89 38.91 91.03 85.50 92.18 85.72 88.38

Traveling 93.12 89.97 – 93.12 89.97 – 83.82 83.36 –

Working 88.26 92.05 97.88 92.64 88.26 92.05 85.10 23.92 83.91 87.19 95.03 89.56

Average 89.36 91.96 93.19 92.87 89.36 91.96 62.01 57.48 84.41 87.58 90.38 88.97

Table 2

Performance evaluation of each module.

(%) The proposed

framework (A)

Without interaction analysis

Without post-processing

Performance (B)

A–B Performance (C) A–C

Precision 91.27 75.68 15.59 87.39 3.88

Recall rate 92.42 74.72 17.70 88.27 4.15

Fig. 11.Predictions with (left) and without (right) the interaction analysis for the same image frame.

Table 3

Experimental results for individual action recognition.

(%) Action recognition of individual equipment

Excavator Dump truck

Precision Recall rate Precision Recall rate

Stopping 86.70 93.87 93.29 97.47

Moving 93.12 89.97 94.3 92.18

Scooping/rotating/dropping 88.26 92.05 –

Average 89.36 91.96 93.80 94.83

(10)

5. Results and discussion

The experimental results supported the feasibility and applicability of the developed method with the acceptable performance criteria. It identiﬁed diﬀerent types of equipment operations with 91.27% precisions and 92.42% recall rates, and then estimated‘working’,‘traveling’,

and‘idling’durations with 5.4% error rates on the average. The results

also showed the significant impacts of interaction analysis on the activity identification.Fig. 10displays that the proposed framework (with the interaction analysis) has an ability to successfully classify activity types of ‘stopping’dump trucks, which is crucial for operation time estimation. As shown inFig. 10(a), activity types of equipment were continuously identified correct from the image frame T= 248 to T= 308. Based on the co-existence, proximity, and action consistency conditions, the framework classified some of‘stopping’dump trucks as

‘working’(Fig. 10(b)). It was also possible to determine whether excavators‘work’alone or together. This insight can be used for match- factor (the ratio of truck arrivals to available excavator service rates) calculation and resource allocations. Moreover,Fig. 11illustrates the advantage of considering the interactive operations between two equipment. While the‘stopping’dump truck for loading was correctly classified as‘working’(filling soil from the excavator) with the interaction analysis, the dump truck was incorrectly classified as‘idling’ when the module 3 was opted out. The results denoted that the developed framework was capable of interpreting the one-to-one interactions and enhancing the performance with the interaction analysis.

The group interactions can be also successfully analyzed with the developed approach. Fig. 10(c) displays that the approach correctly identiﬁed both one-to-one interactions and group interactions. At the image frameT= 9, the excavator in the left-hand side handed soil over to the dump truck located within the operating proximity and their activity types were determined as‘working’. The other excavator and its nearest dump truck co-existed within the pre-deﬁned proximity; however, they were‘idling’since the individual action of excavator was

‘stopping’. Based on the identified interactions, although the dump truck arrived at the operating zone at the image frameT= 103, the proposed method classified its operation type as‘idling’. The results showed that the group interactions among multiple equipment can be effectively handled by the module 3. Through these processes, the operation types were correctly determined with 91.27% of precisions and 92.42% of recall rates. These results meant that the method was able to differentiate the nearest one from multiple dump trucks within the effective distance. As a result,‘working’and‘idling’of dump trucks were classified even though the criteria of co-existence, proximity and action consistency were all met.

Despite the promising performance of the developed approach, three cases of errors were inevitably observed.Fig. 12(a) illustrates the ﬁrst case: errors of interaction analysis. The dump truck took actions of

‘stopping’and the excavator was‘scooping’the soil, which meant that

both the co-existence and action consistency were met. Besides, their proximity was lower than the threshold value of effective distance. In this case, the framework determined they were ‘working’with interaction. In thefigure, however, the excavator was preparing to load soil to another dump truck, not with the tracked dump truck in the frame that was alreadyfilled with soil. These possible side effects can be re- duced by considering the maximum soilfilling durations; for instance, the duration of excavator-to-dump truck interactions cannot be longer than 5 min per each arrival. Second, activity identification errors occasionally occurred when tracking results did not yield statistically significant. Outliers or noise bounding boxes of the target objects re- sulted in both abrupt changes of centroids; as a result, some‘moving’ actions were classified as‘non-moving’. Image differencing was also affected as previously discussed; bounding boxes were not able to fully cover all parts of excavators (e.g., bucket, arm, and main body) (Fig. 12(b)). Then, the shape changes of excavators were not detected during‘scooping/rotating/dropping’. In this case, the dump truck was

also determined as‘idling’since the excavator was‘idling’. This lim- itation is expected to be complemented by using tracking algorithms based on CNN (Convolutional Neural Networks) in further study. In recent research, a CNN-based tracker outperforms to localize any types of objects that have high intraclass/interclass variations. However, to ensure powerful performance of CNN, numerous training data needs to be collected and labeled: very human-intensive. Thus, more cost and time eﬀective approaches such as crowdsourcing labeling technique [68]need to be investigated. Last, in case of occlusions, the developed method inevitably missed target equipment and had diﬃculty to continuously identify operation types. When earthmoving equipment was occluded in a short time, the post-processing could compensate the missed information by considering work continuity. For instance, even though‘working’excavators were missed for 2 s, the post-processing reasonably interpreted this short-term occlusion as ‘working’ time.

However, the post-processing was insufficient to handle the long-term occlusions. When equipment disappeared for longer than the post- processing interval (5 s), it was difficult to classify types of equipment operations. Although the previous studies made efforts to effectively handle such occlusions[6,9,61], they primarily focused on detection and tracking of construction equipment, rather than activity identification. The advanced reasoning processes (e.g., processes integrated with GPS) should be further studied to identify operation types continuously in case of occlusions.

6. Conclusions

This research developed the vision-based activity identification framework that focused on the interactive operations between excavators and dump trucks. This additional consideration had significant impacts on both identification credential and the level of information Fig. 12.Errors examples. (a) Errors of interaction analysis. (b) Effects of bounding box.

(11)

details. The framework consisted of four main modules: equipment tracking, action recognition of individual equipment, interaction analysis, and post-processing. The experimental results (approximately 91.27% of precisions and 92.42% of recall rates) supported not only the feasibility of the proposed method but also the statistical signiﬁcance of the interaction analysis. Based on the experimental results with and without the interaction analysis, the precisions and recall rates were increased by 5.59%→16%, 17.70%→18% respectively when interactive operations were integrated. The present study made contribu- tions to the existing technology ﬁeld and construction management.

First, it identified the critical elements of interactive operations (i.e., co- existence, proximity, and action consistency). Second, it then char- acterized the technical framework in order to detect, classify, and analyze the notable features from 2D video streams. Third, the framework classified activity types enabled to measure performance indicators (e.g., direct work rate and cycle durations) of heavy equipment. Last, it developed the framework that can enhance the practicality of the automated activity identification on actual earthmoving sites.

When considering on-site applications of these findings, however, subtle limitations still remain to exist. For instance, the CNN-based tracker is expected to remarkably upgrade tracking performance and it can also promote the quality of action recognition of individual equipment. For module 2, using 3D information instead of 2D enables one to handle omni-directional movements of equipment. With the 3D global locations, the proximity condition in the excavator-to-dump truck interaction analysis (module 3) can become robust to viewpoint variations. To compensate the intrinsic shortcoming of vision-based monitoring, integration with the radio-based methods is also a neces- sary future research topic. Especially for analyzing dump trucks' operations, GPS can compensate activity identification performance since dump trucks generally travel out of earthmoving sites as well as camera's field of view. Besides, brightness is a crucial variable to vision- based analysis. It is required to customize vision-based systems to night- time environments (e.g., darkness) for extending the applicability although construction operations usually performed during the daytime.

Apart from those technical advancements, the future research also thoroughly considers and examines on the practical applicability to actual construction sites. Using the work amount and time-log data (e.g., ‘working’, ‘idling’, and ‘traveling’) provided by this research, equipment productivity and operational efficiency can be automatically analyzed with the further integration of operating conditions such as soil type, soil volume, driver's skill, and equipment specification. Based on such information, site managers can make proper decisions on equipment allocation. With further achievement, it is expected that the automated activity identification can be realized for productivity analysis of earthmoving operations.

Acknowledgements

This research was supported by a grant (14SCIP-B079691-01) from the Smart Civil Infrastructure Research Program and a grant (16CTAP- C114956-01) from the Technology Advancement Research Program, funded by Ministry of Land, Infrastructure and Transport (MOLIT) of the Korean Government and the Korea Agency for Infrastructure Technology Advancement (KAIA), and Seoul National University Big Data Institute through the Data Science Research Project 2017.

References

[1] J.S. Bohn, J. Teizer, Beneﬁts of Barriers of Construction Project Monitoring Using High-Resolution Automated Cameras, J. Constr. Eng. Manag. 136 (6) (2010) 632–640,http://dx.doi.org/10.1061/(ASCE)CO.1943-7862.0000164.

[2] T. Cheng, J. Teizer, G.C. Migliaccio, U.C. Gatti, Automated task-level activity analysis through fusion of real time location sensors and worker’s thoracic posture data, Autom. Constr. 29 (2013) 24–39,http://dx.doi.org/10.1016/j.autcon.2012.

08.003.

[3] J. Seo, S. Han, S. Lee, H. Kim, Computer vision techniques for construction safety and health monitoring, Autom. Constr. 29 (2015) 239–251,http://dx.doi.org/10.

1016/j.aei.2015.02.001.

[4] J. Teizer, P.A. Vela, Personnel tracking on construction sites using video cameras, Adv. Eng. Inform. 23 (2009) 452–462,http://dx.doi.org/10.1016/j.aei.2009.06.

011.

[5] M. Golparvar-Fard, A. Heydarian, J.C. Niebles, Vision-based action recognition of earthmoving equipment using spatio-temporal features and support vector machine classiﬁers, Adv. Eng. Inform. 27 (2013) 652–663,http://dx.doi.org/10.1016/j.aei.

2013.09.001.

[6] M.-W. Park, I. Brilakis, Continuous localization of construction workers via integration of detection and tracking, Autom. Constr. 72 (2016) 129–142,http://dx.

doi.org/10.1016/j.autcon.2016.08.039.

[7] M.-W. Park, A. Makhmalbaf, I. Brilakis, Comparative study of vision tracking methods for tracking of construction site resources, Autom. Constr. 20 (2011) 905–915,http://dx.doi.org/10.1016/j.autcon.2011.03.007.

[8] F. Pena-Mora, S. Han, S. Lee, M. Park, Strategic-Operational Construction Management: Hybrid System Dynamics and Discrete Event Approach, J. Constr.

Eng. Manag. 134 (9) (2008) 701–710,http://dx.doi.org/10.1061/(ASCE)0733- 9364(2008)134:9(701).

[9] Z. Zhu, X. Ren, Z. Chen, Visual Tracking of Construction Jobsites Workforce and Equipment with Particle Filtering, J. Comput. Civ. Eng. 04016023 (2016),http://

dx.doi.org/10.1061/(ASCE)CP.1943-5487.0000573.

[10] E.R. Azar, S. Dickinson, B. McCabe, Server-Customer Interaction Tracker: Computer Vision-Based System to Estimate Dirt-Loading Cycles, J. Constr. Eng. Manag. 139 (7) (2013) 785–794,http://dx.doi.org/10.1061/(ASCE)CO.1943-7862.0000652.

[11] I. Brilakis, M.-W. Park, G. Jog, Automated vision tracking of project related entities, Adv. Eng. Inform. 25 (2011) 713–724,http://dx.doi.org/10.1016/j.aei.2011.01.

003.

[12] J. Gong, C.H. Caldas, An object recognition, tracking, and contextual reasoning- based video interpretation method for rapid productivity analysis of construction operations, Autom. Constr. 20 (2011) 1211–1226,http://dx.doi.org/10.1016/j.

autcon.2011.05.005.

[13] J.-W. Park, K. Kim, Y.K. Cho, Framework of Automated Construction-Safety Monitoring Using Cloud-Enabled BIM and BLE Mobile Tracking Sensors, J. Constr.

Eng. Manag. 05016019 (2016),http://dx.doi.org/10.1061/(ASCE)CO.1943-7862.

0001223.

[14] T. Omar, M.L. Nehdi, Data acquisition technologies for construction progress tracking, Autom. Constr. 70 (2016) 143–155,http://dx.doi.org/10.1016/j.autcon.

2016.06.016.

[15] C. Zhang, A. Hammad, S. Rodriguez, Crane Pose Estimation Using UWB Real-Time Location System, J. Comput. Civ. Eng. 26 (5) (2012) 625–637,http://dx.doi.org/

10.1061/(ASCE)CP.1943-5487.0000172.

[16] E.R. Azar, B. McCabe, Part based model and spatial-temporal reasoning to recognize hydraulic excavators in construction images and videos, Autom. Constr. 24 (2012) 194–202,http://dx.doi.org/10.1016/j.autcon.2012.03.003.

[17] S. Chi, C.H. Caldas, Automated Object Identiﬁcation Using Optical Video Cameras on Construction Sites, Computer-Aided Civil and Infrastructure Engineering 26 (2011) 368–380,http://dx.doi.org/10.1111/j.1467-8667.2010.00690.x.

[18] S. Chi, C.H. Caldas, Image-Based Safety Assessment: Automated Spatial Safety Risk Identiﬁcation of Earthmoving and Surface Mining Activities, J. Constr. Eng. Manag.

138 (3) (2012) 341–351,http://dx.doi.org/10.1061/(ASCE)CO.1943-7862.

0000438.

[19] S. Chi, C.H. Caldas, D.Y. Kim, A Methodology for Object Identiﬁcation and Tracking in Construction Based on Spatial Modeling and Image Matching Techniques, Computer-Aided Civil and Infrastructure Engineering 24 (2009) 199–211,http://

dx.doi.org/10.1111/j.1467-8667.2008.00580.x.

[20] H. Kim, K. Kim, H. Kim, Vision-Based Object-Centric Safety Assessment Using Fuzzy Inference: Monitoring Struck-By-Accidents with Moving Objects, J. Comput. Civ.

Eng. 30 (4) (2016) 04015075, ,http://dx.doi.org/10.1061/(ASCE)CP.1943-5487.

0000562.

[21] M.-W. Park, I. Brilakis, Construction worker detection in video frames for in- itializing vision trackers, Autom. Constr. 28 (2012) 15–25,http://dx.doi.org/10.

1016/j.autcon.2012.06.001.

[22] C. Yuan, S. Li, H. Cai, Vision-Based Excavator Detection and Tracking Using Hybrid Kinematic Shapes and Key Nodes, J. Comput. Civ. Eng. 31 (1) (2017) 04016038, , http://dx.doi.org/10.1061/(ASCE)CP.1943-5487.0000602.

[23] M. Memarzadeh, M. Golparvar-Fard, J.C. Niebles, Automated 2D detection of construction equipment and workers from site video streams using histograms of oriented gradients and colors, Autom. Constr. 32 (2013) 24–37,http://dx.doi.org/

10.1016/j.autcon.2012.12.002.

[24] Korea Construction Technology Promotion Act, Enforcement decree article 98 and 99, statutes of the Republic of Korea, Republic of Korea 2016, Available athttp://

www.law.go.kr/, Accessed date: 10 December 2017.

[25] M. Bugler, G. Ogunmakin, P.A. Vela, A. Borrmann, J. Teizer, Fusion of Photogrammetry and Video Analysis for Productivity Assessment of Earthwork Processes, Comput. Aided Civ. Inf. Eng. 32 (2017) 107–123,http://dx.doi.org/10.

1111/mice.12235.

[26] J. Fu, Logistics of earthmoving operations—simulation and optimization, Department of Transport Science, KTH, Stockholm, Sweden, 978-91-87353-05-5, 2013.

[27] S. Han, Productivity analysis comparison of diﬀerent types of earthmoving operations by means of various productivity measurements, J. Asian Archit. Build. Eng. 9 (1) (2010) 185–192,http://dx.doi.org/10.3130/jaabe.9.185.

[28] R. Bao, M.A. Sadeghi, M. Golparvar-Fard, Characterizing Construction Equipment Activities in Long Video Sequence of Earthmoving Operations via Kinematic