• Tidak ada hasil yang ditemukan

Time Series and Temporal Analysis of Textual Twitter

Dalam dokumen Big Data Analytics and Cloud Computing (Halaman 87-91)

Transportation Vehicle Arrival Time Prediction

5.4 Event Detection and Analysis: Tweets Relating to Road Incidents

5.5.1 Time Series and Temporal Analysis of Textual Twitter

to distinguish when particularly large spikes in activity are due to the inclusion of active users (marked by an increase in the number of tweets). Similarly, one may want to track the activities of a particular user or group in order to study how users react to communication and also to explore the types of users that tweet in multiple locations.

Time series metrics provide a clear picture of people’s reaction to specific road incidents and to change people’s interest over time. This can be interpreted by looking at the changing of the tweet volume relating to road incident, such as a fatal road accident, and can reveal when people are interested about it (post a tweet) and when this interest starts to end (nobody tweet relating to specific incident). Peaks of the tweet volume can be used to indicate the specific incident within a broader event that generated the most interest. Twitter is important in order to perceive and understand crowdsourcing opinion on certain issues. As it is an open application that allows all data to be made public if the tweet is posted on a user’s wall and as it is a time-stamped social network, Twitter can identify when a certain incident or event occurred.

24 21 18 15 12 9 6 3 0

Saturday, 18 1:00 AM

7:00 PM 7:00 AM 7:00 PM Saturday, 18 Sunday, 19

108 original 165 retweets 3 replies 25 links & pics

Monday, 20 Tuesday, 21 Wednesday, 22 Thursday, 23 7:00 AM 7:00 PM 7:00 AM 7:00 PM 7:00 AM 7:00 PM 7:00 AM 7:00 PM 7:00 AM

Tuesday, 21 1:00 AM Wednesday, 22 1:00 PM Sunday, 19 1:00 PM

3:41 PM 12:21 PM

Fig. 5.5 The graph represents tweets in A720, Edinburgh

area rarely travelled by road users in Edinburgh; thus, people may want to know the situation and the condition of the road so they can estimate the time the bus will arrive at its destination. The time interval used is in minutes; therefore each point on the graph represents the number of topic-relevant tweets gathered within minutes in the Twitter API, Tweetbinder (discussed in previous sections). The time frame on Tweetbinder only focuses on spikes; therefore, the time interval in Fig.5.5is not consistent. Graph in Fig.5.5is generated from a data set which is filtered by employing binder component in Tweetbinder. The tweets about the conditions on a carriageway (A720 – E/B at Sheriffhall roundabout) can be searched from April 18th to April 23rd, 2015. The size of a spike is either large or small, depending on the amount and time period of tweets discussing an incident. The large spike illustrates something interesting, leading many people to tweet. Large spikes also reflect an incident occurring in a long period of time. The small spikes show a road incident in Edinburgh, which may occur in a place rarely travelled by the public. As such, only a few people tweet about it. In the following analysis, the small spikes also describe an incident that occurred in a very short period of time. Hence, only a few people saw the incident occurred and could tweet about it.

This time frame determines if adverse road conditions influence the graph’s spike. A large spike on the graph (Fig.5.6) illustrates something interesting that causes people to tweet, which is the road congestion on road A720, Sheriffhall

12 9 6

3 0

7:00 PM

1:00 PM 1:30 PM 2:00 PM 2:30 PM 3:00 PM 3:30 PM 4:00 PM 4:30 PM 5:00 PM 5:30 PM 6:00 PM 6:30 PM 7:00 PM

7:00 PM 7:00 AM

Saturday, 18 Sunday, 19

c

Monday, 20 Tuesday, 21 Wednesday, 22 Thursday, 23 7:00 AM 7:00 PM 7:00 AM 7:00 PM 7:00 AM 7:00 PM 7:00 AM 7:00 PM 7:00 AM

1:00 PM 7:00 PM

Fig. 5.6 Large spikes can provide specific evidence related to the incident time

roundabout. Large spikes reflect incidents which occurred in a quite long period of time. At its peak, more than 19 original, relevant tweets and 47 retweets were sent in less than 3 h. The peak in interest happened significantly after incidents began on road A720, Sheriffhall roundabout. The graph can help in identifying the overall trends of interest, as well as particular points of interest. If the graph focuses on time-limited topic, as shown here, then one observes an increase in the number of tweeting at early time points, when the topic began to attract interest. Content analysis of tweets in the early stages may also show people’s initial reaction. At this stage, it is important to be aware and track first tweets. Sometimes, people share the same tweet, but disseminate it at different times. Thus, to overcome this problem, we employed Tweetbinder to organise the data sets into segments by leveraging ‘binder’

in Tweetbinder. By creating ‘binders’ based on customised filters, a user can track in real time how Twitter followers react to incidents and events. Subsequently, a

‘binder’ can able to identify the original tweet for a particular incident through an arrangement of tweets in that binder. The initial increase in tweeting volume points to the time (3.15 p.m.) at which the incident’s discussion gains interest. Tweet’s content at the initial stages indicates initial reactions to the tweet. Discussions become increasingly culminate (at approximately 5.15 p.m.) as 23 people tweeted about the event. The peak of the discussion can be known by looking at the peak of

24 21 18

15 12 9

6 3

0

1:00 PM 1:30 PM

1:00 PM 1:00 AM 1:00 PM 1:00 AM 1:00 PM 1:00 AM 1:00 PM 1:00 AM

2:30 PM 3:30 PM 4:30 PM 5:30 PM 6:30 PM

2:00 PM 3:00 PM 4:00 PM 5:00 PM 6:00 PM 7:00 PM

Saturday, 25 Sunday, 26 22 of 71 original 33 of 110 retweets

Peak

Large Spike

Start

Large Spike

Lose interest

48 of 90 replies 12 of 35 links & pics

Monday, 27 Tuesday, 28 1:00 PM 7:00 PM

Fig. 5.7 Large spike indicating a specific event of interest

the graph spikes (Fig.5.6a). This can be interpreted as the topic of discussion getting more attention in the public consciousness. The graph gradually decreases in volume of tweets (at approximately 6.12 p.m., only six tweets (Fig.5.6c)) highlighting when people started losing interest in that discussion (suggesting the incident has ended).

A second spike at point (b) could be interpreted as there being new information about the event, which gets attention. Alternatively, it could be about a new event.

In order to tackle this problem and to avoid confusion, the tweet content may need to be checked.

If the topic is continued for a long time, such as interest in severe road congestions causing major travel disruption, then a comparison of tweet content analysis between the start and end of the period is investigated. This may reflect to change in interest as they take place. In Figs.5.7and5.8, spikes indicate specific event of interest for the topic. In order to identify the interest of specific event, the tweets that are retrieved from the spike must be referred. This can be achieved through content analysis, if necessary.

Fig. 5.8 Information relating to large spikes in Fig.5.7

5.6 Proposed Refined Kalman Filter (KF) Model-Based

Dalam dokumen Big Data Analytics and Cloud Computing (Halaman 87-91)