• Tidak ada hasil yang ditemukan

BIBLIOGRAPHY

A.2 Topic Analysis

• choose a topic𝑧𝑖 ∼ 𝜃𝑑

• choose a sentimentℓ𝑖 ∼ 𝜋𝑑 ,𝑧

𝑖

• choose a word𝑤𝑖from the multinomial distribution over words defined byℓ𝑖and𝑧𝑖(parameter𝜙𝑖

𝑧𝑖 which is the per-corpus joint sentiment-topic word distribution).

The hyperparameter𝛼in this case is the prior for topic distribution. That is, it can be thought of as the prior distribution of topics before having seen any documents.

Similarly,𝛾can be thought of as the prior count of sentiment-topic pairs before any documents are seen.

In order to estimate the model, we use the modified version of Phan’s Gibbs LDA++

package written by Lin for R.1 This is calibrated using the coherence score of the model and searching over the range of topics from 2 to 30 (which correlates with 6 to 90 sentTopic values). The various results can be seen in Figure A.1. For each term, a higher value indicates a better fit and the precise meaning of each term can be found in the documentation for the tex2vec R package.2 These values lead us to choose a final choice of 5 topics. We left the number of sentiments as three following Lin and He, 2009. The most frequently used words in each senTopic can be seen in Figure A.2, the size represents the number of tweets the word appears in.

In Table 2, we list the author-generated label for each topic-sentiment pair. For the remainder of the analysis, we focus on the 5 BLM related senTopics which cover:

BLM General, BLM George Floyd/Breaonna Taylor, BLM Civil Rights, BLM Los Angeles News, and BLM Police Violence. This choice is validated in Appendix A.2.

Topic Choices

Figure A.1 shows the different coherence scores for each of the topic numbers chosen. For more information on the different metrics, check out https://rdrr.io/

github/dselivanov/text2vec/man/coherence.html. These results were the main driver in our decision to choose 5 topics over a different number of topics.

1See http://gibbslda.sourceforge.net/ and https://github.com/linron84/JST/

2https://rdrr.io/github/dselivanov/text2vec/man/coherence.html

mean_npmi_cosim mean_npmi_cosim2 mean_pmi

mean_difference mean_logratio mean_npmi

2 3 4 5 6 7 2 3 4 5 6 7 2 3 4 5 6 7

0.0675 0.0700 0.0725 0.0750 0.0775

0.4 0.5 0.6 0.7

−4.10

−4.05

−4.00

−3.95

0.13 0.14 0.15 0.16 0.17 0.020

0.025 0.030 0.035

0.40 0.42 0.44

Topics

Figure A.1: Coherence metrics for various numbers of topics.

Topic Overview

The word clouds for each topic can be seen in the figure below, this, as well as a detailed analysis of tweets scoring high in each sentiment topic pair are what lead to our author generated labels presented in Table 2.

georg floyd

murder breonna

taylor georg_floyd

breonna_taylor polic

arrest

kill justic cop offic

death charg

protest black

day

famili shot demand

die

free peac

time

breonnataylor polic_offic georgefloyd

sign

break

life

white

minneapoli blacklivesmatt

killer

fire

video chang

home

live

biden vote

trump joe

joe_biden elect

presid joebiden

day voter

democrat

poll win

mail ballot campaign

support join

novemb sign

time

event run

kamalaharri

republican nation

harri american realdonaldtrump

black

debat plan

america kamala

hes vote_mail

parti pick

tonight senat

mask

wear covid19

wear_mask coronavirus test

covid trump

death day

die posit

pandem

week

report virus

american

month countitime

live

spread health

public break

florida test_posit

stay

million countri

home care

social

close

record

news protest

texa school

due

marley

bobmarley bob

drop love king rock

black timeday

feel song

live art fire

girl

night dem gonna

world

bad

soul

babi system power

red

stori cri

unit music

michael morn action

aint

trump

presid donald

donald_trump

american joebiden realdonaldtrump

america

presid_trump

countri tax

projectlincoln

lie

time nation

unit

million die

hes pandem

day

obama dead

support coronavirus

covid19 kill histori

elect

biden call paid

war vote

job fail

administr live campaign

world

black live

matter white

black_live

live_matter protest racism

support communiti

america polic

blacklivesmatt racist

system

american

fight countri

stop

time

chang movement

life stand

justic

folk

histori blm love world

color trump kill

human

your

speak street

power day

race

time watch game

play music love video

song live

day season

week team

start tonight

listen

movi

releas

album film

book final stream fan

read

favorit

player black

episod join

win

seri

check stori

ago

share come

feel

artist

ive

trump vote

senat court

republican democrat law

presid elect

break

suprem

bill suprem_court

justic

right act

feder american parti

gop

pass rule

hous

barr john time protect

nation offic

judg

day

countri fight

govern polit

power

support realdonaldtrump leader

call

los angel los_angel

post

photo california

post_photo

citi houston

protest texa

angel_california houston_texa

blacklivesmatt

counti video chicago

day

hollywood

join

park time

live west sign polic fire

donat march black

share tomorrow close

morn love

mayor week

link offic

blm

happi love day birthday

happi_birthday life friend

time god

famili

feel hope

world beauti

live

father heart rest

bless

power

celebr miss share peac

word amaz black

your chang

dad

proud

moment

stay

fight send

ive

real rememb

brother

lost

school student

health money

kid pay

public time

black care trump

busi

educ job polic support

famili

communiti

million

fund

children parent

pandem

colleg tax donat

worker start

social child

american mental

home learn

class countri

free

system

lot food

trump white

hous white_hous

tweet

presid realdonaldtrump twitter

lie media news

time

video racist

report support

social account

retweet read

question

call watch

follow

post fake

stori

day word

power

social_media speech

stand your

truth

stop america

press told

polit

timeday lol

watch yall shit feel

start

night love

home gonna girl

wait week

friend guy

live

walk play fuck

ago

life

hour dog car

mom

bad

miss stop

ive rememb

eat month

kid hit

sleep run

happen liter

polic protest

offic cop

black peac kill

shot white

fire arrest

shoot protestor

stop

trump citi

video portland

polic_offic brutal

violenc

time

gun car

riot call

street peac_protest

murder tear

start

fuck report

polic_brutal

forc

happen night

chicago

break attack

yall fuck shit

ass love lol

nigga feel

black

bitch hate stop time

white real bad

racist

your

wanna ppl

damn gonna

life girl

lmao guy

gotta

wrong

liter talk

stupid make

bro care

lot friend

aint post

tweet mad

2020 Pres. Election Family Anger/Frustration

Music City News Police Violence

Covid/Wear Masks Political Confrontation Sadness/Nostolgia

Voting Pop Culture Media

George Floyd/Breonna Taylor BLM Public Programs

Figure A.2: sentTopic WordClouds.

Topic Validation

If we look at the distribution of all the senTopics individually over time, it is clear that our five BLM ones have the same structure while the others appear random.

This can be seen in Figure A.3. Additionally, the patter mimics the Google Trends structure of “BLM” searches over the same time period. This is seen in Figure 1.

2020 Pres. Election Family Anger/Frustration

Music BLM City News BLM Police Violence

Covid Believers/Wear Masks Political Confrontation Sadness/Nostalgia

Vote General Pop Culture Media

BLM George Floyd/Breonna Taylor BLM General Public Programs

Jun Jul Aug Sep Oct Jun Jul Aug Sep Oct Jun Jul Aug Sep Oct

0.06 0.08 0.10 0.12

0.06 0.08 0.10

0.075 0.100 0.125 0.150

0.05 0.10 0.15

0.08 0.10 0.12 0.14 0.03

0.06 0.09 0.12

0.050 0.075 0.100 0.125 0.150

0.025 0.050 0.075 0.100 0.125

0.03 0.04 0.05 0.06 0.07

0.10 0.15 0.20 0.25 0.04

0.08 0.12

0.05 0.10 0.15

0.04 0.06 0.08 0.10

0.0040 0.0045 0.0050 0.0055

0.04 0.08 0.12 0.16

Date

Percent of Discussion

Class BLM Not BLM

Figure A.3: Average distribution of senTopics over time.

We also label the tweets originally found when searching for protesters, and thus including at least one of our BLM relevant keywords, asprotest tweetsand the rest of the tweets an individual user publishes over the summer astimeline tweets. The box and whisker plot of the percent BLM topic for each city for these types of tweets can be seen in Figure A.4. In this way, we are using hand-labeled BLM tweets to check their consistency with the unsupervised topic modeling technique. The clear separation between the two groups further increases our confidence in the model.

0 25 50 75 100

Timeline Tweets Protest Tweets

Percent BLM

City Chicago Houston Los Angeles

Figure A.4: Comparison of protest tweets and other tweets by protesters in an effort to validate BLM measure.

Finally, we selected the 400 tweets with the highest BLM rating, the 200 with the lowest BLM rating and then 200 closest to 50%. These tweets were then hand coded by four individuals on a scale of 0 to 1 for percent related to BLM. A boxplot for mean hand codings for each tweet can be seen in Figure A.5. It is clear from these responses that the unsupervised method is in line with the hand codings done by the four individuals. In addition in Table A.5 the correlation of the scores for each person as well as the RJST model can be seen.

0.00 0.25 0.50 0.75 1.00

0 0.5 1

rJST

Hand Coding

Figure A.5: Boxplots for the average score for each tweet based on the hand coders based on whether they were within the group closest to 0, 50, and 100 percent related to BLM according to the RJST model.

BLM_topic P1 P2 P3 P4 Avg BLM_topic 1.00 0.78 0.76 0.61 0.76 0.80

P1 0.78 1.00 0.88 0.76 0.82 0.95

P2 0.76 0.88 1.00 0.72 0.75 0.92

P3 0.61 0.76 0.72 1.00 0.70 0.88

P4 0.76 0.82 0.75 0.70 1.00 0.90

Avg 0.80 0.95 0.92 0.88 0.90 1.00

Table A.5: Correlation matrix between the hand coded response of the four people and then RJST model. It is clear that the unsupervised model is as close to the individuals as they are to each other. This helps to validate our model and lends support to the conclusions drawn using it