• Tidak ada hasil yang ditemukan

Report on Modeling Discrete Dynamic Topics

N/A
N/A
Protected

Academic year: 2023

Membagikan "Report on Modeling Discrete Dynamic Topics"

Copied!
13
0
0

Teks penuh

(1)

Report on Modeling Discrete Dynamic Topics

Zijing Zhang (Zing)

5/21/2019 Report on Modeling Discrete Dynamic Topics 1

(2)

Issue

How to model topic trend? *

• Medium

• Academic paper

• Blog post

• Tweet

• Data Stream

• Continuous

• Discrete

• Change-detection

• Low delay

5/21/2019 Report on Modeling Discrete Dynamic Topics 2

(3)

Dataset Input [1]

• Signal Media

• a publicly available dataset

• 985,867 articles

• Average length 405 words

• from September 1 till September 30, 2015.

• removing stop words, URLs, tokens not starting with alphabet letters, punctuation marks, and words which occur less than 5 times.

• NFCWorld Twitter channel

• emerging technologies

• the Twitter API

• 3,374 tweets

• removing stopwords, URLs, hashtag signs, tokens not starting with alphabet letters, punctuation marks, and words occurring less than 3 times.

5/21/2019 Report on Modeling Discrete Dynamic Topics 3

(4)

Vocabulary *

• Generative Model

A model of joint probability distribution as how data have been generated.

• Dirichlet Distribution

A probability distribution of category probabilities adding up to 1.

• Kalman filter

A linear quadratic estimator that fuses noisy observable data series to guess the hidden state of variable.

• Logit

The logarithm of the odds p/(1-p)

• Logistic Normal Distribution

A probability distribution with logit as normal distribution

• Bayesian Information Criterion

Model selection criterion based on statistical principle

• Latent variable:

Variable inferred from other observed variables.

• Hidden Markov Model

Model which infers target hidden state of a random variable from another observable random variable's emission sequence.

• Negative Log Likelihood

Surprisal of a model

5/21/2019 Report on Modeling Discrete Dynamic Topics * Interpretation 4

(5)

Latent Dirichlet Allocation [4]

A generative model that categorizes observations with imaginary vocabulary. *

Document is a distribution of topics. * Topic is a distribution of words. *

5/21/2019 Report on Modeling Discrete Dynamic Topics 5

(6)

Dynamic Topic Model [1]

A generative model that analyzes the topic evolution over continuous time slices from document stream *

• Divides data into different time slices by using the document timestamps.

• Models topics of each time slice starting from the first one using LDA

• Uses Kalman filter to compute the evolution of each topic over time.

5/21/2019 Report on Modeling Discrete Dynamic Topics 6

(7)

Pros & Cons Prior Works

• LDA [2]

• Pros:

• Effective assumptions for topic modeling

• Cons:

• Topic number

• Lacking correlation detection among topics

• DTM [1]

• Pros:

• Connects the same topics over time

• Cons:

• New topic lag. [1]

• Topic continuous. [1]

• Effectiveness of chaining with Dirichlet prior *

5/21/2019 Report on Modeling Discrete Dynamic Topics 7

(8)

Discrete Dynamic Topic Model

5/21/2019 Report on Modeling Discrete Dynamic Topics 8

uses BIC to discover the best number of topic chains [1]

(9)

Performance of dDTM

5/21/2019 Report on Modeling Discrete Dynamic Topics 9

(10)

Result of Signal Media Dataset

5/21/2019 Report on Modeling Discrete Dynamic Topics 10

(11)

Result of NFCWorld Dataset

5/21/2019 Report on Modeling Discrete Dynamic Topics 11

(12)

Future [1]

• apply dDTM for analyzing trending topics and compare dDTM against other methodologies developed in this domain

• A comparison of different methods for computing HMM state probabilities such as Gaussian mixture models

• the time slice length allocation

• conduct a more in-depth analysis of dDTM which would include

analysis of the topic chains’ quality with respect to human judgment

5/21/2019 Report on Modeling Discrete Dynamic Topics 12

(13)

References

[1] https://dl.acm.org/citation.cfm?doid=3019612.3019673 [2] https://www.youtube.com/watch?v=DWJYZq_fQ2A

[3] https://www.sciencedirect.com/topics/pharmacology-toxicology- and-pharmaceutical-science/bayesian-information-criterion

[4] https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation

• [ num ] : I.P. owner in references

• * : own interpretation

5/21/2019 Report on Modeling Discrete Dynamic Topics 13

Referensi

Dokumen terkait

Conflict management strategies in deter- mining grade promotion at State Junior High School 2 and 3 Anjir Muara were carried out with collaboration and

Among the three survey methods used, transect line survey resulted in most recordings (45 species), followed by mist-netting (14 species) and riparian zones survey (nine