• Tidak ada hasil yang ditemukan

Learning and Inference in Cartoon Videos using Deep Hypernetworks Concept Structure

N/A
N/A
Protected

Academic year: 2024

Membagikan "Learning and Inference in Cartoon Videos using Deep Hypernetworks Concept Structure"

Copied!
27
0
0

Teks penuh

(1)

Byoung-Tak Zhang, Kyung-Min Kim

Learning and Inference in Cartoon Videos using Deep Hypernetworks Concept Structure

2016-11-01

Introduction to Machine Learning

(2)

2

Definition – What is Concept?

Concept ‘Pororo’

High-level

blue, penguin, pororo, cute, playful

Visual and linguistic representation

Low-level

(3)

7. 딥개념신경망 (딥하이퍼넷)

w w w w w w v v v v v v

e e e e e e e e e e e e

c c c c c

“it”

SC2 SC4 SC5 SC6

“playful” “crong” “plane” “sky” “start”

Pororo

= {SC1, SC4, …}

SC1 = e_1

= {w1, w3, v1, v4}

SC4 = e_5

= {w1, w5, v2, v6}

Pororo

= {e_1, e_5, …}

= {w1, w3, w5, v1, v2, v4, v6, …}

SC1 SC3

Pororo

Population code

= a collection of sparse

microcodes Microcodes (sparse)

Crong Eddy

Sparse Population Coding model 1 (SPC)

1. Zhang B.-T., Ha J.-W., Kang M., Sparse Population Code Models of Word Learning in Concept Drift, Cogsci 2012

(4)

Deep Concept Hierarchies

4

r r r r w w w w w

c

1

h

x

Growing Growing and

shrinking

c

2

e1

e2

e3

e4

e5 𝑐 1 1 𝑐 2 1

𝑐 3 1 𝑐 4 1 𝑐 2 2 𝑐 1 2

(b) Hypergraph representation of (a) h

c

1

c

2

“it”

“playful”

“start”

“crong”

“sky”

“plane”

“robot”

(a) Example of deep concept hierarchy learned from Pororo videos r

1 2 1 2 1 2

( , | ) ( , | , ) ( | )

P = ∑ P P

h

r w c ,c r w h c ,c h c ,c

(5)

Algorithm: Learning of Concept Layers

 Three issues

 Determining the number of the nodes of the concrete concept layer c 1 (c 1 -nodes)

 Associating c 1 -nodes and the modality layer h

 Associating c 1 -nodes and the abstract concept nodes (c 2 -nodes)

 Number of c 1 -nodes and association of the c 1 layer and h

 A c 1 -node is associated with a subgraph of the hypernetwork

 A subgraph = a cluster of hyperedges

 The number of clusters changes depending on the closeness centrality using word2vec:

 If Sim (h m ) > θ max or Sim (h m ) < θ min then h m is split or merged.

 Association of c 1 and c 2 layers

 A hyperedge contains the c2 node information ( )

( )

| |

m m

m

Sim = Dist h

h h (h

m

: the hyperedge cluster associated with ) c 1 m

2

1 2

(c , )

(c , c )

i m j m

hm i j

i m hm

C h

ω α

α

= ∑

h h

(Mikolov et al. 2013)

(6)

Why Cartoon Videos?

 Children’s cartoon videos

 Multimodal

 Vision + language

 Simple grammars

 Explicit story lines

 Image processing

 Pseudo-real

 Educational

 Cognitive

6

(7)

Comparison to Previous Approaches

 Previous approaches to

knowledge acquisition and representation

 Semantic networks (Steyvers 2006)

 WordNet (Fellbaum 2010)

 Single-modality (usually linguistic)

 Here: Automated

construction of conceptual knowledge from videos

 Visually-grounded knowledge

 Dynamic

 Concept drift

 Multimodal

(8)

 Cartoon video data

 17 ‘Pororo’ DVDs, 183 episodes, 1232 minutes, 16000 scene- subtitle pairs

 Image

 R-CNN for extracting image patches

 CNN feature vector (4096-D vector)

 Color histogram (1000-D vector)

 Text

 Word set including functional words for sentence generation

 Total 3452 words

 Represent each word as a real vector by word2vec 1

Data Description (1/2)

1 Mikolov T., Sutskever I., Chen K., Corrado G., Dean J., Distributed Representation of Words and Phrases and Their Compositionality, NIPS 2013

(9)

Data Description (2/2)

 Image preprocessing: segmentation + descriptor

 Each scene image → a set of image patches (MSER)

 Image patch → CNN feature / RGB histogram

 Text preprocessing: real vectors by word2vec

9

(dim: 4096)

CNN feature vector

10

10 10

RGB histogram tensor Scene image

r 1 r 2

r 3

R-CNN

Patches Visual features (SIFT / RGB histogram)

1

v V

2

v V

3

v V

1

v C

2

v C

3

v C

CNN feature, Color

quantization

(nonnegative integer)

(10)

10

Concept Learning

(11)

Data Pipeline

i-th episode

Scene #1 Scene #2

….

Scene #n

Unit of input : an episode

Preprocessing Concept Networks

Kids Videome

v1 v2 v5

v2 v3 v7

v3 v4 v6

Hyperedge Large-scale

video

Preprocessing

HE Sampling

Network

construction

(12)

Image 개수 : Word 개수 :

300 50 350 80 400 110 450 140 500 170 550 210 600 240 650 270 700 300 750 330 800 360 850 390 900 420 950 450 1000 480 1050 510 1100 540 1150 570 1200 600 1250 630 1300 660 1400 690 1500 720 1600 750 1700 780 1800 810 1900 830 2000 850 3000 1600 4000 1700 7000 1800 10000 1900 2500 1000 Episode 개수 : 183 150 100 80 75 70 65 60 55 50 45 40 35 20 15 10 5 1 30 25

Evolution of Concept Map

(13)

Image Generation from Sentence

 Intermediate images generated from query sentences

13

Query sentences Episodes 1~52 (1 season) Episodes 1~104 (2 seasons) Episodes 1~183 (all seasons)

• Tongtong, please

change this book using magic.

• Kurikuri, Kurikuri- tongtong!

• I like cookies.

• It looks delicious

• Thank you,

loopy

(14)

질의문장

:

Hi Pororo

Let’s

Play

Hi Pororo Let’s

Image Generation from Sentence

(15)

질의 문장 :

Hi Pororo Let’s

Play!

Pororo Let’s

Play

Hi

Image Generation from Sentence

(16)

Sentence Generation from Image

16

Original subscript: clock, I have made another potion come and try it Generated subscripts

as i don't have the right magic potion come and try it was nice

ah, finished i finally made another potion come and try it we'll all alone?

Query = {i, try} Query = {thanks, drink}

Original subscript: Oh, thanks Make him drink this Generated subscripts

oh, thanks make him drink this bread

oh, thanks make him drink this forest

Query = {he, take}

Original: Tong. Tong Let. Me. Take. It. To. Clock Generated

take your magic to know what is he doing?

take your magic to avoid the house to know what is he keeps going in circles like this will turn you back to normal

Query = {ship, pulled}

Original: The ship is being pulled Generated

wow looks as if that's the ship is being pulled

the ship is being pulled

(17)

질의이미지

magic

making

i food

am

sorry

i am making magic sorry

i am making food

Sentence Generation from Image

(18)

Analysis of Constructed Concept Structures

 Distinguishable concept nodes

 Formation of the character properties (Petty and Loopy)

(a) Microcodes (b) Centroids of c

1

-nodes

• Tongtong

• Eddy

• Pororo

• Pororo

• Eddy

(a) Episode 1

• Loopy

• Crong

• Petty

(b) Episodes 1~52

• Loopy

• Crong

• Petty

Loopy:

- A female beaver - Girlish and shy - Likes cooking

Petty:

- A female penguin

- Boyish and active

- Likes sports

(19)

Additional Experiment 1

 Proposing 3 different graph search methods

Additional Experiment 1

(20)

Graph Monte-Carlo Method

 Learning strategy: The probability of selecting a vertex, P ( v ( x )), for generating hyperedges

 Uniform graph MC (UGMC)

 Same probability for all vertices

 Random selection

 Poorer-richer graph MC (PRGMC)

 Prefer more frequently appearing vertices in graphs

 Smaller and denser graph → fast but premature convergence

 Fair graph MC (FGMC)

 Prefer less frequently appearing vertices

 Larger and sparse graph → a diverse representation but slow convergence 20

( )

( ( )) 1

{ |

n

}

P v x

x x

+

= ∈ x ( : the set of variables with the positive value of the n-th instance) x ( )

+n

{ ( ( )) }

( ( )) , ( ( )) ( ( ), )

i t

i i

e G

R d v x

P v x

+

d v x α h v x e

= x = ∑

{ ( ( )) }

( ( )) R d v x P v x

= x

( R

+

(.): a ranking function of d(v(x)) in the ascending order)

( R

-

(.): a ranking function of d(v(x)) in the descending order)

(21)

© 2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr

21

Scene-to-Subtitle Translation

Scenes Episodes 1~13 Episodes 1~36

Original And petty taught loopy how to ski.

UGMC

- did you ask me how to swim.

- the end how grateful I think she

is coming. UGMC

- Wow petty that how that is not always so loopy taught if i can do fly it well.

-How did you have to be that I could ski just.

PRGMC - end how was so happy

- the end how did you I would PRGMC - How did you pororo you.

- How about now you can I do not worry.

FGMC

- To show how big you found - The end how grateful I am petty

nice to lose careful

FGMC

- Harry realized that how that is it is dangerous

-I thought that how that I could ski just SPC

- But how do someone stop.

- The end how was it. SPC

- How about now you can you give me that how that is great.

- I will see let see how big.

Original Wow poby, you caught so many already.

UGMC

- Has been caught

UGMC

- Come out if you go in to hear you guys you have got a lot of fish I caught.

- You have caught a lot today did you see you later.

PRGMC

- Has been caught

PRGMC

-Everyone has caught a fish for dinner.

- You have caught a lot today did you ask me how.

FGMC

- What are you guys you have caught a lot.

- What happened to ten everyone has caught a lot.

FGMC

-Poby caught a boat a secret that all the wind is so big.

-You have caught a fish for the art diving.

SPC

- Pororo no pororo has caught - She caught the first place

SPC

- You come with his new friend has caught a very interesting book recently - What about pororo has caught a lot of

fish

(22)

 Proposing Dual Memory for Video Q&A

Additional Experiment 2

(23)

23

Video Q&A System (1/2)

Kids Videome

Concept learning

c1 c2 c3 c4 c5

r r r r r w w w w w

er er er ew ew ew

s

Short-term memory

Knowledge base

c1 c2 c3 c4 c5

r r r r r w w w w w

er er er ew ew ew

s

Long-term memory

Question

What did Pororo eat last night?

Question generation

Dinner Poby and Harry

Hypothesis answer set

Porongporong Dark castle

Hypothesis answer set

Answer

evaluation Dinner

Final answer

c1 c2 c3 c4 c5

r r r r r w w w w w

er er er ew ew ew

s

Image & Question multimodal learning

 Overview of Video Q&A System

(24)

sO is a scoring function to get a candidate hyperedge-answer set

In this work, cosine distance measuring between words in the question and words in hyperedges will be used as s

O

sR

is a retrieval function to give answers.

24

Video Q&A System (2/2)

It is the day that pororo and crong

are flying.

Video

4096D CNN-feature vector

200D real valued vector using RNN

Long-term memory

Q : What are they doing now?

Short-term memory constructed by HNs

Candidate HEs o = argmax sO(Q, Ei)

i=1,…,N

A : Greeting r = argmax sR([Q,E1,…EM], w)

wW

Q : What is pororo?

Candidate HEs o = argmax sO(Q, Ei)

i=1,…,N

A : Penguin

r = argmax sR([Q,E1,…EM], w)

wW

(25)

25

Dual Memory Networks

(26)

26

Evaluation

Sequence of Images Sequence of Images

Answers (S/L)

Baking / Making a new toy Stars / Ball

Because they were so loud / His tall height and great strength Yes / Yes

Questions

Can pororo swim out too far?

How can pororo swim well?

Answers (S/L) Questions

What did eddy trying to go to the playground all day?

What does eddy find in her sleep?

* S and L indicate short-term memory and long-term memory

 Examples of generated questions & retrieved answers

 Video turing test

(27)

Future Works : Human-machine Interactive system

비디오 학습

교육용 비디오

아이 교육용 로봇

비디오 개념학습

학습패턴 관찰에 따른사용자 모델링 피드백

1)Pororo likes cookie 2)Eddy goes

3)Petty plays

The ship is

being pulled

Referensi

Dokumen terkait

I didn't have time to complete my assignment… …I have no idea how to start it… …with the help of my course mate… Pathos …gone through a surgery and need to take a rest for a week at

The seven strategies that have not been implemented require teachers to be able to collaborate optimally, because it will affect students' understanding in doing the tasks In

What is Lorem Ipsum? Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. Why do we use it? It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy. Various versions have evolved over the years, sometimes by accident, sometimes on purpose (injected humour and the like). Where does it come from? Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32. The standard chunk of Lorem Ipsum used since the 1500s is reproduced below for those interested. Sections 1.10.32 and 1.10.33 from "de Finibus Bonorum et Malorum" by Cicero are also reproduced in their exact original form, accompanied by English versions from the 1914 translation by H. Rackham. Where can I get some? There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable. If you are going to use a passage of Lorem Ipsum, you need to be sure there isn't anything embarrassing hidden in the middle of text. All the Lorem Ipsum generators on the Internet tend to repeat predefined chunks as necessary, making this the first true generator on the Internet. It uses a dictionary of over 200 Latin words, combined with a handful of model sentence structures, to generate Lorem Ipsum which looks reasonable. The generated Lorem Ipsum is therefore always free from repetition, injected humour, or non-characteristic words etc. 5 paragraphs words bytes lists Start with 'Lorem ipsum dolor sit amet...' Translations: Can you help translate this site into a foreign language ? Please email

What is Lorem Ipsum? Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. Why do we use it? It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy. Various versions have evolved over the years, sometimes by accident, sometimes on purpose (injected humour and the like). Where does it come from? Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32. The standard chunk of Lorem Ipsum used since the 1500s is reproduced below for those interested. Sections 1.10.32 and 1.10.33 from "de Finibus Bonorum et Malorum" by Cicero are also reproduced in their exact original form, accompanied by English versions from the 1914 translation by H. Rackham. Where can I get some? There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable. If you are going to use a passage of Lorem Ipsum, you need to be sure there isn't anything embarrassing hidden in the middle of text. All the Lorem Ipsum generators on the Internet tend to repeat predefined chunks as necessary, making this the first true generator on the Internet. It uses a dictionary of over 200 Latin words, combined with a handful of model sentence structures, to generate Lorem Ipsum which looks reasonable. The generated Lorem Ipsum is therefore always free from repetition, injected humour, or non-characteristic words etc. 5 paragraphs words bytes lists Start with 'Lorem ipsum dolor sit amet...' Translations: Can you help translate this site into a foreign language ? Please email

What is Lorem Ipsum? Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. Why do we use it? It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy. Various versions have evolved over the years, sometimes by accident, sometimes on purpose (injected humour and the like). Where does it come from? Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32. The standard chunk of Lorem Ipsum used since the 1500s is reproduced below for those interested. Sections 1.10.32 and 1.10.33 from "de Finibus Bonorum et Malorum" by Cicero are also reproduced in their exact original form, accompanied by English versions from the 1914 translation by H. Rackham. Where can I get some? There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable. If you are going to use a passage of Lorem Ipsum, you need to be sure there isn't anything embarrassing hidden in the middle of text. All the Lorem Ipsum generators on the Internet tend to repeat predefined chunks as necessary, making this the first true generator on the Internet. It uses a dictionary of over 200 Latin words, combined with a handful of model sentence structures, to generate Lorem Ipsum which looks reasonable. The generated Lorem Ipsum is therefore always free from repetition, injected humour, or non-characteristic words etc. 5 paragraphs words bytes lists Start with 'Lorem ipsum dolor sit amet...' Translations: Can you help translate this site into a foreign language ? Please email