AI深度學習簡介與範例介紹

(1)

AI 深度學習簡介與範例介紹

陳瑞樂

2017/11/28

(2)

• Deep Learning , Machine Learning

• CNN

• RNN

• Opportunity

(3)

(4)

Machine Learning

≈ Looking for a ^(kernel) Fun ction

• Speech Recognition

• Playing Go

• Dialogue System

f



• Image Recognition

f



f





^

f





^

_“Cat”





“How are you”



^

“5-5”

“Hello”

“Hi”

(what the user said) (system response) (next move)

(5)

Framework

A set of

function f₁, f₂



^

f₁



^“cat^”



^

f₁



^“dog”



^

f₂



^“money”



^

f₂



^“snake”

Model



^ ^“cat

” Image Recognition:

f



(6)

Framework

A set of

function f₁, f₂



^ ^“cat

f



Model

Trainin g Data

Goodness of function f

Better!

“cat

”

“dog”

function input:

function output: “monkey”

Supervised Learning

^Tag/Label

(7)

Framework

A set of

f₁, f₂



^ ^“cat

f



Model

Trainin g Data

Goodness of function f

“monkey” “cat” “dog”

f ^*

Pick the “Best” Function

Using _f

“cat

” Training Testing function

Step 1

Step 2 Step 3

(8)

Step 1:

define a set of function

Step 2:

goodness of function

Step 3: pick the best function

Three Steps for Deep Learnin g

Deep Learning is so simple ……

(9)

Variants of Neural Networks

Convolutional Neural Network (CNN)

Recurrent Neural Network (RNN)

Widely used in image processing

卷積神經網路

資料來源

https://www.youtube.com/watch?v=FrKWiRv254g&t=938s

(10)

Why CNN for Image

?

100

… …

• When processing image, the first layer of fully connected network would be very large

……

Sofmax

100

100 x 100x3 1000

Can the fully connected network be simplified by considering the properties of image

recognition?

(11)

Why CNN for Imag e

“beak” detector

• Some patterns are much smaller than the whole image

A neuron does not have to see the whole image to discover the pattern.

Connecting to small region with less

parameters

(12)

Why CNN for Imag e

“middle beak”

detector

• The same patterns appear in different regions.

“upper-lef beak” detector

Do almost the same thing

They can use the same

set of

parameters.

(13)

Why CNN for Imag

• Subsampling the pixels will not change e

the object bird

bird

subsampling

We can subsample the pixels to make image smaller

Less parameters for the network to process the image

(14)

The whole CNN

Fully Connected Feedforward network

cat dog ……

Convolution Convolution

Max Pooling Max Pooling

Max Pooling Max Pooling Flatten

Flatten

Can repeat many times

(15)

The whole CNN

Flatten

 Some patterns are much

smaller than the whole image

The same patterns appear in different regions.

Subsampling the pixels will not change the object

Property 1

Property 2

Property 3

(16)

The whole CNN

cat dog ……

Flatten

(17)

CNN – Convolution

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

6 x 6 image

1 -1 -1 -1 1 -1 -1 -1 1

Filter 1

-1 1 -1 -1 1 -1 -1 1 -1

Filter 2

… …

Those are the network parameters to be learned.

Matrix

Each filter detects a small pattern (3 x 3).

Property 1

(18)

CNN – Convolution

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

6 x 6 image

1 -1 -1 -1 1 -1 -1 -1 1

Filter 1

33 -1-1 stride=1

(19)

CNN – Convolution

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

6 x 6 image

1 -1 -1 -1 1 -1 -1 -1 1

Filter 1

33 -3-3 If stride=2

We set stride=1 below

(20)

CNN – Convolution

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

6 x 6 image

1 -1 -1 -1 1 -1 -1 -1 1

Filter 1

33 -1-1 -3-3 -1-1 -3-3 11 00 -3-3 -3-3 -3-3 00 11

33 -2-2 -2-2 -1-1 stride=1

Property 2

(21)

CNN – Convolution

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

6 x 6 image

33 -1-1 -3-3 -1-1 -3-3 11 00 -3-3 -3-3 -3-3 00 11

33 -2-2 -2-2 -1-1 -1 1 -1

-1 1 -1 -1 1 -1

Filter 2

-1-1 -1-1 -1-1 -1-1 -1-1 -1-1 -2-2 11 -1-1 -1-1 -2-2 11 -1-1 00 -4-4 33

Do the same process for every filter

stride=1

4 x 4 image

Feature

Map

(22)

CNN – Colorful image

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

1 -1 -1 -1 1 -1

-1 -1 1 Filter 1

-1 1 -1 -1 1 -1

-1 1 -1 Filter 2 1 -1 -1

-1 1 -1 -1 -1 1

1 -1 -1 -1 1 -1 -1 -1 1

-1 1 -1 -1 1 -1 -1 1 -1

-1 1 -1 -1 1 -1 -1 1 -1 Colorful image

(23)

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

image

convolution

-1 1 -1 -1 1 -1 -1 1 -1 1 -1 -1

-1 1 -1 -1 -1 1

… …

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

Convolution v.s. Fully Connected

Fully-

connected

x1

x2

x36

(24)

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

6 x 6 image 1 -1 -1

-1 1 -1 -1 -1 1

Filter 1 1

:2 :3 :

…

7 :8 :9 :

…

13 :14 :15:

…

Only connect to 9 input, not fully connected

4 :

10:

16:

1 0 0 0 0 1 0 0 0 0 1 1

33

Less parameters!

(25)

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

1 -1 -1 -1 1 -1

-1 -1 1 Filter 1

1 :2 :3 :

…

7 :8 :9 :

…

13 :14 :15:

…

4 :

10:

16:

1 0 0 0 0 1 0 0 0 0 1 1

33

-1-1

Shared weights Shared weights 6 x 6 image

Less parameters!

Even less parameters!

(26)

The whole CNN

cat dog ……

Flatten

(27)

CNN – Max Pooling

33 -1-1 -3-3 -1-1 -3-3 11 00 -3-3 -3-3 -3-3 00 11

33 -2-2 -2-2 -1-1

-1 1 -1 -1 1 -1

-1 1 -1 Filter 2

-1-1 -1-1 -1-1 -1-1 -1-1 -1-1 -2-2 11 -1-1 -1-1 -2-2 11 -1-1 00 -4-4 33 1 -1 -1

-1 1 -1 -1 -1 1

Filter 1

(28)

CNN – Max Pooling

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

6 x 6 image

33 00

11 33

-1-1 11 33 00

2 x 2 image

Each filter is a channel New image but smaller Conv

Max

Pooling

(29)

The whole CNN

A new image A new image

The number of the channel is the number of filters

Smaller than the original image

33 00 11 33

-1-1 11 33 00

(30)

The whole CNN

cat dog ……

Flatten

A new image A new image

A new image

(31)

Flatten

33 00

11 33

-1-1 11

33

00

Flatten

33 00 11 33 -1-1

11 00 33

(32)

More Application: Playing Go

Network

(19 x 19 positions) Next move

19 x 19 vector Black: 1 white: -1

none: 0

19 x 19 vector

Fully-connected feedforward network can be used

But CNN performs much better.

19 x 19 matrix (image)

(33)

More Application: Playing Go

CNN

record of previous plays

Target:

“ 天元 ” = 1 else = 0

Target:

“ 五之 5” = 1

else = 0

Training: ^黑

^{: 5}

^之

五白

:

天元黑

:

五之

5

…

(34)

Why CNN for playing Go?

• Some patterns are much smaller than the who le image

• The same patterns appear in different regions.

Alpha Go uses 5 x 5 for first layer

(35)

Variants of Neural Network s

Convolutional Neural Network (CNN)

Recurrent Neural Network (RNN) Neural Network with Memory

循環神經網路 : 適用於時間序列預測的 LSTM 模型

。

Long Short-Term Memory (LSTM)

(36)

RNN Tutorial

• 空氣汙染預測

– 準備來源資料 – 準備基本資料

– 建立多變數 LSTM 預測模型 – 參考網址 :

• Multivariate Time Series Forecasting with LSTMs in Keras

•

基於

Keras

的

LSTM

多變數時間序列預測

(37)

準備準備來源資料

Air Quality dataset

• 使用空氣質量資料集。

– 這是美國駐北京大使館記錄了五年的資料集，

其按小時報告天氣和汙染值。

– 此資料包括日期、 PM2.5 濃度，以及天氣資訊

，包括露點、溫度、氣壓、風向、風速和降水

時長。

(38)

•

原始資料中的完整特徵列表如下：

• NO：行號

• year: 年份

• month: 月份

• day: 日

• hour: 時

• pm2.5: PM2.5 濃度

• DEWP: 露點溫度

• TEMP: 溫度

• PRES: 氣壓

• cbwd: 組合風向

• Iws: 累計風速

• s: 累積降雪時間

• Ir: 累積降雨時間

(39)

準備基本資料

原始資料尚不可用，我們必須先處理它。

以下是原始資料集的前幾行資料。

1. 將零散的日期時間資訊整合為一個單一的日期時間，我們可以將其用作索引。

2. 在資料集中還有幾個零散的「 NA 」值，我們現在可以用 0 值標記它們

。

(40)

建立 7 個子圖，顯示每個變數 5 年中的資料。

(41)

LSTM 資料準備

• 我們將監督學習問題設定為：

– 根據上一個時間段的汙染指數和天氣條件，預測目前時刻（ t ）的汙染情況。

• 問題探索與應用

– 根據過去一天的天氣情況和汙染狀況，預測下一個小時的汙染狀況。

– 根據過去一天的天氣情況和汙染狀況以及下一

個小時的「預期」天氣條件，預測下一個小時

的汙染狀況。

(42)

1. 資料正規化

2. 將下一筆資料的 pollution 值當作是上一筆資料的預測值 (Label)

(43)

在訓練過程中繪製 RMSE( 方均根差 )(root-mean-square error) 可能會使問題明朗。

(44)

Opportunity

• 學習門檻高 : 理論深奧 @@

– 類神經網路，機器學習，分類 / 預測 / 辨識 ,….

• 發現問題 @@

– 有什麼問題是可以讓 AI 工具 (Tensorflow + Keras) 來解決的

• 問題的資料源如何取得 (Key point!!)

– 預測空氣汙染：要有空物偵測值，何處取得 ? – 人臉辨識：要有人臉照片與標記 (tag) ，如何製

作 ?

(45)

My Target

預測股市隔天指數值 : RMSE=100 Train set : 2013/01/08~2014/8/3 Test set : 2014/8/27~2017/7/23

(46)

預測股市隔兩天指數值 : RMSE=110 Train set : 2013/01/08~2014/8/3 Test set : 2014/8/27~2017/7/23

(47)

預測股市隔三天指數值 : RMSE=117 Train set : 2013/01/08~2014/8/3 Test set : 2014/8/27~2017/7/23

AI深度學習簡介與範例介紹