• Tidak ada hasil yang ditemukan

A Study of Evaluation Function for Perfect Reversi by Deep Learning

N/A
N/A
Protected

Academic year: 2024

Membagikan "A Study of Evaluation Function for Perfect Reversi by Deep Learning"

Copied!
40
0
0

Teks penuh

(1)

卒 業 研 究 報 告 題 目

指 導 教 員

岐 阜 工 業 高 等 専 門 学 校 電気情報工学科

平 成 3 0 年( 2 0 1 8 年 ) 2 月 1 6 日 提 出

ディープラーニングを用いた

パーフェクト・リバーシの局面評価の研究 A Study of Evaluation Function for Perfect Reversi

by Deep Learning

出口 利憲 教授

2013E33

船橋 聡太
(2)

Abstract

A field of AI, stands for Artificial Intelligence, has been rapidly developed since it was appeared in Dartmouth Conference in 1956. AI is frequently used in every field such as Natural Language Processing, Image Recognition and Expert System, and so on.

Specifically, in AI, Neural Network Theory is spotlighted in recent years since Hierar- chical Neural Network achieved exceptional performance in regression and classification problems. In this research, using 10×10 squares reversi, so called ”Perfect Reversi”, its AI is trained by deep learning using existing training data sets. With TensorFlow library that is appeared recently for the machine learning, Convolutional Neural Network is constructed so as to train the perfect reversi AI. After that, through the matches of the trained AI and random AI, its performance is evaluated. Since the perfect reversi would be the game which have a lot of uncertainties, there are lots of difficulties in determination of evaluation value and optimization method.

(3)

Abstract

1 1

2 2

2.1 . . . 2

2.2 . . . 2

2.3 . . . 2

3 6 3.1 . . . 6

3.2 . . . 6

3.3 . . . 6

4 8 4.1 . . . 8

5 9 5.1 . . . 9

5.2 . . . 9

5.3 . . . 9

5.4 AdaGrad . . . 14

5.5 Adam . . . 15

6 TensorFlow 17 6.1 TensorFlow . . . 17

6.2 TensorFlow . . . 17

6.2.1 . . . 17

6.2.2 . . . 18

6.2.3 . . . 18

6.2.4 . . . 18

6.3 MNIST . . . 18

(4)

6.4.2 . . . 19

6.4.3 . . . 20

6.4.4 . . . 22

6.4.5 . . . 24

6.4.6 1 . . . 24

6.4.7 2 . . . 25

6.4.8 1 . . . 25

6.4.9 . . . 25

6.4.10 2 . . . 26

6.4.11 . . . 26

7 27 7.1 . . . 27

7.1.1 . . . 27

7.2 1 6x6 AI . . . 27

7.2.1 . . . 27

7.2.2 . . . 28

7.2.3 . . . 29

7.3 2 AI . . . 31

7.3.1 . . . 31

7.3.2 . . . 31

7.3.3 . . . 32

8 34 8.1 . . . 35 36

(5)

1

AI 1956

10×10

TensorFlow AI

AI

(6)

2

2.1

Figure2.1

1000 3

1 3

1)

2.2

Figure2.2

1 (2.1) wi xi

b

z =

!n

i=1

wixi+b (2.1)

2.3

(7)

Figure 2.1 Neuron2)

Figure 2.2 Neuron model3)

• Figure2.3

• Figure2.4

• ReLU Figure2.5

(8)

Figure 2.3 Step function4)

Figure 2.4 Sigmoid function4)

(9)

Figure 2.5 ReLU function4)

(10)

3

3.1

1957 Frank Rosen-

blatt

AND OR

3.2

AND NAND

OR XOR

Figure3.1

3.3

1 2

XOR

Figure3.2

(11)

Figure 3.1 Single layer perceptron4)

Figure 3.2 Multilayer perceptron4)

(12)

4

4

Geoffrey Everest Hinton 2010

4.1

(13)

5

5.1

5.2

5.3

3 1986

David E. Rumelhart

5)

• 0

(14)

Figure 5.1 Backpropagation5)

Figure5.1 Wij

Wkj Tj

(5.1) Tj =

!n

i=1

WjiXi (5.1)

Uk

i Oi ti

E E

(5.2)

E = 1 2

!n

i=1

(ti−Oi)2 (5.2)

- Wkj

E Wkj (5.3)

(15)

△Wkj =−η ∂E

∂Wkj

=−η ∂E

∂Ok

∂Ok

∂Uk

∂Uk

∂Wkj

(5.3)

η 1

(5.4)

∂E

∂Ok

= ∂

∂Ok

1

2(tk−Ok)2

=−(tk−Ok)

(5.4)

(5.5)

∂Ok

∂Uk = ∂

∂Uk 1 1 +e−Uk

= eUk (1 +eUk)2

= 1

1 +eUk

"

1− 1

1 +eUk

#

=Ok(1−Ok)

(5.5)

(16)

(5.6)

∂Uk

∂Wkj

= ∂

∂Wkj

(H1Wk1+H2Wk2+...+HjWkj)

=Hj

(5.6)

△Wkj (5.7)

△Wkj =−η ∂E

∂Wkj

=−η ∂E

∂Ok

∂Ok

∂Uk

∂Uk

∂Wkj

=−η{−(tk−Ok)Ok(1−Ok)Hj}

=ηδkHj

(5.7)

δk (5.8)

δk= (tk−Ok)Ok(1−Ok) (5.8)

(5.9)

△Wji =−η ∂E

∂Wji (5.9)

k Ok

n

△Wji (5.10)

△Wji=−η ∂E

∂Wji

=−η

$ n

!

i=1

∂Ei

∂Oi

%∂Hj

∂Tj

∂Tj

∂Wji

(5.10)

(17)

Ui Hj (5.11)

∂Ui

∂Hj

= ∂

∂Hj

(H1Wk1+H2Wk2+...+HnWkn)

=Wkj

(5.11)

Hj Tj (5.12)

∂Hj

∂Tj

= ∂

∂Tj

"

1 1 +e−Tj

#

=− eTj 1 +eTj

= 1

1 +e−Tj

"

1− 1

1 +e−Tj

#

=Hj(1−Hj)

(5.12)

Tj Wji (5.13)

∂Tj

∂Wji = ∂

∂Wji

(X1Wj1+X2Wj2+...+XnWjn)

=Xi

(5.13)

(18)

(5.14)

△Wji =−η ∂E

∂Wji

=−η∂Hj

∂Tj

∂Tj

∂Oi

∂Oi

∂Ui

∂Ui

∂Hj

=ηHj(1−Hj)Xi

!n

k=1

Wkj(tk−Ok)Ok(1−Ok)

=ηHj(1−Hj)Xi

!n

k=1

Wkjδk

=ηδjXi

(5.14)

δj (5.15)

δj =Hj(1−Hj)

!n

k=1

Wkjδk (5.15)

5.4 AdaGrad

AdaGrad 2011 John Duchi

3)

• η0

AdaGrad

ϵ AdaGrad

(5.16) (5.17) (5.18) (5.19)

h0 =ϵ (5.16)

(19)

ht=ht1 +∇Qi(w)◦ ∇Qi(w) (5.17)

ηt= η0

√ht

(5.18)

wt+1 =wt−η∇Qi(wt) (5.19)

ηt AdaGrad

ηt 0 Chainer

(5.20) (5.21)

ϵ = 10−8 (5.20)

η0 = 0.001 (5.21)

5.5 Adam

Adam 2015 Diederik P. Kingma

Momentum AdaGrad

Adam 3)

Adam (5.22) (5.23) (5.24) (5.25)

α= 0.001 (5.22)

β1 = 0.9 (5.23)

β2 = 0.999 (5.24)

(20)

ϵ = 108 (5.25)

Adam (5.26) (5.27) (5.28) (5.29)

(5.30)

mt+ 1 =β1mt+ (1−β1)∇Qi(w) (5.26)

vt2vt−1+ (1−β2)∇Qi(w)◦ ∇Qi(w) (5.27)

ˆ

mt= mt 1−β1

(5.28)

ˆ

vt = vt

1−β2

(5.29)

wt=wt1−α mˆt

√vˆt+ϵ (5.30)

(5.31) (5.32)

m0 = 0 (5.31)

v0 = 0 (5.32)

Adam

(21)

6 TensorFlow

6.1 TensorFlow

TensorFlow Google 2015 11

Google

C/C++, Python, Java, Go

4 (5 ) TensorFlow

TensorBoard TensorFlow

6)

6.2 TensorFlow 6.2.1

tensorflow.constant()

tensorflow.Variable()

tensorflow.global variables initializer()

tensorflow.placeholder()

tensorflow.nn.sigmoid()

• ReLU

(22)

tensorflow.train.AdamOptimizer()

6.2.2

op sess = tensorflow.Session()

sess.run(op) sess.close()

6.2.3

sess dir

saver = tf.train.Saver() saver.save(sess, dir)

6.2.4

sess dir

saver = tensorflow.train.Saver() saver.restore(sess, dir)

6.3 MNIST

MNIST Mixed National Institute of Standards and Technology database

28 x28 0 9

60,000

10,000 MNIST

Figure6.1

(23)

Figure 6.1 MNIST data examples6)

6.4 MNIST

6.4.1 TensorFlow

MNIST Python TensorFlow TensorFlow

tensorflow

Python as tf 5)

import tensorflow as tf

6.4.2

self

0.1 0.1

tf.Variable TensorFlow Variable

5)

def weight_variable(shape):

initial = tf.truncated_normal(shape, stddev=0.1) return tf.Variable(initial)

def bias_variable(shape):

initial = tf.constant(0.1, shape=shape)

(24)

Figure 6.2 Convolution operation5)

6.4.3

Figure 6.2

5)

def conv2d(x, W):

return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding=’SAME’)

Figure6.2

4×4 3×3

2×2

Figure6.2 Figure6.3

Figure6.3 3×3 Figure6.3

1 [

] 4 2 3

[1, ,

(25)

Figure 6.3 Procedure of calculating convolution operation5)

, 1] 1×1

7×7 2×2 Figure6.6 4

4×4 0

Figure6.6 5)

(26)

Figure 6.4 Stride5)

Figure 6.5 Padding5)

6.4.4

Figure6.6 2×2

(27)

Figure 6.6 Max pooling procedure5)

def max_pool_2x2(x, W):

return tf.nn.max_pool(x, kside=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding=’SAME’)

Figure6.6 2×2 Max 2×2

Max 2×2

(28)

1 2

[1, , , 1] 0

3 2×2 Max

5)

6.4.5

x = tf.placeholder("float", shape=[None, 784]) y_ = tf.placeholder("float", shape=[None, 10])

TensorFlow

MNIST

28×28 784 0 9 10

None

6.4.6 1

W_conv1 = weight_variable([5, 5, 1, 32]) b_conv1 = bias_variable([32])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

1 5×5 1

32 ReLU

0 5)

h_pool1 = max_pool_2x2(h_conv1)

(29)

1 1

1 14×14 32 5)

6.4.7 2

W_conv2 = weight_variable([5, 5, 32, 64]) b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

2 1

1 14×14 32

2 32 64 5)

h_pool1 = max_pool_2x2(h_conv1)

2 2 7×7

64

6.4.8 1

1

W_fc1 = weight_variable([7*7*64, 1024]) b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])

h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

6.4.9

(30)

h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

6.4.10 2

W_fc2 = weight_variable([1024, 10]) b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

1

2

6.4.11

cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))

train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) sess.run(tf.initialize_all_variables())

for i in range(20000):

batch = mnist.train.next_batch(50)

train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

Adam Tensor-

Flow tf.train.AdamOptimizer

1

(31)

7

7.1

AI Figure7.1

7.1.1

Table7.1 1 Table7.2

0 1 −1

7.2 1 6x6 AI

7.2.1

6x6 1993 Joel Feinstein

100 10000

4 11

5×5×1

1 32ch padding

same

ReLU 2×2 2×2

MAX

5×5×32 1 64ch

5×5×64 1 256ch

5×5×256 1 256ch

0.5

(32)

Figure 7.1 Class diagram

7.2.2

1000 Table7.3

Figure7.2

(33)

Table 7.1 Board state 1

Table 7.2 Board state 2

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 -1 0 0 0 0 0 0

0 0 0 -1 -1 0 1 0 0 0

0 1 1 -1 1 -1 1 0 0 0

0 0 0 -1 1 -1 -1 -1 0 0

0 0 0 0 -1 0 0 0 0 0

0 0 0 0 1 -1 1 0 0 0

0 0 0 0 0 0 -1 0 0 0

0 0 0 0 0 0 0 0 0 0

7.2.3

6×6 2

(34)

Figure 7.2 Loss - 6x6 Table 7.3 Match result - 6x6

[pieces] 2nd move

AI Random

1st move AI - 17.782 / 18.152

Random 17.222 / 18.735 16.950 / 19.002

1 AI 1

AI 0.2

3

• 6×6

(35)

7.3 2 AI

7.3.1

6×6 AI

1000 1 10000

5 13

5×5×1 1 32ch

padding same

ReLU 2×2

2×2 MAX

5×5×32

1 64ch

5×5×64 1

256ch

5×5×256 1

256ch

5×5×256 1

256ch

0.5

7.3.2

1000 Table7.4

Figure7.3

(36)

Figure 7.3 Loss - 10x10

Table 7.4 Match result - 10x10

[pieces] 2nd move

AI Random

1st move AI - 49.216 / 50.555

Random 49.443 / 50.555 49.894 / 50.103

7.3.3

10×10 0.2

AI 0.7

(37)

AI 0.5

0.5 AI

3

• 10×10

10×10

(38)

8

10×10 AI

6×6 AI

6×6

1

AI

10

AI

TensorFlow TensorFlow

TensorFlow TensorBoard

1

TensorFlow Chainer Keras

(39)

8.1

(40)

1) , 2000

2) , 1995

3) TensorFlow

, 2017

4) Deep Learning Python

, 2017

5) ,

2017

6) TensorFlow,https://www.tensorflow.org/versions/r1.1/get_started/mnist/

beginners,2017 2 16

7) ,

2014

Gambar

Figure 2.1 Neuron 2)
Figure 2.4 Sigmoid function 4)
Figure 2.3 Step function 4)
Figure 2.5 ReLU function 4)
+7

Referensi

Dokumen terkait

Crisp variable state vector correlates as a fuzzy variable state vector of model: Real System [I1,I2,I3,I4,I5;O]  Fuzzy System [SWA,EWA,SWL,EWL,NRP,OSS] Fuzzy state vector [SWA, EWA,

0 2·5 5·0 7·5 10·0 2·5 5·0 7·5 10·0 –4·0 –3·0 –2·0 –1·0 0 1·0 2·0 ARC in age-standardised SEVs, 2010–19 % 0 5·0 10·0 15·0 –2·0 –1·0 0 1·0 2·0 3·0 A Global High SDI Behavioural

0 2 4 6 8 10 12 14 Complimentary/thankful comments General information on library services Funny/humor Suggestions Complaints General comments Friendship Types of Comments Auckland

5% SIMILARITY INDEX 4% INTERNET SOURCES 2% PUBLICATIONS 0% STUDENT PAPERS 1 2% 2 1% 3 1% 4 < 1% 5 < 1% 6 < 1% Study of Rule Development Balinese Language ORIGINALITY REPORT

The present paper aims to review several Deep Learning DL algorithms that address critical issues of forecasting, specifically a time variable known as time series by managing complex

4 2022 http://philstat.org.ph 6.CONCLUSION: The distribution of contact stresses and rolling resistance under various tyre operating situations, such as various tire loads,

9% SIMILARITY INDEX 6% INTERNET SOURCES 2% PUBLICATIONS 6% STUDENT PAPERS 1 4% 2 1% 3 1% 4 < 1% 5 < 1% 6 < 1% 7 The Effectiveness of Dhikr Intervention for Cortisol and

Learning Facilities Analysis towards this variable with lecturer as its respondents was seen from two dimensions: 1 facilities of learning process with the indicators: classrooms,