• Tidak ada hasil yang ditemukan

Large Scale Data Analysis Using Deep Learning

N/A
N/A
Protected

Academic year: 2024

Membagikan "Large Scale Data Analysis Using Deep Learning"

Copied!
19
0
0

Teks penuh

(1)

Large Scale Data Analysis Using Deep Learning

Deep Feedforward Networks - 2

U Kang

(2)

In This Lecture

Back propagation

Motivation

Main idea

Procedure

(3)

Computational Graphs

Formalizes computation

Each node indicates a variable

Each edge indicates an operation

(4)

Chain Rule of Calculus

Let x be a real number, and f and g be functions from R to R. Suppose y = g(x), and z = f(g(x)) = f(y). Then the chain rule states that

𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑

=

𝑑𝑑𝑦𝑦𝑑𝑑𝑑𝑑 𝑑𝑑𝑦𝑦𝑑𝑑𝑑𝑑

Suppose 𝑥𝑥 ∈ 𝑅𝑅

𝑚𝑚

, 𝑦𝑦 ∈ 𝑅𝑅

𝑛𝑛

, 𝑔𝑔: 𝑅𝑅

𝑚𝑚

→ 𝑅𝑅

𝑛𝑛

, 𝑓𝑓 : 𝑅𝑅

𝑛𝑛

→ 𝑅𝑅 . If 𝒚𝒚 = 𝑔𝑔(𝒙𝒙) and 𝑧𝑧 = 𝑓𝑓(𝒚𝒚) , then

𝜕𝜕𝑑𝑑𝜕𝜕𝑑𝑑

𝑖𝑖

= ∑

𝑗𝑗 𝜕𝜕𝑦𝑦𝜕𝜕𝑑𝑑

𝑗𝑗

𝜕𝜕𝑦𝑦𝑗𝑗

𝜕𝜕𝑑𝑑𝑖𝑖

. In

vector notation: 𝛻𝛻

𝒙𝒙

𝑧𝑧 = (

𝜕𝜕𝒚𝒚𝜕𝜕𝒙𝒙

)

𝑇𝑇

𝛻𝛻

𝒚𝒚

𝑧𝑧 , where

𝜕𝜕𝒚𝒚𝜕𝜕𝒙𝒙

is the n x m Jacobian matrix of g

E.g., suppose 𝑧𝑧 = 𝑑𝑑𝑇𝑇𝒚𝒚, and 𝐲𝐲 = 𝛻𝛻𝒙𝒙𝑔𝑔 𝒙𝒙 . Then 𝛻𝛻𝒙𝒙𝑧𝑧 = (𝜕𝜕𝒚𝒚𝜕𝜕𝒙𝒙)𝑇𝑇𝛻𝛻𝒚𝒚𝑧𝑧 = 𝐻𝐻𝑇𝑇𝑑𝑑

(5)

Repeated Subexpressions

Computing the same subexpression many times would be wasteful

(6)

Backpropagation - Overview

Backpropagation is an algorithm to compute partial derivatives efficiently in neural networks

The main idea is dynamic programming

Many computations share common computations

Store the common computations in memory, and read them from memory when needed, without re-computing them from scratch

(7)

Backpropagation - Overview

Assume a computational graph to compute a single scalar 𝑢𝑢(𝑛𝑛)

E.g., this can be the loss

Our goal is to compute a partial derivatives 𝜕𝜕𝑢𝑢𝜕𝜕𝑢𝑢(𝑛𝑛)(𝑖𝑖) for all 𝑖𝑖 ∈ {1,2, … , 𝑛𝑛𝑖𝑖} where 𝑢𝑢(1) to 𝑢𝑢(𝑛𝑛𝑖𝑖) are the parameters of model

We assume that nodes are ordered such that we compute their outputs one after the other, starting at 𝑢𝑢(𝑛𝑛𝑖𝑖+1) and going up to 𝑢𝑢(𝑛𝑛)

Each node 𝑢𝑢(𝑖𝑖) is associated with an operation 𝑓𝑓(𝑖𝑖) and is

computed by 𝑢𝑢(𝑖𝑖) = 𝑓𝑓(𝐴𝐴(𝑖𝑖)) where 𝐴𝐴(𝑖𝑖) is the set of all parents of 𝑢𝑢(𝑖𝑖)

(8)

Forward Propagation

This procedure performs the computations mapping 𝑛𝑛𝑖𝑖 inputs 𝑢𝑢(1) to 𝑢𝑢(𝑛𝑛𝑖𝑖) to output 𝑢𝑢(𝑛𝑛)

(9)

Back-Propagation

This procedure computes the partial derivatives of 𝑢𝑢(𝑛𝑛) with respect to the variables 𝑢𝑢(1), … , 𝑢𝑢(𝑛𝑛𝑖𝑖)

(10)

Back-Propagation Example

Back-propagation procedure (re-use blue-colored computation)

Compute 𝜕𝜕𝑦𝑦𝜕𝜕𝑑𝑑 and 𝜕𝜕𝑑𝑑𝜕𝜕𝑑𝑑

𝜕𝜕𝑑𝑑

𝜕𝜕𝑤𝑤𝜕𝜕𝑤𝑤𝜕𝜕𝑦𝑦 𝜕𝜕𝑦𝑦𝜕𝜕𝑑𝑑 + 𝜕𝜕𝑤𝑤𝜕𝜕𝑑𝑑 𝜕𝜕𝑑𝑑𝜕𝜕𝑑𝑑

𝜕𝜕𝑑𝑑

𝜕𝜕𝑣𝑣𝜕𝜕𝑤𝑤𝜕𝜕𝑣𝑣 𝜕𝜕𝑤𝑤𝜕𝜕𝑑𝑑

v w

y x

z

(11)

Cost of Back-Propagation

The amount of computation scales linearly with the number of edges in the computational graph

Computation for each edge: computing a partial derivative, one multiplication, and one addition

w

y x

z

Compute 𝜕𝜕𝑑𝑑

𝜕𝜕𝑦𝑦 and 𝜕𝜕𝑑𝑑

𝜕𝜕𝑧𝑧 𝜕𝜕𝑑𝑑

𝜕𝜕𝑤𝑤 ←

𝜕𝜕𝑦𝑦

𝜕𝜕𝑤𝑤

𝜕𝜕𝑧𝑧

𝜕𝜕𝑦𝑦 + 𝜕𝜕𝑥𝑥

𝜕𝜕𝑤𝑤

𝜕𝜕𝑧𝑧

𝜕𝜕𝑧𝑧 𝜕𝜕𝑤𝑤 𝜕𝜕𝑧𝑧 𝜕𝜕𝑥𝑥

(12)

Back-Propagation in Fully Connected

Forward propagation

MLP

(13)

Back-Propagation in Fully Connected

Backward computation

MLP

: element-wise product

(14)

Minibatch Processing

SGD (recap)

We sample a minibatch of examples 𝐵𝐵 = {𝑥𝑥 1 , … , 𝑥𝑥 𝑚𝑚𝑚 } drawn uniformly from the training set

The estimate of the gradient is 𝑔𝑔 = 𝑚𝑚𝑚1 𝑖𝑖=1𝑚𝑚𝑚 𝛻𝛻𝜃𝜃𝐿𝐿(𝑥𝑥 𝑖𝑖 ,𝑦𝑦 𝑖𝑖 ,𝜃𝜃)

Then the gradient descent is given by 𝜃𝜃 ← 𝜃𝜃 − 𝜖𝜖𝑔𝑔

SGD using back propagation

For each instance (𝑥𝑥(𝑖𝑖),𝑦𝑦(𝑖𝑖)), where i=1~m’, we compute the gradient

𝛻𝛻𝜃𝜃𝜕𝜕(𝑖𝑖) using back propagation where 𝜕𝜕(𝑖𝑖) = 𝐿𝐿(𝑥𝑥 𝑖𝑖 ,𝑦𝑦 𝑖𝑖 ,𝜃𝜃) is the i-th loss

The final gradient is 𝑔𝑔 = 𝑚𝑚𝑚1 𝑖𝑖=1𝑚𝑚𝑚 𝛻𝛻𝜃𝜃𝜕𝜕(𝑖𝑖)

Update 𝜃𝜃 ← 𝜃𝜃 − 𝜖𝜖𝑔𝑔

(15)

Example

A feedforward neural network with one hidden layer

Forward propagation

𝒂𝒂 ← 𝑾𝑾𝒙𝒙

𝒉𝒉 ← 𝝈𝝈(𝒂𝒂) (elementwise)

�𝒚𝒚 ← 𝒗𝒗𝑇𝑇𝒉𝒉

𝑱𝑱 ← 𝐿𝐿 �𝒚𝒚,𝒚𝒚 + 𝜆𝜆𝜆𝜆 𝑾𝑾,𝒗𝒗 = (�𝒚𝒚 − 𝒚𝒚)𝟐𝟐+𝜆𝜆( 𝑾𝑾 𝐹𝐹2 + 𝒗𝒗 22)

𝒉𝒉 𝒗𝒗

(16)

Example

Back propagation

𝑔𝑔 ← 𝛻𝛻�𝑦𝑦𝜕𝜕 = 2 �𝑦𝑦 − 𝑦𝑦

𝛻𝛻𝒗𝒗𝜕𝜕 ← 𝛻𝛻𝒗𝒗[ �𝑦𝑦 − 𝑦𝑦 2 + 𝜆𝜆 𝑾𝑾 𝐹𝐹2 + 𝒗𝒗 22 ] = 𝑔𝑔𝒉𝒉 + 2𝜆𝜆𝒗𝒗

𝒈𝒈 ← 𝛻𝛻𝒉𝒉𝜕𝜕 = 𝛻𝛻𝒉𝒉[ �𝑦𝑦 − 𝑦𝑦 2 + 𝜆𝜆 𝑾𝑾 𝐹𝐹2 + 𝒗𝒗 22 ] = 𝑔𝑔𝒗𝒗

𝒈𝒈 ← 𝛻𝛻𝒂𝒂𝜕𝜕 = 𝒈𝒈 ⨀ 𝜎𝜎𝜎(𝒂𝒂) (elementwise)

𝛻𝛻𝑾𝑾𝜕𝜕 ← 𝛻𝛻𝑾𝑾[ �𝑦𝑦 − 𝑦𝑦 2 + 𝜆𝜆 𝑾𝑾 𝐹𝐹2 + 𝒗𝒗 22 ] = 𝒈𝒈𝒙𝒙𝑇𝑇 + 2𝜆𝜆𝑾𝑾

𝒙𝒙 𝒉𝒉 𝒂𝒂

𝑾𝑾 𝒗𝒗

(17)

Higher-Order Derivatives

Some software frameworks (e.g., Theano and Tensorflow) support the use of higher-order derivatives

Hessian is useful for optimizing parameters.

For a function 𝑓𝑓:𝑅𝑅𝑛𝑛 → 𝑅𝑅, the Hessian matrix is n x n matrix

In deep learning, n is the number of parameters which can be millions or billions; thus constructing entire Hessian is intractable

The typical deep learning approach to compute a function of Hessian is to use Krylov methods: a set of iterative techniques for inverting a matrix or computing eigenvectors using only matrix-vector products

Hessian-vector product can be computed efficiently by

(18)

What you need to know

Backpropagation is an algorithm to compute partial derivatives efficiently in neural

networks

Reuse partial derivatives

Procedure of backpropagation

Use of backpropagation in SGD

(19)

Questions?

Referensi

Dokumen terkait

; bn so that the sum of squared errors is the smallest (minimum). The most powerful and mathematically mature data analysis method, multiple linear regression is focused on a

The integration of TensorFlow and Apache Spark with TensorFrames allows data scientists to expand their analytics, streaming, graph, and machine learning capabilities to include

That’s why this book is a great addition to the Addison-Wesley Data & Analytics series; it provides a broad overview of tools, techniques, and helpful tips for building large

©Daffodil International University 17 CHAPTER 5 SUMMARY, CONCLUSION, RECOMMENDATION AND IMPLICATION FOR FUTURE RESEARCH 5.1 Summary In this work, we proposed a straight

Table 1 - Overview f Task Offloading Techniques In DL Author Name Year Technique used Merits Demerits Ramana, 2022 iTOA algorithm Efficient workload distribution

This research used data mining classification technique in order to predict academic performance that focus on Decision Tree and Naïve Bayes algorithm.. The goal was to propose a

In this paper, we have proposed a novel approach using a deep learning model integrated with a knowledge graph for the surveillance monitoring system to be activated to confirm human

Muhammad Faqih Ahkam, Ayi Tarya | http://devotion.greenvest.co.id 2343 Graph 3 Monthly Climatologic Fluctuations of Sea Surface CO2 Partial Pressure with Temperature and Chlorophyll