lectures

(1)

lectures Submodular functions El 260

and

discrete optimization

• Combinatorial optimization in ML

• Submodeleh functions

•

maximizing monotone submodularfeinch.org

-

Greedy method

-

( I

^-

E) appronimahion

TA Semion ( Hw 3) : 22

"

Monday 1800 hors

^.

- Francis Bach ( monograph )

(2)

So far , woe have seen Corwen optimization

problems in ML

wt WI

"

classify

^'

+

^'

and

-

'

by finding

a separating hyperplane

solve for the best vector the minimizes

the L( an id 1-

Size of margin WI = arg min Llw )

we

(3)

Feature selection :

-

• Predict Y from a subset ✗ a = { Xi , ,

^- ^-^.

✗ in }

• Given random variables Y , X , ,

^- ^- ^-

, ✗ n

ÑE2=k

^ " " % "

" " d "

{ " " = { " " " }

model

Az = { * , , ✗ u } .

Recent travel # I

, ,

L

female cough Ago ; { ✗ s , Xo }

✗ iz ✗ is

wish to select

'

K' most informative features :

A- * = arg Max IG ( Xa ; y ) St

^.

I Alek

(4)

Infineon gain :

IG ( ✗ a ;y ) = HCY )

^-

HCYI ✗ a)

~ -

uncertainty before uncertainty after

knowing Xp Knowing ✗ A

This is a combinatorial problem 1 !

^.

=

Senior ptgcmnt ( set cover problem )

• • • • • • • ←-

possible

• ☒

•

.

•

location

•

• •

How to place K sensors

°

.

•

• • • • • ✓ Out of V candidate

positions to increase the

Nodes predicts / measures values coverage ?

with dome radius / coverage

^.

(5)

Factoring distributions :

-

Given random variables ×

, ,

- -

✗

no , portion V

them into set A and D= VIA that are

as independent as possible

r

✗ , Xu ✗ 2×3

A- * = ang min I ( xp ; Xu

/ a) ✗

6 ✗ 5

A

St

^.

O c l Al C N A flu

D= VIA

1- ( ✗ a i ✗

ya ) ✗ i ✗ a

✗ 6

✗ z × ,

= H( Xp )

^-

H ( ✗ 31 ✗ a) ✗ r

Again , combinatorial ! !

(6)

Set functions :

f : 2✗ → IR

→ Takes as input a set ; inputs are subsets of

the ground set ✗ = { 1 , 2 ,

^- ^.

,N )

→ I is the power set ( set of all subsets )

minimization ( His manimizalion ) of a set

function

min

^.

F- (A) = min

^.

f- (A)

AC ✗

A C- É

5. t

^.

constraints on the sub # d- A

(7)

Reformulation as Boolean function :

min f- ( w ) with FA CV

If { 0,13N

^-

e- (

IA ) = f- (A)

• •

( 1,1111-{1/2,3}

•

• • ( 1,1 ,o ) - { 1,2 }

• •

( 90,0 ) - { } ( 011,0 )~{ 2 }

(8)

flee ) = £ Wi Ii

i= ,

' -

=

maximize f- ( ee )

} Optimally

( P ) solve this

we c- { 0,1 } "

we have to

enhaujtivedy

St

^.

Hello € k

enumerate

FzTÉ=k over all

concave

CoÑ^ :

f function K

^-

✗ pane

vectors

( Pe ) maximize f- ( w )

s.to to c- { 0,1 ] " : born constraint

Heelless the ( best convene

OE Wi £1

approximation

of do

^-

norm )

(9)

key property :

"

Diminishing returns : "

A B

!?

& ④

I

• am

:

Afc 400,000 Bank

100 Alc

cash

back

+ 5° N

t 50 Ry

( pay -1M ) !

(10)

Submo@aoteenh_ons.A set function is said to be submodule - if

and only if

f- ( B u { i } )

^-

f (B) E f / A u { i } )

^-

9- (A)

F A C- BE ✗ and i ¢ B

Eq-mi.hn :

f- (A) tf (B) f f ( Anb ) + f- ( AUB )

FA , BE ✗

• Equality leader to modular functions

• F- (4) = 0

(11)

T oÉ :

let A ' = A u { i } and NB ' = B

f ( AU { i } ) + f- (B)

= f- ( A ' ) + f ( B ' ) z f- ( A ' no ' ) + f ( A ' UB ' )

= # ( Av { i } n B) + f- ( Au { i } ur )

= f- (A) + f( Bu { i } )

: if is super modular if and only

if

^-

f is sub modular

^.

→ of submodule "

Sufy inn : → Lovato difference

{ " " " Kh " % " " " M "

minimization of sub modular {

^^ "

functions :

✗ aeity

functions

^.

or

Active learning , feature clustering , structure learning

TMAP inference in mmarkov random fields selection , ranking

(12)

Min f- ( n )

^-

scn )

NEX

Difference convene fn

^.

f- ( n ) & gcn ) are convene

T

ffn )

^-

9( Noi )

^-

791¥ no

Scp : min

NEX

(13)

EnÉf 8ubmo_Iion :

⇐ : flows

, sat cover , differ heal entropic

EH :

Given p random variables ✗ i ,

^- ^-

xp

F- (A) as the joint entropy of variables ( ✗ a) kea

€8B f- (A) in sub modular

if A C- B and K & B

f- ( Au { 1k } )

^-

F- (A) = H( ✗ a. ✗ a)

^-

H ( Xa )

= HCXKIXA )

( conditioning reduces Entropy ) 7 H( ✗ ✗ Ixn )

= f- ( Bulk } )

^-

f- (B) ☐•

(14)

Manimzing submo ta^ :

maximize f (A)

5. to I Al s k

A C- V Nemhauser (197-8) :

If F in ✗ ubmodvlar , monotone increasing , and nonempty

- -

f( Au Ei } ) > f- (A) f( 07=0 Then Greedy algorithm : A = 01

for i = 1 , 2 ,

. .

K

i ← arg i ¢ max A [ * ( A u{ i } )

^-

F- (A) ]

A ← A u { i } ; return A

(15)

The above greedy method satisfied :

e- (A) 3 ( I

^-

E) e- ( Aopt )

f- ( Aopt ) = arg max ¥

AEV ; / A1 =p ,

A) { Enhaeesliue

]

search

=-D Although this bound is not that tight ,

results are close to exhaustive search in

practice ( whenever , verifiable ) .

(16)

claim : pick any A E V Buch that I Al C K

^.

Then

Max § ( Au { i } )

^-

f- CAD 3 ÷ fflotopt )

^-

flab

i c- V

PII :

let Aoptl A = { i , ,

^. ^. ^.

ip } so that psk

Then £ ( Aopt ) I f ( A- opt U A) ( monotonicity )

p

= f- (A) + E ' 7- ( D- u { i , .

^. ^-

i ;] )

^-

f(Au{ in

^. ^-

ij.is/j--iCsubmodnlantu)st- (A) + & 4- ( Au { i ; } )

^-

f. ( A ) )

j= I

c- f- (A) + § Max ff ( Au { i } )

^-

f- (A) J

j

^-

i i c- A

(17)

( psk ) I f- (A) + K max [ f- ( A u{ i } )

^-

f- ( A1 )

i

^-

A

appÉ BBB

let Ak be the Foliation of the greedy method

at step K

^.

Then from the previous result

FCA " )

^-

f( A " " ) > ⇐ fflaoet )

^-

flank

-

' 7 ]

f- ( AK ) 7 ¥ f( Aopt ) + ( I

^-

¥ ) f( A " ' )

> ± 7- ( Aopt ) + ( I

^-

÷ ) ( I ftp.opt )

+ ( 1- E) flak -2 ) )

(18)

( psk ) I f- (A) + k max [ f- ( A u{ i } )

^-

f- ( A1 )

i

^-

A

appÉ BBB

let Ai be the Bolivian of the greedy method

at step i. Then from the previous result

FCA " )

^-

f( A " ' ) > 1- fflaoet )

^-

f( ai -17 ]

$4

f(A°A )

^-

flail e ( I

^-

fflaoet )

^-

ffa " ' ) )

combining for every iteration : Hei E K

e- ( Aat )

^-

e- ( AK ) e ( I

^-

Ig ) " [ f / A 't )

^-

e- ( O ) )

I

(19)

e- ( AK ) z e- ( Aat )

^-

( I

^-

E) ⇐ { f- ( Aert ) ]

Using the fact that I

^-

n s e

^-

n

( I

^-

E) keel

e- ( AK ) 7 ( I

^-

I )f( sort

☐•B