• Tidak ada hasil yang ditemukan

Our Modifications for the Anonymization System

5.4 Approach, Modification, and Algorithm

5.4.3 Our Modifications for the Anonymization System

5.4.3.1 Modification on Segmentation Generator’s Loss The goal of a segmentation generator G is to generate semantic images with which a synthesis generator Gs creates well- synthesized images.

Modification: In order to consider the performance ofGs in the loss of G, we define the loss ofG as follows:

LG(G, F, DY, Gs)

=Ladv, G(G, DY, X, Y) +λcycLcyc,G(F)

| {z }

the original loss ofG

sLGs(Gs, DsX,1, . . . , DsX,M, G) +λdistLdist(G)

| {z }

the newly added term

, (113)

whereLdist(G) = Ex∼pdata(x)[kG(x)−yk1], and Ldist(G) could further reduce the space of pos- sible mapping functions with Lcyc,G(F). LGs is the loss of a synthesis generator Gs and will be explained in Section 5.4.3.2 in detail. In addition, λcyc, λs, and λdist control the relative importance of each loss.

By adding LGs(Gs, DX,1s , . . . , DX,Ms , G), the generator G is trained to generate a semantic image minimizing the loss of Gs.

Objective of Segmentation Learning Part: Hence, the complete objective of segmentation- learning part is defined as

Lseg(G, F, DX, DY, Gs) =LG(G, F, DY, Gs) +Ey∼pdata(y)[DY(y)]

+LF(G, F, DX) +Ex∼pdata(x)[DX(x)], (114) whereLF(G,F,DX)=Ladv,F(F,DX,Y,X) +λcycLcyc,F(G).

The segmentation learning part solves (114) as follows:

G, F = argmin

G,F

DmaxX,DY

Lseg(G, F, DX, DY, Gs). (115)

5.4.3.2 Modifications of the Loss of the Synthesis Generator There are two main challenges that hinder the learning of synthesis generator to anonymize faces. We introduce the loss of a synthesis generatorGs, and then explain each challenge. To obtain a synthesis generator to achieve the objective of our system, we modify the loss of Gs and the loss of DsX,k,∀k. In addition, we modify the means to train the discriminatorDsX,k.

In [104], by using (107), (111), and (112), the loss of a synthesis generator can be written as LGs(Gs, DsX,1, . . . DsX,M, G) =

M

X

k=1

1 M

n

Ladv,Gs(Gs, DX,ks , Y, X, G(x)) +LFM,Gs(DsX,k)o

+LVGG(ψ, x, Gs(G(x))), (116)

where we introduce, for simplicity, LFM,Gs(DX,ks ) =E(y,x)

hPT i=1

1

NikDX,ks,i (x)−DX,ks,i (Gs(y))k1i in (111). In addition, Ladv,Gs(Gs, DX,ks , Y, X, G(x))is redefined as

Ladv,Gs(Gs, DX,ks , Y, X, G(x)) =Ey∼pdata(y)

h

log(1−DX,ks (Gs(G(x))))i

. (117)

whereyˆ=G(x).

Challenge in VGG Perceptual Loss: In the synthesis-learning part, the loss (116) should be minimized in order to train the generator Gs. The minimization leads to the reduction of LVGG(ψ, x, Gs(G(x))), and thus the distance between features ofxandGs(G(x))is also reduced during the training ofGs. Consequently, the generatorGsis trained to generate a photorealistic imageGs(G(x))that can be almost the same as the original photorealistic imagex. This trained generator cannot be utilized for our face-anonymizing system.

Modification on VGG Perceptual Loss: To prevent the distance betweenGs(G(x))and x from being reduced to a very small value, we introduce margins to the VGG perceptual loss (112) in the following manner.

LVGG(ψ, x, Gs(y),Υ) =X

i∈SI

max

0, 1 CiHiWi

i(Gs(y))−ψi(x)k1−Υ(m(i))

, (118)

whereΥ ={1, . . . , |SI|} and |SI|is the number of elements in SI. m(i) is a mapping function to find, for Υ, an index corresponding a VGG’s layer index inSI.

Note that a margin value allows the distance between thei-th VGG layers forxandGs(G(x)) to be at leasti. Hence, a photorealistic imageGs(G(x))can have different features from features of the original imagex, which could makeGs(G(x))appear different fromx.

Challenge in Adversarial Loss and Multi-Scale Discriminators’ Feature Loss: To explain our additional modifications, we need to comprehend how a discriminatorDsX,k works.

Based on the understanding, we describe a hindrance to the learning of our synthesis gener- ator Gs. In addition, we modify the adversarial losses of Gs and DsX,k, and the multi-scale discriminators’ feature loss of Gs.

To minimize (117),Gsshould make a synthesized faceGs(G(x))look like a face in the training dataset, and thus tends to translate G(x) to x, which is what our system should anonymize.

Specifically, in (117), a discriminatorDsX,k examinesGs(G(x))to determine ifGs(G(x))is from the training dataset X or is arbitrarily generated. The generated image Gs(G(x))contains an entire face, and thus DX,ks is trained to determine whether the entire face in Gs(G(x)) is from the training dataset. Consequently, to deceive DX,ks ,Gs is trained to generate xfrom G(x).

Modifications on Adversarial Loss and Multi-Scale Discriminators’ Feature Loss:

In order to preventGs regenerating the almost same face asx, we modify the adversarial losses of Gs and DsX,k. The reason thatGs reproduces x is because a discriminator DX,ks examines if the entire face inGs(G(x))is from a training dataset, includingx. To deceiveDsX,k checking an entire face, Gs simply creates any face from the domain X, which greatly reduces the space of possible mapping.

For expanding the space of possible mapping, we limit a discriminator DsX,k to investigate each facial component and not the entire face. By applying the idea, the adversarial loss is rewritten in the following manner.

Lsadv(Gs, DX,1s , . . . , DX,Ms , Y, X, Sξ) =

M

X

k=1

1 M

n

Ey∼pdata(y)

h X

i∈Sξ

1

|Sξ|log(1−DX,ksi(Gs(ˆy))))io +

M

X

k=1

1 M

n

Ex∼pdata(x)

h X

i∈Sξ

1

|Sξ|log(DX,ksi(x))) io

, (119) where we denote the output of our segmentation generator as yˆ=G(x), ξi is the extractor to extract pixels corresponding to the label indexi,Sξ is the set including extracted labels’ index, and |Sξ|is the number of elements in Sξ. For example, if i= 2 and the label index 2 indicates a nose in a face, all pixels in ξ2(Gs(ˆy))become zero, except for the pixels corresponding to the nose.

According to (119), our modification enables a discriminator DX,ks to examine a part of a face instead of observing all facial parts together. This approach enables discriminators to learn the distribution of each facial component instead of learning the distribution of an entire face.

Hence, a discriminator attempts to distinguish each part of a face in Gs(ˆy) from that of a face inx, which could widen the space of possible mapping from the perspective of the entire face.

In addition, by setting |Sξ| < Nf, where Nf denotes the number of labels in a face, we make discriminators observe certain parts of an entire face, and thus the generator Gs could have wider space for possible mapping of the other parts that are not examined by discriminators.

In the same vein, the multi-scale discriminators’ feature loss can be also redefined as LFM,Gs(DX,1s , . . . , DsX,M, Sξ) =

M

X

k=1

1 M E(y,x)

hXT

i=1

1 Ni

X

j∈Sξ

1

|Sξ|kDX,ks,ij(x))−DX,ks,ij(Gs(ˆy)))k1i

| {z }

LFM,Gs(DsX,k,Sξ)

,

(120) In spite of our modifications to Lsadv(·) and LFM,Gs(·), there is still room for Gs to learn to regenerate x since LFM,Gs(DX,ks , Sξ) still compares features of Gs(G(x)) and x, which are

Algorithm 3 Training Procedure for One Epoch Output: Generators, Gand Gs

fori = 1 : Ndata

1) Select x,y,x˜ from the dataset X and Y 2) UpdateG,F,DX,DY with Gs

(114): argminG,F max

DX,DY

Lseg(G, F, DX, DY, Gs) 3) UpdateGs,DsX,k,∀kwithG,F,DX,DY

(122): argminGs max

DsX,k,∀kLsyn(Gs, DsX,1, . . . , DX,Ms , Sξ, G)

extracted byDsX,k. Hence, we slightly modify (120) as follows.

LFM,Gs(DX,1s , . . . , DsX,M, Sξ) =

M

X

k=1

1

M E(y,x,˜x)

hXT

i=1

1 Ni

X

j∈Sξ

1

|Sξ|kDs,iX,kj(˜x))−Ds,iX,kj(Gs(ˆy)))k1i

| {z }

LFM,Gs(DX,ks ,Sξ)

.

(121) wherex˜6=x but x˜ is in the same training dataset ofx. By the modification,LFM,Gs(DsX,k, Sξ) compares features ofGs(G(x))to features ofx, which can help˜ Gs learning to generate a different face fromx.

Objective of the Synthesis Learning Part: Based on (118), (119), and (121), our complete objective of the synthesis-learning part is defined as

Lsyn(Gs, DsX,1, . . . , DsX,M, Sξ, G) =Lsadv(Gs, DsX,1, . . . , DsX,M, Y, X, Sξ) +LFM,Gs(DsX,1, . . . , DX,Ms , Sξ) +LVGG(ψ, x, Gs(y), SI) +Lcyc,Gs(G), (122) whereLcyc,Gs(G) =Ex∼pdata(x)[kG(Gs(G(x)))−G(x)k1]that enables a synthesized imageGs(G(x)) to maintain the shape and location of each facial part in x. Since (121) compares features of Gs(G(x)) to those of x, the generator˜ Gs can cause a synthesized image to retain shape and location of each facial part in x˜ not in x. Hence, Lcyc,Gs(G) helps Gs to generate synthesized images to retain the shape and location of facial parts inx. Hence, byLcyc,Gs(G)and (121),Gs can generate a synthesized faceGs(G(x))that includes the facial features ofx˜while maintaining the shape and location of facial components inx.

Finally, the synthesis learning part solves (122) in the following manner:

(Gs)= argmin

Gs

DmaxsX,k,∀kLsyn(Gs, DsX,1, . . . , DX,Ms , Sξ, G). (123)