Our Modifications for the Anonymization System

5.4 Approach, Modification, and Algorithm

5.4.3 Our Modifications for the Anonymization System

5.4.3.1 Modification on Segmentation Generator’s Loss The goal of a segmentation generator G is to generate semantic images with which a synthesis generator G^s creates well- synthesized images.

Modification: In order to consider the performance ofG^s in the loss of G, we define the loss ofG as follows:

L_G(G, F, DY, G^s)

=L_{adv, G}(G, DY, X, Y) +λcycL_cyc,G(F)

| {z }

the original loss ofG

+λ_sL_G^s(G^s, D^s_X,1, . . . , D^s_X,M, G) +λ_distL_dist(G)

| {z }

the newly added term

, (113)

whereL_dist(G) = Ex∼p_data(x)[kG(x)−yk₁], and L_dist(G) could further reduce the space of possible mapping functions with L_cyc,G(F). L_G^s is the loss of a synthesis generator G^s and will be explained in Section 5.4.3.2 in detail. In addition, λ_cyc, λ_s, and λ_dist control the relative importance of each loss.

By adding L_G^s(G^s, D_X,1^s , . . . , D_X,M^s , G), the generator G is trained to generate a semantic image minimizing the loss of G^s.

Objective of Segmentation Learning Part: Hence, the complete objective of segmentation- learning part is defined as

L_seg(G, F, D_X, D_Y, G^s) =L_G(G, F, D_Y, G^s) +Ey∼p_data(y)[D_Y(y)]

+L_F(G, F, D_X) +Ex∼p_data(x)[D_X(x)], (114) whereL_F(G,F,D_X)=L_adv,F(F,D_X,Y,X) +λ_cycL_cyc,F(G).

The segmentation learning part solves (114) as follows:

G^∗, F^∗ = argmin

G,F

DmaxX,DY

L_seg(G, F, DX, DY, G^s). (115)

5.4.3.2 Modifications of the Loss of the Synthesis Generator There are two main challenges that hinder the learning of synthesis generator to anonymize faces. We introduce the loss of a synthesis generatorG^s, and then explain each challenge. To obtain a synthesis generator to achieve the objective of our system, we modify the loss of G^s and the loss of D^s_X,k,∀k. In addition, we modify the means to train the discriminatorD^s_X,k.

In [104], by using (107), (111), and (112), the loss of a synthesis generator can be written as L_G^s(G^s, D^s_X,1, . . . D^s_X,M, G) =

k=1

1 M

L_adv,G^s(G^s, D_X,k^s , Y, X, G(x)) +L_FM,G^s(D^s_X,k)o

+L_VGG(ψ, x, G^s(G(x))), (116)

where we introduce, for simplicity, L_FM,G^s(D_X,k^s ) =E(y,x)

hPT i=1

NikD_X,k^s,i (x)−D_X,k^s,i (G^s(y))k₁i in (111). In addition, L_adv,G^s(G^s, D_X,k^s , Y, X, G(x))is redefined as

L_adv,Gs(G^s, D_X,k^s , Y, X, G(x)) =Ey∼p_data(y)

log(1−D_X,k^s (G^s(G(x))))i

. (117)

whereyˆ=G(x).

Challenge in VGG Perceptual Loss: In the synthesis-learning part, the loss (116) should be minimized in order to train the generator G^s. The minimization leads to the reduction of L_VGG(ψ, x, G^s(G(x))), and thus the distance between features ofxandG^s(G(x))is also reduced during the training ofG^s. Consequently, the generatorG^sis trained to generate a photorealistic imageG^s(G(x))that can be almost the same as the original photorealistic imagex. This trained generator cannot be utilized for our face-anonymizing system.

Modification on VGG Perceptual Loss: To prevent the distance betweenG^s(G(x))and x from being reduced to a very small value, we introduce margins to the VGG perceptual loss (112) in the following manner.

L_VGG(ψ, x, G^s(y),Υ) =X

i∈S_I

max

0, 1 CiHiWi

kψ_i(G^s(y))−ψi(x)k₁−Υ(m(i))

, (118)

whereΥ ={₁, . . . , _|S_I_|} and |S_I|is the number of elements in S_I. m(i) is a mapping function to find, for Υ, an index corresponding a VGG’s layer index inS_I.

Note that a margin value allows the distance between thei-th VGG layers forxandG^s(G(x)) to be at least_i. Hence, a photorealistic imageG^s(G(x))can have different features from features of the original imagex, which could makeG^s(G(x))appear different fromx.

Challenge in Adversarial Loss and Multi-Scale Discriminators’ Feature Loss: To explain our additional modifications, we need to comprehend how a discriminatorD^s_X,k works.

Based on the understanding, we describe a hindrance to the learning of our synthesis generator G^s. In addition, we modify the adversarial losses of G^s and D^s_X,k, and the multi-scale discriminators’ feature loss of G^s.

To minimize (117),G^sshould make a synthesized faceG^s(G(x))look like a face in the training dataset, and thus tends to translate G(x) to x, which is what our system should anonymize.

Specifically, in (117), a discriminatorD^s_X,k examinesG^s(G(x))to determine ifG^s(G(x))is from the training dataset X or is arbitrarily generated. The generated image G^s(G(x))contains an entire face, and thus D_X,k^s is trained to determine whether the entire face in G^s(G(x)) is from the training dataset. Consequently, to deceive D_X,k^s ,G^s is trained to generate xfrom G(x).

Modifications on Adversarial Loss and Multi-Scale Discriminators’ Feature Loss:

In order to preventG^s regenerating the almost same face asx, we modify the adversarial losses of G^s and D^s_X,k. The reason thatG^s reproduces x is because a discriminator D_X,k^s examines if the entire face inG^s(G(x))is from a training dataset, includingx. To deceiveD^s_X,k checking an entire face, G^s simply creates any face from the domain X, which greatly reduces the space of possible mapping.

For expanding the space of possible mapping, we limit a discriminator D^s_X,k to investigate each facial component and not the entire face. By applying the idea, the adversarial loss is rewritten in the following manner.

L^s_adv(G^s, D_X,1^s , . . . , D_X,M^s , Y, X, S_ξ) =

k=1

1 M

Ey∼p_data(y)

h X

i∈S_ξ

|S_ξ|log(1−D_X,k^s (ξ_i(G^s(ˆy))))io +

k=1

1 M

Ex∼p_data(x)

h X

i∈S_ξ

|S_ξ|log(D_X,k^s (ξi(x))) io

, (119) where we denote the output of our segmentation generator as yˆ=G(x), ξ_i is the extractor to extract pixels corresponding to the label indexi,Sξ is the set including extracted labels’ index, and |S_ξ|is the number of elements in S_ξ. For example, if i= 2 and the label index 2 indicates a nose in a face, all pixels in ξ2(G^s(ˆy))become zero, except for the pixels corresponding to the nose.

According to (119), our modification enables a discriminator D_X,k^s to examine a part of a face instead of observing all facial parts together. This approach enables discriminators to learn the distribution of each facial component instead of learning the distribution of an entire face.

Hence, a discriminator attempts to distinguish each part of a face in G^s(ˆy) from that of a face inx, which could widen the space of possible mapping from the perspective of the entire face.

In addition, by setting |S_ξ| < N_f, where N_f denotes the number of labels in a face, we make discriminators observe certain parts of an entire face, and thus the generator G^s could have wider space for possible mapping of the other parts that are not examined by discriminators.

In the same vein, the multi-scale discriminators’ feature loss can be also redefined as L_FM,G^s(D_X,1^s , . . . , D^s_X,M, S_ξ) =

k=1

1 M E(y,x)

hX^T

i=1

1 N_i

j∈S_ξ

|S_ξ|kD_X,k^s,i (ξ_j(x))−D_X,k^s,i (ξ_j(G^s(ˆy)))k₁i

| {z }

L_FM,Gs(D^s_X,k,Sξ)

(120) In spite of our modifications to L^s_adv(·) and L_FM,G^s(·), there is still room for G^s to learn to regenerate x since L_FM,G^s(D_X,k^s , S_ξ) still compares features of G^s(G(x)) and x, which are

Algorithm 3 Training Procedure for One Epoch Output: Generators, Gand G^s

fori = 1 : N_data

1) Select x,y,x˜ from the dataset X and Y 2) UpdateG,F,D_X,D_Y with G^s

(114): argmin_G,F max

DX,DY

L_seg(G, F, DX, DY, G^s) 3) UpdateG^s,D^s_X,k,∀kwithG,F,D_X,D_Y

(122): argmin_G^s max

D^s_X,k,∀kL_syn(G^s, D^s_X,1, . . . , D_X,M^s , Sξ, G)

extracted byD^s_X,k. Hence, we slightly modify (120) as follows.

L_FM,G^s(D_X,1^s , . . . , D^s_X,M, Sξ) =

k=1

M E(y,x,˜x)

hX^T

i=1

1 Ni

j∈S_ξ

|S_ξ|kD^s,i_X,k(ξj(˜x))−D^s,i_X,k(ξj(G^s(ˆy)))k₁i

| {z }

L_FM,Gs(D_X,k^s ,Sξ)

(121) wherex˜6=x but x˜ is in the same training dataset ofx. By the modification,L_FM,Gs(D^s_X,k, S_ξ) compares features ofG^s(G(x))to features ofx, which can help˜ Gs learning to generate a different face fromx.

Objective of the Synthesis Learning Part: Based on (118), (119), and (121), our complete objective of the synthesis-learning part is defined as

L_syn(G^s, D^s_X,1, . . . , D^s_X,M, S_ξ, G) =L^s_adv(G^s, D^s_X,1, . . . , D^s_X,M, Y, X, S_ξ) +L_FM,Gs(D^s_X,1, . . . , D_X,M^s , S_ξ) +L_VGG(ψ, x, G^s(y), S_I) +L_cyc,G^s(G), (122) whereL_cyc,Gs(G) =E_x∼p_data_(x)[kG(G^s(G(x)))−G(x)k₁]that enables a synthesized imageG^s(G(x)) to maintain the shape and location of each facial part in x. Since (121) compares features of G^s(G(x)) to those of x, the generator˜ G^s can cause a synthesized image to retain shape and location of each facial part in x˜ not in x. Hence, L_cyc,G^s(G) helps G^s to generate synthesized images to retain the shape and location of facial parts inx. Hence, byL_cyc,G^s(G)and (121),G^s can generate a synthesized faceG^s(G(x))that includes the facial features ofx˜while maintaining the shape and location of facial components inx.

Finally, the synthesis learning part solves (122) in the following manner:

(G^s)^∗= argmin

G^s

Dmax^s_X,k,∀kL_syn(G^s, D^s_X,1, . . . , D_X,M^s , Sξ, G). (123)

Dalam dokumen Cognition and Learning for Robot Communications and Vision (Halaman 148-151)