• Tidak ada hasil yang ditemukan

Robustness and Sensitivity of Different Network Architectures

VI.3 Evaluation

VI.3.1 Robustness and Sensitivity of Different Network Architectures

Max-pooling vs. average-pooling. Max-pooling is a preferred choice for training deep neural networks compared with average-pooling because of its nonlinear characteristics. We investigate whether max-pooling

1 1.5 2 2.5 3 10-3 0.89

0.9 0.91

0.92 (a)

1 1.5 2 2.5 3 10-3 3

3.5 4

4.5 (b)

1 1.5 2 2.5 3 10-3 0

200 400 600 800 1000

(c)

1 1.5 2 2.5 3 10-3 700

705 710 715 720

(d)

1 1.5 2 2.5 3 10-3 60

70 80

90 (e)

1 1.5 2 2.5 3 10-3 0

2 4 6

(f)

Figure VI.2: MNIST networks:RV,RS,V T vs.∆ε(Nmax=20).

is actually better than average-pooling in terms of accuracy and robustness of deep SSN. Figures VI.1, VI.2 illustrate the average robustness and sensitivities of MNIST networks under different number of attacked pixels (Figure VI.1, 20 images are used) and input sizes (Figure VI.2, 10 images are used). We focus on the first two networks, i.e. N1andN2. These networks have the same architectures (with 21 layers). The only difference isN1uses average-pooling for down-sampling whileN2uses max-pooling for the same task (both networks use two transposed convolutional layers for up-sampling). With training, we experienced thatN1is more accurate thanN2, (0.87 IoU vs. 0.85 IoU, see Table VI.1). Interestingly,N1is also more robust thanN2 since it has a larger average robustness value (Figure VI.1 -a) and more robust pixels (Figure VI.1 -d). One can also see that the average-pooling-based network is less sensitive to the attack than the max-pooling-based network (Figure VI.1 -(b, e, f)). Notably, when more pixels are attacked or larger input sizes are used, the max-pooling-based network (i.e.,N2) produces more pixels with unknown robustness (Figures VI.1 -f, VI.2 -f, VI.3, VI.4). Lastly, when the input size increases, the robustness of the max-pooling-based network drops more quickly than the average-pooling-based networks (Figure VI.2 (a,d)) and its sensitivity increases faster (Figure VI.2 -b). We believe that the main reason causing the max-pooling-based network more sensitive to the attack is its high nonlinearity due to using max-pooling layers.

Accuracy vs. robustness; deeper network and ReLU layer robustness. Accuracy is one of the most

2th Segmentation without Attack

three ten

2th Pixel-class Reach Set

three five ten unknown

2th Verified Reach Set

three ten unknown misclass

Figure VI.3: Example ofRf(N1),Nunknown=6 (Nmax=50,∆ε=0.003).

2th Segmentation without Attack

three ten

2th Pixel-class Reach Set

three ten unknown

2th Verified Reach Set

three ten unknown misclass

Figure VI.4: ExampleRf(N2),Nunknown=19 (Nmax=50,∆ε=0.003).

important factors for evaluating deep neural networks. We investigate whether more accurate and deeper networks are more robust compared to other architectures. To determine this, we analyze the robustness of two networks with different architectures and accuracy trained on M2NIST data set. The first networkN4 is based on dilated convolution with 16 layers and 0.62 (IoU) accuracy (Table VI.1). The second network N5is based on transposed convolution with 22 layers and 0.75 (IoU) accuracy. Here, the second network is deeper and more accurate than the first network. We run the robustness analysis on these two networks on a set of 20 M2NIST images. The results are depicted in Figure VI.5. In terms of robustness, the more accurate and deeper networkN5is worse than the less accurate oneN4(Figures VI.5 -(a,d), VI.7, and VI.8) when the number of attacked pixels is increases. Additionally,N5is also more sensitive to the attack than N4(Figure VI.5 -(b,e)). The main reason for this result is, the more accurate network contains many ReLU layers (8 ReLU layers) compared with the less accurate one (3 ReLU layers). Similar to the max-pooling layer, using many ReLU layers increases the nonlinearity of the network to capture complex features of images. Unfortunately, it also makes the network more sensitive to the attack.

Dilated convolution vs. Transposed convolution. Dilated convolution and transposed convolution are typical choices for semantic segmentation tasks. We compare these techniques in terms of accuracy and robustness. On MNIST networks, although the transposed-convolution networks N1,N2 and the dilated- convolution networkN3have the same number of layers (21 layers with 3 ReLU),N3is less accurate than N1andN2(0.83 vs. 0.87 and 0.85 IoU, see Table VI.1). In terms of robustness,N3is less robust and more sensitive to the attack thanN1andN2when the number of attacked pixels is smaller than 40 (Figure VI.1- (a,b,d,e)). When the input size increases, the dilated-convolution network is less robust and more sensitive to

5 10 15 20 25 0.985

0.99 0.995

1 (a)

5 10 15 20 25 2

2.5 3

3.5 (b)

5 10 15 20 25 100

200 300

400 (c)

5 10 15 20 25 5280

5300 5320 5340 5360

5380 (d)

5 10 15 20 25 0

20 40 60 80

100 (e)

5 10 15 20 25 -1

-0.5 0 0.5

1 (f)

Figure VI.5: M2NIST networks:RV,RS,V T vs.NattackedPixels(∆ε=10−5).

0 5 10 15

Layer ID (N

4) 0

10 20 30 40

Reach-Time (sec)

N4 N5

Network 0

50 100 150 200 250

Total Reach-Time (sec)

0 5 10 15 20

Layer ID (N5) 0

50 100 150

Reach-Time (sec)

Figure VI.6: Reach-Times of M2NIST networks (Nmax=25,∆ε=10−5).

4th Segmentation without Attack

one three five six ten

4th Pixel-class Reach Set

one three five six nine

ten 4th Verified Reach Set

one three five six ten misclass

Figure VI.7: Example ofRf(N4),Nunrobust=43 (Nmax=25,∆ε=0.00001).

4th Segmentation without Attack

one six ten

4th Pixel-class Reach Set

one six seven ten

4th Verified Reach Set

one six ten misclass

Figure VI.8: Example ofRf(N5),Nunrobust=51 (Nmax=25,∆ε=0.00001).

the attack than the transposed-convolution networks (Figure VI.2-(a,b,d,e)). On M2NIST networks, by con- sidering 21-layer (8 ReLU) transposed-convolution networkN5and 24-layer (4 ReLU) dilated-convolution networkN6, one can see that even using more layers,N6is less accurate thanN5(0.72 vs. 0.75 IoU, see Table VI.1). In terms of robustness,N6is also less robust and more sensitive to the attack thanN5(Figure VI.5-(a,b,d,e)).