Robustness and Sensitivity of Different Network Architectures

VI.3 Evaluation

VI.3.1 Robustness and Sensitivity of Different Network Architectures

Max-pooling vs. average-pooling. Max-pooling is a preferred choice for training deep neural networks compared with average-pooling because of its nonlinear characteristics. We investigate whether max-pooling

1 1.5 2 2.5 3 10^-3 0.89

0.9 0.91

0.92 (a)

1 1.5 2 2.5 3 10^-3 3

3.5 4

4.5 (b)

1 1.5 2 2.5 3 10^-3 0

200 400 600 800 1000

(c)

1 1.5 2 2.5 3 10^-3 700

705 710 715 720

(d)

1 1.5 2 2.5 3 10^-3 60

70 80

90 (e)

1 1.5 2 2.5 3 10^-3 0

2 4 6

(f)

Figure VI.2: MNIST networks:RV,RS,V T vs.∆_ε(N_max=20).

is actually better than average-pooling in terms of accuracy and robustness of deep SSN. Figures VI.1, VI.2 illustrate the average robustness and sensitivities of MNIST networks under different number of attacked pixels (Figure VI.1, 20 images are used) and input sizes (Figure VI.2, 10 images are used). We focus on the first two networks, i.e. N₁andN₂. These networks have the same architectures (with 21 layers). The only difference isN₁uses average-pooling for down-sampling whileN₂uses max-pooling for the same task (both networks use two transposed convolutional layers for up-sampling). With training, we experienced thatN₁is more accurate thanN₂, (0.87 IoU vs. 0.85 IoU, see Table VI.1). Interestingly,N₁is also more robust thanN₂ since it has a larger average robustness value (Figure VI.1 -a) and more robust pixels (Figure VI.1 -d). One can also see that the average-pooling-based network is less sensitive to the attack than the max-pooling-based network (Figure VI.1 -(b, e, f)). Notably, when more pixels are attacked or larger input sizes are used, the max-pooling-based network (i.e.,N₂) produces more pixels with unknown robustness (Figures VI.1 -f, VI.2 -f, VI.3, VI.4). Lastly, when the input size increases, the robustness of the max-pooling-based network drops more quickly than the average-pooling-based networks (Figure VI.2 (a,d)) and its sensitivity increases faster (Figure VI.2 -b). We believe that the main reason causing the max-pooling-based network more sensitive to the attack is its high nonlinearity due to using max-pooling layers.

Accuracy vs. robustness; deeper network and ReLU layer robustness. Accuracy is one of the most

2^th Segmentation without Attack

three ten

2^th Pixel-class Reach Set

three five ten unknown

2^th Verified Reach Set

three ten unknown misclass

Figure VI.3: Example ofR_f(N₁),N_unknown=6 (N_max=50,∆ε=0.003).

2^th Segmentation without Attack

three ten

2^th Pixel-class Reach Set

three ten unknown

2^th Verified Reach Set

three ten unknown misclass

Figure VI.4: ExampleR_f(N₂),N_unknown=19 (N_max=50,∆ε=0.003).

important factors for evaluating deep neural networks. We investigate whether more accurate and deeper networks are more robust compared to other architectures. To determine this, we analyze the robustness of two networks with different architectures and accuracy trained on M2NIST data set. The first networkN₄ is based on dilated convolution with 16 layers and 0.62 (IoU) accuracy (Table VI.1). The second network N₅is based on transposed convolution with 22 layers and 0.75 (IoU) accuracy. Here, the second network is deeper and more accurate than the first network. We run the robustness analysis on these two networks on a set of 20 M2NIST images. The results are depicted in Figure VI.5. In terms of robustness, the more accurate and deeper networkN₅is worse than the less accurate oneN₄(Figures VI.5 -(a,d), VI.7, and VI.8) when the number of attacked pixels is increases. Additionally,N₅is also more sensitive to the attack than N₄(Figure VI.5 -(b,e)). The main reason for this result is, the more accurate network contains many ReLU layers (8 ReLU layers) compared with the less accurate one (3 ReLU layers). Similar to the max-pooling layer, using many ReLU layers increases the nonlinearity of the network to capture complex features of images. Unfortunately, it also makes the network more sensitive to the attack.

Dilated convolution vs. Transposed convolution. Dilated convolution and transposed convolution are typical choices for semantic segmentation tasks. We compare these techniques in terms of accuracy and robustness. On MNIST networks, although the transposed-convolution networks N1,N2 and the dilated- convolution networkN3have the same number of layers (21 layers with 3 ReLU),N3is less accurate than N₁andN₂(0.83 vs. 0.87 and 0.85 IoU, see Table VI.1). In terms of robustness,N₃is less robust and more sensitive to the attack thanN₁andN₂when the number of attacked pixels is smaller than 40 (Figure VI.1- (a,b,d,e)). When the input size increases, the dilated-convolution network is less robust and more sensitive to

5 10 15 20 25 0.985

0.99 0.995

1 (a)

5 10 15 20 25 2

2.5 3

3.5 (b)

5 10 15 20 25 100

200 300

400 (c)

5 10 15 20 25 5280

5300 5320 5340 5360

5380 (d)

5 10 15 20 25 0

20 40 60 80

100 (e)

5 10 15 20 25 -1

-0.5 0 0.5

1 (f)

Figure VI.5: M2NIST networks:RV,RS,V T vs.NattackedPixels(∆_ε=10⁻⁵).

0 5 10 15

Layer ID (N

4) 0

10 20 30 40

Reach-Time (sec)

N₄ N₅

Network 0

50 100 150 200 250

Total Reach-Time (sec)

0 5 10 15 20

Layer ID (N₅) 0

50 100 150

Reach-Time (sec)

Figure VI.6: Reach-Times of M2NIST networks (N_max=25,∆ε=10⁻⁵).

4^th Segmentation without Attack

one three five six ten

4^th Pixel-class Reach Set

one three five six nine

ten 4^th Verified Reach Set

one three five six ten misclass

Figure VI.7: Example ofR_f(N₄),N_unrobust=43 (N_max=25,∆_ε=0.00001).

4^th Segmentation without Attack

one six ten

4^th Pixel-class Reach Set

one six seven ten

4^th Verified Reach Set

one six ten misclass

Figure VI.8: Example ofR_f(N₅),N_unrobust=51 (N_max=25,∆_ε=0.00001).

the attack than the transposed-convolution networks (Figure VI.2-(a,b,d,e)). On M2NIST networks, by con- sidering 21-layer (8 ReLU) transposed-convolution networkN₅and 24-layer (4 ReLU) dilated-convolution networkN₆, one can see that even using more layers,N₆is less accurate thanN₅(0.72 vs. 0.75 IoU, see Table VI.1). In terms of robustness,N₆is also less robust and more sensitive to the attack thanN₅(Figure VI.5-(a,b,d,e)).

Dalam dokumen Verification of learning-enabled Cyper-Physical Systems (Halaman 94-98)