We consider two robustness verification problems.
Problem VI.2.1. Given an SSN f , an image x and an UBAAA, prove for every pixel xi j∈x that xi jis robust or unrobust to the attackA.
Problem VI.2.2. Given an SSN f , a set of N test images{x1, . . . ,xN}and an UBAAA, compute the average robustness value RV and the average robustness sensitivity RS of the network (corresponding toA).
In this section, we investigate a reachability-based algorithm to prove the robustness of an SSN f under an UBAAA at the pixel-level, i.e., Problem VI.2.1. The core of our approach is the computation of the pixel-class reachable setRf =Reach(f,I)that contains all possible classes of every pixel in the input setI constructed by applying the attackA on an imagex(Proposition VI.1.1). The pixel-class reachable set is computed by propagating the ImageStar input set through the layers of the network.
We analyze the robustness of an SSN that is composed of the following layers: convolution, max- pooling, average-pooling, batch normalization, ReLU, transposed convolution, dilated convolution, softmax, and pixel-classification. Reachability of the convolution, max-pooling, average-pooling, batch normalization, and ReLU layers has been developed previously [80]. In this section, we develop reachability techniques for the up-sampling layers, including transposed convolution, dilated convolution and pixel-classification. We note that the softmax layer can be neglected in the analysis [80].
VI.2.1 Reachability of a Transposed (Dilated) Convolutional Layer
Lemma VI.2.3. The reachable set of a transposed (dilated) convolutional layer with an ImageStar input set I =hc,V,Piis another ImageStarI0=hc0,V0,Piwhere c0=Trans(Dil)Conv(c)is the transposed (dilated) convolution operation applied to the anchor image, V0={v01, . . . ,v0m},v0i=Trans(Dil)ConvolZeroBias(vi)is the transposed (dilated) convolution operation with zero bias applied to the generator images, i.e., only using the weights of the layer.
Proof. Similar to the reachability of a convolutional layer [80], the tranposed (dilated) convolution operation applied to an ImageStar is the combination of 1) the transposed (dilated) convolution operation with the bias on the anchor image, and 2) the transposed (dilated) convolution operation with zero bias on the basis images.
This combination results in a new ImageStar output set. The convenience in computing the output set of a transposed convolutional layer comes from the linearity of the input set and operation itself.
VI.2.2 Reachability of a Pixel-classification Layer
The last layer in an SSN f is a pixel-classification layer, which assigns a specific class (label) to each pixel of an input image. Given anh×w×ncinput image, the size of the inputxto the pixel-classification layer ish×w×N, whereNis the number of classes (labels) of the network (we can neglect softmax layer in the
analysis). To assign a specific classl,1≤l≤Nto a pixelpi j∈x,1≤i≤h,1≤j≤w, the value of the pixelpi j at channell, i.e.,x(i,j,l), needs to be the maximum one amongNchannels. When the input to the network is an ImageStar set instead of a single image, the input to the pixel-classification layer is ah×w×NImageStar set. Depending on the value of the predicate variables in the input set, a pixelpi jin the set may be assigned to more than one classes. For example, ifl1, . . . ,lmare the cross-channel max-point candidates of the pixel pi jinNchannels, the pixel-class reachable set of the layer at the considered pixel ispci j={l1, . . . ,lm}. By determining all cross-channel max-point candidates of all pixels in the input set, we can obtain the pixel-class reachable set of the layer, which is also the reachable set of the networkRf = [pci j]h×w.
Similar to the max-pooling layer [80], determining all cross-channel max-point candidates of all pixels in the input set can be done via solving linear programming (LP) optimization problems, which is time- consuming due to the number of LPs required (or equivalenlty the size of the LP). To reduce computation time, we estimate the lower and upper bounds of the ImageStar input to the layer using only the ranges of the predicate variables. These bounds are then used to predict all possible cross-channel max-point candidates of all pixels .
VI.2.3 Verification Algorithm
Our reachability-based verification algorithm for an SSN is presented in Algorithm 7. The algorithm takes a network f, an input imagex, an UBAAA, and a reachability method (exact or approximate) as inputs and returns the pixel-class reachable setRf, the robustness value RV, and sensitivityRSof the network. The algorithm works as follows. First, it constructs the input set corresponding to the attack using Proposition VI.1.1 (line 2). Then, it computes the pixel-class reachable set of the network using reachability analysis layer-by-layer (line 3). Using the pixel-class reachable set, it verifies the robustness of each pixel in the reachable set by comparing its classes with the non-attacked output segmentation image, i.e.,y= f(x). If Rf(i,j) =y(i,j), the pixelpi,jis robust under the attack (line 10). IfRf(i,j)6=y(i,j)∧y(i,j)6⊂Rf(i,j), the pixelpi,j is unrobust under the attack (line 12). Otherwise, the robustness of the pixelpi,jisunknown (may be robust or unrobust), due to overapproximation. Beyond verifying the robustness of each pixel in the reachable set, it also counts the numbers of 1) robust pixelsNrobust(line 10), 2) unrobust pixelsNunrobust(line 12), and 3) pixels with unknown robustnessNunknown(line 13). Finally, it computes the robustness value and sensitivity of the network (line 12 and 13).
Average Robustness Value and Sensitivity. The robustness of a network under an UBAA should be evaluated on a set of test images (Problem VI.2.2). Suppose we have a test set ofNimagesX={x1, . . . ,xN} that we want to estimate the average robustness valueRV and sensitivityRSover for the network. This can be done by computing the robustness valueRVkand sensitivityRSkon each imagexkusing Algorithm 7 and
Algorithm 7Robustness verification of a semantic segmentation network.
Input: f,x,A,method .network, input image, attack, reachability method
Output: Rf,RV,RS .pixel-class reachable set, robustness value, robustness sensitivity 1: procedure [Rf,RV,RS] = V E R I F Y(f,x,A,method)
2: I=constructInputSet(x,A) .construct an ImageStar input set 3: Rf=Reach(f,I,method) .compute the pixel-class reachable set 4: y=f(x).compute non-attacked output segmentation image 5: H=x.Height,W=x.Width
6: Nrobust=0,Nunrobust=0,Nunknown=0,Nattacked pixels=0 7: fori=1 :Hdo
8: fori=1 :Wdo
9: ifA.xnoise(i,j)6=0thenNattacked pixels=Nattacked pixels+1
10: ifRf(i,j) =y(i,j)thenNrobust=Nrobust+1 .the pixelxi jis robust under the attack
11: else
12: ify(i,j)6⊂Rf(i,j)thenNunrobust=Nunrobust+1 .the pixelxi jis unrobust 13: elseNunknown=Nunknown+1 .the pixelxi jrobustness is unknown
14: RV= (Nrobust/(H×W))×100%) .robustness value
15: RS= (Nunrobust+Nunknown)/Nattacked pixels .robustness sensitivity
then taking the average of these values.
VI.3 Evaluation