Advances in Image Segmentation:

Earlier methods were developed in the name of the edge-based approach with which we will begin our discussion. Then we will look at models of the region-based approach that have emerged to overcome the disadvantages of the edge-based approach, such as not working well with noisy image and image with multiple objects. Since image segmentation is one of the applications that deep learning technique can outperform conventional state-of-the-art models, some important deep learning models will be introduced in detail compared to the previously discussed models of the edge-based and the region-based approaches.

Then we will show the practical results of the most recently developed model from the models we have explained. The edge of an object in an image is where the discontinuity of the function appears. Due to the non-convexity of the model, it is difficult to solve the minimization problem directly.

Edge-based method

Active Contour Model

The typical external energy is |rI(x, y)|2, which is designed to attract the contour towards the boundary of the object in the image. If it is not located near the true boundary of the object, the wrong result will be obtained. But the edge detection is to detect the boundary of the object in an image under the assumption that the location of the boundary is unknown.

Because the concavity part makes the length and curvature of the curve large, it is difficult to extract the concave part by minimizing the energy functionality of SNAKE.

Gradient Vector Field Snake Model

The first row is the result with the SNAKE model, and the second row is the result with the GVF snake model. The blue contour is the starting contour, the green is the process of moving towards the result, and the red in the right column image is the result contour. So the improved algorithms for the GVF snake have been developed, one of which will be introduced in the next section.

Improved GVF Snake Model

Region-based method

Chan-Vese Model
Applications
Terminologies
Networks

The region-based model takes advantage of the similarity within regions, as opposed to using the edge of recent models. However, it is difficult to compute the solution to this minimization problem due to discontinuities in the domains ⌦\C and C. The Chan-Vese model achieves the best constant approximation in part by finding a local minimizer of the appropriate functional.

In this case, C is the boundary of the closed set and the approximation u can be one of two values. By minimizing the functional F with respect to the c1 and c2, the optimal values of c1 and c2 are the average of the intensity values inside and outsideC, respectively. For the minimization with respect to , we use the C1(⌦) regularization of the Heaviside function and its derivative, H✏ and.

The original method solves the model by obtaining the Euler-Lagrange equation of the given functional function and discretizing it. One of the major disadvantages of the multi-phase CV model is that it first determines how many phases need to be segmented before solving the minimization problem. Texture image segmentation uses the texture tensor, which is associated with the derivative of the image[1].

For the given grayscale image I, the linear structure tensor (LST) of the image is as follows: where K is the Gaussian kernel with standard deviation, ⇤ denotes the convolution, and I = [Ix, Iy]T. Note that convolution with the Gaussian kernel makes the matrix more robust to noise. In artificial neural networks, the connection state of neurons in the brain is expressed by the connection weights of the nodes. Receptive Field A receptive field is a concept that indicates from which region of the input image the output feature is a↵ected.

Let's create a 36⇥36 matrix by adding two zeros to the border of the original input image.

Semantic Segmentation

Fully Convolutional Network

In feature-level classification, each pixel of the feature map extracted in the previous step is classified. The classified results are very coarse, as shown in the cat breed heatmap in Figure 3-2. So we should compress this rough heatmap and make it the original image size.

So the next step is to upsample via the backward convolution, also called deconvolution, to improve the coarseness of the previous result and expand it to the original image size. In other words, when the width of the original image is W and the height is H, W⇥H⇥(the number of classes+1) a dense heatmap is obtained. However, when making a coarse result closer, the details of the original image are lost and it is difficult to expect performance to any extent.

The result of the first line in Figure 3-3, FCN-32s, is obtained by simply upsampling by a factor of 32, which means that the output size will be 32 times the input size. As shown in Figure 3-4, FCN-32 results from the disappearance of many details. In the last step, the process of converting the previous result into one image segmentation result is required.

In this step, the remodeled heatmap for each class obtained in the previous step is assembled into a segmentation image. So, to get denser results, the indices from maxpooling are copied in the upsampling process instead of copying encoder features as in FCN and this network is called SegNet[19].

DeepLap V1 and V2

The first image represents a kernel associated with 1-plotted convolution, the second 2-plotted convolution, and the last 4-plotted convolution. As described above, dilated convolution can cover a large receptive field without performing pooling, so there is litter loss in the spatial dimension, and most of the weight is 0, so the computational efficiency is good. Due to these characteristics, the dilated convolution is mainly used in situations where a wide field of view is needed and there are not enough opportunities to use multiple convolutions of large kernels, such as semantic segmentation.

The dilated convolution we introduced above is used to preserve resolution, but that does not solve the multi-scale problem. The final problem that the DeepLab team has solved is the limitation of the spatial accuracy of Deep Convolutional Neural Networks (DCNN). The DeepLap team solves this problem through fully connected CRF (Conditional Random Field)[20] as post-processing.

The pairwise potential measures the compatibility of the labels at each pair of pixels and plays an important role in making a detailed prediction between pixels because the expression takes into account the similarity between pixel values and the similarity between positions. The second is the smoothness kernel, which determines the degree of smoothing depending on the proximity of the original pixel. Also, as can be seen in Figure 3-9, the more accurate the iteration, the higher the accuracy.

To summarize the overall semantic segmentation process, we first obtain the coarse map through CNN and expand the map to the original image size through bilinear interpolation. The result is the probability of the label at each pixel location, which is the unary term of the CRF.

DeepLap V3

As another improvement, the following figure shows a comparison of the simple deepening process without atrus convolution and the deepening process with atrus convolution. Summarizing image information into small feature maps, as shown in Figure 3-12(a), is difficult to use for semantic segmentation because it loses detailed information. So the DeepLab team designed a model that uses cascaded ResNet blocks for atrocious convolution and achieved better results using atrocious convolution along with specifying the desired output step.

In Figure 3-12(b), the network obtains the desired result by varying the network's deepening rate and when the desired output step is 16. As you can see in the figure, a better resolution feature map is achieved by using duplicate blocks. Figure 3-13 shows the result of DeepLab V3 for semantic segmentation in TensorFlow with the PASCAL VOC dataset.

We can see that more objects are separated from the background and even that the objects are classified separately. However, if there are more cars in the queue than shown in the following image, a problem arises because each car cannot be separated individually. In order to improve the limitation of this semantic segmentation, another segmentation technique is presented in the next section.

Other segmentation

In particular, the SNAKE model opened up a variational approach to image segmentation, from which diverse edge models have evolved. This gave rise to the so-called regional approach, which complemented the shortcomings of edge models. In addition, one of the important problems of the variational approach is the determination of the number of components, i.e. segmented regions.

For the development of image segmentation in the field of machine learning, semantic segmentation emerged due to the development of deep learning, which can not only capture objects in images, but also distinguish the types of objects. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, p.

Bresson, “A fully convex formulation of the chanvese image segmentation model,” International Journal of Computer Vision, vol. Tai, “Efficient global minimization for multi-phase chan-vese image segmentation model,” in International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, p. Zeng, “A two-stage image segmentation method using the convex variant mumford–shah model and thresholding,” SIAM Journal on Imaging Sciences, vol.

Kim, "'n Konvekse ontspanning van die ambrosio-tortorelli elliptiese funksionaliteite vir die mumford-shah funksionele," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. Vese, "Mumford and shah model and its applications to image segmentation and image restoration. ,” in Handbook of Mathematical Methods in Imaging, pp.