Estimation of disparity map from stereo image pairs in presence of occlusion

102 5.1 Comparison (with MSE and CC) of local features (overlapping and non-overlapping . regions) for different window sizes. 115 5.2 Comparison (by SSI and QI) of local features (overlapping and non-overlapping regions) for different window sizes.

Image Formation Models

Image formation in a single camera-based setup

Each transformation is a transition from one coordinate system to another, which is stated as follows: i) The transformation from the point xw to xc is performed by translating it by the vector t, and then rotating it by the matrix R, expressed as follows: In simple terms, the three-dimensional point can be projected onto the image plane by multiplying this point by the camera matrix.

Figure 1.1: The geometry of a linear perspective camera system [1].

Image formation in a stereo vision setup

Epipolar geometry
Rectification
Triangulation
Relation between depth information and disparity value
Single camera-based depth estimation

The mapping of points in one image plane to their corresponding point on the epipolar lines in the other image can be performed using epipolar geometry (epipolar constraint). For the three-dimensional world point in one image, its corresponding corresponding points in the other.

Figure 1.3: Image formation in a stereo vision setup (Epipolar geometry).

Applications of Stereo Vision

These two objects have different disparity values despite having the same color, as can be seen in Figure 1.7(c). The size of an object, the distance between two objects, and the distance of an object from the camera can be determined in a stereo vision setup.

Figure 1.7: Accurate image segmentation using three-dimensional information. (a) Left image; (b) Right image; (c) Disparity map; (d) Colour-based segmentation [3]; (e) Disparity map-based segmentation [4].

Basics of Stereo Correspondence

Issues of Accurate Disparity Map Estimation

Occlusions
Photometric variations
Image sensor noise
Specularities and reflections
Foreshortening effect
Perspective distortions
Textureless regions
Repetitive structures
Discontinuity

So the disparity map obtained from stereo image pairs may not provide actual specular surface information. In general, stereo correspondence methods assume that the areas of objects in both stereo images are the same.

Figure 1.9: Presence of occlusion is highlighted with red and yellow colour boxes in the Teddy stereo images from the Middlebury dataset.

Organization of the Thesis

The matching costs of the local regions are then combined using a two-step filtering process to estimate the disparity map. Disparity map estimation is one of the techniques used to extract three-dimensional information of a scene.

Figure 1.18: Discontinuous regions in stereo images. (a) Left image; (b) Right image; (c) Discontinuous regions [7].

The Basic Principle of Finding a Disparity Map

Global Algorithms

Data term

It is completely pointless to calculate the match price for occluded pixels, since those pixels are present in one image and not in the other. According to this procedure, if two pixels of the left image correspond to the same pixel of the right image, then the pixel that has a smaller discrepancy value is considered the hidden pixel.

Table 2.1: Metrics used for finding the matching cost value

Smoothness term

In equation (2.12), if both pixels have the same color, a higher weight value is assigned and vice versa. This problem can be overcome by considering more adjacent pixels that slightly modify the smoothness term given in equation (2.11).

Optimization

Dynamic programming
Graph cut
Belief propagation

To overcome this, a heuristic method is proposed which incorporates vertical smoothness into the optimization technique [45]. A message from pixel p to pixel q encodes the belief of p around q, i.e. the probability of q at a particular disparity value.

Local Algorithms

Problem with different window sizes

Choosing an appropriate window
Adaptive window size and multi-resolution approaches
Cost aggregation

On the other hand, a fixed window is applied for the pixels of the non-boundary areas. The bilateral filter depends on the histogram of the difference image which is independent of the window size.

Problem of finding a disparity map for varying illumination

Gabor phase-based stereo correspondence
Adaptive normalized cross correlation
Mutual information-based matching

The ambiguities in a phase-based disparity map estimation method arise due to the presence of singularities in phase information. In [98], phase information is obtained by convolving the input images with the Gabor filter at different scales.

Occlusion Detection and Filling

Occlusion detection

There is a sudden change in the disparity value of the occluded surface with respect to the background. In this method, if the disparity values of a pair of corresponding pixels are different, the pixel in the reference image is considered as the occluded pixel. The x-coordinate pest1 of the estimated left pixel pest corresponding to the tox coordinate p′1 of pixelp′ in the right image is given.

Occlusion filling

It is worth noting that LRC is the most used algorithm for occlusion detection for the last decades. Bimodality and goodness-of-fit jumps can detect boundary-closed regions, while LRC, ORD, and OCC algorithms can detect all semi-closed regions [110]. Although the above-mentioned methods are able to detect occluded pixels, these methods also detect many correctly matched pixels as occluded pixels.

Summary

Apparently, the degradation of the disparity map accuracy is due to the choice of a poor energy function. The performance of local methods is comparable to that of the global methods after incorporating cost aggregation step in stereocorrespondence methods. Ambiguity in the disparity map occurs mainly due to the fact that the features used to find the matching pixels fail to distinguish between the matching pixel and its neighboring pixels.

Motivation of the Thesis

Objective of the Thesis

Stereo matching constraints and assumptions

This constraint states that the matching point of a pixel in the left image lies on the corresponding epipolar line in the right image. This constraint claims that there is at most one matching pixel in the right image that corresponds to every pixel in the left image. A segment Sl in the left image with spatial orientation θl corresponds to a segmentSr in the right image with spatial orientation θr if the following condition holds:. ii).

Figure 3.1: An example where uniqueness constraint fails.

General steps of disparity map computation

Matching cost computation
Cost aggregation
Disparity computation/optimization
Disparity map refinement

This calculation gives a range of values for each of the pixels in the reference image for different disparity values. Cost aggregation for pixelp is performed by combining the cost values of all the pixels in the support region. The disparity map is obtained by determining the disparity dp of all the pixels p in the reference image.

Figure 3.4: General steps of stereo correspondence methods.

A brief overview of existing stereo correspondence algorithms

In addition, the size of the eigenvector depends on the number of histogram bins. The computational complexity of this method depends on the size of the image and the number of labels used. proposed a cost pooling method based on a linear model [120]. Therefore, in the proposed method, PCA is used to reduce the dimensionality of the wavelet Gabor coefficients.

Proposed Local Stereo Matching Method

Cost aggregation
Disparity computation
Disparity map refinement

This improved performance of the proposed feature is due to the fact that the directional features are extracted at different orientations and scaling. To show the performance of the proposed cost aggregation method, a guided filter is used instead of the Kuwahara and median filter combination for cost aggregation [11], and the comparative result is shown in the table. This is achieved by taking the index of the minimum value of the aggregated costs.

Figure 3.9: Gabor wavelet kernel (real part). (a)-(d) for scale 2; (e)-(h) for scale 5; (a) and (e) for theta 0 ◦ ; (b) and (f) for theta 45 ◦ ; (c) and (g) for theta 90 ◦ ; (d) and (h) for theta 135 ◦ .

Datasets used for Evaluation

In the occlusion filling step, min(dl, dr) is assigned to an occluded pixel, where dl and dr are the disparity values of neighboring non-occluded left and right pixels. This includes the disparity map calculated without cost aggregation, the disparity map obtained after cost aggregation only by the Kuwahara filter, the disparity map obtained after cost aggregation using the combination of Kuwahara and median filters, and the final disparity map obtained after refinement. a) Inequality map calculated without cost aggregation; (b) Disparity map obtained after cost aggregation of Kuwahara filter only. (c) Disparity map obtained after cost aggregation using the combination of Kuwahara and median filters (before refinement); (d) Final disparity map obtained after refinement.

Figure 3.17: Middlebury stereo standard dataset. Left to right - Tsukuba, Venus, Teddy, and Cones images.

Evaluation Methodology

In this figure, white color shows the whole image, while black color is assigned to unknown regions. In this figure, the white color indicates the interrupted regions, the black color refers to the closed regions, and the gray color is used to indicate the remaining regions of the image. The percentage of bad pixels is calculated in the three critical image regions above for four standard Middlebury stereo images.

Experimental Results

Variation in Kuwahara filter window size: The error rate for Tsukuba, Venus, Teddy, and Cones images is shown in Figure 3.23 for different Kuwahara filter window sizes. Variation of the median filter window size: Figure 3.24 shows the error rate for different median filter window sizes. Variation of the number of principal components: Figure 3.25 shows the error rate for different numbers of principal components used for local stereo correspondences for all.

Table 3.7: Comparison of the proposed method with existing local stereo matching methods (Error threshold

Summary

In most of the methods, occluded pixels are detected only after the estimation of initial disparity map. In bimodality, the neighborhood of the occluded pixels has disparity values for both non-occluded and occluded areas. This change in disparity values corresponds to the occluded pixels in the second image and vice versa.

Figure 3.27: Variations of number of Gabor wavelet filter scaling. (a) Tsukuba, (b) Venus, (c) Teddy, and (d) Cones.

Background

Occluded regions are only visible in one image of the stereopair and invisible in the other image. Invisibility is caused by the geometry of the scene and the own and/or mutual occlusion of the objects in the scene. In this setup, CL and CR are the camera centers, and BL1, BL2, BR1, and BR2 are the background objects present in the scene.

Figure 4.1: General stereo vision set-up [14].

Proposed Method for Occlusion Detection and Filling

Cost aggregation
Disparity map computation
Proposed linear regression-based asymmetric occlusion detection (LAOD) method 90
Disparity refinement

Gabor wavelet based feature is used to find the corresponding matching pixel in the target image. Finally, the disparity value of the selected pixel is assigned to the pixel targeted for filling. 4.13) where ∆cpq is the color disparity of the pixelqfrom,pis the pixel under consideration,q is the non-closed pixel in the neighborhood area Np, and γc is a constant.

Figure 4.4: Stereo vision setup for different types of occlusions [14].

Experimental Results

Np|is the number of pixels in window Np, and ε is a user-defined smoothness parameter. In the next section, we have explained how our method is also suitable for the detection of occluded pixels in a horizontally inclined surface. In the areas with horizontally inclined surfaces, many pixels in the reference image correspond to a single pixel in.

Figure 4.11: Comparison of disparity maps estimated by our proposed LASW method and the method proposed in [15]

Summary

In this chapter, two main characteristics of Gabor features are specified, namely: (i) The real coefficients of the Gabor wavelet are sufficient to represent the image; and (ii) Local Gabor wavelet features with overlapping regions represent the image more accurately compared to global Gabor features and local features extracted for non-overlapping regions. Experimental results show that local Gabor wavelet features extracted from overlapping regions represent the image more effectively than global and non-overlapping region-based features. The performance of this local Gabor Waltz function is compared with global Gabor functions.

Basics of Gabor Wavelet

It is again observed that local Gabor features for overlapping regions can represent an image more accurately compared to other two counterparts. The robustness of all three Gabor features is analyzed for radiometric variations in a scene, and we found that the real coefficients of local Gabor features for overlapping regions are more robust compared to Gabor features extracted from the imaginary part or size information. This method is also significantly better than the local Gabor features for non-overlapping regions and the global features.

Global Gabor Wavelet Feature (GGWF) Extraction

Input image, image represented only with the real coefficients, image represented only using the imaginary coefficients, and the image represented using the size information are shown from left to right in this figure. In this figure, input image, image represented by the real coefficients, image represented using the imaginary coefficients, and the image represented using the size information form= 2, n= 2 in Equations and (5.6) from the left shown to the right. Input image, image reconstructed using only the real coefficients, image reconstructed using only the imaginary coefficients, and the image reconstructed using the size information are shown from left to right in this figure.

Figure 5.1: Gabor wavelet filtered images. First row - Global Gabor filtered images (GGWF) and second row - local Gabor filtered images (LGWF) for overlapping regions

Local Gabor Wavelet Feature (LGWF) Extraction

The reconstructed images using real coefficients, imaginary coefficients and magnitude information using Equation (5.9) are shown in the second row of Figure 5.2. The input image is divided into sub-regions of size u1 × v1 to find LGWFs from non-overlapping regions. Equations with small modifications can also be used to represent and reconstruct the original image for non-overlapping regions.

Experimental Results

Different window sizes
Different number of orientations
Different number of scalings
Synthetic illumination changes
Real radiometric changes
Performance evaluation of Gabor features for stereo correspondence

Performance of the features for overlapping regions decreases with the increase of window size, while the performance of features extracted from non-overlapping regions increases with the increase of window size. In the case of the local feature, few number of orientations are sufficient enough to represent the pixel variations in the image patches. To evaluate the performance of these features under different lighting conditions, the intensity values of the pixels of the real image are varied.

Table 5.1: Comparison (by MSE and CC) of the local features (both overlapping and non-overlapping regions) for different window sizes

Summary

A state-of-the-art review of the existing literature on disparity map estimation methods was presented. Therefore, finding the matching pixels for all pixels in the occluded areas is a difficult task. To calculate the real coefficients, only the real part of the Gabor filter needs to be stored in memory.

Possible Extensions

It has also been mentioned in previous literature that the optimal performance of a two-dimensional Gabor filter can be achieved by using the real part of the filter. A plane wave with frequency (ξ0, ν0) tends to the direction of propagation along the short axis of the elliptical Gaussian. Each of these two families of Gabor waves can be created by rotating and dilating (affine group) the parent Gabor wave as follows:

Linear Regression

The geometry of a linear perspective camera system [1]
Calculation of the x-coordinate of the projected point in the image plane by using
Image formation in a stereo vision setup (Epipolar geometry)
Stereo images rectification
Stereo images before and after rectification. (a) Reference image before rectification;
Elementary stereo geometry in the rectified configuration [1]
Accurate image segmentation using three-dimensional information. (a) Left image; (b)
Presence of occlusion is highlighted with red and yellow colour boxes in the Teddy stereo
Photometric variations in a stereo image pair. (a) Left image; (b) Right image [5]
Stereo images affected by noises. (a) Left image; (b) Right image [6]
Specular surfaces in (a) Left image; (b) Right image [5]
Specular reflections in (a) Left image; (b) Right image [5]
Foreshortening areas for two different viewpoints [5]
Stereo images having perspective distortions. (a) Left image; (b) Right image; (c)
Presence of textureless regions in a stereo image pair. [5]
Presence of repetitive structures in a stereo image pair [5]
Discontinuous regions in stereo images. (a) Left image; (b) Right image; (c) Discontin-
General block diagram for computing a disparity map
Pictorial illustration of census transform
An example where uniqueness constraint fails
Illustration of ordering constraint in two scenarios
Cyclopean distance [8]
General steps of stereo correspondence methods
Cost aggregation
Disparity computation
Block diagram of the proposed disparity map estimation method
Gabor wavelet kernel (real part). (a)-(d) for scale 2; (e)-(h) for scale 5; (a) and (e) for
Local Gabor wavelet feature extraction
Role of real and imaginary coefficients of Gabor wavelet on disparity map. (a) Left
Subregions of Kuwahara filtering
Behaviour of Kuwahara filter at boundary regions
Disparity space image filtering. (a) Disparity space image (d = 1); (b) Filtering by
Intermediate results. (a) Disparity map computed without cost aggregation; (b) Dis-
Middlebury stereo standard dataset. Left to right - Tsukuba, Venus, Teddy, and Cones
Middlebury stereo dataset (2005). Left to right - Cloth1, Books, Dolls, Laundary,
Middlebury stereo dataset showing non-occluded, all, and discontinuous regions
Experimental results on 2005 Middlebury datasets - Cloth1, Book, Dolls, Laundary,
Variations of local stereo window size. (a) Tsukuba, (b) Venus, (c) Teddy, and (d) Cones. 76
Variations of Median filter window size. (a) Tsukuba, (b) Venus, (c) Teddy, and (d)
Variations of number of principal components. (a) Tsukuba, (b) Venus, (c) Teddy, and
Variations of number of Gabor wavelet filter orientations. (a) Tsukuba, (b) Venus, (c)
Variations of number of Gabor wavelet filter scaling. (a) Tsukuba, (b) Venus, (c) Teddy,
Average percentage of bad pixels. (a) Variation of local stereo window size, (b) Variation
General stereo vision set-up [14]
Left disparity map showing the ground truth occluded pixels. Border occlusion (blue
Different types of occlusion [14]
Stereo vision setup for different types of occlusions [14]
Block diagram of the proposed occlusion detection method
Example showing the case when a pixel not satisfies continuity, ordering, and uniqueness
Detected occluded pixels (shown by black colour) by proposed LAOD and LRC methods
Block diagram of the proposed occlusion filling method

Zhang et al., “Shape from shading: A survey,” Pattern analysis and machine intelligence, IEEE Transactions on, vol. Lempitsky et al., "Fusion moves for Markov random field optimization," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. Yang et al., “Stereo matching with color-weighted correlation, hierarchical belief distribution, and occlusion handling,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.