Instead of using two static cameras, our hybrid camera system consists of a static wide-angle camera and a PTZ camera. This thesis/dissertation entitled "NEW SIFT-BASED CALIBRATION METHODS FOR HYBRID CAMERA SYSTEM" was prepared by LOW YI QIAN and submitted in partial fulfillment of the requirements for the degree of Master of Engineering Sciences at Universiti Tunku Abdul Rahman.
Background and Introduction
By comparing image information from vantage points, we can implement 3D reconstruction, scene analysis, and other depth-related applications (Dingrui Wan &. However, these may also increase the camera cost, processing power, and architectural complexity of the application.
Motivation
For example, in a 15' x 15' room, as shown in Figure 1.3 below, a static wide-angle camera using a 4mm lens (green arrows) provides better wide-angle viewing coverage than a PTZ camera. As shown in Figure 1.4, hybrid camera geometry imposes constraints on finding the positions of the PTZ camera views.
Objective
Calibration Process for Hybrid Cameras: Each camera has a different type of optics and sensor resolution, and while they can capture similar information, they all obtain images with different distortions. Automated Calibration Process: Instead of calibrating the hybrid camera manually, we developed a test bed system with an algorithm and mechanism that automatically performs calibration under different environments and conditions.
Contribution
The challenge is how we can improve the accuracy of the sewing process in the automated system. Finally, we implemented and evaluated the calibration method on hybrid camera system based on SIFT features on our test bed.
Outline of Thesis
Introduction
Computer vision is a field that involves the processing, analysis, and understanding of images, which are the extracted features of the 3-dimensional (3-D) world, to achieve results and effects similar to human vision. Stereovision is also defined as two different perspectives of human eyes leading to a slight relative displacement of objects in the two monocular views of the scene.
Image Information with Geometry Perspective
The intersection of the lights with the image plane will form the image of the object, this is called perspective projection. No image inversion assuming the image plane is in front of the center of projection.
Stereo Camera Network Calibration
The approach requires training the neural network for a set of matched image points, the correspondence of which is known. Instead, the system is trained so that it learns to directly find the objects' correspondences.
Scale Invariant Features Transform (SIFT)
- Scale-space extrema detection
 - Key Point Localization
 - Orientation Assignment
 - Key Point Descriptor
 
It can work rapidly to determine the exact match of keypoint descriptors with good proximity in the large feature database. Principal curvatures can be calculated from a Hessian matrix from Harris and Stephens (1988), calculated on the location and scale of key points. These are weighted using keypoint scales to select the level of Gaussian blur for the image in the nested circle.
To achieve orientation invariance, the coordinates of the descriptor and gradient orientations are rotated relative to the keypoint orientation.
Hough Transformation
The Hough transformation is used to identify all clusters with at least three entries in a bin. When a large number of votes fall into the correct bin, the Hough transformation will be efficient; the bin can be easily detected among the background noise. In short, the quality of the input data has a major impact on the efficiency of the Hough transformation.
For the Hough transform to be efficient, the edges of the images must be properly detected.
Robust Homography Estimation using RANSAC
An advantage of RANSAC is that it performs robust model parameter estimation. For example, when an outlier is detected in the data set, it has high accuracy to estimate the parameters. On the other hand, the disadvantage of RANSAC is that there is no limit on the time it takes to calculate these parameters. Another disadvantage is that it requires problem-specific thresholding, which is a common disadvantage in most current image processing solutions.
However, the Hough transform is an alternative robust estimation technique that is useful when more than one model instance is present in the data set.
Affine Transformation
Perspective irregularities introduce geometric distortion that the subjects for image acquisition are in the position of the PTZ camera relative to the scene that changes the apparent dimensions of the scene geometry. A uniformly distorted image can be corrected by applying an affine transformation to account for a variety of perspective distortions by transforming measurements from exceptional coordinates to those currently used. The perspective problem can be overcome if we construct a shape description which is invariant to perspective projection.
Many interesting tasks in model-based computer vision can be performed without the use of Euclidean shape descriptions and use descriptions that involve relative measurements.
Geometry Transformation based on ImTrans (MATLAB)
Introduction
Equipment Setup
Pan-Tilt-Zoom Camera
Pelco-D Protocol
The syn bit serves as an indicator to determine the value in bits 4 and 3 (Protocol Manual, 2011). When the observe bit is on, and bits 4 and 3 are on, the command will enable autoscan and turn on the camera. However, bit 4 and 3 are on the command will enable manual scan and turn off the camera when the observe bit is off.
The pan speed ranges from 00 (Stop) to 3F (Medium Speed) while FF is maximum speed.
RS 485 Transmitter
Then, the twisted pair on Rx+ and Rx- (Receive + and -) is connecting to the PTZ side. The appropriate protocol for the PTZ camera command interface must be set, for example the baud rate is 2400 bps for pelco 'D' and the camera ID, e.g. At this point, the hybrid camera system is able to control the PTZ camera and also acquire the images.
Introduction
Instead of capturing an image from a static camera, the PTZ camera captures the image of the scene with different pan, tilt and zoom values to produce more detail of the environment. Based on Figure 4.1, the relationship between the static camera and the PTZ camera is defined as. In a hybrid camera system, the PTZ camera collects more than one image to compare with a static camera.
PTZ camera Geometry
Since a panorama image has a much wider field of view, we proceed to a SIFT matching process to estimate the master camera position of the images composing the panoramas. Due to the capability limitation of the matching technique, panoramas are generated at different zoom parameters to cover a large range of scale change. For example figure 4.5 and figure 4.6 show the dataset of images of different zoom parameters.
Image robustness will cause a mismatch between the calibration of the master and slave cameras, because the texture and scale of the object may have a large different index.
Images Matching
For image matching and recognition, the new image is compared to the database based on their feature vectors. The matches are called keypoints, and each keypoint consists of the smallest Euclidean distance between the features. This can be done quickly to identify the correct keypoint descriptors with good proximity in the feature database.
However, in a cluttered view, many features can affect the accuracy of correct matches and result in false matches of key points.
Experiments and Results
Experiment I – SIFT algorithm detection
These images were subjected to the SIFT algorithm to identify key points of correspondence between them. The resolution of the images dropped by 10 percent with each experiment, while the smallest resolution was 70x58. When the data set followed the SIFT algorithm, the key points in each image were identified.
Results Experiment I(a)(b)(c)
From the graph above, we found that the number of key points initially increases with the image resolution. In Figure 4.12, it is also significantly proven that image (a), which has high texture and high contrast, found the most key points. As a conclusion, we can find that the processing times in Table 4.2 and plotted in Figure 4.13 for different resolutions are as listed in Table 4.1 and plotted in Figure 4.12.
The number of output key points is affected by the size of the resolution and also the texture of the image.
Experiment II - Image stitching based on Hough Transformation
First, we proceed with the laboratory data set which had better image quality compared to corridor. Then, though transformation identified the key point positions of the arbitrary shapes, and classified them into inliers and outliers as shown in figure 4.15. As we know, although transformation identified the key point positions of the arbitrary shapes, the robust affine alignment was achieved by poor texture information and overexposed image quality.
The images were aligned with the time stamp printed on the image due to detected key points of high intensity and high shape similarity.
Robust estimation based on RANSAC and ImTrans
This procedure would have enabled both images to have similar perspective ratio as shown in figure 4.21 below.
Limitations and Solutions
The weakness of using this technique is that as the threshold increases, the number of false key points also increases. The detection rate of the image is affected by the objects with a high degree of similarity, as shown in Figure 4.22 below. By calculating the gradient of the pairs, we successfully rejected most of the unselected key points, as in Figure 4-23.
If the bin is too small, votes will fall into neighboring bins, reducing the visibility of the main bin. This is the reason why building the corridor dataset image failed.
Conclusion
Second, the geometric constraints for multiple views are stronger than their pairwise counterparts and this allows more incorrect matches to be rejected. Finally, we can exploit the probabilistic nature of an image matching problem by using known mismatches in a data-driven classifier.
Future Work
In this project, a further enhancement of image background clarity is required, a background learning process to distinguish between foreground and background objects. This technique will be able to solve the occlusion of objects while moving. Local Feature View Fusion for 3D Object Recognition, IEEE Conferences on Computer Vision and Pattern Recognition, p. Camera Calibration and Three-Dimensional World Stereovision Reconstruction Using Neural Networks, International Journal of System Science, p. 1155-1159.