Chapter 7
Robust 6 DOF Motion Estimation for Non-Overlapping Multi-Camera Rigs
Motion estimation of multi-camera systems has become of more interest with use of these systems for the capture of ground based or indoor video data to allow a reconstruction of the whole surrounding environment [83]. Capture systems need to provide a large field of view horizontally and vertically to cover the upper hemisphere of the environment. An efficient system, for example, is often build of two wide field of view cameras rigidly coupled together.
Alternatively, each wide field of view camera is replaced by a camera cluster. To closely approximate the wide field of view camera, the optical centres of the cameras are as close together as possible and the cameras have no or very small overlap. This avoids parallax effects in between cameras. There are also systems that only consist of one camera cluster that captures the whole upper hemisphere. As we will show in our analysis, it is generally very challenging to recover a 6 degrees of freedom (DOF) motion for the latter type of cameras.
An example of a multi-camera system for the capture of ground based video is shown in Figure 7.1. This system consists of two camera clusters on each side of a vehicle. The cameras are attached tightly to the vehicle. Accordingly, they move in rigid motions. The shown system will later be used for experimental evaluation of our approach.
In this chapter, related work is discussed in the next section, and our novel 6 degrees of freedom motion estimation method for non-overlapping multi-camera rigs is introduced in section 7.3. In section 7.5, experiments with synthetic and real data are carried out.
84
§7.1 Related work 85
Figure 7.1: Example of a multi-camera system on a vehicle. (Courtesy of UNC-Chapel Hill)
7.1 Related work
There has been a lot of study on the motion estimation of multi-camera systems [58, 14, 80].
Some approaches use stereo/multi-camera systems to estimate the ego-motion of the camera system. Nist´er et al. proposed a technique that uses a calibrated stereo camera system for visual navigation in [58]. They used the stereo camera system to recover 3D points up to an unknown orientation. Frahm et al. introduced a 6 degrees of freedom estimation technique for a multi- camera system [14]. Their approach assumed overlapping views of the cameras to obtain the scale of the camera motion. Tariq and Dellaert proposed a 6 degrees of freedom tracker for a multi-camera system for head tracking using nonlinear optimization [80].
In this chapter, we propose an algorithm estimating 6 degrees of freedom motion of multi- camera systems. However, it does not require to have overlapping views and does not need to know the positions of the observed scene. In other words, 3D structure reconstruction is not required to estimate the 6 degrees of freedom motion.
Another type of approach is based on the generalized camera model [18, 60]. A stereo or multi-camera system is an example of generalized cameras. A generalized camera is a type of camera which may have different centres of projection. Without loss of generality, general- ized cameras also can represent a type of single central projection camera. The single central projection cameras are an ordinary type of camera having all centres of projection identical.
Nowadays, they are widely used by general customers. Accordingly, multi-camera systems
§7.1 Related work 86
x
X x X
(a) (b)
Figure 7.2: (a) Ordinary camera and (b) Generalized camera.
can be considered as a type of generalized cameras having multiple centres of projection for each physical camera [18, 60]. Figure 7.2 shows an illustration of an ordinary camera and a generalized camera.
The concept of generalized cameras was proposed by Grossberg and Nayar in [18]. Sturm showed a hierarchy of generalized camera models and multiview linear relations for general- ized cameras [77]. A solution for the motion of a generalized camera is proposed by Stew´enius et al [74]. They showed that there are up to 64 solutions for the relative position of two gen- eralized cameras given 6 point correspondences. Their method delivers a rotation, translation and scale of a freely moving generalized camera. One of the limitations of the approach is that centres of projection can not be collinear. It means that their method can not solve a motion for the axial case of generalized cameras. The definition of axial cameras is shown in [77]. This limitation naturally excludes all two camera systems as well as a system of two camera clus- ters where the cameras of the cluster have approximately the same centre of projection. The approach of Stew´enius et al. can not estimate the camera motion for pure translation at all, and the algorithm fails to give any result. Our method also can be affected by the pure translation, and may not return the 6 degrees of freedom of motion. However, at least, for the pure transla- tion case, our proposed method can estimate 5 degrees of freedom motion without the scale of translation. Our method also uses 6 points to estimate the 6 degrees of freedom motion. The next section will introduce our novel approach for the 6 degrees of freedom estimation of a multi-camera system.
§7.2 6 DOF multi-camera motion 87
X X
(a) (b)
Figure 7.3: Examples of a generalized camera. (a) Six rays never meet each other. It could be considered as that 6 rays are projected by each different camera. (b) Five rays meet on a single centre of projection and another ray does not meet the centre.
7.2 6 DOF multi-camera motion
The proposed approach addresses the motion estimation of multi-camera systems. Multi- camera systems may have multiple centres of projection. In other words, they may consist of multiple conventional (central projection) cameras. For instance, a stereo system is one of the examples of multi-camera systems. However, multi-camera systems could have little overlapping views, for example, such as an omni-directional camera, LadybugTM2 [32]. These multi-camera systems are examples of generalized cameras. The most general type of gener- alized cameras may not have common centre of projection as shown in Figure 7.3. However, that case is rare in real applications. Practically, multi-camera systems are more frequently used. Our technique assumes that we observe at least five correspondences from one of the cameras and one correspondence from any additional camera. In practice this assumption is not a limitation as a reliable estimation of camera motion requires multiple correspondences due to noise.
Suppose that there is a set of calibrated cameras moving from one position to another. An essential matrix which describes the epipolar geometry of the calibrated camera can be esti- mated from five point correspondences in one camera. Nist´er proposed an efficient algorithm for this estimation in [56]. It delivers up to ten valid solutions for the epipolar geometry. The ambiguity can be eliminated with one additional point correspondence. A rotation and a trans- lation (up to scale) of the motion of the camera can be extracted from the essential matrix.