Results - Extracting the Object - Video object segmentation and tracking.

Chapter I: Introduction

Chapter 5: Extracting the Object

5.2 Results

Figure 5-6: Sample frame to illustrate sequence, ball rolling

..-,\

^~ ^,

, _~ ₎

~ ^... ^}^':...!..- ~

./

. .---

\

./ .i

Figure 5-7: Final models as generated for a selection of frames from sequence, ball rolling

)

Figure 5-8: Corresponding extracted objects for the same sequence.

Figure 5-6 illustrates a sample frame from the sequence ball rolling which was used in the previous chapter. The video scene is that of a ball rolling across the view of the stationary camera. Figure 5-7 shows the final model for a series of frames and Figure 5-8 the corresponding extracted objects. The results show inherent problems of the algorithm regarding shadows, revealed background and objects moving out of the view of the camera object. The first four frames show the effect of shadows on the object extraction. Since the shadows are not compensated for they are considered to be part of the object and hence the parts of the background on which the shadows fall are also included as part of the object. In Chien et al [Chien, 2002], a morphological gradient filter was applied to reduce the effect of shadows. This approach does prove effective when the background is smooth as the shadow results in a gradual change of the luminance values of the background. Hence they are

"removed" by the gradient filter. The approach however fails in the presence of a textured or cluttered background as the effect of changing illumination is more prominent and comparable to the change effect of motion itself and hence not effectively "removed" by the gradient filter. The fifth and sixth frames show the distOliion of the object by uncovered background objects. The uncovered object in question is the gutter outlet to the left of the ball. Without further a priori knowledge regarding the contour of the object, one cannot confidently exclude or remove the uncovered background from the extracted object. The last frame in the sequence shows the failure of the object extraction as the object moves out of the view of the camera. Insufficient contour information is available in the scene for accurate object extraction.

The problems highlighted in the previous paragraph provide motivation for more a prlOrl information for reliable semantic object extraction. For example, if it was known a priori that the moving object was a rigid object and a subset of possible shapes were available, then a best fit algorithm would yield more accurate object extraction than the results presented.

Similar problems were observed in other sequences tested.

(a) (b)

Figure 5-9: (a) Sample frame of MPEG 4 Sequence, Akiyo. (b) Final model of moving object. (c) Closed contour of final object model. (d) Extracted object

The results for a sample frame from the sequence, Akiyo are presented in Figure 5-9. As may be observed from the extracted object in Figure 5-9(d), the bottom boundary of the frame forms the bottommost boundary of the object. Hence, "hollowing" out the bottom boundary would provide worse results instead of better. To overcome this problem, the choice to remove these pixels was amended to be a user input parameter as it depends on scene content. The Akiyo sequence is not affected by shadows and the background is neither textured nor very cluttered. The movement of the object is also within a limited range with very little background being uncovered. Hence, the results for the object extraction of sequence, Akiyo is good. A sample of the extracted objects from the sequence is presented in Figure 5-10. Small portions of the background above her head are included as this area is continuously occluded and revealed as she moves her head in the sequence.

Figure 5-10: Sample of extracted objects from MPEG 4 video test sequence, Akiyo

Figure 5-11 shows the final model for a sample of frames from the sequence, Hall Monitor and the corresponding extracted objects are presented in Figure 5-12. In this sequence the effect of a cluttered background is more prominent. As the man moves down the corridor, the background becomes covered and then uncovered. While the temporal filter discussed in the previous chapter does reduce the amount of background clutter, the portions of the uncovered background in the immediate vicinity of the moving object are extracted as part of the object. These portions of the background may only be removed with further lmowledge of the object of interest.

·Ir:

b ^~1

;I~

_._~~^'\l _. ^II ^-~^I

^.fJ

^'{,~

^.

I~?

>""

tJ

^\A^·.,

\L ^~

r'" '

-,.

Figure 5-11: Final models as generated for a selection of frames from MPEG 4 video test sequence, Hall Monitor

Figure 5-12: Corresponding extracted objects for the same sequence

In this chapter a novel, intuitive method of tracing the outer contour of the object as obtained from the tracked model and compensating for breaks in the contour was presented. After contour closing, the object extraction becomes trivial and the results of object extraction for different sequences were presented. One may conclude from testing that the problem of object extraction is certainly an ill- posed one. While good results are obtained under constrained conditions such as sequences where the object casts no shadows on the background and minimal occlusion and uncovering of the background occurs, the majority of sequences yield unsatisfactory results. Gradient filters have been used in the literature to eliminate the effect of shadows, however their effectiveness is limited. It is difficult to envision an object extraction algorithm that is able to discern the exact contour of the object of interest without further knowledge of the object's features. Such functionality would require a shift from object detection to object recognition. The human visual system performs such segmentation by effectively having an enorn10US, high speed database of semantically meaningful objects and it relates low level features like motion, shape and texture to the objects in the database.

A possible way forward at this point, would be to constrain the scope of the problem to be application specific and then form a database of possible objects that may be detected and perform object extraction as the human visual system does, such an approach however, would be a divergence from the original aim of the research.

Dalam dokumen Video object segmentation and tracking. (Halaman 92-98)

Results