Design and Implementation of Stereo Vision System for Visual Tracking System Using FPGA

(1)

______________________________________________________________________________________________

_______________________________________________________________________________________________

Design and Implementation of Stereo Vision System for Visual Tracking System Using FPGA

1Keerthi M.V, ²Mohan Kumar N, ³Shilpa N, ⁴Yashaswinini S, ⁵P.V. Rao,

1,2,3,4,5

Dept. of ECE, RRCE, Bengaluru

Email: ¹[email protected],³[email protected], ⁴[email protected],

5[email protected]/[email protected]

Abstract - Tracking system is used for the observing of persons or objects on the move and supplying a timely ordered sequence of respective location data to a model e.g.

capable to serve for depicting the motion on a display capability. Tracking of objects from images is a tough task.

Therefore stereo vision system is incorporated in proposed design so this stereo vision system helps to identify the states of object which will be helpful for tracking of objects. As tracking needs motion of the object, that part is also being incorporated in proposed design. For increasing the speed of the design efficient hardware architecture will be design using HDL. This paper presents the image processing can improve the efficiency of tracking system design and implementation of a visual servo system, which uses a stereo machine vision system to guide an omnidirectional mobile robot for real-time object tracking.

However, the computational complexity and large amount of data access make real-time processing of stereo vision challenging because of the inherent instruction cycle delay within conventional computers. The entire stereo vision process, such as rectification, stereo matching, and post- processing, is realized using a single field programmable gate array(FPGA) without the necessity of any external devices. Stereo vision has traditionally been and continues to be one of the most extensively investigated topics in computer vision. Since stereo can provide depth information, it has potential uses in many visual domains such as autonomous navigation, 3D reconstruction, object recognition, and surveillance systems. Especially, it is probably the most widely used for robot navigation in which accurate 3D information is crucial for the reliability of navigation. However, FPGAs have already shown their high performance capacity for image processing tasks especially for embedded systems. This makes the FPGAs offer great flexibility in manipulating the algorithm Keywords: Stereo Vision, FPGA, Tracking system, Image Processing.

I. INTRODUCTION

The robotics in unstructured and dynamic environments has been studied extensively for these decades. The robots applied to such challenging areas are often referred as ‘field robots’, which should be more important in the future. The applications of field robots spread widely, including construction, forestry, agriculture, mining, subsea, intelligent highways, search and rescue, military, and space. Especially, hazardous areas where humans cannot work are the important fields for such robots, represented by active volcanic fields. The robotic exploration of volcanic fields are highly desired since detailed observations on site can

provide valuable results for geoscientists to figure out the mechanism of eruptions, and for rescues to reduce the impact of disasters. In order to deploy robots into the hazardous areas without human intervention, robotic autonomy is required not only to accomplish tasks but also to keep their safety. One of the important techniques is localization, i.e., to identify their orientation and position.

Accurate localization enables a robot to explore efficiently, obtain precise data about the environment, and keep safety from collapsed obstacles and dangerous places. GPS (Ground Positioning System) is widely used for this purpose, while it can provide only position (not orientation) and it cannot be used if the satellites are not visible such as in a valley. Recently, localization approaches using vision sensors are extensively studied.

It is called ‘visual odometry (VO)’ in contrast with conventional wheel odometry (WO) that uses wheel encoders to estimate the motion and direction. The accuracy of WO is harmfully affected by wheel slips which a robot frequently experiences in unstructured natural terrain. In contrast, VO is not degraded by slips;

hence it can provide accurate pose estimation even in slippery sandy terrain or steep slopes. Meanwhile, the vision-based method may suffer in several specific cases. One of the crucial problems is the failure of visual feature tracker. Recent VO systems extract visual features from terrain and track them frame by frame [15, 4]. However, in some feature-less terrain where fewer features found on the ground surface, stable feature tracking can occasionally be difficult. The lack of well- tracked features results in poor accuracy or failure of motion estimation. Moreover, the performance of feature tracker depends on the parameters such as thresholds and window size. A field robot encounters various types of terrain while exploring natural scenes and thus, it is hard to find the best parameters that fit to a new terrain. Another challenge is the high computational cost for dealing with 2D images. In order to perform real-time operations using limited on-board processors and memories, the efficiency of algorithms must not be neglected. One will find the trade-off problem between the computational efficiency and the accuracy of estimation. Stereo vision is a vision for extraction of 3D information of objects from digital images. Tracking is an act or a process of following someone or something. Tracking of objects from images is a tough task. Therefore stereo vision system is

(2)

_______________________________________________________________________________________________

incorporated in proposed design. This stereo vision system helps to identify the states of object which will be helpful for tracking of objects. As tracking needs motion of the object, that part is also being incorporated in proposed design. For increasing the speed of the design efficient hardware architecture will be designed using HDL.

II. LITERATURE SURVEY

Stereo vision system as a advantage of a mobile robots which as reliable and effective way to extract range information from the environment. Stereo vision is a passive sensor and there is no interference with other sensor devices (when multiple robots are present in the environment). It is easily integrated with other vision routines, such as object recognition and tracking.

Implementation of stereo vision systems for mobile robots used in map building and navigation tasks, can obtain frame rates up to 5HZ by making use of special purpose hardware. A stereo system allows more flexibility and most of the work to producing a stereo range image is performed by software which can be easily adapted to variety of situations. The primary obstacle to stereo vision on fast vehicles is the time needed to compute a disparity image of sufficient resolution and reliability. Stereo vision can be tuned, through the choice of lenses and camera separation to detect obstacles at a variety of ranges, even out of 100 meters or more.

There is limited literature on visual navigation of robotic vehicles. Real time stereo systems have the capabilities of generating dense depth maps at a high rate.

Disparities from images are used for farther objects, while disparities from low resolution ones are used for closer objects. Stereo vision achieves a useful level of performance that has been willing to trade resolution, image size and accuracy to gain speed and reliability. It has higher performance computing becomes available for these vehicles and able to take immediate advantage by increasing the dimensions and depth resolution of our images. The outdoor mobile robots typically share several requirements in a ranging system that is reliable performance which is unstructured environment, high speed and physical robustness. These requirements as two applications such as low speed cross country navigation and high speed obstacle detection. Stereo vision can be used to locate an object in 3D space on a robot. It can also give valuable information about that object such as colour, texture, and patterns that can be used by intelligent machines for classification. Hardware chosen for the system includes auto-iris lenses for improved outdoor performance, s-video cameras and a four frame grabber PCI card for digitizing the analog s- video signal. Software from SRI International was used for image rectification and the calculation of camera calibration parameters. An FPGA based solution of visual based fall detection to meet stringent, a real time requirement with high accuracy was performed. Stereo matching software and hardware implementation research status based on local and

A vision system was designed to tackle the specific problems associated with such vehicles and to be integrated into the CIMAR sensor architecture. This global algorithm analysis was implemented. An experimental approach for the building of a stereo vision system was achieved. The use of FPGA provides specific hardware technologies that can be properly exploited to obtain a reconfigurable sensor system.

Realistic stereo vision system was implemented for mobile robot navigation. A stereo vision system for the autonomous vehicle designed for off-road navigation at CIMAR was developed. Work focused on selecting the hardware and developing software for outputting CIMAR Smart Sensor traverse ability grids using stereo vision. It provided a hardware setup, an algorithm for computing traverse ability grids and an optimal set of stereo processing parameters. This was a starting point for a more robust stereo vision system to be developed at CIMAR. Future work should attempt to increase the field of view and range of the system.

The simplest way to do this would be to add more cameras. These cameras should be positioned in such a way as to capture data from different regions around the vehicle. Using multiple pairs of cameras with different focal lengths and different baselines would increase the range.

III. PROJECT DEFINITION

Fig.1.Proposed system

VISUAL SENSOR Visual sensor is used to capture images from real-time scenarios.

FEATURE EXTRACTION: Feature Extraction is used to identify the Features in captured images from Visual sensor.

COLOR EXTRACTION: Color Extraction is used to identify the Colors present in the images in different Co- ordinates in captured images.

SHAPE EXTRACTION: Shape Extraction is used to identify the Shapes present in the images in different Co-ordinates in captured images.

SIZE EXTRACTION: Size Extraction is used to identify the Sizes present in the images in different Co-ordinates in captured images.

MOTION EXTRACTION: Motion Extraction is used to identify the Motion present in the images in different Co-ordinates in captured images.

(3)

_______________________________________________________________________________________________

IV.SEQUENTIAL FLOW

Fig.2. Sequential flow

IMAGE ACQUISITION: Real time inputs are captured from camera i.e, still images and continuous image.

IMAGE PERCIPTION: Develop simulink model for object and motion detection.

IMAGE EXTRACTION: Matlab codes are generated for both object and motion detection.

CONTROLLER DEVELOPMENT: Motor controller using verilog (FSM).

V. FLOW CHART

Fig.3. Flow chart

The top, front and the side view of the object is taken through the image in real time scenario. Then the object present in the top, front and side view of the image is detected by using the simulink model. The x and y co- ordinates of the objects present in all the three views are extracted by using the MATLAB code that we have developed. The colour of the object is extracted by comparing the pixel value of the original top view image with the 24 bit colour encoding scheme. Thus the colour of the object will be extracted. The size and shape of the object is extracted by calculating the area, length and width from the top, side and front views of the image that we have captured. The motion of the object is extracted by taking the two consecutive a top view images and omparing their x and y co-ordinates images their x and y coordinates.

VI. ARCHITECTURE DEVOLPMENT

Fig.4..Architecture of the system

The top, front and the side view of the object is taken through the image in real time scenario. Then the object present in the top, front and side view of the image is detected by using the simulink model. The x and y co- ordinates of the objects present in all the three views are extracted by using the MATLAB code that we have developed. The colour of the object is extracted by comparing the pixel value of the original top view image with the 24 bit colour encoding scheme. Thus the colour of the object will be extracted. The size and shape of the object will be extracted by the length, width and area that we calculate from the x and y co-ordinate values.

The motion of the object is extracted by taking the two consecutive top view images and comparing their x and y co-ordinates.

VII. RESULT

Fig.5.Simulink model

The image which has been captured in the real-time scenario is taken from the file. This captured image is then resized to reduce the pixel size and thus the processing time gets reduced. The captured image which is in RGB is converted into gray scale or intensity because RGB uses 24-bits to represent 1 pixel which increases the complexity, so to reduce the complexity we need this conversion. The captured image is then subjected to thresholding to suppress the images. The threshold scaling factor is varied to identify the better object which has high intensity. thresholding by default takes the scaling factor of threshold as 0.9. Then the thresholded image is subjected to dilation which smoothens the image so that the object to be detected

(4)

_______________________________________________________________________________________________

can be identified easily in comparision with other surrounding objects. Further the image is complimented.

Video viewer is used to display the processed image. To view the step by step processing of the image we place the video viewer after each block. Verilog simulation of stereo vision system has been shown in figure and object length, object area and object width has been observed from simulation.

Fig.6. Final simulation

VIII. DISCUSSION

In this study, in addition to the distance computation, the image processing is also very substantial. The image processing includes gray, binary, dilation, erosion and background subtraction. The background subtraction is in order to target more clearly and make compute easier after a string of image processing program. Distance is calculated by epipolar geometry algorithm when the screen only has a target. Whether it is image processing or distance computation is very important program in

this study. In the future of visual system application, I hope it can be applied more real-time image processing system on the FPGA. Using FPGA to achieve image processing system not only miniaturizes the system, but also reduces electricity consumption.

IX.FPGA IMPLEMENTATION

Top level block diagram of stereo vision system has been shown in figure. It has taken the input of object co- ordinates in different views like top view, front view and side view. RTL schematic of the stereo vision system is show below figure. it shows the blocks has been used in design.

Fig.7. Stereo vision

Fig.8. RTL Schematic

(5)

_______________________________________________________________________________________________

X.CONCLUSION

After observing the results it has been obtained that with the help of stereo Vision we can obtain the features of objects. It will be helpful for extract the 3D information of object. With this advancement we are improving the efficiency of visual tracking for surveillance. As a result of this paper, it can be stated, that stereo vision is a good modality for three-dimensional information. Computer stereo vision is a broad application of science. In the previous application, using the PC with the programming language achieves this project, but the end result is a time limit. FPGA implementation of a stereo vision to derive 3D information system used in this study, it improves the PC carried out the image processing, but it cannot achieve real-time features.

FPGA not only accomplish real-time computing, but also save hardware space to do more function in this research. At this point the work can be developed into two methods. One is firmware design, which can be easy in image processing, but there is a requirement of more memory space. In this research, the memory space is insufficient for the DE2-70 board because two cameras and one LCD require large memory space. So, memory space would be not enough for this research. The other

is hardware design, which not only saves memory space, but also can do real-time computing.

REFERENCES

[1] Lee, J.S., Seo, C.W., Kim, E.S.: Implementation of opto-digital stereo object trackingsystem.

Optics Communications 200, 73–85 (2001) [2] Yau, W.-Y., Wang, H.: Fast relative depth

computation for an active stereo vision system.Real-Time Imaging 5(3), 189–202 (1999) [3] Candocia, F., Adjouadi, M.: A similarity measure

for stereo feature matching. IEEE Transaction on Image Processing 6, 1460–1464 (1997)

[4] Masrani, D.K., MacLean, W.: A Real-Time Large Disparity Range Stereo-System using FPGAs. In: Fourth IEEE International Conference on Computer Vision Systems (ICVS 2006), p. 13 (2006)

[5] Hamblen, J.O.: Using an FPGA-base SOC approach for senior design projects. In: IEEE International Conference on Microelectronic System Education (2003)

