University of the Philippines Manila College of Arts and Sciences

The special problem entitled “Image Navigation and Manipulation in the Operating Room Using Hand Gesture Recognition,” prepared and submitted by Jaye Renzo L. Montejo in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, has been examined and is recommended for acceptance . Accepted and approved in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science.

Mathematics and Computing Unit Department of Physical Sciences Department of Physical Sciences and Mathematics. In the operating room, the surgeon uses computer-aided navigation systems to view medical images during surgery. However, since sterility must be maintained at all times to prevent disease transmission and spread of infection, the surgeon cannot use a mouse, keyboard or any other control device in cases where multiple images need to be reviewed or images need to be manipulated.

To provide a sterile and natural human-computer interaction, this paper presents a gesture-guided image navigation and manipulation application that uses hand gesture recognition and the computer vision capabilities of the Kinect sensor to allow the surgeon to perform navigation and manipulation on medical images.

Background of the Study

In 2010, Microsoft introduced a motion sensing device called the Xbox 360 Kinect, which is an add-on peripheral for the Xbox 360 video game console. It was originally intended to enhance gaming and entertainment capabilities of the Xbox 360, but since the release of the Kinect for Windows SDK, developers have begun building real-world Kinect applications, allowing the Kinect to enter other areas beyond gaming such as robotics [ 10], biometrics [ 11], communication [12]. With the help of the Kinect sensor, the implementation of gesture recognition and the development of gesture-controlled systems and applications will become easier.

Statement of the Problem

Consequently, the assistant is in charge of performing these actions on the computer while the surgeon instructs him on what to do. Such interactions can be cumbersome, error-prone, and can slow down the entire operation as miscommunication between surgeon and assistant can occur. We envisage an operating room setting where the surgeon can perform image navigation and manipulation on the computer while maintaining their sterility.

To allow the surgeon to interact directly with the computer without touching controller devices, we create a gesture-based image navigation and manipulation application designed for use by surgeons.

Objectives of the Study

The image(s) will be displayed on the screen and can be in .jpeg, .png, .bmp, or .dcm file format. It involves tracking the user's body and controlling its orientation through the use of the Kinect sensor. A waving hand gesture tells the application that the user is ready to perform gesture commands on the images.

The application assigns control to the user who performed a waving hand gesture, so it knows whose gesture to recognize if two or more users are present in the visible area.

Significance of the Project

In [2], it was observed that it took seven minutes and four people, including the surgeon, to perform a single click required to set up the navigation system. Since an assistant will no longer be needed, there will be a reduction in staff in the operating room, but the same amount of support will be provided.

Scope and Limitations

Assumptions

Review of Related Literature 7

11] proposed the use of hand gestures for user authentication with Kinect as a medium. To deal with such problems, they proposed to develop new libraries that will allow calibration and modification of the Kinect sensor. The Kinect has two main parts, the body and the base, as shown in Figure 3.

The green color tells the user that the computer has detected the Kinect and the drivers have been loaded correctly. The body and base portion of the Kinect are connected by the tilt motor. It is used to adjust the angle of the sensor so that the Kinect points to the user's desired visible area.

Through the tilt motor, the body of the Kinect can be moved up or down by 27 degrees.

The Kinect for Windows SDK

The IR Depth Sensor locates and reads the spot light created by the IR Emitter. By calculating the distance between the sensor and the point light, it will capture the depth information of the object where the point light was located. In addition to these components, Kinect also requires an external power adapter and a USB adapter to connect to a computer.

After installing the Kinect for Windows SDK, an additional toolkit called Kinect for Windows Developer Toolkit can also be installed. While the Kinect for Windows SDK will help the developer create Kinect applications, it does not provide fully built-in Application Programming Interfaces (APIs) exclusively for gesture recognition. This means that it is up to the developer to define and formulate his/her own approach to gesture recognition.

Skeletal Tracking

To be able to recognize the object as a human body, Kinect will compare each pixel of the raw depth data with that of the machine-learned data. After the raw depth data is matched with the machine-learned data, Kinect will now recognize the object as a human body. After the matching process, Kinect will now identify human body parts using decision tree learning.

Once the identification is done, it will now place the joints throughout the body. These joints will now serve as indicators for the Kinect sensor to track full body movement. The Kinect for Windows SDK supports the detection of up to six players, of which only two can be tracked in detail.

This means that it can detect the twenty joint points of each of the two users, but only the positions of the remaining four.

Approaches to Gesture Recognition

Algorithmic Gesture Recognition

To do this, we need to calculate the distance between the left and right links. For example, to determine if both hands are raised, we need to check if the y values of the left and right joints are greater than the main joint. Once the original position is validated, we need to check each frame to determine if the user is still performing the gesture.

The left wrist is below the left elbow joint and the right joint is below the right shoulder joint and above the right elbow joint. The right joint moves from right to left, while the positions of other joints are maintained. To validate this, we need to check if the distance between the right hand and the left elbow joint is decreasing in each frame.

After a certain number of frames, we check if the distance between the right hand and the left shoulder joint is smaller than that of the initial position.

Figure 5: Algorithmic Gesture Recognition [1]

Template-based Gesture Recognition

Design and Implementation 20

Also, every time a user performs a gesture, the number of points that can be obtained changes. And so, we need to bring the points to the same reference that the gesture templates use and make the number of points the same. Having the same reference, the same number of points and the same bounding box makes it easier to compare the input to each of the templates [21].

An example is shown in Figure 8. Furthermore, we need to convert the measure from distance to probability so that it is easier to decide which of the gesture patterns is closest to the input gesture. The total probability that the input gesture is approximately equal to gesture pattern j is given by Eq. 2) The value n j is the number of points in template j. P j,i is the i-th point of template j and P i is the i-th point of the input gesture.

Since we changed the input motion to have a fixed number of points during the normalization process, we are sure that the number of points for the input motion is equal to n j.

Class Diagram

The ToFixedPoints(), ScaleToReference() and CenterToOrigin methods are the main data normalization processes and these methods are applied to the input stroke before classification.

Flow Chart

When checking user stability, we need to determine the user's position for n consecutive frames. We then calculate the average position by adding all positions from the first to the (n − 1)th frame. By determining the distance between the average position and the position on the nth frame and checking whether it exceeds a certain threshold t, we can determine whether a user is stable or not.

The point is that if the distance between the average position and the position on the nth frame is too large, the user is probably moving too much and is therefore not yet ready to use the application. Here we check whether the position of the left and right shoulder joints is approximately the same. We must also check that the positions of the gesture points do not change.

After validating the initial conditions, we then normalize the sequence of input gesture points to prepare them for template matching.

Technical Architecture

Obvesti glavno aplikacijo, da zažene animacijo zasuka v levo /// . Konstruktor regije /// . regija Člani in lastnosti privateKinectSensor _sensor;. Meje vsebnika trenutno izbrane slike /// . ImageListViewModel)Application.Current.FindResource("viewModel2");. _skeleton.Joints[JointType.HandLeft].TrackingState != JointTrackingState.Tracked) ||. skeleton.Joints[JointType.HandRight].Position.Y < .. skeleton.Joints[JointType.HipRight].Position.Y && .. skeleton.Joints[JointType.HandLeft].Position.Y < .. skeleton.Joints[JointType .HipLeft].Position.Y) ||. skeleton.Joints[JointType.HandRight].Position.Y >= .. skeleton.Joints[JointType.HipRight].Position.Y && .. skeleton.Joints[JointType.HandLeft].Position.Y >= .. skeleton.Joints [JointType.HipLeft].Position.Y)).