A project report submitted in partial fulfillment of the requirements for the award of the Bachelor of Engineering. Honours) Electrical and Electronic Engineering. In addition, the memory occupancy of the deep learning model is about 10 MB, which suits the embedded system with limited memory capacity.
General Introduction
Importance of the Study
Problem Statement
A divergence in electronic communication standards and protocols used by each meter makes it a challenge to integrate them into an already built factory control system. In addition, most factory workers are already used to dealing with analog meters and do not have knowledge of the use and calibration of some complex smart meters.
Aim and Objectives
In addition, different production processes require different types of smart meters from different suppliers. Thus, the requirement for additional training and investment of workers turns out to be an obstacle to progress towards IR 4.0 (Rymarczyk, 2020).
Project Overview
Scope and Limitation of the Study
Contribution of the Study
Outline of the Report
Introduction
Image Recognition Concepts
- General Concept
- Digital Image Capturing
- Methods of Image Recognition
- General Algorithms
It consists of a set of algorithms to remove unwanted noise and provide an improvement in image quality. Then localization is performed to detect and filter the contours or boundaries of the components in the image.
Deep Learning
Introduction
In general, the contours of useful components will be in a connected or clustered orientation; thus, a connected component analysis is performed to identify a region of interest (ROI). The labels act as reference responses to the algorithm, which compares the prediction from the deep learning model against the labels.
Convolutional Neural Network (CNN)
The kernel is complex with the pixels of the input image, which returns a feature map containing the extracted features such as edges, color, brightness, etc. In this layer, each perceptron in one layer is perfectly connected to the perceptrons in both of its layers.
Training Mechanism of CNN
- Forward Propagation
- Backward Propagation
Forward propagation refers to the process of generating a prediction based on the input parameters by passing through each layer in the neural network. In forward propagation, the input image will first be subjected to the convolution operation based on the defined kernel. The output of the convolution is the dot product between the kernel and the region of the input image covered by the kernel, as shown in Figure 2.8.
Backpropagation is a mechanism for fine-tuning the weights of a neural network based on error values. It is calculated by comparing the value of the predicted output from the model with the value from the tags or reference responses. After obtaining the error value, a gradient descent algorithm is implemented to calculate the amount of change in value that is applied to the current weights.
The gradient descent algorithm is known as an optimizer in machine learning, as it calculates optimal values โโfor the weights that maximize the accuracy of the model's predictions (Johnson, 2022). Finally, after successive training steps on the deep learning model, the difference between the predicted value and the reference value is minimized and the model is able to make reliable predictions based on the input image.
Image Processing Techniques
- OpenCV
- Grayscaling
- Image Filtering
- Thresholding
- Morphological Transformation
- Edge Detection
- Contour Detection
- Orientation Correction
- ROI Extraction
In simple thresholding, a threshold is defined with a constant value that is applied to each pixel. Morphological transformation is a set of image processing algorithms based on the shape of the components. Algorithms perform a comparison of a pixel in the input image with surrounding pixels to determine the value of the corresponding pixel in the output image (Sreedhar, 2012).
The erosion operation sets the pixel of the output image to zero (black) if any of its surrounding pixels below the core are zero. It sets the pixel of the output image to one (white) if any of the adjacent pixels within the core is one. The operation must be selected depending on the background of the image (white or black).
It is an image array of identical size to the original image and contains the region of the ROI in white (255) and the outer region in black (0) as shown in Figure 2.17. The bitwise AND operation returns the pixel value of the original image if the pixel of the mask is not 0, otherwise the output pixel will be 0 (black) as shown in Equation 2.10.
Character Recognition
Once the region of interest (ROI) has been identified, it must be extracted and cut to the appropriate size (depending on the application) for further processing, such as character recognition. Then, the AND operation will be performed on the original masked image. Thus, the ROI in the original image is preserved and the outer region is replaced with black pixels.
Moreover, it is written and compiled in C and C++, so it can be run on different platforms such as Linux, Windows and MacOS (Patel et al, 2012). Additionally, to run the Tesseract engine in Python, a wrapper library known as PyTesseract is required. It consists of the Tesseract class in the Python language and is able to handle various image types, for example jpg, png, etc. as supported by the Pillow library.
Summary
Introduction
Methodology
Software
- Overview
- Python
- Google Colab
- Spyder IDE
- LabelImg
- Firebase
- Deep Learning Model
- Dataset
- Model Training
- Extraction of ROI
- Image Processing
- Optical Character Recognition
- Uploading Data to Cloud
- Additional Libraries
- Imutils
- Numpy
- Pytesseract
Each image that was labeled with the labelImg software was supplied with an XML file that contained information about the image and the coordinates of the labeling box, as shown in Figure 3.8. Then, all XML image files in the dataset were combined into one file by converting to CSV format. In addition, a transfer learning approach was implemented using a pre-trained object detection model based on the COCO dataset to save training time, reduce the size of the dataset, and improve the performance of the neural network by transferring the knowledge obtained from previous training to the current one. task (Torrey and Shavlik, n.d.).
Next, the positions of vertices of the ROI were obtained using the NumPy min and max functions. Image processing is intended to remove the noise and improve the quality of the extracted ROI. According to the documentation (Tesseract, n.d.), the Tesseract OCR engine prefers the characters in black and white, and the characters must be reduced to a resolution of at least 300 dpi.
Running the Tesseract engine in Python requires a library wrapper known as PyTesseract, which consists of the Tesseract class in Python language. Next was the segmentation mode which depended on the orientation of the characters (Tesseract, n.d.), in this case the psm 7 was used, meaning that the characters were in the format of single lines of text.
Hardware
Configuration
It is the main image manipulation library for Python that provides features such as opening, image enhancement, format conversion, and export. The Tesseract character recognition engine only supports the image in PIL format, so this library is needed to convert the image array from OpenCV format. Regular expression (re) is a module that allows the user to define certain rules or patterns for the string.
It is primarily used to search for, split, or replace specific characters in a string by referring to defined patterns. It enables running the Tesseract engine inside the Python IDE using a python codebase and simplifies a developer's job.
Raspberry Pi 4 Model B
- Operating System
It is a Debian-based OS developed by the Raspberry Pi Foundation and highly optimized for ARM CPUs. The OS has a built-in package manager that consists of various applications such as web browser, document editor, etc. The OS needs to be downloaded to an SD card with Windows and then installed on the microcomputer.
Camera Module
Cost of Components
Planning and Milestones
Project Milestones
Project Schedule and Gantt Chart
M6 A11 Integrate the software with hardware 4 M6 A12 On-site testing and results collection 5 - A13 Documentation and final report writing 3.
Summary
Introduction
Software Simulation
- Deep Learning Model
- Image Processing
- Character Recognition
- Setup
- Installation on Meter
Then the meter image was derived and drawn with a detection box using the viz_utils.visualize_boxes_and_labels_on_image_array function, as shown in Figure 4.3. It can be observed that the outline of the detection box has been extracted white (bits of 255), while the other area has been rendered black (bits of 0). The extracted ROI had undergone image processing to remove the noise and improve its quality.
First, the ROI was enlarged using the resize function of the OpenCV with the INTER_CUBIC interpolation method as shown in Figure 4.10. Finally, the ROI after processing was performed with optical character recognition using the Tesseract OCR engine. The numeric characters in the ROI are digitized and converted to string data type as shown in Figure 4.13.
Next, the microcomputer with the camera module was fixed in a plastic holder with a cutout function as shown in Figure 4.14. Additionally, the hardware was powered by a power bank to enable its portability as illustrated in Figure 4.15.
Cloud and Real-time Database
Accuracy Evaluation
From the evaluation process, it is noted that the accuracy of the meter reading region detection by deep learning model is affected by the size and resolution of the meter images. When the meter surfaces are small, occupying less than 30% of the entire image, the deep learning model has a higher chance of detecting the false area. It is therefore recommended to place the camera close to and centered on the face of the meter.
In addition, the sharpness and quality of the captured image are affected by vibrations caused by wind or vehicle movement. When the vibration occurs at the camera module, the photo capture will be blurry and the performance and accuracy of the optical character recognition will be greatly reduced. To overcome this problem, it is suggested to install the camera module on a stable and durable platform.
Summary
Conclusion
Available at:
Available at:
Available at: