RECOGNITION SYSTEM FOR ANALOGUE METER’S READING DETECTION

A project report submitted in partial fulfillment of the requirements for the award of the Bachelor of Engineering. Honours) Electrical and Electronic Engineering. In addition, the memory occupancy of the deep learning model is about 10 MB, which suits the embedded system with limited memory capacity.

General Introduction

Importance of the Study

Problem Statement

A divergence in electronic communication standards and protocols used by each meter makes it a challenge to integrate them into an already built factory control system. In addition, most factory workers are already used to dealing with analog meters and do not have knowledge of the use and calibration of some complex smart meters.

Aim and Objectives

In addition, different production processes require different types of smart meters from different suppliers. Thus, the requirement for additional training and investment of workers turns out to be an obstacle to progress towards IR 4.0 (Rymarczyk, 2020).

Project Overview

Scope and Limitation of the Study

Contribution of the Study

Outline of the Report

Introduction

Image Recognition Concepts

General Concept
Digital Image Capturing
Methods of Image Recognition
General Algorithms

It consists of a set of algorithms to remove unwanted noise and provide an improvement in image quality. Then localization is performed to detect and filter the contours or boundaries of the components in the image.

Figure 2.2: CMOS Sensor Block Diagram (University of Maine, n.d.) A CMOS image sensor mainly consists of colour filter, pixel cell array, 2D array driver, control logic, analogue-to-digital converter (ADC) and output data bus

Deep Learning

Introduction

In general, the contours of useful components will be in a connected or clustered orientation; thus, a connected component analysis is performed to identify a region of interest (ROI). The labels act as reference responses to the algorithm, which compares the prediction from the deep learning model against the labels.

Convolutional Neural Network (CNN)

The kernel is complex with the pixels of the input image, which returns a feature map containing the extracted features such as edges, color, brightness, etc. In this layer, each perceptron in one layer is perfectly connected to the perceptrons in both of its layers.

Figure 2.6: Architecture of CNN (MathWorks, n.d.)

Training Mechanism of CNN

Forward Propagation
Backward Propagation

Forward propagation refers to the process of generating a prediction based on the input parameters by passing through each layer in the neural network. In forward propagation, the input image will first be subjected to the convolution operation based on the defined kernel. The output of the convolution is the dot product between the kernel and the region of the input image covered by the kernel, as shown in Figure 2.8.

Backpropagation is a mechanism for fine-tuning the weights of a neural network based on error values. It is calculated by comparing the value of the predicted output from the model with the value from the tags or reference responses. After obtaining the error value, a gradient descent algorithm is implemented to calculate the amount of change in value that is applied to the current weights.

The gradient descent algorithm is known as an optimizer in machine learning, as it calculates optimal values for the weights that maximize the accuracy of the model's predictions (Johnson, 2022). Finally, after successive training steps on the deep learning model, the difference between the predicted value and the reference value is minimized and the model is able to make reliable predictions based on the input image.

Figure 2.8: Convolution Operation (IBM Cloud Education, 2020) Next, the pooling operation will be performed on the convoluted output to reduce its spatial size

Image Processing Techniques

OpenCV
Grayscaling
Image Filtering
Thresholding
Morphological Transformation
Edge Detection
Contour Detection
Orientation Correction
ROI Extraction

In simple thresholding, a threshold is defined with a constant value that is applied to each pixel. Morphological transformation is a set of image processing algorithms based on the shape of the components. Algorithms perform a comparison of a pixel in the input image with surrounding pixels to determine the value of the corresponding pixel in the output image (Sreedhar, 2012).

The erosion operation sets the pixel of the output image to zero (black) if any of its surrounding pixels below the core are zero. It sets the pixel of the output image to one (white) if any of the adjacent pixels within the core is one. The operation must be selected depending on the background of the image (white or black).

It is an image array of identical size to the original image and contains the region of the ROI in white (255) and the outer region in black (0) as shown in Figure 2.17. The bitwise AND operation returns the pixel value of the original image if the pixel of the mask is not 0, otherwise the output pixel will be 0 (black) as shown in Equation 2.10.

Figure 2.10 shows that each pixel in grayscale is represented by single digit which denotes the intensity

Character Recognition

Once the region of interest (ROI) has been identified, it must be extracted and cut to the appropriate size (depending on the application) for further processing, such as character recognition. Then, the AND operation will be performed on the original masked image. Thus, the ROI in the original image is preserved and the outer region is replaced with black pixels.

Moreover, it is written and compiled in C and C++, so it can be run on different platforms such as Linux, Windows and MacOS (Patel et al, 2012). Additionally, to run the Tesseract engine in Python, a wrapper library known as PyTesseract is required. It consists of the Tesseract class in the Python language and is able to handle various image types, for example jpg, png, etc. as supported by the Pillow library.

Summary

Introduction

Methodology

Software

Overview

Python
Google Colab
Spyder IDE
LabelImg
Firebase

Deep Learning Model

Dataset
Model Training

Extraction of ROI
Image Processing
Optical Character Recognition
Uploading Data to Cloud
Additional Libraries

Imutils
Numpy
Pytesseract

Each image that was labeled with the labelImg software was supplied with an XML file that contained information about the image and the coordinates of the labeling box, as shown in Figure 3.8. Then, all XML image files in the dataset were combined into one file by converting to CSV format. In addition, a transfer learning approach was implemented using a pre-trained object detection model based on the COCO dataset to save training time, reduce the size of the dataset, and improve the performance of the neural network by transferring the knowledge obtained from previous training to the current one. task (Torrey and Shavlik, n.d.).

Next, the positions of vertices of the ROI were obtained using the NumPy min and max functions. Image processing is intended to remove the noise and improve the quality of the extracted ROI. According to the documentation (Tesseract, n.d.), the Tesseract OCR engine prefers the characters in black and white, and the characters must be reduced to a resolution of at least 300 dpi.

Running the Tesseract engine in Python requires a library wrapper known as PyTesseract, which consists of the Tesseract class in Python language. Next was the segmentation mode which depended on the orientation of the characters (Tesseract, n.d.), in this case the psm 7 was used, meaning that the characters were in the format of single lines of text.

Hardware

Configuration

It is the main image manipulation library for Python that provides features such as opening, image enhancement, format conversion, and export. The Tesseract character recognition engine only supports the image in PIL format, so this library is needed to convert the image array from OpenCV format. Regular expression (re) is a module that allows the user to define certain rules or patterns for the string.

It is primarily used to search for, split, or replace specific characters in a string by referring to defined patterns. It enables running the Tesseract engine inside the Python IDE using a python codebase and simplifies a developer's job.

Raspberry Pi 4 Model B

Operating System

It is a Debian-based OS developed by the Raspberry Pi Foundation and highly optimized for ARM CPUs. The OS has a built-in package manager that consists of various applications such as web browser, document editor, etc. The OS needs to be downloaded to an SD card with Windows and then installed on the microcomputer.

Camera Module

Cost of Components

Planning and Milestones

Project Milestones

Project Schedule and Gantt Chart

M6 A11 Integrate the software with hardware 4 M6 A12 On-site testing and results collection 5 - A13 Documentation and final report writing 3.

Summary

Introduction

Software Simulation

Deep Learning Model
Image Processing
Character Recognition
Setup
Installation on Meter

Then the meter image was derived and drawn with a detection box using the viz_utils.visualize_boxes_and_labels_on_image_array function, as shown in Figure 4.3. It can be observed that the outline of the detection box has been extracted white (bits of 255), while the other area has been rendered black (bits of 0). The extracted ROI had undergone image processing to remove the noise and improve its quality.

First, the ROI was enlarged using the resize function of the OpenCV with the INTER_CUBIC interpolation method as shown in Figure 4.10. Finally, the ROI after processing was performed with optical character recognition using the Tesseract OCR engine. The numeric characters in the ROI are digitized and converted to string data type as shown in Figure 4.13.

Next, the microcomputer with the camera module was fixed in a plastic holder with a cutout function as shown in Figure 4.14. Additionally, the hardware was powered by a power bank to enable its portability as illustrated in Figure 4.15.

Figure 4.1: On-going Training Process of Deep Learning Model on Google Colab

Cloud and Real-time Database

Accuracy Evaluation

From the evaluation process, it is noted that the accuracy of the meter reading region detection by deep learning model is affected by the size and resolution of the meter images. When the meter surfaces are small, occupying less than 30% of the entire image, the deep learning model has a higher chance of detecting the false area. It is therefore recommended to place the camera close to and centered on the face of the meter.

In addition, the sharpness and quality of the captured image are affected by vibrations caused by wind or vehicle movement. When the vibration occurs at the camera module, the photo capture will be blurry and the performance and accuracy of the optical character recognition will be greatly reduced. To overcome this problem, it is suggested to install the camera module on a stable and durable platform.

Figure 4.18: Data Tabulated using Microsoft Excel

Summary

Conclusion

Available at: [Accessed 17 January 2022]. Available at: . Available at: [Accessed 20 July 2021].

Available at: [Accessed 7 July 2021]. Available at: [Accessed 10 January 2022]. Online] Available at: [Accessed 15 July 2021].

Available at: . Online] Available at: [Accessed 9 July 2021].