APPLICATION DEVELOPMENT FOR PRODUCT RECOGNITION ON-SHELF WITH DEEP

General Introduction

Background of Problem

According to Ehrenthal (2012), retailers spend an average of 37% on inventory management and 7% of inventory from their expenses. Maintaining OSA or lowering OOS can help improve inventory management in retail stores and are therefore indicators of a store's performance. The main goal of this project is to develop an application that acts as a platform for retail stores to manage their stocks and shelf.

Problem Statement

Negligence of empty shelf

Three advanced deep learning models are compared in terms of recognizing products on the shelf. In addition, a notification system has been implemented to inform users that a shelf is running out of stock. A proper inventory management system with visual monitoring attached to it can help the retailer minimize their losses due to negligence from empty shelves in brick and mortar stores.

High human intervention

Project Objectives

Project Solution

This changed the status of the product to OOS, which then sent the notification to the retailer to immediately restock.

Project Approach

The next phase of the model is the planning phase, where the set of tasks is written out according to a document that lists all the requirements. The iteration process involves designing, implementing, and testing the system, which takes 1 to 3 weeks to produce a release version of the software. When there are no more open bugs in the system, the developer enters the retrospective phase, which marks the end of the iteration process.

Figure 1.2: PXP process phases (Marthasari, Suharso & Ardiansyah, 2018).

Project Scope

Target User

System Scope

Introduction

Object Detection and Recognition

Object detection will detect the occurrence of a visual object in classes (i.e. human, animal, product, car, building, etc.) from a photo or video frame. Penelope (2013) proposed that object detection will show the location of an object in an image, while object recognition will label them. Pawangfg (2020) argued that object detection will perform localization and classification, and object recognition achieves better image understanding through the use of algorithms.

Deep Transfer Learning

Hussain, Bird, and Faria (2018) focus their study on the transfer of CNN learning, especially classification. With the continuous improvement of studies and the efforts of current researchers in deep transfer learning, the possibility of creating an accurate model with a small data set is increasing. Deep transfer learning is expected to be useful in practical applications such as biometric passwords, facial recognition on any device in any location around the world, etc.

Deep Learning in Product Recognition

Deep learning is undoubtedly a popular solution for better product management and predicting customer preferences in unmanned stores.

Two-Stage Algorithm

CNN, R-CNN and Fast R-CNN

Faster R-CNN

One-Stage Algorithm

YOLO

According to Redmon and Farhadi (2018), Darknet19 has been replaced by Darknet53 as the feature mining backbone. YOLOv3 has been proven to be as accurate as SSD (Single Shot Detector) but runs three times faster. Darknet53 in YOLOv3 is replaced by CSPDarknet53 to speed up the algorithm with higher accuracy.

Evaluation Metrics

Accuracy
Precision
Recall
Intersection Over Union (IoU)
Precision-Recall (PR) Curve
Mean Average Precision (mAP)

We will try to classify all positive data points as positive, while not misclassifying negative data points. Thus, precision can reflect the reliability of the model for classifying data points as positive (Gad, 2020). On the other hand, a False Negative is a case where the ground truth is represented but the model failed to detect the object.

Figure 2.2: Formula of IoU (Rath, 2020).

System Development Methodology

Traditional Software Methodologies

There are some traditionally implemented software modeling methods in SDLC, namely the waterfall model, incremental model and spiral model. This is called the waterfall model as each of the phases is carried out one by one and downstream like a waterfall (Malleswari, Kumar, Sathvika & Kumar, 2018). This model is suitable for understaffed situations, but it can also cause scope creep in the SDLC (Subair, 2014).

Agile Software Methodology

PXP has been widely chosen as the SDLC model because it increases software quality while empowering autonomous developers. In a study by Asri et al. 2017), it has been proven that using PXP in software projects can help develop high-quality software in less time. As shown in the study by Agarwal and Umphress (2008), PXP is a more customized alternative to XP for a one-person team that involves modifying the 12 basic principles of XP.

Conclusion

PXP, on the other hand, allows autonomous developers to write their source code using their own coding standards.

Comparison table of the similar work and application studied

It requires a lot of product proposals and a lot of work intensity by updating the dataset of seasonal products. If there are many classes, CNNs using a hierarchical approach are better at classifying them.

SSIM: 85%

Conclusion
Introduction
Software Development Methodology

Requirements
Planning
Iteration Initialisation
Design
Implementation
System Testing
Retrospective

Development Tools and IDE .1 Visual Studio Code

ImageAI
PyTorch
Google Colab
MakeSense
GitHub
CUDA and cuDNN

Flowchart
Project Planning and Scheduling .1 Work Breakdown Structure
Application Development for Product Recognition On-Shelf 1.0 Planning
System Analysis and Design 2.1 Design Use Case Diagram
Introduction
Functional Requirements
Non-Functional Requirements

Proposed method is consumer friendly with only an RGB-D depth sensor camera placed on top of the shelf to detect the OSA. Fisheye camera from the top of the shelf enables product counting, but the angle is very specific. However, given the situation where a fisheye camera needs to be placed on top of the shelf.

This method is also limited to OOS condition detection, but not for product identification. 2019) used the fish-eye camera on top of the shelf to detect OOS condition and OSA. This chapter illustrated each stage of the application development methodology, or Personal Extreme Programming (PXP). In this project, functional requirements are collected from the lighting store employees as they are the stakeholders of the software.

The second iteration continues the work of the first iteration to perform object detection, classification and localization. In this project, it is used to support YOLOv3, which is one of the deep learning algorithms that will be benchmarked in this project. Being one of the most popular platforms for deep learning studies, PyTorch is widely used in data science and AI fields.

The flowchart for the training of the deep learning model in this project is illustrated below.

Figure 3.1: Flowchart of Training Deep Learning Model.

Usability

The quality of a software system can be judged by the definition of non-functional requirements.

Performance

Security

Compatibility

Reliability

Scalability

Supportability

Use Case Diagram
Use Case Description
Initial Prototype
Introduction
Comparison of Deep Learning Models

Data Collection
Data Preprocessing and Annotation
Data Splitting
Model Building and Fine Tuning
Model Deployment
Error Calculation and Plotting .1 YOLOv3 and YOLOv4 Results

Brief Description: This use case describes how users can create user profile in the application. The system verifies the user account against the data stored in the system before generating a new account. The system verifies the user account against the data stored in the system before registering the user in a session.

User – View real-time streaming of the video surveillance tool associated with the application. An initial project prototype has been developed to prove the feasibility of the object detection application. This chapter illustrates the comparison of the deep learning model to determine the best one for implementation, the project's system architecture and database design.

All markup files are saved in YOLO format as shown in the image below. The rest of the information in the line indicated the four points of the bounding box on the image. For example, the box of tissues in the lower right corner of the video will not be recognized as one of the products we trained on.

For example, in the result below, the Philips 10w bulb is recognized as the Philips 9w bulb because of the similar packaging.

Figure 4.1: Use Case Diagram of the Application.

Overall 0.886

Metrics of YOLOv3 Training

Overall 0.988

Metrics of YOLOv4 First Training

YOLOv5 Results

Metrics of YOLOv4 Second Training

Metrics of YOLOv5 Training

Conclusion

YOLOv4 gave a higher accuracy in terms of mAP but longer training time compared to YOLOv3. The result is promising, that is, YOLOv5 is able to recognize and distinguish between similar products in a frame.

Empty Shelf Detection Deep Learning Model

Data Collection

Test 1 – Dataset from SKU
Test 2 – Cogoo Lighting Empty Shelf Dataset

Data Preprocessing and Annotation
Data Splitting
Model Building and Fine Tuning
Model Deployment
Error Calculation and Plotting

The next step, this project continues with the implementation of YOLOv5 in detecting empty shelves. To address the low accuracy problem in the SKU-110K dataset, 92 still images of an empty shelf were taken from Cogoo Lighting Sdn. After the reflection problem has been solved, empty shelf photos are taken with Apple iPhone X with a phone stand about 0.7 meters away from the shelf.

Only one class is required for the training process in YOLOv5, that is, empty space on shelf marked as "empty". Among the 60 labeled images, 50 of them are selected as the training images and 10 of them as the validation images. Google Colab has again allocated NVIDIA® T4 tensor Core GPU for the training as shown in the figure below.

For YOLOv5, the training result can be tracked and visualized using WandB with the API code provided in my account. After the configuration file and training images are uploaded and mounted with JupyterNotebook, training can be started. It showed promising result in empty shelf detection even with object occlusion or distraction of black background above the product (which should not be marked as empty shelf).

As shown in the result below, it is able to detect the empty shelf even with the human body closed.

Figure 5.28: Comparison of related benchmark for SKU-110K.

Metrics of YOLOv5 Empty Shelf Training

Conclusion

In conclusion, the last experiment with 120 ephocs and 64 batchsize is giving the best result for training empty shelves. The best model is set as "best.pt" to be used in the empty shelf detection application. Therefore, it will be used in our application as the best deep learning model in the next stage.

System Architecture Design

System Database Design

Introduction

Implementation of Sign Up Function

Implementation of Login Function

Implementation of View Real-Time Video Function

Implementation of Stop Camera Function

Implementation of Detect OOS Situation Function

To establish that the video surveillance tool was connected to the application, the user received a message on Telegram notifying them that the camera was on. Timestamp is shown on the console at every intervals to keep the application on track. This function is also included to send notification when the shelf with empty space is detected.

Despite the console showing the location of detected empty space, the user will receive the message that reads "Empty shelf. An image of the empty shelf with a bounding box marking the empty space is also sent to the Telegram as evidence of EAST situation.

Figure 6.12: User received Telegram messages whenever the camera is connected to the app

Conclusion

Introduction

Unit Testing

Test Cases

Test Type ☑ Unit Test ☐ Integration Test ☐ System Test Test Case Summary To test whether a new account can be registered. Test Type ☑ Unit Test ☐ Integration Test ☐ System Test Test Case Summary To test whether a new account can be registered. Test Summary Test Steps Test Data Expected Result Actual Result Status (PASS/FAIL) Login to account with.

Test Type ☑ Unit Test ☐ Integration Test ☐ System Test Test Case Summary To test whether video can be viewed in real time. Prerequisites The user must have at least one control tool associated with the application. Test Summary Test Steps Test Data Expected Result Actual Result Status (PASS/FAIL) Watch the video in real time.

Integration Testing

Test Cases

Usability Testing

User Acceptance Test

Conclusions

Recommendations for Future Work

The model is specially trained for the monitoring tool from the front corner of the shelf. With a deep learning model that even includes variation training, it should be able to track and predict the product placed in the empty shelf slot. The system takes a little more time to load while detecting the empty area of the stocks.