Halal Food Identification from Product Ingredients using Machine Learning

Introduction

Overview

Problem Statement

Research Objectives

Outline of Thesis

Background and Literature Review

ROC Curve

The true positive rate (TPR) is compared to the false positive rate (FPR) to generate an ROC curve. Similarly, the false positive rate is the proportion of negative observations that are incorrectly projected as positive (FP/(TN + FP)).

Confusion Matrix

The true positive rate (TP/(TP + FN)) is the proportion of all positive observations that were correctly expected to be positive. Confusion Matrix is a useful machine learning approach for calculating Recall, Precision, Accuracy and ROC curves.

Precision, Recall, Accuracy and F1 Score

Accuracy is vital as it determines the overall accuracy of the model's predictions, ensuring that permissible (Halal) and forbidden (Haram) items are correctly identified. The F1 score has a significant feature in that if either component (precision or recall) falls to zero, the score is zero. The importance of the F1 score in the system is its ability to strike a balance between correctly identifying permitted (Halal) and prohibited (Haram) items.

Literature Review

The system will read out the location of the ingredients in the image, which will later be used to extract the ingredient as text and train the machine learning models. Labeling of the ingredients is done using binary values where 1 represents Halal and 0 represents Haram ingredients. Then the model displays the expected output list of Halal and Haram ingredients in the packaged food.

In my research, I have implemented the SVM algorithm for food ingredient classification. The area under the ROC curve (AUC) is shown in the legend, which is a metric that summarizes the overall performance of the model. The proposed system considers only halal certification as a basis for determining the status of food.

Hanim, “Halal-Thayyib, Food Products and the Halal Industry: A Thematic Analysis of Qur'anic Verses,” Al-Amwal J. Hasyim, “The Politics of 'Halala': From Cultural to Structural Shariatization in Indonesia,” Aust.

Methodology

Data Acquisition

The data retrieved was food and beverage related and included product name, nutrition facts, and ingredients. Images are collected from various sources, for example, various online food stores, Google and also using mobile phone cameras. To classify Halal and Haram foods, a total of 1040 ingredients are used to train the machine learning models.

Halal & Haram ingredients are collected from Halal Islamic Food and Nutrition Council of America (IFANCA) [30] and Halal Foundation [31] certification provider.

YOLOv5 for Ingredients Detection

Data validation is then performed to detect the ingredients in new images of packaged foods.

Image-To-Text Extraction using OCR

Data Preprocessing

Feature Extraction

Machine Learning Models

Random Forest Model
K-Nearest Neighbors (KNN) Model
Naive Bayes Model
Support Vector Machine Model
Decision Trees Model
Multi-Layer Perceptrons Model
Rule-Based Model

Support vector machines SVM, or supervised learning models, are used in machine learning to analyze data for classification and regression. The most powerful and widely used tool for categorization and prediction is the decision tree. In my research, I implemented the decision tree algorithm for food ingredient classification.

The trained decision tree model has an accuracy of 93%, which indicates that the decision tree model performed well on the test data. The goal is to create algorithms and data structures that can be used to correctly classify Halal and Haram ingredients in food. The loop searches the list of ingredients from the CSV file and compares them all to the list of Haram ingredients using the fuzz.token_set_ratio function.

If any Haram ingredients are found in the ingredient list, it will return "Haram".

Five-Fold-Cross Validation

Fuzz.token set ratio is a function in the FuzzyWuzzy library that compares two strings and returns a score between 0 and 100 based on how similar the strings are. In terms of accuracy, the highest accuracy score is 98% for Random Forest and Decision Tree, followed by 98% for KNN. Multilayer Perceptron also has a high accuracy score of 97%, but Naive Baiyes and Support Vector Machine have lower accuracy values of 68%.

This implies that Random Forest, Decision Tree and KNN are more suitable for this data set than Naive Bayes and Support Vector Machine.

Prediction using New Data

The model achieved an overall accuracy of 97% on the test data, correctly classifying Halal and Haram ingredients in food. The mean F1 score is 0.83, which is the average of both classes' F1 scores, indicating that the model's performance on both classes was reasonably balanced. Rule-based Haram Haram Haram Haram Haram Haram Haram Haram Function. cauliflower, mozzarella cheese, salt, enzymes, Parmesan cheese, coconut flour;.

Rule Based Haram Haram Haram Haram Haram Haram Haram Haram Feature. flour blend, brown rice, sorghum and buckwheat, brown sugar semi-sweet chocolate chips, cane sugar, unsweetened chocolate, cocoa butter, concentrated fruit juices, pear, grapes, date paste, safflower oil, water, brown rice syrup, natural flavors, rice cake soda, salt, vanilla extract, xanthan gum, konjac gum, rosemary extract). Rule Based Haram Halal Haram Haram Haram Haram Haram Haram Feature. enriched corn flour corn flour, iron sulfate, niacin, thiamin mononitrate, riboflavin, folic acid, sunflower oil, cheddar cheese, milk, cheese cultures, salt, enzymes, whey, maltodextrin made from corn, sea salt, natural flavors, sour cream cultured cream, skim milk, torula, lactic acid, citric acid , yeast). Rule Based Haram Halal Haram Haram Halal Halal Haram Haram Feature. enriched unbleached flour, wheat flour, malted, barley, ascorbic acid, niacin, reduced iron, thiamin mononitrate, riboflavin, folic acid, acid, sugar, degerminated yellow corn flour, salt, rice baking soda, sodium acid pyrophosphate, soybean oil, powder; natural flavor, milk, eggs, soy and tree nut flour, honey) Islamic.

Rule-based Haram Halal Haram Haram Haram Haram Haram Haram Function. whole wheat, mixed corn starch, sugar, vitamin e) Islamic.

Results & Discussion

Random Forest Model Evaluation

The performance of the model is represented by the classification report, where the classification report shows the precision, recall and F1 score for the Halal and Haram classes. According to the test data, the model performed well with 91% accuracy, indicating a relatively high level of overall performance.

KNN Model Evaluation

Naive Bayes Model Evaluation

SVM Model Evaluation

Decision Trees Model Evaluation

MLP Model Evaluation

Fuzzy Inference Rule

Accuracy Comparison between Open and Close Test Data

After training, the models are evaluated on another test set and achieve high test accuracy, as shown in the 'Open Test Accuracy' column in the table. In addition, the models are tested on an unknown data set (unlabeled data), and their accuracy on this data set is indicated in the "Close Test Accuracy" column of the table. For the new dataset, the K-Nearest Neighbors and Support Vector Machine achieved 100% accuracy, while the Random Forest and Decision Tree models achieved 95% accuracy.

Overall, the findings show that the K-nearest Neighbors and Support Vector Machine models are overfitted.

ROC CURVE

The graph shows how successfully the classifier can distinguish between positive and negative samples at different levels. A perfect classifier will have an ROC curve crossing the upper left corner of the graph where TPR is 1 and FPR is 0. The graph shows the ROC curve for the binary classification models Random Forest, K-Nearest Neighbors, Naive Bayes, Support Vector Machine, Decision Tree, Multilayer Perceptron .

The ROC curve shows how effectively the model discriminates between positive and negative data at different threshold values. The dashed line indicates the classifiers, whereas the solid blue line shows the ROC curve of the Random Forest, K- Nearest Neighbors, Naive Bayes, Support Vector Machine, Decision Tree, Multi-Layer Perceptron model.

Confusion Matrix

Validation

The proposed system is designed to recognize text from component images, but may not be able to identify components in languages other than the system's trained languages. The system does not take into account other factors that may influence a Muslim's decision to consume a particular food item, such as cross contamination with Haram foods. As a result, the usefulness of the system is limited to the value of identifying halal food as the main factor for purchasing food.

Will develop image quality improvement techniques, such as image noise or image sharpening, to improve the accuracy of the system. The system will expand language support to recognize ingredients to better serve Muslim customers from different regions. Will develop the system to integrate with external databases, such as Halal certification bodies, to obtain updated and reliable Halal certification information.

Will develop the system to identify food status when the food label does not contain ingredients, regardless of whether they are omitted in a smaller percentage.

table shows the comparison between machine learning models decision & Islamic scholar decision on some random ten products ingredients

Conclusion & Future Works

Conclusion

This study provides a unique approach for detecting halal items using deep learning and machine learning techniques. The Yolo v5 algorithm is used for the proposed system to inspect images of packaged food products and identify product ingredients. The trained model is then tested with images so that it can recognize the ingredients from the food product image.

After that, the text from the component image is then recognized using optical character recognition. Text from ingredient images is extracted and revokes redundant data, these food ingredient texts are tested with different machine learning, neural network and rule based model to determine food status. In case of training with different machine learning techniques, artificial neural networks and fuzzy interference rules, a dataset with labeled food ingredients is used.

The results show that the proposed approach is effective and reliable in distinguishing between Halal and Haram items.

Limitations

The system's accuracy rate is 98%, which is high enough and can be an excellent tool for Muslim consumers to quickly and easily identify Halal food items, especially when traveling to new places or encountering unfamiliar products. The detection system is overfitted as the training data size is not enough and the model trains on the limited training data for several time periods. Therefore, deep learning has some disadvantages compared to traditional machine learning, as the need for a lot of data and computing resources to train and deploy, which is also time-consuming.

It does not take into account other elements that may be important to Muslim customers, such as ethical or health issues. Yuswan et al., “Hydroxyproline assay for initial detection of halal-critical food ingredients (gelatin and collagen),” Food Chem., vol. Ng et al., “Recent Advances in Halal Food Verification: Challenges and Strategies,” J. Calandra, “Halal Food: Structured Literature Review and Research Agenda,” Br. Detection of pork adulteration in beef for Halal verification using an optimized electronic nose system,” IEEE Access , vol.

Joice Lavandoski, Adriana Brambilla, “HALAL TOURISM the importance of halal food,” J. Available from https://github.com/heartexlabs/labelImg.