Introduction
Overview
Problem Statement
Research Objectives
Outline of Thesis
Background and Literature Review
ROC Curve
The true positive rate (TPR) is compared to the false positive rate (FPR) to generate an ROC curve. Similarly, the false positive rate is the proportion of negative observations that are incorrectly projected as positive (FP/(TN + FP)).
Confusion Matrix
The true positive rate (TP/(TP + FN)) is the proportion of all positive observations that were correctly expected to be positive. Confusion Matrix is a useful machine learning approach for calculating Recall, Precision, Accuracy and ROC curves.
Precision, Recall, Accuracy and F1 Score
Accuracy is vital as it determines the overall accuracy of the model's predictions, ensuring that permissible (Halal) and forbidden (Haram) items are correctly identified. The F1 score has a significant feature in that if either component (precision or recall) falls to zero, the score is zero. The importance of the F1 score in the system is its ability to strike a balance between correctly identifying permitted (Halal) and prohibited (Haram) items.
Literature Review
The system will read out the location of the ingredients in the image, which will later be used to extract the ingredient as text and train the machine learning models. Labeling of the ingredients is done using binary values where 1 represents Halal and 0 represents Haram ingredients. Then the model displays the expected output list of Halal and Haram ingredients in the packaged food.
In my research, I have implemented the SVM algorithm for food ingredient classification. The area under the ROC curve (AUC) is shown in the legend, which is a metric that summarizes the overall performance of the model. The proposed system considers only halal certification as a basis for determining the status of food.
Hanim, “Halal-Thayyib, Food Products and the Halal Industry: A Thematic Analysis of Qur'anic Verses,” Al-Amwal J. Hasyim, “The Politics of 'Halala': From Cultural to Structural Shariatization in Indonesia,” Aust.
Methodology
Data Acquisition
The data retrieved was food and beverage related and included product name, nutrition facts, and ingredients. Images are collected from various sources, for example, various online food stores, Google and also using mobile phone cameras. To classify Halal and Haram foods, a total of 1040 ingredients are used to train the machine learning models.
Halal & Haram ingredients are collected from Halal Islamic Food and Nutrition Council of America (IFANCA) [30] and Halal Foundation [31] certification provider.
YOLOv5 for Ingredients Detection
Data validation is then performed to detect the ingredients in new images of packaged foods.
Image-To-Text Extraction using OCR
Data Preprocessing
- Feature Extraction
Machine Learning Models
- Random Forest Model
- K-Nearest Neighbors (KNN) Model
- Naive Bayes Model
- Support Vector Machine Model
- Decision Trees Model
- Multi-Layer Perceptrons Model
- Rule-Based Model
Support vector machines SVM, or supervised learning models, are used in machine learning to analyze data for classification and regression. The most powerful and widely used tool for categorization and prediction is the decision tree. In my research, I implemented the decision tree algorithm for food ingredient classification.
The trained decision tree model has an accuracy of 93%, which indicates that the decision tree model performed well on the test data. The goal is to create algorithms and data structures that can be used to correctly classify Halal and Haram ingredients in food. The loop searches the list of ingredients from the CSV file and compares them all to the list of Haram ingredients using the fuzz.token_set_ratio function.
If any Haram ingredients are found in the ingredient list, it will return "Haram".
Five-Fold-Cross Validation
Fuzz.token set ratio is a function in the FuzzyWuzzy library that compares two strings and returns a score between 0 and 100 based on how similar the strings are. In terms of accuracy, the highest accuracy score is 98% for Random Forest and Decision Tree, followed by 98% for KNN. Multilayer Perceptron also has a high accuracy score of 97%, but Naive Baiyes and Support Vector Machine have lower accuracy values of 68%.
This implies that Random Forest, Decision Tree and KNN are more suitable for this data set than Naive Bayes and Support Vector Machine.
Prediction using New Data
The model achieved an overall accuracy of 97% on the test data, correctly classifying Halal and Haram ingredients in food. The mean F1 score is 0.83, which is the average of both classes' F1 scores, indicating that the model's performance on both classes was reasonably balanced. Rule-based Haram Haram Haram Haram Haram Haram Haram Haram Function. cauliflower, mozzarella cheese, salt, enzymes, Parmesan cheese, coconut flour;.
Rule Based Haram Haram Haram Haram Haram Haram Haram Haram Feature. flour blend, brown rice, sorghum and buckwheat, brown sugar semi-sweet chocolate chips, cane sugar, unsweetened chocolate, cocoa butter, concentrated fruit juices, pear, grapes, date paste, safflower oil, water, brown rice syrup, natural flavors, rice cake soda, salt, vanilla extract, xanthan gum, konjac gum, rosemary extract). Rule Based Haram Halal Haram Haram Haram Haram Haram Haram Feature. enriched corn flour corn flour, iron sulfate, niacin, thiamin mononitrate, riboflavin, folic acid, sunflower oil, cheddar cheese, milk, cheese cultures, salt, enzymes, whey, maltodextrin made from corn, sea salt, natural flavors, sour cream cultured cream, skim milk, torula, lactic acid, citric acid , yeast). Rule Based Haram Halal Haram Haram Halal Halal Haram Haram Feature. enriched unbleached flour, wheat flour, malted, barley, ascorbic acid, niacin, reduced iron, thiamin mononitrate, riboflavin, folic acid, acid, sugar, degerminated yellow corn flour, salt, rice baking soda, sodium acid pyrophosphate, soybean oil, powder; natural flavor, milk, eggs, soy and tree nut flour, honey) Islamic.
Rule-based Haram Halal Haram Haram Haram Haram Haram Haram Function. whole wheat, mixed corn starch, sugar, vitamin e) Islamic.
Results & Discussion
Random Forest Model Evaluation
The performance of the model is represented by the classification report, where the classification report shows the precision, recall and F1 score for the Halal and Haram classes. According to the test data, the model performed well with 91% accuracy, indicating a relatively high level of overall performance.
KNN Model Evaluation
Naive Bayes Model Evaluation
SVM Model Evaluation
Decision Trees Model Evaluation
MLP Model Evaluation
Fuzzy Inference Rule
Accuracy Comparison between Open and Close Test Data
After training, the models are evaluated on another test set and achieve high test accuracy, as shown in the 'Open Test Accuracy' column in the table. In addition, the models are tested on an unknown data set (unlabeled data), and their accuracy on this data set is indicated in the "Close Test Accuracy" column of the table. For the new dataset, the K-Nearest Neighbors and Support Vector Machine achieved 100% accuracy, while the Random Forest and Decision Tree models achieved 95% accuracy.
Overall, the findings show that the K-nearest Neighbors and Support Vector Machine models are overfitted.
ROC CURVE
The graph shows how successfully the classifier can distinguish between positive and negative samples at different levels. A perfect classifier will have an ROC curve crossing the upper left corner of the graph where TPR is 1 and FPR is 0. The graph shows the ROC curve for the binary classification models Random Forest, K-Nearest Neighbors, Naive Bayes, Support Vector Machine, Decision Tree, Multilayer Perceptron .
The ROC curve shows how effectively the model discriminates between positive and negative data at different threshold values. The dashed line indicates the classifiers, whereas the solid blue line shows the ROC curve of the Random Forest, K- Nearest Neighbors, Naive Bayes, Support Vector Machine, Decision Tree, Multi-Layer Perceptron model.
Confusion Matrix
Validation
The proposed system is designed to recognize text from component images, but may not be able to identify components in languages other than the system's trained languages. The system does not take into account other factors that may influence a Muslim's decision to consume a particular food item, such as cross contamination with Haram foods. As a result, the usefulness of the system is limited to the value of identifying halal food as the main factor for purchasing food.
Will develop image quality improvement techniques, such as image noise or image sharpening, to improve the accuracy of the system. The system will expand language support to recognize ingredients to better serve Muslim customers from different regions. Will develop the system to integrate with external databases, such as Halal certification bodies, to obtain updated and reliable Halal certification information.
Will develop the system to identify food status when the food label does not contain ingredients, regardless of whether they are omitted in a smaller percentage.
Conclusion & Future Works
Conclusion
This study provides a unique approach for detecting halal items using deep learning and machine learning techniques. The Yolo v5 algorithm is used for the proposed system to inspect images of packaged food products and identify product ingredients. The trained model is then tested with images so that it can recognize the ingredients from the food product image.
After that, the text from the component image is then recognized using optical character recognition. Text from ingredient images is extracted and revokes redundant data, these food ingredient texts are tested with different machine learning, neural network and rule based model to determine food status. In case of training with different machine learning techniques, artificial neural networks and fuzzy interference rules, a dataset with labeled food ingredients is used.
The results show that the proposed approach is effective and reliable in distinguishing between Halal and Haram items.
Limitations
The system's accuracy rate is 98%, which is high enough and can be an excellent tool for Muslim consumers to quickly and easily identify Halal food items, especially when traveling to new places or encountering unfamiliar products. The detection system is overfitted as the training data size is not enough and the model trains on the limited training data for several time periods. Therefore, deep learning has some disadvantages compared to traditional machine learning, as the need for a lot of data and computing resources to train and deploy, which is also time-consuming.
It does not take into account other elements that may be important to Muslim customers, such as ethical or health issues. Yuswan et al., “Hydroxyproline assay for initial detection of halal-critical food ingredients (gelatin and collagen),” Food Chem., vol. Ng et al., “Recent Advances in Halal Food Verification: Challenges and Strategies,” J. Calandra, “Halal Food: Structured Literature Review and Research Agenda,” Br. Detection of pork adulteration in beef for Halal verification using an optimized electronic nose system,” IEEE Access , vol.
Joice Lavandoski, Adriana Brambilla, “HALAL TOURISM the importance of halal food,” J. Available from https://github.com/heartexlabs/labelImg.