This project entitled "Bangladesh Local Flower Classification Using CNN" submitted by Ikhtiar Khan Sohan, ID and Jahid Hasan Shuvo, ID and Ruhul Amin, ID to the Department of Computer Science and Engineering, Daffodil International University, has been accepted as satisfactory for partial fulfillment of the requirements for the degree of B.Sc. Department of Computer Science and Engineering Faculty of Natural Sciences and Information Technology Påskelilje International University. We hereby declare that this project has been carried out by us under the supervision of Ms.
Majidur Rahman, Lecturer, Department of CSE, Faculty of Science and Information Technology, Daffodil International University. We also declare that neither this project nor any part of this project has been submitted elsewhere for the award of any degree or diploma. Touhid Bhuiyan, Professor and Head of CSE Department, for his kind assistance in completing our project, and also to other faculty members and staff of CSE Department of Daffodil International University.
We would like to thank all our coursemates at Daffodil International University who participated in this discussion while completing the coursework. Today's computer vision technology is powered by deep learning algorithms that make sense of images using a special form of neural network called a convolutional neural network (CNN). In deep learning, we can use the convolutional neural network (CNN) to get the latest accuracy in various classification problems, such as image info, CIFAR-100, CIFAR-10, MINIST dataset.
We retrained the last layer of the CNN architecture, MobileNet, Inception V3, VGG16, for classification approach, for solid architecture.
Introduction
- Introduction
- Motivation
- Rationale of the Study
- Research Questions
- Expected Outcome
- Layout of the Report
There are several parts to complete our task. For easy understanding we can divide our task into several following sections, as in section 3 we describe our proposed methodology included in an implementation of our model, data collection, data augmentation, data preprocessing, define a test set for evaluating our models and train a dataset to train our model. Technology is an integral part of every human being, because with technology we can make our everyday lives easier. When we fixate on an object or something, we can gain a clear sense of that object in a short time.
When an object's light passes through our retina with neurons to our encephalon, we can visually perceive and understand that object, but it takes a distance of time to give a result. From that conception, we can build a model that can degrade an image with an understanding of the training task. We can use artificial intelligence anywhere, such as internet, playground, home, office, factory, etc.
Daffodil International University 3 we can make a model that uses a human training zone management systems with great success. To make a model, we can use various machine learning models such as deep learning by using a convolutional neural network.
Background
- Introduction
- Related Works
- Research Summary
- Scope of the problem
- Challenges
The beginnings of machine learning were very weak algorithms, but the machine learning algorithm is evolving day by day. Machine learning is an artificial intelligence (AI) technology that provides faculty with structures to automatically learn and modify experiences without specifically programming them. Machine learning focuses on creating computer programs that learn on their own and can access and use data.
Initially, we need to collect new datasets from an individual sector to train our model. Defining number of epochs, training set, avoiding overfitting and choosing batch size are a big challenge for us.
Research Methodology
Introduction
Research Subject and Instrumentation
Data Collection Procedure
Data Processing
- Data Augmentation
- Data Preparation
Research Methodology Convolutional Layer
- Convolutional layer
- Rectified Linear Units (ReLU)
- Pooling layer
- Flatten Layer
- Fully connected layer
- Dropout layer
- Softmax layer
To avoid the cost of many computational resources and overfitting chances, we terminated the augmentation and minimized the dimension of the input images fixed to 150 X 150 pixels. There are different types of layers to identify an image, such as the input layer in the first layer of CNN, the read-only image pixel to pixel, the image can be gray scale category RGB. Then the filter is used for the second step to extract the input image feature in another way.
Length filters: The kernel acts as a filter, it works on the input data to extract unique features or pattern identifications that improve classification efficiency. Stride: The number of rows and columns that move pixels over the input matrix is determined by Stride. Padding: After the convolution layer, the dimensions of the output matrix are reduced, but we can keep the output dimension as an input matrix by using padding.
The same padding means that the same dimension as the input matrix is the output matrix. Add an extra block to the same padding and symmetrically allocate zero to the input matrix for the same dimension. The constant gradient of ReLUs results in faster learning, but at the same time it does not stimulate all neurons, as if the input is negative, it will translate to zero and it will not activate the neuron.
The pool layer is a non-linear layer that partitions the input dimension and reduces the number of parameters, controls overfitting and preserves the most important information. The Pooling Layer size that can remove unnecessary features and keep the required features can be specified. MaxPooling is often used in the CNN architecture to identify related features because Max Pooling provides a better result from the layers of Average Pooling and MinPooling.
For the final classification layer, this flattened structure allows the dense layer to use a single long continuous linear vector. For a CNN, the fully connected layer is the last layer that represents the feature vector for the input. In neural network functions, the Softmax layer is the last layer or the output layer and is used to determine the probability of multiple groups.
Test Set
It is possible to contrast FC layers with Multilayer Perceptron (MLP), where each neuron has maximal connections with all previous activations of the layer[11]. A neural network is forced by the dropout layer to learn more efficient features that are useful in combination with several different random subsets of other neurons [13]. This function calculates the probabilities for each target class and returns the values for the specified inputs to evaluate the target class[14].
Training the Model
Execution Requirements
Introduction
- Number of Parameters
- Performance Evaluation
Similarly, the pooling layer reduces the number of image shapes 75 to 37 and makes the activation size 87,616 without generating any parameters. We applied the proposed CNN architecture to the above-mentioned datasets to identify the best practices and achieve substantially better results with an average accuracy of 97 per class. In this picture we see the loss going down, and the validation accuracy increasing at this point, so we can see at that point that the training model is learning perfectly from collecting training data.
Result Discussion
Daffodil International University 22 The final test of our model gives 97% accuracy, so our model gives better test accuracy for unseen data.
Comparison
- MobileNet
- InceptionV3
- VGG16
The Daffodil International University 23 train set is the same for each train model used, but the consistency of validation and testing accuracy varies by a particular model summarized in Table 2[17]. From the table, we find that VGG16 provides 94 percent accuracy with little noise, and sometimes comparable validation accuracy and train accuracy rate. MobileNet's accuracy is 92 percent, but validation accuracy and train accuracy are very noisy.
But for decent train accuracy with some noise, our CNN model provides the highest validation accuracy. Finally, we noted that the accuracy of each model is satisfied under the same conditions, but under these conditions our CNN performs perfectly with a validation accuracy of 95 percent. MobileNet offers a validation accuracy of 92 percent, but enables a very high noise result in terms of training accuracy and validation accuracy.
But we can notice in Fig. 4.5.2.1 that the training accuracy is good and that the training accuracy and the validation accuracy are slowly growing. In Fig: 4.5.3.1 we see that there is a small difference between the training accuracy and the validation accuracy. We show in Fig. 4.5.4.1 that the training accuracy and validation accuracy are satisfactory because it allows a small amount of noise and prevents overfitting successfully.
Conclusion and Future Works
Conclusions
Future Work
APPENDIX
34; Performance Evaluation of Convolutional Neural Networks for Face Forgery Prevention." In 2019 International Joint Conference on Neural Networks (IJCNN), p.