System Methodology
CHAPTER 4 System Design
4.1 License Plate Localization using TensorFlow Object Detection API
4.2.1 Data Collection
Bachelor of Computer Science (Honours)
Faculty of Information and Communication Technology (Kampar Campus), UTAR
29 4.2 License Plate Recognition using Convolutional Recurrent Neural Network (CRNN)
Bachelor of Computer Science (Honours)
Faculty of Information and Communication Technology (Kampar Campus), UTAR
30 4.2.2 Data Pre-processing
Input images are pre-processed to improve the quality for computational processing or visual perception.
i. Colour Augmentation and Contrast Enhancement
The multi-colour input image was first converted to be grayscale. The grayscale image then went through contrast enhancement to make the characters on the car plate stood out more clearly. This effectively removed the unnecessary information while keeping only the important features.
Figure 4.2.2.1 Raw car plate image (left), after grayscaling (middle), and after enhancing contrast (right)
ii. Fixed Alignment (x-axis)
Since the collected images may be taken from various angles (depends on the position of the camera), the tilted image must be rotated into a straighten position, because it might lower the accuracy of model in recognizing car plate characters.
Figure 4.2.2.2 Tilted image (left) and after straightening (right)
Bachelor of Computer Science (Honours)
Faculty of Information and Communication Technology (Kampar Campus), UTAR
31 iii. Region-based Cropping
The exact location of the car plate was cropped from the captured image, by only retaining the characters. The edges of the car plate were removed because the recognition process was based on identifying the pixel intensities, the white edges may be confused with the white characters.
Figure 4.2.2.3 Car plate image before (left) and after region-based cropping (right) iv. Spatial Resolution Standardization
Since the cropped car plate images may be varied in dimension, all the images will be resized to follow the resolution of the original dataset: 240px (width) x 120px (height). With constant input images, it was easier for the car plate recognition model to parallelize during training, thereby decreasing the computing time.
Figure 4.2.2.4 Resizing image to have standardize spatial resolution
iv. Adding Gaussian Noise
The training size was artificially expanded by adding noise in the known samples to create new samples. The addition of noise during training made the neural network model unlikely to memorize the training samples since they are regularly changing, resulting in a smaller network weights for faster learning, and improved robustness with better generalization.
Figure 4.2.2.5 Image before adding (left) and after adding noise (right)
Bachelor of Computer Science (Honours)
Faculty of Information and Communication Technology (Kampar Campus), UTAR
32 4.2.3 Create Lightning Memory-Mapped Database (LMDB) Dataset
In deep learning, when the batch size was large, it took longer time for Pytorch’s Dataloader to load a batch of data, and thus resulting in longer training time. In order to improve the performance of the import pipeline and reduce the model training time, TensorFlow had its own binary data format – TFRecord. However, there was no corresponding data format in Pytorch, in this case we decided to the LMDB to create our datasets.
LMDB was the database of choice when using large datasets, because of its simple files structure, which only included a data file and a lock file. Besides, the entire database was exposed in a memory map, and all data were returned directly from the mapped memory, giving much better I/O performance. Unlike HDF5 files which were always read entirely into memory, there was no need to worry about the LMDB datasets would exceed the memory capacity. In short, LMDB required no page caching layer of its own, had high I/O performance and was memory efficient.
First, all the license plate images were organized in a folder and renamed with the file naming format label_number.jpg. label represented the corresponding license plate number, while number was used to distinguish images with the same license plate number.
Figure 4.2.3.1 Samples of license plate images in the trainset folder
Next, the original training and testing license plate images were converted into LMDB data format. The codes below showed how the images and their corresponding groundtruth (actual license plate number) were read to create LMDB dataset.
Bachelor of Computer Science (Honours)
Faculty of Information and Communication Technology (Kampar Campus), UTAR
33 Figure 4.2.3.2 Code Snippets for creating LMDB dataset
The output below showed that both the LMDB training and testing dataset had been successfully created. To ensure that the LMDB dataset was labelled with the correct groundtruth, a license plate sample was randomly selected to check its labelled license plate number.
Figure 4.2.3.3 Create LMDB dataset with 30000 training and 3720 testing samples Lastly, a Pytorch’s DataLoader function was defined to load the LMDB dataset during training and testing process.
Figure 4.2.3.4 Code snippets for training and testing DataLoader
Bachelor of Computer Science (Honours)
Faculty of Information and Communication Technology (Kampar Campus), UTAR
34 4.2.4 Model Configuration for Transfer Learning
We leveraged on the CRNN architecture proposed by Cheang et al. [16] to perform transfer learning for license plate recognition. Since the existing CRNN license plate recognition model was trained with a dataset containing 80% of W plates, hence we adopted their pretrained weights to continue training all the CRNN layers with our custom dataset.
The diagram below showed the network architecture of our proposed license plate recognition model. The output channel of CNN layer was 26, meaning the features of the input license plate image will be extracted from 26 timesteps. Meanwhile, the output channel of RNN was 37, which was exactly the number of character classes to be classified (26 letters “A-Z”, 10 numbers “0-9”, and 1 blank label “–”).
Figure 4.2.4.1 Network architecture of CRNN license plate recognition model
Figure 4.2.4.2 Label sequence of license plate number in 26 timesteps
Bachelor of Computer Science (Honours)
Faculty of Information and Communication Technology (Kampar Campus), UTAR
35 In the proposed CRNN architecture, there were seven layers of convolutional layers, each following with a Rectified Linear Units (ReLU) function to perform element-wise activation.
There was a total of four MaxPooling layers to progressively reduce the dimension of convolutional feature maps to reduce overfitting. Batch Normalization was also applied throughout the architecture to accelerate the training process by normalizing the ReLU activations. Furthermore, in the recurrent layers, LSTM was adopted instead of the normal RNN to avoid the exploding gradient issue occurred in conventional RNN. Rather than using single layer of LSTM, two bi-directional LSTMs were stacked together to form a deep bidirectional LSTM to capture more contextual information within the feature sequence, so as to increase the accuracy of per-frame predictions.
Noted that there were only around 5.5 million trainable parameters in the CNN layers of our CRNN model architecture, which was only about 0.3% of the 2 billion parameters in the VGG network applied in Cheang’s CRNN model. The reason why the number of parameters varied so much was that our model abandoned the usage of fully connected layers at the end of the CNN layers that typically had a large number of parameters. It was an advantage to have reduction in the number of parameters because we can speed up the training process as well as shorten the prediction time, which was an important aspect in the real-world license plate recognition.
Proposed CRNN Parameter Size Cheang’s CRNN Parameter Size
Figure 4.2.4.33 Comparison of parameter size in the CNN layers between proposed and existing CRNN model
Bachelor of Computer Science (Honours)
Faculty of Information and Communication Technology (Kampar Campus), UTAR
36 Optimization technique was applied to find the best weights that minimized the loss function when mapping inputs to outputs. Instead of manually setting a learning rate in conventional momentum (SGD) method, we used ADADELTA to automatically calculate the per-dimension learning rates. Based on the graph below, we found that optimization using ADADELTA significantly converged faster than the momentum method.
Figure 4.2.4.4 Comparison of ADADELTA and SGD optimizer in loss convergence
Bachelor of Computer Science (Honours)
Faculty of Information and Communication Technology (Kampar Campus), UTAR
37 4.3 Catch - Overspeeding Detection
In order to determine whether if a car was overspeeding, we had to calculate the average speed based on the check-in and check-out timestamps of that particular car. The code snippets below showed how the car data was passed into the backend database. In this function, text represented the recognized license plate number, while cap_time represented the current time.
Both of these information will be recorded into a csv file to be retrieved by the matching function later. Meanwhile, image_np_with_detections was the image of the car taken when passing by the camera. This was to serve as evidence for issuing fine if the car was caught overspeeding.
In simple words, whenever a license plate was recognized, the license plate number together with the current time will be recorded in a csv file. Therefore, we will be creating two separate csv files to record the check-in and check-out details.
Figure 4.3.1 Code snippets for saving the recognized car plate with registered timestamp The code snippets below showed how the overspeeding detection in our system worked. First, we had 4 array lists loaded with different types of information retrieved from the csv files, namely check_in_plate, check_out_plate, check_in_time, check_out_time. Next, we will compare the license plate number in both check_in_plate and check_out_plate. If the same license plate numbers were matched, their respective check_in_time and check_out_time will be read to calculate the time difference. This time difference also indicated the duration taken to complete the predefined distance.
There were two ways to determine overspeeding: by time taken or average speed. Let’s took UTAR Kampar campus as an example. The total distance driving around the whole campus was about 3.3km, while the standard speed limit was 25km/h. If obeying the 25km/h speed limit, the minimum time taken to complete the whole journey 3.3km was 7min 55sec. Which
Bachelor of Computer Science (Honours)
Faculty of Information and Communication Technology (Kampar Campus), UTAR
38 means, if a car completed the journey in less than 7min 55sec, then the car will be flagged as overspeeding. Furthermore, if we took the time taken and distance to calculate its average speed, it was undoubtedly that the car was driving beyond 25km/h.
In our project, we will be using time taken to determine overspeeding, at the same time we will also output the average speed to tell how much faster the car was than the speed limit.
Figure 4.3.2 Code snippets for overspeeding detection
Bachelor of Computer Science (Honours)
Faculty of Information and Communication Technology (Kampar Campus), UTAR
39