SCALABALE AND DISTRIBUTED METHODS FOR LARGE-SCALE VISUAL COMPUTING

25 3.3 Comparison of the classification loss in existing tree-based approach and. the proposed Fast-BoW approach with respect to sequential BoW. 74 7.5 A comparison of the classification performance (%) for sequential SVM. B) Performance at different mixtures of K-Gaussian distributions.

Feature representation techniques

Modeling Techniques

Various machine learning techniques already exist in the modeling literature. They are generally classified into three categories, namely 1) supervised learning techniques (e.g. support vector machine, neural network, decision tree, Bayesian network, Fisher linear discriminant analysis, etc.) that require knowledge of the ground truth, 2) unsupervised learning technique (eg k-means, k-medoids, etc.) that do not require any ground truth, and 3) semi-supervised learning techniques that require ground truth for a few samples. Widespread applications of machine learning are subject to overcoming major difficulties such as the need for large datasets, the need for ground truth, moving the concept in a dynamic environment, and computational complexity.

Challenges in large-scale visual computing

Therefore, it is very important to minimize the loss caused by model approximation in order to maintain performance. First, transferring data from the camera to the central facility will increase latency.

Issues addressed in this thesis

Existing distributed learning algorithms suffer from the large flow of data in the communication network, thus making the entire process a time-consuming task. Using a central computing device is not a good choice for performing the vision task for two reasons.

Organization of the thesis

Some of the techniques for aggregate feature representation are bag-of-visual-words (BoW), Fischer vector (FV), vector of locally aggregated descriptors (VLAD), sequential models, dictionary learning and sparse coding, etc. Bag-of-visual-words ( BoW) approach is one of the widely used techniques for aggregate feature representation due to its simplicity.

Large-scale modeling

Support vector machines (SVM)

However, the size of the subsets increases with the number of iterations which contributes to the increased learning time. 14] proposed a MapReduce-based implementation of the same methodology in the cloud environment to improve the scalability and parallelism of the training phase by dividing the training data into smaller subsets as shown in Fig.

Figure 2.1: Schematic of cloud SVM architecture [14].

K-nearest neighbours (k-NNs)

Neural networks (NNs)

The distributed implementation of machine learning is one of the possible solutions to achieve learning on large-scale visual data. However, the next step towards the distributed implementation requires a trade-off between computational accuracy and communication overhead in the computing cluster.

System architecture for large-scale surveillance network

In the early phase, researchers randomly distributed tasks across multiple machines to increase performance. Ryu et al.[96] have presented an extensible video processing framework over apache Hadoop framework to be able to perform parallel video processing in a cloud environment.

Large-scale visual computing applications

Abnormal activity recognition

Spatio-temporal interest points have recently been investigated for abnormal activity recognition in surveillance videos [ 20 ]. However, the above methods use a bag-of-words approach that does not consider the geometric relationships between salient points.

Helmetless motorcyclists detection

The extraction of normal interactions from training videos is formulated as the problem of efficiently finding the regular geometric relationships of the nearby sparse spatiotemporal interest points (STIPs). Some of the reasons for the poor performance of existing approaches are: (i) the use of not so efficient hand-crafted features for object classification, (ii) the consideration of irrelevant objects towards the goal of detecting motorcycle riders without a helmet, and (iv) ) most of the existing methods are computationally complex and therefore not suitable to be used in real time.

Accident detection

Modeling vehicle interactions: Inspired by sociological concepts, these methods model the interaction between vehicles and detect accidents [75, 110]. However, a large number of training data and the use of rate change information alone limit the performance of these methods.

Observations from the review

Many of the modeling techniques presented in the review, such as kernel SVM, are difficult to parallelize or distribute and cannot take advantage of distributed computing systems while learning from large-scale datasets. The core issues listed in the distributed learning of SVM are the use of sequential minimal optimization (SMO) algorithm, loss of global support vectors due to random partitioning, increased training time due to non-separable and complex distribution of the class data, and increased prediction time due to a large number of support vectors.

Summary

Learning probability distribution of the clusters

Since the cluster learning from a large number of local feature descriptors is time-consuming, we first use a small sample to learn the initial clusters{µ1,···,µk}and later refine the centers by using the remaining points only once as suggested by Raghunathanet al . Let µij be the nearest center, in addition to the local feature descriptorfi, then it updates µij as.

Weight quantization and hashing

Input: Q: Set of local feature descriptors to create the vocabulary, m : Cardinality of Q, i.e. |Q|,k : Size of the vocabulary, i.e. |V|. Once θ∗ is obtained, the BoW histogram is generated using Algorithm 3.2, where for each local feature descriptor, it first merges and then sums its values in the scatter bucket�s� according to the similar parameter indices in the corresponding θ∗j, and then does the multiplication vector of real values s� and a quantized vector with integer values h, which can be represented with a smaller number of bits.

Hierarchical tree for hard BoW generation

Experiments and results

The results show that the proposed approach Fast-BoW significantly reduces the loss in the efficiency of the generated BoW compared to the hierarchical clustering based tree approach. Thus, the proposed Fast-BoW retains the efficiency of the BoW features compared to the existing Tree-BoW approach.

Figure 3.2: Effect of the vocabulary size on classiﬁcation accuracy and BoW generation time.

Summary

Detection of space-time interest points

The spatio-temporal points of interest [59] are salient points, which are the regions in f: R2 × R→ With significant eigenvalues�1,�2, and�3 of a spatiotemporal second-moment matrixµ, which is a 3- at- 3 matrix composed of first-order spatial and temporal derivatives averaged using a Gaussian weight functiong(.;σi2,τi2)with integration scalesσi2(spatial variance) andτi2(temporal variance). In this way, the STIP feature descriptions include the appearance information that HoG uses and motion information that HoF uses around the salient points.

Graph formulation of a video

Activity recognition

The random walk kernel [31] compares two graphs by counting the total random walks between them. The number of normal random walks of length is calculated using direct product graphs because a random walk on the direct product graph is equivalent to a simultaneous random walk in both graphs [31].

Experiments and results

The classification performance on the UCSDped2 dataset using the proposed approach is 90.13%, while the performance of the existing bag-of-words approach using STIP features is 75.82% on the same dataset. The classification performance on the UMN dataset using the proposed approach is 95.24%, while the performance of the existing bag of words approach using STIP features is 85.00% on the same dataset.

Figure 4.2: Illustration of normal and abnormal sample and corresponding graphs from all datasets.

Summary

Random key encoding
Initial population generation
Fitness evaluation
Selection operator
Reproduction operator
Elitism

In the next section, we propose a solver for equation (5.1) using the genetic algorithm to obtain the best solution. To evaluate the suitability of a solution α, the objective function J(α) in equation (5.1) is used as the suitability function.

Figure 5.1: Genetic-SVM operations. (A) The ﬂow diagram of the steps in genetic algo- algo-rithm

Distributed execution of Genetic-SVM

Distributed Genetic-SVM

Each VM employee generates the initial population, then performs the ﬁtness evaluation and sends the best solution to the main VM. The Master VM collects all the local solutions in the global poolAG, then it selects the global solution from the local solutions and then sends the best solution to all VMs.

Distributed Genetic-SVM for large dataset

Furthermore, each worker VM prepares the next generation, which consists of the global best solution, the local best solution (if not the winner worker VM), reproduced child solutions of previous generation solutions, and randomly generated solutions. Furthermore, the N-worker VMs transmit only the best solution during this process, passing totalN messages over the network after each generation.

Experiments and results

Finally, when the complete pipeline of the algorithm runs on different datasets, the Genetic-SVM algorithm performs approximately 10–20 times faster than LIBSVM as shown in Table 5.3. The proposed Genetic-SVM outperforms existing partitioning-based distributed SVM approaches in terms of classification accuracy and time taken to train an SVM model.

Table 5.1: Details of datasets used to evaluate the performance of Genetic-SVM Dataset Dimensions Training Size Test Size

Summary

Also, the statistical properties of the partitions{Dp}Pp=1 are approximately close to the statistical properties of the entire data setD. It is clear that the partitions formed using DPP are up to 103x closer to the mean and variance of the entire data set than the random partitions.

Fig. 6.1, illustrate an example of full dataset D and one of its partition D p obtained by using Algorithm 6.1 for the MNIST dataset

Distributed execution of DiP-SVM

Empirical evaluation

This demonstrates the suitability of DiP-SVM over existing clustering-based methods in [40, 132] for well-separated clusters. On the other hand, the LSVs produced by DiP-SVM as shown in Fig 6.3 (E), (F) and (G) are in close agreement.

Fig. 6.3 demonstrates the suitability of the DiP-SVM over the existing clustering-based methods in [40, 132] for overlapping clusters using Gaussian kernel

Experiments and results

It can be observed from the experiments that DiP-SVM consistently achieves better performance regardless of the distribution of. The results show that DiP-SVM training is approximately 9× faster than sequential SVM training for each dataset.

Figure 6.3: Comparison of DiP-SVM with the existing clustering based methods in [40] [132] for local and global solutions using non-linear kernel

Summary

To partition the dataset, it calculates the dominant eigenvector of the entire dataset using an iterative procedure. The direction of the maximum variance is given by the dominant eigenvector of datasetD.

Figure 6.7: Performance of classiﬁcation for mini-batch training of DiP-SVM.

Training and prediction in distributed environment

Prediction using proposed distributed SVM

Leaf node with class label: If the current node is a leaf node with a class label, it assigns the class label of the leaf node as the predicted class of the test point x and terminates the procedure. Leaf node with SVM model: If the current node is a leaf node with a trained SVM modelSM, then it is that SVM model to predict the class of the test pointx.

Figure 7.2: Block diagram of the Projection-SVM training over the cluster. Master node contains a sample tree model

Time complexity analysis

Input: x ∈ Rd: unlabeled data point, d: #dimensions, B: #branches (max) at each internal node, tree: trained tree model. The average case occurs when the decision tree creates the balanced partitions, and each class contains data points from both classes.

Experiments and results

Sketches of correctness

7.5(B) showed the classification performance comparisons for sequential SVM and the proposed method on synthetic datasets. The proposed approach performs similarly to sequential SVM for low values of K , but for high values of K , it achieved much better performance than sequential SVM.

Figure 7.3: Illustration of the working of proposed distributed SVM for well separable classes.

Comparison with state-of-the-art methods

Details of the various evaluation metrics used to evaluate the proposed approach are given in Table 7.2. The proposed distributed SVM approach reduces the loss in classification accuracy and the results are approximately equal to the results of sequential SVM.

Table 7.1: Performance of classiﬁcation (%) of proposed distributed SVM and comparison with LIBSVM, DC-SVM, CA-SVM and DT-SVM.

Summary

Compute nodes are the embedded devices located near the cameras at the location. End users can access alerts detected from the central alert database through a web interface.

Figure 8.1: The proposed edge computing-based framework for trafﬁc monitoring The entire system architecture consists of three parts, namely, compute node, central servers, and client interface

Real-time detection of motorcyclists without helmet

Detection of motorcyclist using CNN based object detector
Localization of the rider’s head
Classiﬁcation of head and helmet using CNN
Temporal consolidation of the alerts
Experiments and results

Also, there is an increase in the intensity of the activation values for the deeper layers. It can be observed from the scatter plots that the proposed model learns the distribution of both.

Figure 8.3: The sample images of the located motorcycle riders with and without a helmet of various style in different viewpoints.

Deep spatio-temporal representation for detection of road accident

Spatio-temporal volume generation
Stacked denoising autoencoder (SDAE)
Detection of intersection points in trajectories
Accident score generation
Experiments and results

The final performance (AUC) of the accident detection based on the reconstruction error alone is and 76.28 for appearance, motion, and joint representations, respectively. The final performance (AUC) of the accident detection based on the intermediate representation using one-class SVM is and 74.21% for appearance, motion, and joint representations, respectively.

Figure 8.10: The architecture of the proposed framework for accident detection. (A) Overview of the framework

Summary

We showed that the proposed methods were able to reduce the time complexity and loss in effectiveness of various feature representation and modeling techniques. Further, we proposed DiP-SVM, a distribution kernel-preserving SVM, which reduces the chance of missing significant global support vectors by preserving the first- and second-order statistics of the entire dataset in each of divisions.

Directions for Further Research

Krishna Mohan, “A method and system for real-time detection of traffic violations by two-wheeler riders,” All India Patent, Application no. Krishna Mohan, "DiP-SVM: Distribution Kernel Support Vector Machine for Big Data", IEEE Transactions on Big Data, vol.3, pp.79-90, January 2017.