25 3.3 Comparison of the classification loss in existing tree-based approach and. the proposed Fast-BoW approach with respect to sequential BoW. 74 7.5 A comparison of the classification performance (%) for sequential SVM. B) Performance at different mixtures of K-Gaussian distributions.
Feature representation techniques
Modeling Techniques
Various machine learning techniques already exist in the modeling literature. They are generally classified into three categories, namely 1) supervised learning techniques (e.g. support vector machine, neural network, decision tree, Bayesian network, Fisher linear discriminant analysis, etc.) that require knowledge of the ground truth, 2) unsupervised learning technique (eg k-means, k-medoids, etc.) that do not require any ground truth, and 3) semi-supervised learning techniques that require ground truth for a few samples. Widespread applications of machine learning are subject to overcoming major difficulties such as the need for large datasets, the need for ground truth, moving the concept in a dynamic environment, and computational complexity.
Challenges in large-scale visual computing
Therefore, it is very important to minimize the loss caused by model approximation in order to maintain performance. First, transferring data from the camera to the central facility will increase latency.
Issues addressed in this thesis
Existing distributed learning algorithms suffer from the large flow of data in the communication network, thus making the entire process a time-consuming task. Using a central computing device is not a good choice for performing the vision task for two reasons.
Organization of the thesis
Some of the techniques for aggregate feature representation are bag-of-visual-words (BoW), Fischer vector (FV), vector of locally aggregated descriptors (VLAD), sequential models, dictionary learning and sparse coding, etc. Bag-of-visual-words ( BoW) approach is one of the widely used techniques for aggregate feature representation due to its simplicity.
Large-scale modeling
Support vector machines (SVM)
However, the size of the subsets increases with the number of iterations which contributes to the increased learning time. 14] proposed a MapReduce-based implementation of the same methodology in the cloud environment to improve the scalability and parallelism of the training phase by dividing the training data into smaller subsets as shown in Fig.
K-nearest neighbours (k-NNs)
Neural networks (NNs)
The distributed implementation of machine learning is one of the possible solutions to achieve learning on large-scale visual data. However, the next step towards the distributed implementation requires a trade-off between computational accuracy and communication overhead in the computing cluster.
System architecture for large-scale surveillance network
In the early phase, researchers randomly distributed tasks across multiple machines to increase performance. Ryu et al.[96] have presented an extensible video processing framework over apache Hadoop framework to be able to perform parallel video processing in a cloud environment.
Large-scale visual computing applications
Abnormal activity recognition
Spatio-temporal interest points have recently been investigated for abnormal activity recognition in surveillance videos [ 20 ]. However, the above methods use a bag-of-words approach that does not consider the geometric relationships between salient points.
Helmetless motorcyclists detection
The extraction of normal interactions from training videos is formulated as the problem of efficiently finding the regular geometric relationships of the nearby sparse spatiotemporal interest points (STIPs). Some of the reasons for the poor performance of existing approaches are: (i) the use of not so efficient hand-crafted features for object classification, (ii) the consideration of irrelevant objects towards the goal of detecting motorcycle riders without a helmet, and (iv) ) most of the existing methods are computationally complex and therefore not suitable to be used in real time.
Accident detection
Modeling vehicle interactions: Inspired by sociological concepts, these methods model the interaction between vehicles and detect accidents [75, 110]. However, a large number of training data and the use of rate change information alone limit the performance of these methods.
Observations from the review
Many of the modeling techniques presented in the review, such as kernel SVM, are difficult to parallelize or distribute and cannot take advantage of distributed computing systems while learning from large-scale datasets. The core issues listed in the distributed learning of SVM are the use of sequential minimal optimization (SMO) algorithm, loss of global support vectors due to random partitioning, increased training time due to non-separable and complex distribution of the class data, and increased prediction time due to a large number of support vectors.
Summary
Learning probability distribution of the clusters
Since the cluster learning from a large number of local feature descriptors is time-consuming, we first use a small sample to learn the initial clusters{µ1,···,µk}and later refine the centers by using the remaining points only once as suggested by Raghunathanet al . Let µij be the nearest center, in addition to the local feature descriptorfi, then it updates µij as.
Weight quantization and hashing
Input: Q: Set of local feature descriptors to create the vocabulary, m : Cardinality of Q, i.e. |Q|,k : Size of the vocabulary, i.e. |V|. Once θ∗ is obtained, the BoW histogram is generated using Algorithm 3.2, where for each local feature descriptor, it first merges and then sums its values in the scatter bucket�s� according to the similar parameter indices in the corresponding θ∗j, and then does the multiplication vector of real values s� and a quantized vector with integer values h, which can be represented with a smaller number of bits.
Hierarchical tree for hard BoW generation
Experiments and results
The results show that the proposed approach Fast-BoW significantly reduces the loss in the efficiency of the generated BoW compared to the hierarchical clustering based tree approach. Thus, the proposed Fast-BoW retains the efficiency of the BoW features compared to the existing Tree-BoW approach.
Summary
Detection of space-time interest points
The spatio-temporal points of interest [59] are salient points, which are the regions in f: R2 × R→ With significant eigenvalues�1,�2, and�3 of a spatiotemporal second-moment matrixµ, which is a 3- at- 3 matrix composed of first-order spatial and temporal derivatives averaged using a Gaussian weight functiong(.;σi2,τi2)with integration scalesσi2(spatial variance) andτi2(temporal variance). In this way, the STIP feature descriptions include the appearance information that HoG uses and motion information that HoF uses around the salient points.
Graph formulation of a video
Activity recognition
The random walk kernel [31] compares two graphs by counting the total random walks between them. The number of normal random walks of length is calculated using direct product graphs because a random walk on the direct product graph is equivalent to a simultaneous random walk in both graphs [31].
Experiments and results
The classification performance on the UCSDped2 dataset using the proposed approach is 90.13%, while the performance of the existing bag-of-words approach using STIP features is 75.82% on the same dataset. The classification performance on the UMN dataset using the proposed approach is 95.24%, while the performance of the existing bag of words approach using STIP features is 85.00% on the same dataset.
Summary
- Random key encoding
- Initial population generation
- Fitness evaluation
- Selection operator
- Reproduction operator
- Elitism
In the next section, we propose a solver for equation (5.1) using the genetic algorithm to obtain the best solution. To evaluate the suitability of a solution α, the objective function J(α) in equation (5.1) is used as the suitability function.
Distributed execution of Genetic-SVM
Distributed Genetic-SVM
Each VM employee generates the initial population, then performs the fitness evaluation and sends the best solution to the main VM. The Master VM collects all the local solutions in the global poolAG, then it selects the global solution from the local solutions and then sends the best solution to all VMs.
Distributed Genetic-SVM for large dataset
Furthermore, each worker VM prepares the next generation, which consists of the global best solution, the local best solution (if not the winner worker VM), reproduced child solutions of previous generation solutions, and randomly generated solutions. Furthermore, the N-worker VMs transmit only the best solution during this process, passing totalN messages over the network after each generation.
Experiments and results
Finally, when the complete pipeline of the algorithm runs on different datasets, the Genetic-SVM algorithm performs approximately 10–20 times faster than LIBSVM as shown in Table 5.3. The proposed Genetic-SVM outperforms existing partitioning-based distributed SVM approaches in terms of classification accuracy and time taken to train an SVM model.
Summary
Also, the statistical properties of the partitions{Dp}Pp=1 are approximately close to the statistical properties of the entire data setD. It is clear that the partitions formed using DPP are up to 103x closer to the mean and variance of the entire data set than the random partitions.
Distributed execution of DiP-SVM
Empirical evaluation
This demonstrates the suitability of DiP-SVM over existing clustering-based methods in [40, 132] for well-separated clusters. On the other hand, the LSVs produced by DiP-SVM as shown in Fig 6.3 (E), (F) and (G) are in close agreement.
Experiments and results
It can be observed from the experiments that DiP-SVM consistently achieves better performance regardless of the distribution of. The results show that DiP-SVM training is approximately 9× faster than sequential SVM training for each dataset.
Summary
To partition the dataset, it calculates the dominant eigenvector of the entire dataset using an iterative procedure. The direction of the maximum variance is given by the dominant eigenvector of datasetD.
Training and prediction in distributed environment
Prediction using proposed distributed SVM
Leaf node with class label: If the current node is a leaf node with a class label, it assigns the class label of the leaf node as the predicted class of the test point x and terminates the procedure. Leaf node with SVM model: If the current node is a leaf node with a trained SVM modelSM, then it is that SVM model to predict the class of the test pointx.
Time complexity analysis
Input: x ∈ Rd: unlabeled data point, d: #dimensions, B: #branches (max) at each internal node, tree: trained tree model. The average case occurs when the decision tree creates the balanced partitions, and each class contains data points from both classes.
Experiments and results
Sketches of correctness
7.5(B) showed the classification performance comparisons for sequential SVM and the proposed method on synthetic datasets. The proposed approach performs similarly to sequential SVM for low values of K , but for high values of K , it achieved much better performance than sequential SVM.
Comparison with state-of-the-art methods
Details of the various evaluation metrics used to evaluate the proposed approach are given in Table 7.2. The proposed distributed SVM approach reduces the loss in classification accuracy and the results are approximately equal to the results of sequential SVM.
Summary
Compute nodes are the embedded devices located near the cameras at the location. End users can access alerts detected from the central alert database through a web interface.
Real-time detection of motorcyclists without helmet
- Detection of motorcyclist using CNN based object detector
- Localization of the rider’s head
- Classification of head and helmet using CNN
- Temporal consolidation of the alerts
- Experiments and results
Also, there is an increase in the intensity of the activation values for the deeper layers. It can be observed from the scatter plots that the proposed model learns the distribution of both.
Deep spatio-temporal representation for detection of road accident
- Spatio-temporal volume generation
- Stacked denoising autoencoder (SDAE)
- Detection of intersection points in trajectories
- Accident score generation
- Experiments and results
The final performance (AUC) of the accident detection based on the reconstruction error alone is and 76.28 for appearance, motion, and joint representations, respectively. The final performance (AUC) of the accident detection based on the intermediate representation using one-class SVM is and 74.21% for appearance, motion, and joint representations, respectively.
Summary
We showed that the proposed methods were able to reduce the time complexity and loss in effectiveness of various feature representation and modeling techniques. Further, we proposed DiP-SVM, a distribution kernel-preserving SVM, which reduces the chance of missing significant global support vectors by preserving the first- and second-order statistics of the entire dataset in each of divisions.
Directions for Further Research
Krishna Mohan, “A method and system for real-time detection of traffic violations by two-wheeler riders,” All India Patent, Application no. Krishna Mohan, "DiP-SVM: Distribution Kernel Support Vector Machine for Big Data", IEEE Transactions on Big Data, vol.3, pp.79-90, January 2017.