Multimedia Big Data: Content Analysis and Retrieval
3.2 The MapReduce Framework and Multimedia Big Data
3.2.3 Multiple Multimedia Processing
Clearly as we have mentioned earlier, one aspect of big data is the variety of the data.
Yet the approaches that we have described above largely focus on image analysis.
Of course there is a large degree of variety with this image data especially where it is mined from the web; however it is just one media type, and generally only a type of processing has been applied. What if we wished to use more than one processing type? One possibility is to use the MapReduce framework to handle distinct multiple multimedia processing techniques simultaneously.
Chen et al. [11] designed a parallel processing mechanism for multiple multime- dia processing techniques based on the existing MapReduce framework for GPUs.
Programming interfaces for GPUs are generally designed for graphics applications, and ‘Mars’ is a MapReduce framework that hides the programming complexity of the GPU and allows programmers to use a more familiar MapReduce interface [12].
Targeting the MapReduce framework at GPUs rather than CPUs has advantages as there is a higher bandwidth memory access rate with CPU. This results in a more efficient performance for the MapReduce framework. More specifically [mrChecn]
focused on the deployment of more than one multimedia processing program on the GPU MapReduce framework and how best to divide work units to create an efficient parallelization. A middleware manager is implemented in the Mars framework, and this is responsible for managing all the service requests of the multimedia processing programs. In fact this middleware manager replaces the Mars Scheduler in the original Mars framework. Their results suggested that the average processing speed under the proposed mechanism is improved by 1.3 times over the previous framework. Continuing in the vein of using the MapReduce framework to carry out multiple tasks [13] explore the framework for large-scale multimedia data mining.
They present empirical studies on image classification, video event detection, and near-duplicate video retrieval. They used a five-node Hadoop cluster to demonstrate the efficiency of the proposed MapReduce framework for large-scale multimedia data mining applications. The MapReduce framework can read in input data in several different formats. However, many traditional Hadoop implementations are targeted as processing text data and thus may face difficulties in dealing with multimedia data types such as video and images [13]. To work around this problem, one ‘mapper’ was used to process an entire input file by customizing to two classes in the standard Hadoop implementation (InputFormat and RecordReader). This resulted in image and video files being presented in a format that could be processed directly.
Processing of the multimedia content for feature extraction is done by harnessing third-party toolkits. During the MapReduce phase for each multimedia file passed to a corresponding computer node in the form of byte stream as an entire load, each mapper receives the data and generates a new copy of the file in the local disk [13].
At this point, executable programs are invoked and run natively on the process the temporary file and return the extracted features. This process occurs in parallel with other nodes and does not interfere with them. Ultimately the extracted features are returned as part of key/value pair with the name of the file as the key and extracted features as the value. In the Reduce phase, all these key/value pairs are aggregated together to output a final image/video feature data set [13]. The multimedia was also processed for clustering (k-means clustering), generating a ‘bag of features’ of image or a video as an orderless collection of keypoint features, image classification, and video event detection and near duplicate video retrieval. Only experimental results from three kinds of applications are listed: image classification and video event detection and near-duplicate video retrieval. However, these are common multimedia processing tasks.
Extensive experimental results on several benchmark image/video data sets demonstrate that their approach can efficiently reduce the processing time for three applications examined. Three scenarios were used to show the scalability of the approach: (1) only one node was used, (2) a three-node cluster was used, and (3) five computers are used as a five-node cluster. For the image classification application;
processing time significantly decreased across each scenario as it did for the video event detection tasks. Overall, the performance was consistent with the baselines for
toolkits were run natively to extract results for the Reduce part of the processing.
By doing so, they could leverage already existing external tools and did not have to modify algorithms/write code to take a MapReduce form. This approach may prove useful to research teams who use third-party toolkits to process multimedia data.
But it still required modification of the standard MapReduce framework.
Also [9] found that they had to port the eCP algorithm to fit into the MapReduce framework. The original algorithm creates clusters but uses information on disk I/O, e.g., the cluster count is generally chosen such that the average cluster size will fit in on one disk, as part of the process. Thus it would appear that any algorithm that takes account of disk I/O when implemented in a serial fashion will require additional work to fit into the MapReduce framework. They also highlighted a number of problems with using the MapReduce framework ‘out of the box’. Multimedia files may be larger than the normal Hadoop block sizes. Clearly this too was a problem for [13], and this resulted in changing how files were read in their deployment.
So in short, although the MapReduce framework has proven useful, it appears that additional work is required to work with multimedia data.
Thus far we have outlined recent results using the MapReduce framework. In contrast and also potentially in addition, Apache Storm can be viewed as an in- memory parallelization framework. Mera et al. [14] examines empirically the choice of Storm versus the MapReduce framework for the task of extracting features for indexing. In order to test different workloads on both the MapReduce and the Storm frameworks, three workloads were created: (1) one with ten thousand images; (2) one with a hundred thousand images, and (3) one with all one million of images. Overall it was found that from the scalability point of view, the Storm approach has a better scalability in most cases [storm]. But the MapReduce framework scaled better than Storm when more resources and data are available.
With the caveat that the direct comparisons of the MapReduce framework with the Storm framework will require further empirical studies using larger data and more resources, [14] hypothesize that Storm scales better in small infrastructures, while the MapReduce framework scales better in larger infrastructures. In effect as algorithms and techniques from computer vision, image processing, and machine learning are often the basis for multimedia data mining, then general research from these area is also relevant. We explore an important technique for machine learning and multimedia data in the next section.