38 Multimedia Search Reranking: A Literature Survey

For example, the best automatic video search achieves only about 10% of the mean average precision (MAP) in TRECVID 2008 [TRECVID 2013].1 The most popular BoW model (without reordering) for object. The key issue – the relevance between the search results and the query – remains a challenging and open research problem. The research in visual search repositioning has proceeded along four paradigms from the perspective of the knowledge that is used to extract relevant information: (1) self-reranking, which uses only the first search results; (2) example-based relocation, which utilizes user-supplied query examples; (3)crowd reranking, which explores the crowdsourcing knowledge available on the Internet (e.g., the many image and video search engines or websites, user-contributed online encyclopedias such as Wikipedia [2013];

Unlike metasearch, where the final search results are the combination of multiple lists, White et al. For example, a user is able to modify part of the search results (ie, delete and highlight operations) in the Yamamoto et al. In summary, most existing reordering approaches first extract some prior knowledge K (i.e., the dominant patterns that are relevant to the query) and then perform reordering based on three widely accepted assumptions: (1) visual documents with patterns dominant are expected to be ranked higher than others, (2) visual documents with similar visual appearances should be ranked close, and (3) documents ranked higher in the initial list are expected to be ranked relatively higher than other documents.

The challenges lie in that (1) the training set (i.e., the initial sorted list) is not labeled, and moreover, contains only a minority of "good" examples; (2) the modeling task is to classify the "training" set rather than to classify the new "testing" data; and (3) the model must deal with heterogeneous features because even the "good" visual documents in the training set have high visual variance. Here, typicality (ie, the visual representation of a visual document related to a query) is a higher-level definition than relevance. 2008a], optimal pairs are identified based on low-level visual features from the initial search results.

To reveal hidden connections between video stories, video reclassification is modeled as a random walk on a context graph, where nodes represent video documents connected by edges weighted by multimodal similarities (i.e., textual and visual similarities) [Hsu and Chang 2007; Wang et al. This is usually only done on a subset of the original ranked list (eg top 1000 results). In the video search system developed by NUS [Chua et al. and Yan et al. [2004]) according to some rule-based classifications or machine learning techniques based on automatically extracted entities.

2009b], the object model learned from image search results from every existing search engine [Fergus et al. 2008; Wang and Forsyth 2008], and suggested queries added from the Web image collection [Zha et al. CrowdReranking extracts relevant visual patterns from the image search results of multiple search engines, and not just from the initial search result [Liu et al.

The user's intent in the visual feature space is localized by a discriminative dimensionality reduction algorithm [Tian et al. The former is limited to displaying only two fixed threads – the query result and time threads, while the latter shows users all possible matching threads for each captured snapshot.

Fig. 2. Examples of multimedia search reranking in some commercial search engines.

DATASETS AND PERFORMANCE EVALUATION 1. Datasets for Multimedia Search Reranking

Performance Metrics

The purpose of visual reordering is to improve multimedia search performance; thus, it makes sense to use the existing performance metrics of information retrieval and multimedia search to evaluate the performance of visual recasting. In addition, there are also many other performance metrics that are more suitable for visual reordering. Precision and recall are the traditional measures of information retrieval [Baeza-Yates and Ribeiro-Neto 1999].

Success in the search task is measured using precision and recall as the central criteria to evaluate the performance of retrieval algorithms. Recall is defined as the fraction of retrieved relevant documents in the entire dataset, while precision is the fraction of retrieved documents in the returned subset. Another widely accepted performance measure is the average precision (AP) over a set of retrieved visual documents.

The NDCG measures the utility or gain of a ranked list of documents based on their positions on the ranked list. The winnings are accumulated from the top of the ranked list to the bottom with the winnings of each result discounted at lower ranks. Due to its dynamic audiovisual nature, a multimedia search system could be evaluated more effectively than in a static performance measurement.

The impact of interaction mechanisms and advanced visualization are taken into account at VideoOlympics: each participating search system forms a client that independently communicates with the evaluation server. The scoring server immediately processes the incoming results, prioritizes them using timestamp, compares them to the ground truth, and updates the list of results to the audience in real time. This requires that the response time (i.e. the time cost between when the user submits a query and when the system returns search results to the user) of the search system is as short as possible.

Due to the view-based nature of multimedia search results, a diverse presentation would enable users to quickly view and understand search results. Typicality is defined as the human perception of the degree of importance of the document in relation to a certain question or a category of object, which can be derived from two components: the similarity between this document and other received documents, and the dissimilarity to documents that are not in the ordered list. 2007], where is the ground truth feature score of the j-th document (labeled by human subjects), and is the j-highest feature score in the ground truth.

CONCLUSIONS AND FUTURE CHALLENGES

We need to analyze the ranking difficulty of the query as well as the performance of the initial ranked list (ie how noisy or satisfying the initial list is) before reranking. When a user performs a search task, she/he actually provides rich context to the search system (e.g. previous behavior in the same session, the web pages browsed if a search is triggered from browsing behavior, geographical location and time of user, social networks if the user remain logged in). Unless we have a common evaluation protocol, we cannot quantitatively compare the performance of the many reranking methods discussed in this article.

However, it can also be understood as one of the few attempts to solve one of the great challenges of multimedia information retrieval. How Flickr helps us understand the world: context and content in community-contributed media collections. Content-Based Multimedia Information Retrieval: State of the Art and Challenges. ACM Transactions on Multimedia Computing, Communications, and Applications 2, 1 (Feb.