Interactve IR n Web Search Engne Envronments
mixed strategy at the beginning, and selected a bottom-up strategy later for the specific fact-finding task. Simultaneously, they chose a top-down strategy for the exploratory task. The interactions among multiple variables make it difficult for researchers to uncover the relationships between tasks and searching behaviors.
Further research is needed to reveal direct relationships between tasks and search behaviors/strategies.
Usage.Pattern:.Patterns.of.Query.Formulation.and...
users were more sophisticated and able to specify the information needed or they used more complete sentences. Many users seemed to form a model that considered human-Web communication as human-human communication. Wang, Berry, and Yang (2003) analyzed longitudinal user queries submitted to an academic Web site during a 4-year period. They found that the patterns of user queries between the academic Web site and search engines such as Excite and AltaVista were compatible, for example, most of the queries are unique, short queries. The longitudinal data present similar patterns across time, especially the problem of null output. Thirty percent of queries consistently resulted in zero hits over the years. Lack of basic IR knowledge and misspelling contributed to a high number of zero hits.
Most of these studies focus on the identification of patterns of general query formula- tion; significantly fewer focus on patterns of query reformulation. Silverstein et al.
(1999) reported that users did not modify their queries much (average 2.02 queries per session). Adding terms (7.1%), deleting terms (3.1%), and modifying operators only (1.4%) consisted of 12% of the query reformulations, while complete modifi- cations of queries comprised of 35.2% of the query reformulations. That indicated that users had to refine or change their information need based on the results of their previous queries. Spink, Jansen, and Ozmultu (2001) examined the patterns of query reformulation by Excite users based on the data set of 1,369 queries from 191 user sessions. Users had limited use of query reformulations. They found that only one of the five users reformulated queries, and an average of 6.67 queries were entered for users who modified their queries. Users did not add or delete much in their reformulations. Changing a term is the most common query reformulation, because about 35% of queries that were modified had the same number of terms as the preceding query. About an equal number of reformulations either increased (52%) or decreased (48%) the terms. Spink et al.’s analysis also showed less subject change, as 73% of user sessions included one topic and 27% consisted of two topics.
These studies of query reformulations demonstrated limited query reformulations in the searching process, but they concentrated more on adding terms, deleting terms, and modifying operators.
Bruza and Dennis (1997) analyzed the logs of a prototype search engine, manually categorizing 1040 Web queries into 11 query transformation types. They found that users frequently repeated a query that they had already submitted. Other main categories of reformulation were term substitutions, additions, and deletions, in order of frequency. The results also revealed that users did not often split com- pound terms; make changes to spelling, punctuation, or grammatical case; or use derivative forms of words and abbreviations. Based on these findings, Bruza and Dennis developed a hyperindex to aid users in query term additions and deletions by presenting more specific terms that often contain contextual information. Lau and Horvitz (1999) analyzed a data set of 4,960 queries on the Excite search engine.
They hand-tagged the data and partitioned queries into classes representing different search actions while focusing on a refinement strategy for query sequences. Seven
Interactve IR n Web Search Engne Envronments
refinement classes were derived from the data: new, generalization, specialization, interruption, requests for additional results, duplicate queries, and blank queries.
Their analysis revealed that most actions are either new queries or requests for ad- ditional information. Relatively few users refined their searches by specialization, generalization, or reformulation.
Rieh and Xie (2001, 2006) examined query reformulation from a semantic level based on log data derived from Excite. They characterized the facets of query reformula- tion in Web searching and identified the patterns of multiple query reformulations in sequences. The data consist of 313 search sessions from two data sets randomly sampled over two time periods. Three facets of query reformulation as well as nine subfacets were derived from the data. Most query reformulations involve changes of content, which account for 80.3% of query reformulations. About 14.4% of the query reformulations are related to format alone, and only 2.8% of the modifica- tions are associated with resource reformulation. More important, the analysis of modification sequences generated eight distinct patterns: specified, generalized, parallel, building-block, dynamic, multitasking, recurrent, and format reformulation.
Some of the identified reformulation patterns—for example, specified reformulation, parallel reformulation, generalized reformulation, recurrent, and building-block re- formulation—are not necessarily new findings, as they have already been identified in previous studies (e.g., Bruza & Dennis, 1997; Lau & Horvitz, 1999). However, this study examined these patterns of query reformulation based on analysis of sequences of multiple queries rather than of just one query movement.
In addition, this study also identified new patterns reformulations, such as dynamic, multitasking, and so forth. Saracevic’s (1996, 1997) stratified model, especially his insightful comments about the fact that there is a direct interplay between the surface and deeper levels of interaction, was adapted as a theoretical framework for the study.
The deeper-level cognitive, affective, and situational aspects are employed on the surface level to specify and modify queries. Query formulation and reformulation demonstrate the existence of the interplay. The deeper-level aspects of interactions can change frequently, which can lead to interactions on surface level, for example, changes in queries or tactics. Rieh and Xie (2006) further developed a model of Web query reformulation and suggested interactive query reformulation tools.
Studies of patterns of query formulation and query reformulation demonstrated that users take the least effort approach in Web searching. Simultaneously, their query reformulation process is dynamic in a variety of situations. It is imperative that the design of Web search engines support users’ query formulation and reformulation process. Yang (2005) calls for the need to design support features that can shift the cognitive burden from users to systems. One major problem of the above log analysis is that researchers only examined the log data that provide an overview of usage pattern. Log analysis can only account for what users have done, but it cannot answer what directs user actions, and why.
Patterns.of.Multimedia.IR.
Multimedia retrieval is much more complicated compared to text retrieval because of the multimodal context. Because multimedia searching is a complicated interaction, it is important to understand how users interact with IR systems to obtain nontextual information. Goodrum, Spink, and their associates conducted a series of studies exploring image searching and related behaviors and strategies and concluded that image searching is different from textual information searching. When Goodrum and Spink (2001) examined image queries of a major Internet search service, they found that users input few queries (average 3.36 image queries per user) with few terms (average 3.74 per query) for their image searching. Unique terms represented a large number of the image queries. However, their query analysis cannot account for the reasons behind the data. Similar results were also found by the study con- ducted by Spink and Jansen (2006) on multimedia searching. They also identified the differences among search patterns for different types of collections. Users only entered one to two terms per image and audio query when submitting their queries to a metasearch engine. Audio searches had longer sessions with few queries per session. While the majority of users did not seek system Help, more users who looked for images and videos tried to find system Help.
Not only did users exhibit different behaviors in searching for information in different media, but users in different regions also showed different behaviors in multimedia Web searching. Ozmutlu, Spink, and Ozmutlu (2002) compared multimedia Web searching by one US (Excite) and another European (FAST) search engine. They found while users of Excite submitted longer and more complicated queries than FAST users, FAST users spent more time on queries and sessions—except audio queries—than Excite users. Goodrum, Bejune, and Siochi (2003) further identified image search patterns based on state transition analysis. Within the 198 patterns identified, there were two main characteristics of patterns of transitions. First, long strings with lengthier search times happened when users searched for images via text-only search tools that generated Web site surrogates instead of image surrogates.
Second, users inspected more image surrogates than Web site surrogates because relevance feedback needs to be judged based on the images themselves. The results of this study indicated that users did employ different types of tactics and search strategies in their image retrieval process.
It seems that research in multimedia retrieval, in particular how users interact with IR systems and multimedia information in their searching process, is still in the exploratory stage. More research is needed to solicit information about not only how but also why. In other words, further research needs to extend query analysis of user queries to diary analysis or think-aloud protocol analysis of the search process.
Interactve IR n Web Search Engne Envronments