Potential Research Avenues Blending
encountered? A related question is how the process of decision making proceeds? Is it an incremental process, whereby evidence from data queries accumulates slowly; or is it more like an insight-based process where an answer or solution emerges suddenly?
(Metcalf, 1986). Are monotonic reasoning processes at work when analysts discover new insights or do truth-maintenance systems slow down the emergence of new insights?
(Reichgelt, 1991).
Another question pertains to the type of information that is most influential in a decision.
For example, there is a body of research that addresses the situation when a global look at a data set provides a different perspective than a more detailed view of the data. This is known as Simpson’s Paradox. Given that analysts make decisions regarding when to drill deeper, in what situations or in what kinds of task settings does this bias exist?
(Curley & Browne, 2001; Lin, 2002; Spellman, Price & Logan, 2001). A related question asks whether there are aspects of an analyst’s decision process that are amenable to automation. Another question related to decision processes goes beyond the individual decision maker to settings where multiple analysts are working together on a single problem. A number of previous research streams suggest that group communication processes can either enhance or inhibit the KDD process (e.g., Choi & Kim, 1999; Lam
& Schaubroeck, 2000; see also related work by Weldon & Bellinger, 1997).
Theme 3: KDD Tool Use/Selection
Questions related to this theme revolve around the selection of analysis and visualization tools used to conduct the data query. What tools (analysis and visualization) do the best job of assisting the analyst (and other information foragers) in defining the usefulness of a data patch? Furthermore, are analysts predisposed toward KDD tools that yield useful information with minimal effort and energy expenditure? Do analysts naturally gravitate toward tools that facilitate optimal information foraging? Once within a data patch, how does the analyst make decisions about selecting appropriate visualization and presentation tools to convey the information contained within the patch? Also of interest is whether analysts modify their information foraging strategies and behaviors in response to KDD tools.
Theme 4: Individual Differences
Research questions within this theme focus on how analysts may differ in ways relevant to the other three themes. Of course, an obvious issue here is that of the wisdom/expertise of the data analyst. Do expert and novice analysts differ in their attention and decision processes as well as tool selection? Can appropriately structured tools reduce gaps between novice and expert in the quality of decisions they make in the KDD process?
Larger questions to be explored are the characteristics of wise KDD analysts and whether analyst wisdom is due solely to greater experience and familiarity with KDD processes and environments. If experience is a primary predictor of expert/novice differences, how does greater experience affect analyst interactions with data sets and other KDD processes?
A key question for inquiring organizations focuses on the predisposition of KDD analysts to use the different inquiring styles described by Churchman (1971). Kienholz (1999) describes the Inquiry Mode Questionnaire (InQ) that can be used to classify analysts as Synthesists (Hegelian inquirers), Idealists (Kantian inquirers), Analysts (Leibnizian inquirers), Realists (Lockean inquirers), or Pragmatists (Singerian inquirers).
The use of this questionnaire to assess expert/novice differences, information foraging strategies, and preferences for KDD tools could yield data with significant implications for inquiring organizations.
Undoubtedly, the above list of themes and questions ultimately falls short by not including other important topics. For example, many questions involve the interaction of the themes stated above. Analyst expertise, for instance, will most likely interact with attentional, decisional, and tool selection questions. In addition, there are issues that fail to fit cleanly into any of the above themes—like how variables that are external to the analysis focus (e.g., time pressure, stress, task ambiguity) affect analyst behavior (Driskell & Salas, 1991; Inzana et al., 1996). However, the themes presented above constitute a new and arguably better path for inquiring organizations that do not overlook the notion that data mining and other KDD processes are human-oriented endeavors.
Theme 5: Ethical and Privacy Issues
As personal information about individuals becomes increasingly available to industry, issues of privacy loom to the forefront in the minds of consumers, marketers, and researchers. Although there is some disagreement among these groups about the degree of threat associated with companies building data repositories that are replete with personal information, there is little dispute that the need for privacy protection is urgent (Estivill-Castro, Brankovic & Dowe, 1999). Even if one believes that the gathering institution owns information in a consumer database, there is frequently a substantial level of mistrust on the part of customers pertaining to what companies will do with the data.
By recentering the focus of KDD research to the analyst and the analyst’s interactions with data sets, researchers can begin to explore how analysts manage ethical concerns pertaining to the data with which they work. An analyst may not be in a position to decide what kinds of information about customers should be captured, stored, or internally disseminated within the analyst’s employing organization. The analyst may not have any say in the KDD tasks that are assigned. Indeed, the analyst is likely to be given an assignment and implicitly (or explicitly) charged with the task of “digging as deep” as possible to find a desired answer or to unearth new insights. Despite these constraints, expecting analysts to be governed by ethical considerations as they engage in KDD activities is consistent with Singerian inquiry that Churchman (1971) favors above all others.
A host of ethical and empirical questions are highlighted when attention is turned to privacy issues in KDD (e.g., Brankovic & Estivill-Castro, 1999; O’Leary, 1995; Wahlstrom
& Roddick, 2000). Much of this work supports the notion that analysts should be aware
of the potentially sensitive nature of the data they are analyzing, while also realizing that a primary constraint on their work is not to violate privacy. Understanding how analysts manage the delicate balance between looking for answers and protecting privacy is another important area of research for analyst-focused KDD researchers.