To keep pace with the ever-growing importance of audiovisual media when it comes to reporting the world of politics to average citizens and political elites, audiovi- sual content analysis ought to be booming. Unfortunately, it is not. As is typical, it played a very minor part in our sample of content analysis studies reported in 2000 and 2001 (e.g., Dixon & Linz, 2000). Many political communication researchers acknowledge the importance of illustrating their studies of political content, such as election campaigns, with actual audiovisual examples. In fact, the journalPoliti- cal Communicationfeatured a special electronic issue titled “Multi-Media Politics”
on a CD in 1998. Most of the issue was devoted to election campaign studies that presented and analyzed television clips (Boynton & Jamieson, 1998). Similarly, text- books, including many American government texts, now include videos and CDs.
An outstanding example of this genre isVideo Rhetorics: Televised Advertising in American Politics, published in 1997 by John Nelson and G. Robert Boynton. The authors use a video to convey information that is beyond the reporting capacity of the printed text.
So what keeps political communication scholars from using audiovisual content analysis for research rather than treating television messages as if they were akin to radio, ignoring the visual content? The answers lie in the lingering myths that audiovisuals are too difficult to code, if they can be coded at all, because they do not constitute an analyzable language—an orderly way of using symbols to communicate specific thoughts and feelings to others (Hall, 1980).
The fact that the combination of words and pictures produces messages that differ from the meanings conveyed by the verbal or visual elements alone further complicates the analysis. Words commonly affect the meaning of pictures and pic- tures alter the meanings of words. The picture of a soldier sprawled on the grass carries different meaning when the words say that he is sleeping after a hard training exercise or that he is dead in the wake of a sniper’s fire. A comment that a politician looked and sounded relaxed may keep the audience from noticing the nervous tap- ping of his fingers and the slight tremor in his voice. To complicate matters further, verbal syntax consists of discrete units like words and sentences that carry distinct denotations. In contrast, picture syntax is fluid making it more difficult to determine the unit that conveys meaning. One picture can be the equivalent of many words and sentences (Salvaggio, 1980).
None of these obstacles to audiovisual content analysis refutes the fact that audiovisuals are a codable language and sophisticated as well as simple programs for the task abound (Hart, 2000; Van Leeuwen & Jewitt, 2001; Wang, Liu, & Huang, 2000). Moreover, coding audiovisuals is not exceptionally costly, as doubters often allege.
Contrary to the claim that audiovisuals do not carry shared meaning, news pro- ducers know that audiovisual language has an array of widely understood cues that can be coded. Without such shared understandings, it would be impossible to convey similar meanings to mass audiences. When television broadcasts use a lim- ited vocabulary of familiar audiovisual clich´es audiences can grasp the predictable meanings quickly. Speed is essential because television news segments are short and fleeting, leaving no time for reflection, careful search for alternative meanings, or close scrutiny of pictures for hidden clues. When audiences are asked about the audiovisual cues they use to judge political figures and situations, they mention an array of similar cues (Graber, 2001). Likewise, print reports covering televised events, such as presidential inaugurations, show that the writers derived shared meanings from the audiovisuals.
Audiovisual cues used in television stories are based on producers’ understand- ing of average Americans’ past visual experiences and viewing tastes. To show the destruction of a bridge in war-torn Afghanistan, for example, a producer might show the actual bridge—an icon—or vehicles piling up where the road breaks off—an index.Or the producer might depict a road sign that is a symbol for an impassable highway. All of these pictorials are well understood by American audiences; they present no unusual coding problems.
Content analysis of television news shows that audiovisual language is stereo- typical not only in terms of the types of images it uses but also in the overall framing.
For example, “. . . the ‘game’ frame—reporting politics primarily in strategic terms—
is predominant in mainstream news reporting of politics” (Lawrence, 2000, p. 93).
Audiences have learned to associate this structure with the idea that politics is a constant, more or less exciting battle between sworn enemies. The game frame and other frames can be maintained easily for a variety of political situations because it can be assembled from disparate image and audio bites. The fact that audiovisual language is stereotypical does not mean that the framing and pictures are identical.
Drought damage scenes from American farms, or sites in China or Argentina, un- doubtedly resemble each other. Still, each set of pictures provides well-understood information about similar happenings in diverse locations.
Coding politically relevant television images can be an overwhelming task when one contemplates recording the large number of constantly changing pictures with
thousands of potentially significant details recorded from different angles and dis- tances and embedded in diverse contexts in an average broadcast. How can ana- lysts master such complexity in ways that make audiovisual coding a practical and affordable research procedure? One answer is using the many available audiovi- sual coding programs. For example, televised legislative session can be analyzed with a program developed for use on video transcripts from the German parlia- ment (K¨ohler, Biatov, Larson, Eckes, & Eickler, 2001). The fact that parliamentary proceedings involve a limited universe of verbal and visual discourse makes it pos- sible to develop protocols for recognizing this particular audiovisual syntax and semantics.
Judging from the political communication literature, scholars in the subfield have largely shunned these automated methods in favor of manual content analysis.
Their approaches have ranged from merely identifying, counting, and recording every visible object in a scene, such as the number of people and objects, to detailed descriptions of people and objects and their apparent activities (Emmison & Smith, 2000). Researchers have used these descriptions to make and code inferences about the meanings conveyed by the images. A scene showing a police officer surrounded by angry laborers involved in a strike may portend violence, whereas a police officer coaching a sporting event for gang members may portend a more peaceful neighborhood. Similarly, some scholars have concentrated on the psychological implications of production and editing techniques. They have recorded close-ups and middle-range and long-range shots as well as camera angles and zoom shots, fade-outs and fade-ins, and various types of cuts, surmising that such variations have conveyed respect or disrespect for political leaders and have left audiences more or less involved in the scenes that they have witnessed (Kepplinger, 1991;
Lang, Geiger, Strickwerda, & Sumner, 1993).
Several scholars have skipped coding of individual verbal and visual images and tried instead to discern the general impression created by images. For example, they have searched for audiovisual content that conveyed rhetorical visions or political fantasies or ritual dramas (Hallin & Gitlin, 1993; Hershey, 1993; Nimmo &
Combs, 1989).
My own approach—gestalt coding—is a middle ground between descriptive cod- ing of sentences and pictures and recording of broad general impressions (Graber, 2001). It is based on what neuroscientists and psychologists have discovered about how humans extract meaning from audiovisual texts. Scientists have demonstrated that average people do not absorb the full gamut of information that reaches them.
Rather, people selectively extract salient kernels of information from the total flow to which they are exposed. Gestalt coding, therefore, records overall audiovisual impressions and their meanings, rather than analyzing discrete visual and verbal details. In the process, it captures what audiences actually absorb, leaving out what they ignore. It is thus oriented toward the meanings conveyed to average audience members by telecasts, rather than detecting every kernel of information that may be encapsulated in an audiovisual message, as is true for most automated content analysis programs.
Gestalt coding is not unduly impressionistic because it is grounded in tested knowledge about information processing. It is also quite similar to what is routinely done in verbal coding schemes, which rarely call for coding each word. Rather, they call for extracting common meanings from whole sentences and groups of sentences, considering the overall context of particular situations. Similarly, tele- vision viewers do not see and hear and interpret each word or picture separately.
Rather, like readers of printed symbols or listeners to the spoken word, they first discern the overall thrust of the message—its general gestalt. Then they select a limited number of suitable details that capture the particular thrust of the message in light of their stored knowledge. As Salvaggio (1980) put it, “. . . the spectator’s attention in any given scene is not necessarily focused on what is visually signified, but is concerned with placing the signified into a gestalt—the entire area suggested by what is seen” (p. 41).
Detecting the gestalt for coding usually depends heavily on a television story’s verbal lead-in or on pictures clearly identifying a specific person, location, or situ- ation. These cues tell coders the overall meaning and significance of the message elements that they are about to witness. For instance, a group of children receiving food packets could be verbally identified as participants in a school lunch program or orphans receiving food in a refugee camp. Coders, just like ordinary viewers, can then structure the audiovisuals into a meaningful story within the gestalt provided for them by the messages in question and the flow of prior messages.