Study
6. Film Analysis and Statistics: A Field Report
Charles O’Brien
Hidalgo, Santiago (ed.), Technology and Film Scholarship. Experience, Study, Theory. Amsterdam: Amsterdam University Press, 2018
doi: 10.5117/9789089647542/ch06 Abstract
This chapter examines the use of statistics in film analysis in light of pos- sibilities and challenges stemming from online digital tools such as the Cinemetrics interface. First, digital resources for statistical analysis are briefly situated in the history of quantitative film analysis. Second, various points regarding the statistical study of film style are illustrated through examples drawn from research conducted by myself and others into the transition from silent to sound cinema, with a focus on issues raised in a recent debate on the fundamental question of how to measure the average shot length in films. Concluding the chapter is a discussion of the role of dialogue in sound films versus silent, which illustrates the advantages of em- ploying different methods of computing average shot length comparatively.
Keywords: cinemetrics, digital tools, statistical analysis, sound cinema, shot length
The Digital Moment in the Analysis of Film Style
The quantitative analysis of film style has a history that long precedes the availability of the digital resources now being employed for film study. The logistics of film production invite numerical assessment, and as a conse- quence, quantitative analysis, in one form or another, has had a long history in filmmaking practice. Early examples include the efforts of Charles Pathé, the industrialist behind the Pathé-Frères media empire, who ordered the compilation of detailed figures on the lengths of the company’s films.1 The need to tabulate film length became routine by 1910, when laws such as the Payne-Aldrich Tariff, passed in the United States in 1909, charged duties for
150 Charles O’Brien
imported films based on the length of the footage rather than the weight of the celluloid.2 At around the same time, as films became longer and came to exhibit a wider variety of techniques of editing and shot composition, more fine-grained statistical analyses were practiced. Filmmakers began counting shots (ordinarily referred to as ‘scenes’, ‘views’, or ‘tableaux’), frames, and words in intertitles, and computing the averages, as did critics. Among the latter was the ‘Rev. Dr. Stockton’ discussed in a 1912 issue of Moving Pictures World, whose tools for comparing the average lengths of shots in a sample of 25 contemporary films included “a stop watch, a pocket counting machine, and electric flash lamp and a note book.”3 Such rudimentary forms of average shot length computation have been practiced ever since.
A major leap forward occurred in the early 1980s when computer-based methods for average shot length computation were introduced into film scholarship by Barry Salt, whose path-breaking book Film Style and Technol- ogy appeared in 1983.4 The book drew upon Salt’s statistical examination of shot lengths and framings for thousands of films, which encompassed a broad swath of American and European cinema from the late nineteenth century up through time of the book’s publication.Salt’s book came at a time when film study was undergoing rapid expansion as an academic discipline in North America and Europe, and it remains an indispensable work of reference for investigations into the history of film style.5 But with respect to the use of statistics as a film-analytical method, relatively few film scholars followed Salt’s example, perhaps because film academics are typically trained in humanities disciplines where statistical research is rarely practiced.
In recent years initiatives connected with the ‘digital humanities’ have made statistical inquiry somewhat more prominent in film scholarship.
The essential development was the inauguration in 2006 of the cinemetrics website, designed by Gunars Civjans and Yuri Tsivian, which provides an easy-to-use interface for recording data, a database where the data is processed and displayed visually, and a forum where research results and theoretical questions are discussed and debated. Further developments include the Shot Logger technology designed by Jeremy Butler and the Lignes de temps system linked to the Pompidou Centre in Paris.6 The new tools and resources are enabling the generation of a wide variety of statistical calculations, including many that had been too difficult, time-consuming, and mathematically esoteric for film critics in the past to have produced.
The public accessibility of the online data is allowing for the cross-checking and confirmation of results, as well as the fashioning of statistics-based arguments by critics other than those who had counted the shots. The ready availability to anyone with internet access of a vast and ever-growing body
Film analysis and statistiCs: a Field repOrt 151
of research data has brought about a notable increase in the number of scholars involved in the statistical study of film style. Moreover, in addition to the film scholars who come to cinemetrics without statistical back- grounds, participants now also include trained statisticians. The expanded participation has enabled unprecedented discussion and debate concerning methodological issues. How exactly might the new tools and ever-expanding database of results be employed in film study? What are the limitations and possibilities of statistical methods for film studies research?
Strengths and Limitations
It must be acknowledged at the outset that the utility of statistics for film analysis is extremely limited in certain respects. Statistical computations such as average shot lengths reveal aspects of a film’s formal structure at a very high level of abstraction. In film analysis as in other contexts, the meaning of a statistical measurement is never self-evident but requires contextualization and interpretation, so that the statistical study of film style inevitably extends beyond the realm of the statistics per se and into questions of film-historical context. The methodological upshot for cinema study is that statistical tech- niques can at most supplement rather than replace conventional research practices. Statistical methods amount to an extra set of tools to be used in conjunction with other critical methods, including old-school practices of mindful film viewing and the examination of non-filmic documentation.
Insofar as non-statistical methods remain integral to the research project, then statistical techniques, in my understanding, can offer to film-historical study two distinct benefits. The first benefit is the increased precision that statistics bring to the study of film style. Virtually any film-style parameter can be related to editing, and editing lends itself to quantitative analysis, with the result that critics – for over one hundred years at least – have found irresistible the endeavor of counting shots and computing averages. Questions of norms are always in play in a stylistic analysis, and in this context statistics are invaluable. As music theorist Leonard Meyer observed, the statistical analysis of artistic style can be seen as inescapable because “all classification and all generalization about stylistic traits are based on some element of relative frequency.”7 Meyer was referring to the study of music but the specification of the aesthetic norms operative in particular times and places is just as necessary for cinema study. Even if the principal object of analysis is a single film, the analysis will involve some consideration of how that film conforms to the norms
152 Charles O’Brien
manifest in a larger body of work. Statistics allow for exceptional detail in the identification of film-stylistic norms.
Statistics can also offer to film analysis a more fundamental benefit.
Numerical data enable visual displays such as graphs, which have a way of casting new light on the object of study, bringing out important aspects of films, or bodies of films, that ordinarily go unnoticed. In drawing attention to otherwise invisible style patterns, statistical findings can stimulate the formation of new research questions.8 Here the major limitation of statisti- cal methods for the study of film style – their indifference to the viewer’s experience of a film – can become a powerful advantage. In revealing aspects of a film’s construction that escape the viewer’s awareness, statisti- cal findings can alter one’s sense of how a film is constructed, and this alone can provide a powerful stimulus for rethinking a film or body of films.
My Research as an Example
As a modest example of this sort of revelatory effect, I offer my experience conducting research into musical films of the early 1930s, which informs a book whose manuscript I am currently finishing. The project began with the compilation of data on three shot types I had devised for feature films made during 1927-1934: shots with synchronous speech, hereafter designated as
‘dialogue shots’; shots featuring singing performances, or ‘singing shots’; and, finally, ‘action shots’, which range from panoramic landscapes to people walking through doorways, trains arriving at stations, and inserts of clocks and signage, include essentially any shot not involving synchronous vocals.9
I settled on the three types after having counted shots for dozens of sound films and experimenting with different shot labels and criteria of category membership. The experiments led to the choice of the categories of action, dialogue, and singing as the principal metric for three reasons. First, these categories were consonant with the editing practices I was seeing in cinema in the late 1920s/early 1930s, when shots with synchronous vocals, in almost all cases, run relatively long. The decision to distinguish between action shots and vocal shots responded to the particularities of the films I am investigating.
Second, these shot categories imply particular production methods that can be expected to have entered the awareness of the filmmakers, which al- lows for the possibility that changes in style can be correlated with changes in film technology and technique. Most fundamentally, while action shots in sound movies were often shot ‘wild’, as in the silent period, and then the sound added in during the post-production phase, dialogue and singing
Film analysis and statistiCs: a Field repOrt 153
shots typically involved ‘direct sound’, or the concurrent recording of the voices and the image of the actors, which entailed particular aesthetic, technical, and economic constraints. The latter were most formidable for the singing shots, which in the early 1930s ordinarily involved the presence of an orchestra on the set. In sum, the categories of action, dialogue, and singing gave my analysis a forensic dimension, allowing not only for the description of film style but for the drawing of inferences regarding the style’s causes.
The third basis for choosing the three shot categories is that they turned out to be relatively easy to distinguish by myself and also by other scholars.
To ascertain this, I hired research assistants at various points to retrace the steps of my analysis by tabulating shots for the same films using the same categories. I wanted to ensure that other scholars following the same procedures could duplicate my results, and the shot categories I ended up choosing seemed to allow for this. Some of the research results are presented below in Figure 6.1, which displays the data for over 350 sound films of 1927-1934, all of whose shots were classified as one of three types:
action, dialogue, and singing. (The intertitles that occasionally surface in early sound movies were excluded from the analysis.)
The findings presented in Figure 6.1 allude to the reality behind the frequent complaint that the introduction of recorded sound the editing of motion pictures had become subordinated to the rhythm of the spoken dialogue. Dialogue shots in conversion-era films, Figure 6.1 shows, run, on average, more than double the length of the action shots. No wonder that critics at the time identified the handling of dialogue as the essential aesthetic problem with the talkies.
Fig. 6.1: asls for three shot types, based on an analysis of 355 sound films of 1928-193410
154 Charles O’Brien
Less expected is the situation for the singing shots. Of the three types, the singing shots are the most complex. They exhibit the greatest range and va- riety in length, and their measurements imply the highest margins of error.11 Singing shots show someone singing, and occur in song-and-dance numbers proper as well as in ordinary scenes, as when an actor briefly whistles a song melody. If the melody is recognizable enough to prompt the viewer’s memory of the tune – a major commercial consideration circa 1930 – then it was counted as a singing shot. I had separated out the singing shots as a distinct category because recorded songs were prevalent in cinema circa 1930, and I was interested in how they functioned. My assumption was that the singing shots, like the dialogue shots, last longer, on average, than do the action shots, and the statistics ended up confirming this hypothesis. But the statistics also pointed to something I had not expected to find, which is that shots of singers consistently endure even longer than do shots of speakers. As Figure 6.1 indicates, while the dialogue shots, on average, last roughly twice the length of the action shots, the singing shots last nearly triple that duration.
The excessive length for the singing shots confirmed my sense that this shot category merited special attention. It also raised a new research ques- tion, namely: how is the extreme length of the shots of singers in films of the early 1930s to be explained? Answering the question necessarily involved an investigation into questions of film-historical context, and hence the use of additional, non-statistical critical methods. The statistical results merely drew attention to the singing-shot phenomenon. Nonetheless, this flagging of the phenomenon was crucial because it ended up stimulating a novel avenue of investigation.
Average Shot Length
The findings regarding the extra-long singing shots entail one of the most basic of film-statistical practices: the computation of the average shot lengths for numerous individual films that then become factored into an analysis of the norms pertinent to a large corpus of films. Pertinent to the project of specifying norms, whether for a single film or for a large body of films, is a recent debate on how average shot lengths might be calculated. If one takes the trouble to count a film’s shots, then the first thing one wants to know is the average length. The customary practice has been to compute the arithmetic mean, commonly known as the Average Shot Length, or ASL. With respect to my sample of shot types, the ASL is displayed in Figure 6.1. The ASL is easy to produce: all that is required is the running time of the film and the
Film analysis and statistiCs: a Field repOrt 155
total number of shots. As was the case one hundred years ago, a clock, pen, and paper are the only tools needed. Other measures of central tendency, such as the median or MSL (Median Shot Length), which specifies the middle value in the data set, can require more effort to compute since separate measurements are required for each shot.12 Today, however, the need for extra labor has been diminished by the cinemetrics interface, which requires that the researcher spend no more time producing the MSL than the ASL. Both measurements--along with many others--are automatically calculated and displayed on the website’s data base when an entry is submitted.
The ready availability of additional measurements, along with the sta- tistical expertise now evident in the community of researchers, has made the difference between the ASL and MSL a focus of interest and contention.
The MSL is said to be a superior measure for shot length because it is less affected than the ASL by the presence in a film of extra-long shots. Virtually any film includes some shots whose duration far exceeds the mean for the film as a whole. A feature film with an ASL of ten seconds, say, will very likely include some shots that run several minutes, with the consequence that the distribution of shots, when graphed, will exhibit a lop-sided pattern, with most of the results clustering on one side of the chart. Typical is the strong positive skew in Figure 6.2’s shot-length histogram for The Broadway Melody (dir. Harry Beaumont), the great MGM show musical of 1929:
In Figure 6.2, the 477 shots comprising The Broadway Melody are gathered into separate bins, one for each five-second interval, so that shots running between zero and five seconds go into one bin, shots lasting between five
Fig. 6.2: shot-length distribution for The Broadway Melody
156 Charles O’Brien
and ten seconds go into the next, and so on. The bins on the x-axis are arranged so that they extend from the film’s briefest shots on the left to the extremely long ones on the right. Like other films, The Broadway Melody includes many short shots along with a small number of extra-long ones.
The first column reveals that 171 of the film’s shots (roughly one-third of the total) last from zero to five seconds in duration; the second column lists 135 shots running from five to ten seconds; and so on. Most of the shots in The Broadway Melody thus run less than the 12.4-second mean for the film overall.
The curve superimposed over the chart represents the so-called normal distribution, which refers to cases where the data is distributed around a central value with no bias either to the left or right. The normal distribution surfaces in a wide variety of types of data, ranging from the heights of people to the sizes of machine-made artifacts, blood pressure measure- ments, and scores on examinations. But it is not ordinarily found in shot lengths for films, whose distributions are typically skewed in the manner of Figure 6.2, with most of a film’s shot values falling below the ASL. An explicit case against the utility of the ASL has been made by Nick Redfern, who has proposed as an alternative measure the median shot length or MSL, which “should be used in cinemetric analyses in place of the mean [i.e. the ASL].”13 The debate has evolved over several years, and has yielded a string of articles and online commentaries, some of which are available on the cinemetrics website.14 An important intervention came from Mike Baxter, who, in a detailed article, refuted some of specific claims made against the ASL while also raising new questions regarding the validity of particular uses of statistics for film analysis.15 The arguments involve technical questions that go beyond my knowledge of statistical theory;
and in any case, they are too complex to summarize in this short chapter.
Nonetheless, certain of the key points can be illustrated with examples drawn from my research into similarities and differences between sound cinema and silent.
The Long Take as Outlier
The tendency for the ASL for a feature film to be higher than a randomly selected shot amounts to a universal phenomenon in my research. It holds for each of the 500 plus feature films that I have measured, all of whose ASLs are higher than the MSLs. A key factor behind the disparity, the data presented in Figure 6.1 suggests, is the frequent occurrence in early