• Tidak ada hasil yang ditemukan

Big Data in communication research Its contens dan discontants

N/A
N/A
Protected

Academic year: 2018

Membagikan "Big Data in communication research Its contens dan discontants"

Copied!
6
0
0

Teks penuh

(1)

A F T E R W O R D

Big Data in Communication Research: Its

Contents and Discontents

Malcolm R. Parks

Department of Communication, University of Washington, Seattle, WA, 98195, USA

doi:10.1111/jcom.12090

I had two goals in mind when I decided to dedicate a special issue of theJournal of Communicationto “Big Data.” One was to provide an outlet for the growing number of excellent Big Data studies on mass communication, digital technologies, political communication, health communication, and many other areas of interest to our dis-cipline. My focus was on empirical papers that made substantive contributions using new methods, rather than on explanations, endorsements, or critiques of the Big Data movement. The goal was to showcase the state of the art in recent research in compu-tational communication science.

My second goal was to provide a benchmark for research innovation. Big Data research is still in its infancy in communication. Relatively little of the work done in this early stage will stand the test of time, but all of it will likely be critical in the on going process of conceptual and methodological advance. The articles featured in this issue represent the best of what is currently being done. Their strengths will guide future work, but so, too, will their limitations.

What is Big Data?

There is no one definition of Big Data. Thought about in simple terms, Big Data involves datasets that are far larger than those traditionally examined in journals like this one. Yet there has always been considerable variation in the size of datasets, rang-ing from small experimental studies to large samples involvrang-ing census or pollrang-ing data. Size alone is therefore an insufficient descriptor. In more substantive terms, the Big Data movement has been associated with the analysis of large social networks (includ-ing online networks such as Twitter), automated data aggregation and min(includ-ing, web and mobile analytics, visualization of large datasets, sentiment analysis/opinion min-ing, machine learnmin-ing, natural language processmin-ing, and computer-assisted content analysis of very large datasets. Several of these methods are featured in this issue.

(2)

As others have observed, the Big Data movement often brings along an ideology or mythology that asserts a special, transformative value (e.g., boyd & Crawford, 2012). In order to evaluate these claims, it is necessary to distinguish the true promise of Big Data from some of the over promises of its most fervent proponents.

Separating promise from poses

Big Data methods and sources will become increasingly important because they offer data and insights that could not be obtained in other ways. These methods open research to work involving datasets of previously unimagined size. Indeed they often provide the only means of managing and analyzing digital datasets of increasing size and complexity. The entry by Baek, Park, and Cha, for instance, begins with a scan of approximately 1.7 billion tweets. Even after the most relevant data are selected by these and the other authors represented in this issue, the sample sizes typically remain in the hundreds of thousands. The ultimate value of Big Data, however, derives not from sheer size, but rather from two other factors.

First, because the Big Data movement is coupled with what is sometimes called “datafication,” that is, the creation of quantitative datasets from information that has not been viewed as data in the past (Mayer-Schönberger & Cukier, 2013), it leads to new research questions and new ways of thinking about existing questions. Among the many examples is the relatively large social network that Christakis and Fowler (2007, 2009) constructed from previously overlooked participant tracking informa-tion in the long-running Framingham heart study. In this issue, we might think of Giglietto and Selva’s creative analysis of messages tweeted by television viewers as an example of rarely examined discourse. We might also point to Hill and Shaw’s sub-stantive appropriation of administrative data in wikis.

Big Data can open new doors in a second way as well. Its computational tools enhance researchers’ ability to bring together multiple datasets—datasets of differ-ent times, from differdiffer-ent places, or gathered at differdiffer-ent times. This ability has always existed on a small scale, but new data management and analytic capabilities make it possible to conduct research of unprecedented complexity and scope. Several of the studies here have done just that. One of the more striking examples is Jungherr’s analysis combining Twitter content, separate content analyses of print and television coverage, and public opinion polling related to the 2009 federal elections in Germany. Together, datafication (i.e., the construction and sharing of multi faceted datasets) and the development of new analytic tools to work on them hold dramatic promise for our discipline.

(3)

Age of Big Data, that hypothesis testing and causal analysis will no longer be neces-sary to advance science (Mayer-Schönberger & Cukier, 2013). It is fair to say that such positions are intended to be provocative, often in service of the authors’ market inter-ests. A more realistic view might be to acknowledge the value of large-scale datasets, while at the same time recognizing that the choice of data (even Big Data) always reflects at least an implicit theoretic model and that the desire for explanation will continue to lead scientists toward causal analysis and experimentation (even though some experiments may now become very large).

A more subtle, but still misleading view of Big Data is that it presents a sharp break from the past or possibly even a new science. The term “data science” is par-ticularly unfortunate in this regard, both because of its redundancy, and because of the way it obscures the fact that Big Data’s value ultimately depends on disciplinary and interdisciplinary utility. Kuhn’s (1962) observation that substantive advances and methodological advances are more often intertwined than independent is no less true today than it was 50 years ago. This suggests that the impact of “data science” spe-cialists will depend on their ability to create value for those engaged with substantive disciplinary and interdisciplinary issues.

Big Data is not so much a break from the past as simply the latest in a more or less steady flow of methodological advances that have transformed the social sciences over the past 100 years. These include the codification of experimental design, the development of systematic sampling and surveys, the advent of multivariate statisti-cal analysis, the development of searchable compilations of media content, and video recording, to name just a few. We might also keep in mind that perceptions of bigness are themselves relative and historically bound. Several of the innovations mentioned above were the big data revolutions of their day.

Making the most of Big Data

Placing the Big Data movement in disciplinary and historical context enables us to attend to the issues that must be addressed if progress is to be made. Four issues would benefit from greater attention in my view.

Greater attention to questions of theoretic and social importance

(4)

We selected manuscripts for this issue with this third stage in mind. Although the chosen studies vary, each clearly grapples with an issue of interest within our research community. Studies by Jungherr, by Neuman and colleagues, and by Vargo and colleagues bring new approaches to understanding central questions regarding the nature and timing of influence between online social media and more traditional media. Colleoni and colleagues examine the theoretically important question of whether the structure interaction on Twitter brings users into contact with diverse perspectives or merely creates an “echo chamber” of likeminded voices. Emery and her colleagues open a new window for considering the theoretic and socially important issue of how public health campaigns work.

Advancing toward this higher stage will inevitably bring changes in patterns of graduate education and collaboration. Just as media and communication researchers in the 1970s sought training in multivariate analysis from those outside the disci-pline, we now reach out to those with the computational skills. But we need not go with hats in hand. It is clear that we have much to offer in terms of substance, sub-stance often lacking in the demonstration projects so often found in computationally oriented work. Our contribution becomes even more critical when research sponsors begin to demand that the makers of new tools demonstrate their societal value.

Greater concern for validity of measurement

In many of the submissions we received, researchers selected the large-scale indicators they could and were then left in the position of trying to attribute broader concep-tual meaning or importance to operational indicators of convenience rather than of choice. Even more difficult problems arise when a given operational indicator appears to be valid, but is too limited to capture the full richness of the concept it presumably measures.

Progress depends as well on providing stronger evidence to support the valid-ity of automated coding systems, machine learning algorithms, sentiment analysis, and the other new tools rapidly entering the research sphere. The paper by Emery and colleagues offers a good example of what is necessary to validate machine-coding procedures. Other papers, including many of those we turned away, either relied on coding validation procedures that were not tailored to the specific research situation or the authors simply assumed that previous, often very limited, validation efforts were sufficient. Here we must guard against the error of equating very detailed tech-nical descriptions of procedures with evidence of validity. Very detailed procedures and algorithms are not necessarily any more valid than more straightforward ones. Indeed, because more assumptions are made, there is more to go wrong.

Greater attention to sampling and representativeness

(5)

example. Their dataset of tweets (N=2.49 million) related to political talk shows for the 2012/2013 season is described as complete. Upon closer inspection, however, it is apparent that the dataset only contains tweets that included official or the most popular hashtags for the programs of interest. As Jungherr notes in his article, the choice to sample Twitter messages using hashtags may slant the sample toward more experienced users. Giglietto and Selva based their final analyses on a much smaller dataset intended to reflect tweets during peaks of activity. This is not intended to be critical and indeed, to their credit, the authors are quite candid about the limitations of the final dataset. The larger point is that even very large datasets often represent samples whose generalizability and representativeness is open to challenge. Bigness does not ensure quality.

It is striking that seven of the eight papers selected for this issue rely entirely or in part on Twitter data. Although Twitter users in the United States increasingly mir-ror its online population in basic demographic terms (Brenner & Smith, 2013), we know much less about the demographics of Twitter users in most other countries, particularly those in the developing world. As Baek and his colleagues acknowledge, this leaves cross-cultural comparisons of Twitter use and content open to concerns of sampling bias. Beyond this, however, there is no reason to assume that molar demo-graphic similarities between Twitter users and the overall online population imply similarities in attitudes, issues discussed, or several of the other more specific issues addressed in this issue.

In addition to concerns about how representative Twitter users are, we should also be concerned about Twitter’s ability to represent social media platforms more gener-ally. It is an appropriate choice on substantive grounds in some cases, but not in others, or at least not as a sole choice. Twitter was an excellent choice for Giglietto and Selva’s analysis of “second screen” interaction, though one might acknowledge that televi-sion viewers also interact with one another via direct texts, e-mail, and cellphone. As digital venues proliferate, it will become increasingly important to analyze more than one medium, just as those interested in media coverage of issues more generally now are encouraged to consider both broadcast and print media. The study by Neuman, Guggenheim, Jang, and Bae offers an outstanding example of an analysis using mul-tiple traditional and social media. In some other cases, it is fair to ask if Twitter data were representative of the larger, more diverse media streams substantively related to the authors’ research questions. This is a legitimate question for any study that is based on a single digital media platform, again, regardless of the amount of data drawn from that platform.

Enhancing data access and ensuring data quality

(6)

rich” or “data poor” (e.g., boyd & Crawford, 2012). These are legitimate fears and ought to be a source of alarm for everyone in the research community as more and more of our social life is conducted within commercially owned walled gardens.

But the rhetoric of digital divides fails to capture the full range of the danger. As communication researchers begin to work with the owners of social network-ing sites and other proprietary venues, they may well begin to experience the same challenges that biomedical researchers have experienced working with commercial entities making drugs and medical devices. Communication researchers may have to contend with the fact that companies will grant access only to data that they believe will reflect positively upon their commercial interests. They will discover, as biomedi-cal researchers have, that sponsorship and assistance often comes with strings. Some-times these strings are explicit, as in the case of a company demanding the right to approve manuscripts before they are submitted for publication. Sometimes, the strings will be implicit, as in cases where researchers are biased by their own desire to please or to gain visibility through association with a trendy company or industry group. In extreme cases, there may be direct conflicts of financial interest when investiga-tors have ownership or extensive consulting relationships with the companies whose products they study.

Significant challenges therefore face us as we move into the era of Big Data. Some are new, but fortunately most of them are the same challenges that have been faced with major methodological innovations in the past. Looking past claims of excep-tionalism will help us recognize the road ahead. Moving forward holds the potential for not only examining existing questions in new ways, but for positioning the disci-pline of communication at the heart of efforts to understand social and civic life in an increasingly mediated age. The challenges are familiar; the theoretic and practical potential is enormous.

References

Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete.Wired[WWW document]. Retrieved from http://www.wired.com/science/ discoveries/magazine/16-07/pb_theory.

boyd, d., & Crawford, K. (2012). Critical questions for big data.Information, Communication & Society,15, 662–679. doi:10.1080/1369118X.2012.678878.

Brenner, J., & Smith, A. (2013). 72% of online adults are social networking site users. Washington, DC: Pew Research Center’s Internet & American Life Project. Retrieved from http://pewinternet.org/∼/media//Files/Reports/2013/PIP_Social_networking_sites

_update_PDF.pdf.

Christakis, N., & Fowler, J. H. (2007). The spread of obesity in a large social network over 32 years.New England Journal of Medicine,357, 370–379. doi:10.1056/NEJMsa066082. Christakis, N., & Fowler, J. H. (2009).Connected: The surprising power of our social networks

and how they shape our lives. New York, NY: Little, Brown.

Kuhn, T. (1962).The structure of scientific revolutions. Chicago, IL: University of Chicago Press.

Referensi

Dokumen terkait

Berdasarkan hasil pembahasan, penulis memiliki beberapa saran, yaitu: (1) bagi para kandidat yang akan mengikuti pertarungan politik ketika melakukan kampanye supaya membentuk

Pasal 304 “Setiap orang yang secara melawan hukum memaksa masuk ke dalam kantor pemerintah yang melayani kepentingan umum atau yang berada di dalamnya dan atas permintaan pejabat

Tujuan dari penelitian ini adalah untuk mengetahui adakah perbedaan yang signifikan dari kemampuan siswa dalam menulis recount teks dari siswa kelas sepuluh SMA 1 Mejobo

Permasalahan dalam penelitian ini adalah apakah penerapan model CTL dapat meningkatkan hasil belajar IPA materi hubungan antara sumber daya alam dengan lingkungan dan

CHAPTER IV FINDING OF THE RESEARCH 4.1 The Reading Comprehension of Recount Text of the Eighth Grade Students of SMP NU Al Ma’ruf Kudus in Academic Year 2012/2013

Berdasarkan dari hasil penelitian dapat disimpulkan : secara simultan, hipotesis yang menyatakan bahwa diduga variable pendidikan dan pelatihan berpengaruh terhadap

Penelitian ini bertujuan untuk mengetahui pengaruh kualitas produk, harga, dan promosi terhadap minat beli konsumen pada Produk Smartphone Nokia dan menganalisiss

Tujuan pada penelitian ini adalah 1) untuk mengetahui perkembangan harga kedelai di tingkat KOPTI tahun 2010–2012. 3) Untuk mengetahui pengaruh perubahan harga kedelai