• Tidak ada hasil yang ditemukan

DATA LAN G UAG ES

1 65

Each value in a tree can be reached from its one origin by a separate path that passes through a number of branching values. Trees are basic to the recording of linguistic representations and conform to one of the earliest theories of meaning.

Aristotle's notion of a definition, for example, requires naming the genus (the general class) to which the definiens (the word to be defined) belongs and distin­

guishing the latter from all other species of that genus. Moving from genus to genus describes moving through the branching points of a tree. The system of categories in the Linnean classification in biology-not the organisms it classi­

fies--constitutes a tree. Closer to content analysis, a reference to Europe is implicitly a reference to France, Italy, Germany, and so on. A reference to France is implicitly a reference to the regions of that country. The relation connecting

"Europe, " "France," and "Provence" is one of inclusion and defines a path or chain through a tree. France and England are on different paths, as neither includes the other.

Figure 8.5 Trees

Vice President I 1

I

President I

Dep. Head A Dep. Head B Dep. Head C

Vice President 2 I

Dep. Head D

I

I

I

I

� �

I

I

I

Sup.i Sup.ii Sup.iii Sup.iv Sup.v Sup.vi Sup. vii Sup. viii Sup.ix Sup.x

Most content analyses fix the level of abstraction on which countries, popu­

lations, products, or mass-media programs are coded. Trees offer a richer alter­

native. Other examples include family trees, decision trees, telephone trees, the trees that the rules of a transformational grammar generate, and social hierar­

chies in business organizations, in the military, and in government. (I discuss the possible confusion of trees with groupings below, in section 8.6. 1 . Note here only that each branching point can be occupied by a value.)

literature on metrics developed largely on chainlike variables and in the context of a measurement theory that distinguishes among nominal, ordinal, interval, and ratio scales (Stevens, 1 946) . These four metrics (listed here in the order of their increasing power) differ in the information they can represent and are there­

fore often called levels of measurement. I describe the three principal metries below and define their mathematical properties in Table 8 .2.

Ordinal Metrics

Ordinal metrics describe recording units in such relational terms as "larger than," "more than," "precedes," "causes," "is a condition of," "is a refinement of," "is contained in," "supervises"-in short, in terms of ranks. Ordinal scales (chains with ordinal metrics) are probably most common in the social sciences, largely because relationships between people and objects tend to occur in lan­

guage, spoken or written, and are then also more easily recorded in words. When the stock market is said to "gain," an ordinal metric is implied. When it is said to "gain 5 points," an interval metric is invoked. Ordinal scales using 3, 5, and 7 points are most closely associated with language and hence natural in content analysis. Polar opposites lend themselves to 3-point scales (e.g., a scale from good to bad, with neutral as its midpoint); the addition of simple adjectives, such as more or less, results in 5-point scales, and the addition of superlatives (e.g., most and least) leads to 7 -point scales.

In content analysis, ranks may be variously operationalized. Newspaper edi­

tors, for example, employ several typographical devices to express the impor­

tance they assign to the news items they publish. Suppose, after interviewing a sample of newspaper editors, a researcher found the following rank order to correlate highly with the editors' judgment of how important news items were­

of course always relative to what happened that day:

1st: Largest multicolumn headline above the center fold of the front page 2nd: Any other headline above the center fold of the front page

3rd: Any headline below the center fold of the front page 4th: Any multi column headline on the second, third, or last page 5th: Any other headline above the center fold of any other inside page 6th: Any headline below the center fold of any other inside page 7th: Any other news item

Assuming that the editors' judgments are those of a somewhat stable journalistic culture, are used quite consistently, and have little variation, content analysts can use this construct to infer the importance of news items by ranking them with this 7-point ordinal scale.

DATA LANG U AGES

1 67

As Table 8.2 suggests, ordinal metrics are not limited to chains. Grouping an unordered set of values into conceptual categories introduces inequalities between the otherwise pairwise equal differences. Groupings suppose that the values within one group have more in common with each other than with the values in different groups. The above-mentioned standards for analyzing Freud's dream theory represent a grouping: The difference between Al and A2 is smaller than the difference between Al and B l , but nothing indicates by how much. Graham and Witschge (2003 ) introduced such differences in ranks by cat­

egorizing messages, the posts to online discussion groups, in four convenient phases, effectively grouping 21 categories on four levels. Figure 8 . 6 shows the researchers' process. In Phase 1, messages were distinguished into three groups, two of which were final categories. In Phase 2, messages that responded to pre­

vious messages were grouped into two kinds, depending on whether they mani­

fested reasons. In Phase 3, the nonreasoned claims led to three final categories.

The reasoned claims were divided into four types of responses and, in Phase 4, each led to four groups indicating the kind of evidence used in the arguments.

Frequencies were obtained for each final category, which could be summed in the reverse order of the distinctions that led to them. Although the data are qualita­

tive, showing no ordering, the grouping imposed a metric that assumes that categories of messages in the same group are more similar to each other than to categories of messages in different groups. Figure 8 . 6 suggests that messages that manifest reasons and those that do not are more different than messages that differ in whether they contain counterarguments, rebuttals, refusals to rebut, or rational affirmations.

Groupings reflect conceptual hierarchies that are defined on top of an original set of values. When used repeatedly, any decision tree-for example, that depicted in Figure 8.6, but also the one in Figure 7.1-creates groupings. Decision trees proceed from rougher to finer distinctions and from larger and less differentiated sets of units of analysis to smaller and more specialized sets. One analytical implication of grouping is that it suggests the order in which frequencies of values may be summed, undoing decisions one by one. Hierarchical clustering procedures, for instance, proceed that way as well. They capitalize on unequal differences between elementary qualities to develop a hierarchy, repre­

sented by a dendrogram, that could explain the collection of these qualities as groupings (for instance, see Figure 1 0 . 1 0 ) .

Groupings and trees are easily confused, and as both are important in content analysis, I want to highlight their distinction. As I have said, groupings provide convenient conceptualizations of a given set of values, the terminal points of a decision tree, like the outline of a book in chapters and sections.

Groups do not constitute values in a grouping, however. An outline is not the text it organizes. In contrast, the values of a tree are not limited to the terminal values of the tree; they include its branches as well. Thus their values are not merely different; they may include each other, enabling the coding of different levels of inclusion, abstraction, or entailments. Take the above-mentioned Linnean classification system as an example. It groups organisms into classes and subclasses and provides concepts that label these groups on different levels.

Phase: 2 3 4 Message Type Reasoning Tvpe of Response Evidence Used

Initial (Rational Argument)

Figure 8.6

Irrelevant

Response-Information

Non-Reasoned!

Justified Claim """":::---+- Response-Affirmation Counter-Assertion

Reasoned!

Justified Claim

Ana10gylExampJe

C A

Assertion/Assumption

ounter- rgument

Experience Supported-by-Factual

AnalogylExample

R b I Assertion! Assumption

e utta Experience

Supported-by-Factual

AnalogylExample

Refute-to-Rebuttal ExAssertiOnlAssumption penence Supported-by-Factual

Rational Affinnation

4

Assertion/Assumption AnalogylExampJe Experience Supported-by-Factual

A Grouping of Messages From Online Deliberations SOU RCE: Adapted from G raham and Witschge (2003, p. 1 81 , fig. 1 ).

For instance, mammal is not an organism but the name of a group that includes humans, whales, and mice. The Linnean system groups organisms but it defines a tree for the names of groups of organisms that describe organisms on different levels of commonalities.

I nterval Metrics

Interval metrics represent quantitative differences between recording units.

Measures of time, distance, and volume as well as changes in quantities and movement in space all assume meaningful intervals. When applied to chains, an interval metric creates interval scales and enables the addition or subtraction of differences between scale points. In psychological tests, subjects are often asked to use rating scales with equal intervals to answer questions. In content analysis, the semantic differential scales that are used to record judgments of biases, personality traits of characters, and so on are often conceptualized as equal-interval scales. Intervals do not need to be equal, however.

Interval data are the preferred kind in empirical social research, largely because of the wealth of statistical techniques that are available and accessible for them, especially techniques that require the calculation of differences, as in variance calculations, correlational methods, factor analyses, multidimensional scaling, and clustering. Interval metrics might well be an artifact of these tech­

niques . In the natural sciences most measures, except for time, have ratio metric properties, and in content analysis interval scales tend not to be as reliable as data with a less powerful metric. For example, in research on the personality

DATA LAN G UAG ES

1 69

characteristics of fictional characters on television, semantic differential scales, which are treated as interval scales, have been notoriously unreliable. This has been so not only because language is rarely as precise as would be necessary for differences to be calculable, but mainly because personality characteristics that are irrelevant to a plot may not be present at all, causing coders to guess when they are forced to choose among interval values. Nevertheless, many secondary measures that content analysts provide-quantitative indices of phenomena, geometric depictions of findings-have valid interval qualities.

Ratio Metrics 8.6.3

Ratio metrics are defined from absolute zero points relative to which all differences between values are expressed. Lengths, weights, speeds, masses, and absolute temperatures in degrees Kelvin (but not in degrees Fahrenheit or Celsius) exemplify ratio scales in the physical sciences, none of which can go below its absolute zero point. There are also many examples of ratio-level measurements of text, such as column inches of newsprint, sizes of photographs, frequencies of publication, audience sizes, and Nielsen ratings, as well as amounts of information and costs. These have no negative values either. In content analysis, these measures may have less to do with what a text says or the role it plays in a particular context than with how prominent recording units are or how much they say to the analyst.

The list of metrics is far from settled, and far more orderings are available than are relevant for content analysis. Regarding data languages-of which variables, orderings, and metrics are the most prominent features-it is probably most important to keep their Janus-faced character in mind. The data language must be appropriate to the phenomenon being recorded-and from this perspec­

tive, the best data language is the raw text itself. The data language must also render the data amenable to analysis. Given the currently available analytical techniques, the gap between the form in which texts are easily available and the forms these techniques require often seems large. For content analysts, the chal­

lenge is to develop computational techniques whose requirements are easily satisfied by naturally occurring texts and images.