T ransitivity and the clustering coefficient

second number by the first to get a clustering coefficientCthat lies in the range from zero to one:

C(number of closed paths of length two)

(number of paths of length two) . (7.26) C 1 implies perfect transitivity, i.e., a network whose components are all cliques. C0 implies no closed triads, which happens for various topologies, such as a tree (which has no closed loops of any kind—see Section 6.8) or a square lattice (which has closed loops with even numbers of nodes only but no closed triads).

Note that paths in networks, as defined in Section 6.11, have a direction (even in an undirected network). Thusuvw andwvuare considered distinct paths. The formula in Eq. (7.26) counts these paths separately, although in practice it would also be fine to count each path in only one direction—it would reduce both the numerator and the denominator by a factor of two, and the factors would cancel, leaving the value ofCunchanged. Usually, however, and particular when writing computer programs, it is easier to count paths in both directions—it avoids having to remember which paths you have already counted.

An alternative way to write the clustering coefficient is C (number of triangles)×6

(number of paths of length two). (7.27) Why the factor of six? It arises because each triangle in a network contains six paths of length two. Suppose we have a triangleuvw. Then there are six paths of length two in it: uvw,vwu,wuv,wvu,vuw, anduwv. Each of these six is closed, so the number of closed paths is six times the number of triangles, and using this result in Eq. (7.26) then gives Eq. (7.27).

Another way to write the clustering coefficient would be to note that if we have a path of length two,uvw, thenuandwhave a common neighbor inv—

they share a mutual acquaintance in social network terms. If the pathuvwis A triangle contains six dis-

tinct paths of length two, all of them closed.

closed thenuandware also themselves acquainted, so the clustering coefficient can be thought of also as the fraction of pairs of people with a common friend who are themselves friends, or equivalently as the mean probability that two people with a common friend are themselves friends.

This is perhaps the most common way of defining the clustering coefficient.

In mathematical notation:

C (number of triangles)×3

(number of connected triples). (7.28)

7.3 | T

ransitivity and the clustering coefficient

Here a “connected triple” means three nodesuvwwith edges(u,v)and(v,w). (The edge (u,w) can be present or not.) The factor of three in the numerator arises because each triangle gets counted three times when we count the connected triples in the network. The triangleuvw, for instance, contains the triplesuvw,vwu, andwuv. In the older social networks literature the clustering coefficient is sometimes called the “fraction of transitive triples,” which is a reference to this definition of the coefficient.

Social networks tend to have quite high values of the clustering coefficient.

For example, the network of film actor collaborations discussed earlier in this chapter has C 0.20 [354]; a network of collaborations between biologists was found to have C 0.09 [349]; a network of who sends email to whom in a large university had C 0.16 [156]. These are typical values for social networks. Some denser networks have even higher values, as high as 0.5 or 0.6. (Technological and biological networks by contrast tend to have somewhat lower values. The Internet at the autonomous system level, for instance, has a clustering coefficient of only about 0.01. This point is discussed in more detail in Section 10.6.)

In what sense are the clustering coefficients for social networks high? Let us assume, to make things simple, that everyone in a network has about the

same numbercof friends and let us suppose that everyone picks their friends Of course it is not normally the case that everyone in a network has the same number of friends. We will see later how to perform better calculations of the clustering coefficient (Sec- tion 12.3), but this simple calculation will serve our purposes for now.

completely at random from the whole population, meaning that they have the same probability of being friends with every person in the network. That probability is simply equal toc/(n−1), wherenis the total number of people in the network. But in that case the probability of two of my friends being acquainted, which is by definition the clustering coefficient, is alsoc/(n−1)— my friends have the same probability of being acquainted as everyone else.

For the networks cited above, the value ofc/(n−1)is 0.0003 (film actors), 0.00001 (biology collaborations), and 0.00002 (email messages). Thus the real clustering coefficients aremuchlarger than our simple calculation would suggest. The calculation does ignore any variation in the number of friends people have, but the disparity between calculated and observed clustering coefficients is so large that it seems unlikely it could be eliminated just by allowing the number of friends to vary. A more likely explanation is that we were wrong to assume that everyone has the same probability of knowing everyone else. The numbers suggest that there is a much greater chance that two people will be acquainted if they have another common acquaintance than if they don’t. We discuss this point at greater length in Section 10.6.

Some social networks, such as the email network mentioned earlier, are directed networks. In calculating clustering coefficients for directed networks, scientists have typically just ignored their directed nature and applied Eq. (7.28)

as if the edges were undirected. It is however possible to generalize transitivity to take account of directed links. If we have a directed relation between nodes such as “ulikesv” then we can say that a triple of nodes is closed or transitive if ulikesv,vlikesw, and alsoulikesw. One can calculate a clustering coefficient in the obvious fashion for the directed case, counting all directed paths of

u w

v

A transitive triple of nodes in a directed network.

length two that are closed and dividing by the total number of directed paths of length two. To date, however, such measurements have not often appeared in the literature.

7.3.1 Local clustering and redundancy

The clustering coefficient of the previous section is a property of an entire network. It quantifies the extent to which pairs of nodes with a common neighbor are also themselves neighbors, averaged over the whole network. It is, however, also sometimes useful to define a clustering coefficient for a single node. For a nodei, we can define

In this book we use the notation C_i for both the local clustering coefficient and the closeness centrality.

Care must be taken not to confuse the two.

C_i (number of pairs of neighbors ofithat are connected)

(number of pairs of neighbors ofi) . (7.29) That is, to calculateCiwe go through all distinct pairs of nodes that are neighbors ofi, count the number of such pairs that are connected to each other, and divide by the total number of pairs, which is ¹₂ki(ki−1), wherekiis the degree ofi. Ci is sometimes called thelocal clustering coefficientand it represents the average probability that a pair of i’s friends are friends of one another. (For nodes with degree zero or one the number of pairs of neighbors is zero and Eq. (7.29) is not well defined. Conventionally in this case we say thatC_i 0.)

Local clustering is interesting for several reasons. First, in many networks it is found empirically to have a rough dependence on degree, nodes with higher degree having a lower local clustering coefficient on average. This point is discussed in detail in Section 10.6.1.

Second, local clustering can be used as an indicator of so-called “structural holes” in a network. While it is common in many networks, especially social networks, for the neighbors of a node to be connected among themselves, it does happen sometimes that these expected connections between neighbors are missing. The missing links are calledstructural holesand were first stud- Structural holes

When the neighbors of a node are not connected to one another we say the network contains “structural holes.”

ied in this context by Burt [89]. If we are interested in the efficient spread of information or other traffic around a network then structural holes are a bad thing—they reduce the number of alternative routes information can take through the network. On the other hand, structural holes could be a good thing for the node whose neighbors lack connections, because they give that

Dalam dokumen Networks Second Edition (Halaman 196-200)