• Tidak ada hasil yang ditemukan

G roups of nodes

Dalam dokumen Networks Second Edition (Halaman 190-196)

about one of our friends not from that friend directly but from another mu- tual acquaintance—the message has passed along a path of length two via the mutual acquaintance, rather than along the direct (shortest) path of length one.

Flow betweennessis a variant of betweenness centrality that uses edge-in-

dependent paths between node pairs rather than shortest paths [192]. If there See Section 6.13 for a discussion of independent paths.

is more than one possible choice of independent paths between a pair of nodes, the contribution to the betweenness of any node for that pair is defined to be the maximum over all choices.

Another variant israndom-walk betweenness[356], which imagines messages

performing random walks across the network between every possible starting See Section 6.14.3 for a dis- cussion of random walks.

point and destination, and the betweenness is defined as the average number of such messages that pass through each node. Random-walk betweenness would be an appropriate betweenness measure for traffic that traverses a network with no idea of where it is going—it simply wanders around at random until it reaches its destination. Conventional shortest-path betweenness is the exact opposite: it is the appropriate measure for information that knows exactly where it is going and takes the most direct route to get there. It seems likely that most real-world situations fall somewhere in between these two extremes.

It is found in practice, however, that the two measures often give quite similar results [356], in which case one can with reasonable justification assume that the

“correct” answer, which presumably lies between the limits set by the shortest- path and random-walk measures, is similar to both. In cases where the two differ by a larger margin, however, we should be wary of attributing too much authority to either measure—there is no guarantee that either is telling us a great deal about true information flow in the network.

Other generalizations of betweenness are also possible, based on other mod- els of diffusion, transmission, or flow along network edges. We refer the inter- ested reader to the article by Borgatti [76], which draws together many of the possibilities into a broad general framework for betweenness measures.

7.2 G

roups of nodes

Many networks, including social and other networks, divide naturally into groups or communities. Networks of people divide into groups of friends, co-workers, or business partners; the World Wide Web divides into groups of related web pages; biochemical networks divide into functional modules, and so forth. The definition and analysis of groups within networks is a large and fruitful area of network theory. In Chapter 14 we discuss some of the sophisticated computer methods that have been developed for dividing networks into their constituent groups, such as modularity-based methods

and maximum likelihood methods. In this section we discuss some simpler concepts of network groups that can be useful for probing and describing the local structure of networks. The primary constructs we look at are cliques, k-cores, andk-components.

7.2.1 Cliques

Acliqueis a set of nodes within an undirected network such that every member of the set is connected by an edge to every other. Thus a set of four nodes in a network would be a clique if (and only if) each of the four is directly connected by edges to the other three. Note that cliques can overlap, meaning that they can share one or more of the same nodes.

The occurrence of a clique in an otherwise sparsely connected network is normally an indication of a highly cohesive subgroup. In a social network, for A clique of four nodes

within a network.

instance, one might encounter a set of individuals each of whom was acquainted with each of the others, and such a clique would probably indicate that the

A

B

Two overlapping cliques.

Nodes A and B in this net- work both belong to two cliques of four nodes.

individuals in question are closely connected—the members of a family, for example, or a set of co-workers in an office.

However, it’s also the case that many circles of acquaintances form only near-cliques, rather than perfect cliques. There may be some members of a group who are unacquainted, even if most members know one another. The requirement that every possible edge be present within a clique is a very strin- gent one and limits the usefulness of the clique concept. There are, however, some circumstances in which cliques do crop up and play an important role.

An example is the one-mode projection of a bipartite network introduced in Section 6.6.1. Recall that bipartite networks (also called affiliation networks in sociology) are commonly used to represent the membership of people or ob- jects in groups of some kind. The one-mode projection creates a network that is naturally composed of cliques, one for each group—see Fig. 6.6 on page 117.

7.2.2 Cores

For many purposes a clique is too stringent a notion of grouping to be useful and it is natural to ask how one might define something more flexible. One possibility is thek-core. By contrast with a clique, where each node is joined to all the others, ak-core is a connected set of nodes where each is joined to at leastkof the others. Thus, in a 2-core, for instance, every node is joined to at Note that a 1-core is the

same thing as an ordinary

component. least two others in the set. Figure 7.4 shows thek-cores in a small network.

The k-core is not the only possible relaxation of a clique, but it is a par- ticularly useful one for the very practical reason thatk-cores are easy to find.

7.2 | G

roups of nodes

3−core 1−core 2−core

Figure 7.4: Thek-cores in a small network.The shaded regions denote thek-cores for k1, 2, and 3 in this small network. There are nok-cores fork>3 in this case. Note how thek-cores are nested within one another, the 3-core inside the 2-core, which is in turn inside the 1-core.

A simple way to find them is to start with a given network and remove from it any nodes that have degree less thank, along with their attached edges, since

clearly such nodes cannot under any circumstances be members of a k-core. There is a close connection between k-cores and the concept of “complex con- tagion,” which is used to model the spread of ideas or information in social networks. See the dis- cussion in Sections 16.1.9 and 16.3.5 and footnote 12 on page 640. Another closely related process, bootstrap percolation, has been studied extensively in statistical physics—see Refs. [7, 99, 210].

In so doing, one will normally reduce the degrees of some other nodes in the network—those that were connected to the nodes just removed. So we then go through the network again to see if there are any additional nodes that now have degree less thankand remove those too. And so we proceed, repeatedly pruning the network to remove nodes with degree less than kuntil no such nodes remain. What is left over will, by definition, be a k-core or a set of k-cores, since each node is connected to at leastkothers. Note that we are not necessarily left with asinglek-core—there’s no guarantee that the network will be connected once we are done pruning it, even if it was connected to start with.

For any given network, there is a maximum value of k for the k-cores.

It is clear, for instance, that no k-cores can exist when k exceeds the highest degree in the network, since in that case no node could havekconnections to others. Thek-cores of a network also have the property of beingnestedwithin one another: the 2-cores are subsets of the 1-cores, the 3-cores subsets of the 2-cores, and so forth—see Fig. 7.4. This must be the case since one could, if one wished, compute the 3-cores by first removing all nodes with degree less than 2, thereby creating the 2-cores, then removing all nodes with degree less than 3 from those, creating the 3-cores. Thus, the breakdown of a network

2−component 3−component 1−component

Figure 7.5: Thek-components in a small network. The shaded regions denote thek- components in this small network, which has a single 1-component, two 2-components, one 3-component, and no k-components for any higher value of k. Note that the k-components are nested within one another, the 2-components falling inside the 1- component and the 3-component falling inside one of the 2-components.

into cores for all values of kprovides a onion-like decomposition into layers within layers—1-cores, then 2-cores, then 3-cores, and so forth, culminating at the highest value ofkfor which cores exist. This decomposition is sometimes used as a measure ofcore–periphery structurein networks: nodes that lie within See Section 14.7.3 for fur-

ther discussion of core–

periphery structure. the highest-kcores are “core” nodes within the network, while nodes outside those cores are “peripheral” nodes. In this sense, the cores define a kind of centrality measure, and they are sometimes used that way. In the social networks literature, for instance, it is sometimes hypothesized that core actors in a network, defined in this sense, may be more powerful or influential, or have better access to information or resources, although this is only a hypothesis—

there is in most cases no formal reason to suppose thatk-cores are closely linked with node roles or behaviors [462].

7.2.3 Components andk-components

In Section 6.12 we introduced the concept of a component. A component in an undirected network is a (maximal) set of nodes such that each is reachable by some path from each of the others. A useful generalization of this concept is the k-component. Ak-component(sometimes also called ak-connected component) is a set of nodes such that each is reachable from each of the others by at least knode-independent paths—see Fig. 7.5. (Recall that two paths are said to be node-independent if they share none of the same nodes except the starting and ending nodes—see Section 6.13.) For the common special casesk2 andk3,

7.2 | G

roups of nodes

Figure 7.6: A small network with one 2-core but two 2-components.The whole of this network constitutes a single 2-core, since each of its nodes is connected to at least two of the others. But the network contains two separate 2-components, as indicated by the two shaded circles, proving that 2-cores and 2-components are not the same thing.

k-components are also calledbicomponentsandtricomponentsrespectively.

A 1-component by this definition is just an ordinary component—there is at least one path between every pair of nodes—and, like the k-cores of the previous section,k-components are nested within each other. A 2-component or bicomponent, for example, is necessarily a subset of a 1-component, since any pair of nodes that are connected by at least two paths are also connected by at least one path. Similarly a tricomponent is necessarily a subset of a bicomponent, and so forth. (See Fig. 7.5 again.)

At first sight, k-components seem rather similar to k-cores, but there are important differences. Consider Fig. 7.6, which shows a small network which is composed of a single 2-core—every node in the network is connected to at least two of the others—yet there are two separate 2-components in the network.

The left and right halves of the network are connected by only one independent path in the middle, so they are separate 2-components.

As discussed in Section 6.13, the number of node-independent paths be- tween two nodes is equal to the size of the minimum node cut set between the same two nodes, i.e., the number of nodes that would have to be removed in order to disconnect the two. So another way of defining ak-component would be to say that it is a subset of a network in which no pair of nodes can be disconnected from each other by removing less thankother nodes.

A variant of thek-component can also be defined using edge-independent paths, so that nodes are in the samek-component if they are connected bykor more edge-independent paths, or equivalently if they cannot be disconnected by the removal of less thankedges. In principle this variant could be useful in certain circumstances but in practice it is rarely used.

The idea of a k-component is a natural one in network analysis, being connected with the idea of network robustness. For instance, in a data network

such as the Internet, the number of node-independent paths between two nodes is also the number of independent routes that data might take between the same two nodes, and the size of the cut set between them is the number of nodes in the network—typically routers—that would have to fail or otherwise be knocked out to sever the data connection between the two endpoints. Thus a pair of nodes connected by two independent paths cannot be disconnected from one another by the failure of any single router. A pair of nodes connected by three paths cannot be disconnected by the failure of any two routers. And so forth. Ak-component withk ≥2 in a network like the Internet is a subset of the network that has robust connectivity in this sense. One would hope, for instance, that most of the network backbone—the system of high volume world-spanning links that carry long-distance data (see Section 2.1)—is a k- component with highk, so that it would be difficult for points on the backbone to lose connection with one another.

Figure 7.7: A non-contiguous tricomponent. The two high- lighted nodes in this network form a tricomponent, even though they are not directly connected to each other. The other three nodes are not in the tricomponent.

One disadvantage ofk-components as a definition of node groups, is that fork≥3 they can be non-contiguous (see Fig. 7.7). Ordinary components (1- components) and 2-components are always contiguous, but 3-components and above may not be. Within the social networks literature, where non- contiguous components are often considered undesirable, k-components are sometimes defined slightly differently, to be a set of nodes such that every pair in the set is connected by at leastknode-independent pathsthat themselves are contained entirely within the subset. This definition rules out non-contiguousk-components, but it is also mathematically and computa- tionally more difficult to work with than the standard definition. For this reason, and because there are also plenty of cases in which it is appropriate to count non-contiguousk-components, the standard definition remains the one most widely used.

There are a number of other definitions of node groups that find occa- sional use, particularly in the social networks literature, such as k-plexesand k-clubs. See the book by Wasserman and Faust [462] for a detailed discus- sion. There are also various definitions that avoid the use of a parameter k.

For instance, Flake et al. [181] proposed a definition of a group as a set of nodes that each has at least as many connections inside the set as outside.

Radicchi et al.[395] proposed a weaker definition where a group is a set of nodes such that the total number of connections between nodes inside the set is greater than the total number to nodes outside it. The use of these measures is, however, relatively rare and we will not consider them further here.

Dalam dokumen Networks Second Edition (Halaman 190-196)