How to Choose the Most Appropriate Centrality Measure? A Decision Tree Approach

(1)

0 IEEE TRANSACTIONS ON XXX, VOL. XX, NO. XX, XXXX 2024

This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

(2)

How to Choose the Most Appropriate Centrality Measure? A Decision Tree Approach

Pavel Chebotarev and Dmitry Gubanov

Abstract—Centrality metrics play a crucial role in network analysis, while the choice of specific measures significantly influences the accuracy of conclusions as each measure represents a unique concept of node importance. Among over 400 proposed indices, selecting the most suitable ones for specific applications remains a challenge. Existing approaches—model-based, data- driven, and axiomatic—have limitations, requiring association with models, training datasets, or restrictive axioms for each specific application. To address this, we introduce the culling method, which relies on the expert concept of centrality behavior on simple graphs. The culling method involves forming a set of candidate measures, generating a list of as small graphs as possible needed to distinguish the measures from each other, constructing a decision-tree survey, and identifying the measure consistent with the expert’s concept. We apply this approach to a diverse set of 40 centralities, including novel kernel-based indices, and combine it with the axiomatic approach. Remarkably, only 13 small 1-trees are sufficient to separate all 40 measures, even for pairs of closely related ones. By adopting simple ordinal axioms like Self-consistency or Bridge axiom, the set of measures can be drastically reduced making the culling survey short.

Applying the culling method provides insightful findings on some centrality indices, such as PageRank, Bridging, and dissimilarity- based Eigencentrality measures, among others. The proposed approach offers a cost-effective solution in terms of labor and time, complementing existing methods for measure selection, and providing deeper insights into the underlying mechanisms of centrality measures.

Index Terms—Network, Centrality measure, Decision tree, Axiomatic approach

I. INTRODUCTION

U

NDERSTANDING networks relies heavily on centrality metrics. However, the vast number of proposed point centrality measures, which now exceeds 400 and continues to grow [1] (cf. [2]–[4]), poses a challenge when it comes to selecting the most appropriate centrality indices for specific applications.

In some cases, researchers have a mathematical model that accurately captures the influence of nodes in the process of interest, naturally suggesting a measure of node importance that can be interpreted as centrality [5]–[11].

The work of P. Chebotarev was supported by the European Union (ERC, GENERALIZATION, 101039692). Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

P. Chebotarev is with the Technion–Israel Institute of Technology, Haifa, 3200003 Israel (e-mail: [email protected]).

D. Gubanov is with the Institute of Control Sciences, RAS, 65 Profsoyuz- naya str., Moscow, 117997 Russia (e-mail: [email protected]).

However, in many cases, no detailed model is available, while the centrality of network nodes needs to be estimated.

To address this, researchers rely on studying the typology of the application and theoretical taxonomy of centrality indices [2]–[4], [6], [12]–[14] to narrow down the options. Further, specific measures can be compared experimentally by studying correlations, regressions, and appropriate distances between them and external network characteristics [2], [14]–[21] and (or) using Principal Components Analysis (PCA) [12], [22].

On the other hand, such studies yield varying results across datasets within the same subject area [23] and do not uncover the underlying causes of correlations.

Hence, to supplement the experimental perspective, it is essential to focus on the properties of centralities and the conditions they satisfy. The quintessence of this is the axiomatic approach, which seeks to characterize a measure by a minimal set of axioms it uniquely satisfies (we refer to [24]–[30] for several examples). In [14], typology is incorporated into this approach by the use of nodal statistics.

However, the axiomatic approach has its limitations. Some of them are as follows.

1. Only a minority of measures are axiomatized, since constructing a suitable set of axioms is a non-trivial task.

2. For parametric families of measures (such as those in [31]–[33]), selecting the parameter value rarely has an axiomatic solution.

3. In many axiomatics, there is at least one technical axiom, which is not appealing in itself and rather specifies the func- tional form of a centrality (see, e.g., Closeness, Degree, and Decay axioms in [27], Linearity in [28], Neighbor separability in [34]). It can be argued that adopting such an axiom does not fundamentally differ from adopting a measure at once.

4. Some non-technical axioms have rather sophisticated for- mulations, which makes it difficult to assess their desirability.

5. Comparing multiple axiomatics can be laborious, with no guarantee of reaching a definitive choice among them.

6. The rankings of nodes in simple networks provided by a measure satisfying a set of axioms may be counterintuitive to users, despite the initial appeal of the axioms.

For example, the Weighted degree centrality[35]

f(u) = X

v∈V, v̸=u

D(v) d(u, v),

whereV is the node set of a graphG, D(v)is the degree of nodev,d(u, v)being the shortest path distance betweenuand v,implements the reasonable idea of measuring the centrality

(3)

ofuby the sum of the degrees of all other nodes with weights decreasing with distance from u. This measure satisfies four out of six axioms considered in [35] (while Betweenness or Eigenvector centrality satisfy three axioms and no measure under study satisfies five or six) and its characterization is likely to be obtained by adding a suitable axiom of an algebraic nature. However, for any star(a graph with n >2nodes and n−1edges incident to the same node called the center), this index considers the center to be less central than the leaves.

It follows that (i) the Weighted degree measures something other than centrality; (ii) the performance of a measure on simple examples is no less informative than the underlying heuristics or axioms it satisfies.

The main observation that prompted this study was that professionals familiar with a particular application often rely on their experience (frequently based on objective results) to compare node centrality in simple networks. In this case, we can select a centrality measure that ranks the nodes in the same way. Furthermore, such a centrality can be sought in the set of measures that satisfy the axioms that the expert considers important.

As another instructive example, consider the PageRank centrality [36]–[38], which was originally proposed for directed reference networks (see [5], [39]) and has axiomatizations [30], [40]. Note that it sets f(4) > f(5) (node 4 is more central than node 5) for the graph G5 in Fig. 1. As well as some other inequalities it produces (namely, f(4)> f(0) for G2; f(1)> f(0) for G7; f(0)> f(2) andf(1)> f(5) forG9), this seems rather counterintuitive. On the other hand, an analysis of the mechanism of this centrality reveals [41]

some types of real-world undirected networks that may require such a measure. This kind of singularity of measures may go unnoticed in an axiomatic study, but it is easy to detect when centralities are tested on a set of specially selected graphs.

Taking into account the above limitations of the axiomatic strategy, in this paper we propose an alternative approach based on expert beliefs on how the desired centrality should act on test graphs.

Purpose of the Study

We consider centrality measures on undirected graphs. Given a graph G = (V, E) with node set V = V(G) such that

|V(G)| > 3 and edge set E = E(G), a centrality measure f associates a real number f(v) to each node v ∈ V(G).

This number is interpreted as follows: the larger f(v), the more central v is considered. While f(v) depends on G, for simplicity we do not reflect this in the notation as only one graph is considered at a time. We confine ourselves to connected graphs, since there are centralities defined only for them. Let F ={f1, . . . , fm} be a set of centrality measures, and each fi is applicable to any graph of the above type.

In this paper, we are mainly interested in the order of centrality values, not the numerical values themselves. Let us say that a graph Gdistinguishes (or separates) measures fi

and fj on a pair of nodes{u, v} (u, v∈V(G))whenever sign(f_i(u)−f_i(v))̸= sign(fj(u)−f_j(v)). (1)

In words (1) means that f_i and f_j disagree in comparing the centrality ofuandv.

Centrality measures f_i and f_j are rank equivalent if and only if there exists no graph that distinguishes them. We suppose that no distinct measures in F are rank equivalent.

Eliminating rank equivalent “twins” of some measures can be done by considering a large enough series of large enough graphs: if no graph distinguishes a couple of measures, we remove one of them, even if we have not been able to rigorously prove that they are rank equivalent.

The aim of this work is to propose a method that allows the expert to select a centrality measure that matches their application-specific understanding of how such a measure should act on a collection of test graphs.

Given a set of measuresF, we need to construct:

• a collection of fairly simple graphs that contains a distinguishing graph for every pair of distinct measures fi, fj ∈ F;

• a questionnaire such that every set of responses to its questions determines a unique measure.

Ideally, these collection and questionnaire should be generated and modified automatically whenever a new set F is provided or an existing setF is updated.

The method that consists in constructing such collections and questionnaires and passing the corresponding surveys to select the most appropriate centrality measures will be called culling. We develop this method in the following section. The computer code that implements it is provided athttps://github.

com/DAGubanov/centralities_dec_tree. II. RESULTS

We now present the main procedures of the culling method and then illustrate its application.

A. Generating Graphs that Distinguish Measures

Let a set F of rank-nonequivalent centrality measures be given. We aim to construct a listG= (G1, . . . , GK)of graphs that separate all measures in F.

The distinguishing graphs should be simple enough for experts to answer which pairwise centrality order, i.e.,f(u)>

f(v) or f(u) < f(v) or f(u) = f(v) is most consistent with their application-specific understanding based on their experience and previous studies.

For our first survey, we opted for the class of unicyclic graphs (also called unicycles, 1-trees, or augmented trees).

These are connected graphs with the number of edges equal to the number of nodes. Such a graph can be obtained from a tree by connecting any two non-adjacent nodes with an edge.

The advantage of1-trees is that they are simple and can be even more comfortable to perceive and compare the centrality of nodes than trees, thanks to an eye-catching cycle (Fig. 1).

The procedure of constructing the list G = (G₁, . . . , G_K) of separating1-trees is as follows.

1. At the beginning, the listG is empty.

2. We sequentially generate unlabeled trees, starting from the trees with four nodes and increasing the number of nodes.

(4)

Fig. 1. Distinguishing graphs for the test setF⁴⁰of 40 centrality measures.

2.1. For each tree, we produce various 1-trees by adding different edges that connect non-adjacent nodes.

2.1.1. For each unlabeled 1-tree G, we look for the couplesf_i, f_j∈ F such that the current listG (let

|G| =k) still contains no graph that separates f_i andfj,whileGseparates them on some pair{u, v}

of its nodes.

2.1.1.1. If{fi, fj} is such a couple, then we addGto the list G as Gk+1 and associate a separating triple(Gk+1, u, v)with{fi, fj}.

3. We stop the generation process when a separating triple (G_k+1, u, v) is associated with every pair of distinct centrality measures in F.

This simple procedure was sufficient to separate all measures in the set F⁴⁰ of 40 centralities (see Section II-D) by means of13 1-trees with at most8nodes. Among them, there are five graphs with 7 nodes and only one with 8 nodes.

Therefore, the “separating power” of 1-trees on 8 nodes is far from exhausted and can potentially separate a much larger number of measures. Moreover, for our test survey we limited ourselves to 1-trees with node degrees at most four, which is not obligatory in the above procedure.

However, there is no guarantee that all couples of different measures in F can be separated by 1-trees. To guarantee a complete separation, we propose the following general procedure that iterates over the entire set of connected graphs.

It includes the above procedure as the first stage to make the most of1-trees.

1. At the beginning, the listG is empty.

2. Increase n starting with 3, and while there are unseparated measures in F and n ≤ 8, sequentially use all connected graphs with nnodes andn edges (1-trees) to separate measures inF as in the previous procedure.

3. If there are still unseparated measures in F, proceed as

in Item 2 with trees (i.e., with connected graphs having n−1 edges).

4. If there are still unseparated measures in F, proceed as in Item 2 with connected graphs havingn+ 1edges.

5. Continue in the following way until all measures in F are separated:

5.1. For each increasingn >8:

5.1.1. Sequentially use, to separate the not yet separated measures inF,all connected graphs withnnodes and: (a)nedges (1-trees); (b)n−1 edges (trees);

(c)n+ 1 to2n−8 edges;

5.1.2. Generate and use: (d) all connected graphs with n^′ ≤ n nodes and n^′ +n −7 edges, starting with the smallestn^′ allowing that many edges and increasing n^′.

In this general procedure, with each incremented n > 8, we increase the maximum difference between the number of edges and nodes to (n^′+n−7)−n^′ = n−7 (see (d) in 5.1.2 above) and check all graphs with this difference and the number of nodes at mostn.This maximum difference grows slowly to keep the test graphs simple enough by keeping their density as low as possible.

The next task is constructing a questionnaire that allows the expert to choose the centrality measure that suits them best.

B. Constructing a Decision Tree

We now present a procedure for constructing a questionnaire, which takes the form of a decision tree.

In every next question, the expert is asked to compare the centrality of two nodesuandv in some graphGk ∈ G.This graph separates certain measures that match all the expert’s responses already received. The new response isf(u)> f(v), or f(u)< f(v), orf(u) =f(v)and it narrows down the set of suitable measures keeping only those f_i ∈ F for which

(5)

this response is correct. The survey continues until only one measure remains.

Thus the questions in the survey depend on the responses to the previous ones. Therefore, the questionnaire has the structure of a rooted directed tree. The expert navigates the tree answering the questions associated with the root and intermediate vertices. Each answer is identified with an arc directed from the vertex corresponding to the question. The leaves, i.e., the vertices of the tree that have no outgoing arcs, are identified with the measures.

We now present a procedure for constructing such a tree.

Let us have a set F (|F | > 1) of centrality measures and a list of distinguishing graphs G = (G1, . . . , GK). For each couple of distinct measures f_i, f_j ∈ F, a triple (G_k, u, v) is associated with this couple such thatG_k ∈ Gseparatesf_iand f_j on the pair{u, v} of its nodes.

At the initial step of the procedure, we are at the root of the tree. There are no other vertices in the tree yet, and no question is associated with the root.

On eachstep of the procedure, we are at some vertexxof the tree. A standard stepof the procedure is any step, except for “finish.”

The standard step consists of the following.

• Consider the path from the root to x. Each arc of this path corresponds to some condition: it is a response to the question associated with the vertex this arc is directed from. Let Fx ⊆ F be the set of measures satisfying all conditions in this path (if xis the root, thenFx=F).

• If no question has yet been associated with x and

|Fx|>1, then takeGk ∈ Gwith the smallestksuch that Gk separates some measures in Fx on a certain pair of its nodes{u, v}.Choose a suitable pair{u, v}arbitrarily and associate the question “What inequality (equality) holds true for f(u)andf(v)inGk?” withx.Draw two or three arcs directed from xto newly created vertices, depending on which of the three conditionsf(u)> f(v), f(u)< f(v), or f(u) =f(v) are met by the measures inFx. Assign the conditions that can be satisfied by the measures in Fxto these arcs. Mark these arcs as “new.”

• If there is at least one “new” arc directed fromx,choose any such arc, move to the vertex this arc is directed to, and mark this arc as “old.”

• If |Fx| = 1, then associate with xthe unique centrality measure that belongs to Fx.

• If (a) there is no “new” arc directed fromx,while there is a question associated withx,andxis not the tree root or (b)|F_x|= 1,then make one move back fromxtoward the root.

• If there is no “new” arc directed from x, while there is a question associated with x and x is the root, then finish: the decision tree is built. Otherwise, make another standard step from x.

A high-level pseudocode of the standard step is given in Algorithm 1.

C. Extending a Decision Tree

Suppose now that several new centrality measures (for example, recently appeared in the literature) are added to an existing

Algorithm 1 Recursive Construction of a Decision Tree.

Input:F, a set of centrality measures;G= (G1, . . . , GK), a list of distinguishing graphs; a set of distinguishing triples of the form(Gk, u, v).

Result:Decision tree for choosing a centrality measure procedure STANDARDSTEP(x)

if |F_x|>1 then

(G_k, u, v)←distinguishMeasures(F_x) x.associateQuestion(“f(u)vs f(v)inG_k”) x.addNewArcsAndChilds(Gk, u, v)

whilex.hasNewArcs()do arc←x.getAnyNewArc() markArcAsOld(arc) standardStep(arc.target) else

x.associateUniqueMeasure(Fx)

setF. Should we rebuild the decision tree from the beginning?

The answer is no.

First consider the simplest case where only one measuref^′ is added. There are two possibilities:

(a) the new measure f^′ satisfies the same set of conditions (answers to the questions of the current decision tree) as one of the “old” measures;

(b) in the current decision tree applied to the extended set F^′=F ∪ {f^′},there is a vertexxsuch thatf^′∈ F_x^′,but only two arcs are directed fromx,while the third possible answer to the question associated withxis correct forf^′. In case (b), it remains to add a third arc directed from x, i.e., attach the third possible answer to the question associated withx,after which an updated tree is built.

In case (a), we need to distinguishf^′from an “old” measure f_i that satisfies the same set of conditions. Since f_i and f^′ end up on the same leaf of the current decision tree, we need to make this leaf an intermediate vertex by associating a new question with it. To formulate this question, we first check ifG contains a graph that separates fi andf^′ on some pair of its nodes {u, v}. If such a graph G exists, then the required question is: “Which inequality (or equality) is true for f(u) andf(v)inG?” Otherwise, we need a new graph that separatesfi andf^′,so we call the graph generation procedure described in Section II-A for the set of measures{fi, f^′}.

Thus, adding a new measure f^′ leads to the addition of /either a question or a possible answer to an existing question.

So the total number of questions in the questionnaire does not exceed m−1, wherem is the number of measures. If three arcs are directed fromkvertices of the tree, then the number of questions is obviouslym−k−1.

Suppose now thatseveralmeasures have been added toF.

Then we need the following:

1. For each new measuref^′,check the above condition (b);

if it is satisfied, then add the lacking arc to the decision tree;

2. Attach the new measures for which condition (b) is violated to the existing leaves of the tree, as for the above case (a);

(6)

TABLE I

THE TEST SETF⁴⁰OF CENTRALITY MEASURES 1. Betweenness [42] 21. Bonacich [43]

2. Closeness [44] 22. Total communicability [45]

3. Connectivity [46] 23. Communicability(Kij)[47]

4. Connectedness power [34] 24. Walk(Kij)[48]

5. Degree [44] 25. Walk(Kii)[48]

6. Coreness [49] 26. Estrada [47]

7. Bridging [50] 27. Eigencentrality (Dice dissimilarity [51]) 8. PageRank [36] 28. Eigencentrality (Jaccard dissimilar. [51]) 9. Harmonic closeness [52] 29. Closeness (Forest [53])

10. Eccentricity [44] 30. Closeness (Heat [54])

11.p-Means(p=−2)[33] 31. Closeness (logarithmic Forest [55]) 12.p-Means(p= 0)[33] 32. Closeness (logarithmic Walk [56]) 13.p-Means(p= 2)[33] 33. Closeness (logarithmic Heat [54]) 14. Beta current flow [57] 34. Closeness (log-Communicability [58]) 15. Weighted degree [35] 35. Eccentricity (Forest [53])

16. Decaying degree [35] 36. Eccentricity (Heat [54])

17. Decay [59] 37. Eccentricity (logarithmic Forest [55]) 18. Generalized degree [32] 38. Eccentricity (logarithmic Walk [56]) 19. Katz [48] 39. Eccentricity (logarithmic Heat [54]) 20. Eigenvector [60] 40. Eccentricity (log-Communicability [58])

3. Check whetherG contains graphs that separate the measures associated with the same leaf and add the separating questions whenever such graphs exist;

4. For the couples of measures that cannot be distinguished by any graphs in G,generate new graphs as described in Section II-A and add the required questions.

The above procedures of the culling method allow one to build a decision tree for selecting the most appropriate centralities for any set of measures F.

D. Test Survey

In Materials and Methods, we present several parametric classes of centralities based on similarity/dissimilarity measures for network nodes. Representatives of these classes are included in the set F⁴⁰ designed to test the culling method.

This set consists of the centralities listed in Table I.

The setF⁴⁰contains several pairs of closely related centralities. We include them to check whether they can be separated by sufficiently small low-density graphs, or whether large or dense graphs are necessary for this purpose.

The list G of distinguishing graphs generated for F⁴⁰ consists of 13 1-trees shown in Fig. 1. The culling decision tree involves 33 questions in total. The length of a particular survey ranges from 1 to 12, with an average of 6.6.

The survey begins with the question about the centrality of nodes 1 and 2 in G₁ (Fig. 2). If the expert chooses f(2)>

f(1),then the second question concerns the centrality of nodes 0 and 1 in the same graph. If the answer is f(0) > f(1), then the third question is about the centrality of nodes 0 and 4 in G2. Otherwise, the expert who chooses f(0) =f(1) is asked to comparef(0)andf(2)inG2,whereas in the case of answer f(0)< f(1),the third question is about the centrality of nodes 0 and 3 inG3. If the expert choosesf(2) =f(1)for the first question, then the survey ends yielding the Weighted degree [35] centrality.

The beginning of the survey is represented by the tree of Fig. 2. The whole tree is shown in Fig. 3 and Fig. 4.

1 vs 2 in 𝐺1

0 vs 1 in 𝐺1

0 vs 4 in 𝐺2 0 vs 2 in 𝐺2 0 vs 3 in 𝐺3 Weighted

degree

Fig. 2. The beginning of the decision tree for the test setF⁴⁰.

The corresponding survey is implemented using Google Forms [61]. It allows everyone to choose from F⁴⁰ the measure that best suits their concept of centrality. Each survey form (one of which is shown in Fig. 5) contains the instruction:

“Choose the answer that best matches your feelings or your professional judgment... It would be optimal if you consider your specific application when choosing an answer.”

Of particular interest will be the statistics of measure selection by a significant number of experts. The leaders of the early limited statistics are Degree, Closeness (Heat), and Closeness (logarithmic Forest).

Note that a decision tree for the entire set of known centralities can be built by extending the tree (or, in other terminology, the dendrogram) presented in Figures 3 and 4.

E. Combination with the Axiomatic Approach

It can be seen that some questions in the above test survey are quite simple. Say, comparison of the centrality of nodes 1 and 2 inG₁ (Fig. 2) should be clearly in favor of 2. Therefore, the first question along with the Weighted degree index assigning equal centrality to nodes 1 and 2 can be eliminated, which reduces the number of questions in each survey by one.

The second question on the comparison of f(0) and f(1) in G1 is less obvious, but also fairly simple. Indeed, the answer f(0) < f(1) is unlikely to be popular, and most experts would probably choose f(0) > f(1). Based on this, the Eigencentrality measures based on Dice dissimilarity and Jaccard dissimilarity can be eliminated as they suggest f(0)< f(1).The answerf(0) =f(1) may have some basis and F⁴⁰contains nine centralities that match it. These are the well-known Bridging and Betweenness measures and also the Eccentricity measures based on the Shortest path, Forest, Heat, logarithmic Forest, logarithmic Heat, logarithmic Walk, and logarithmic Communicability proximities. The decision tree of Fig. 3 makes the expert realize that choosing either of them implies adopting that nodes 0 and 1 inG1are equally central.

Furthermore, the subsequent question (comparing nodes 0 and 2 inG2) reveals that Bridging states thatf(0)> f(2),which is not very easy to accept;f(0) =f(2)leads to Eccentricity; the remaining seven measures on this branch of the tree suggest f(0)< f(2).

Presumably the most popular answer, f(0)> f(1), to the second question is followed by the comparison of f(0) and

(7)

1 vs 2 in G1

Weighted degree f(1) = f(2)

0 vs 1 in G1

f(2) > f(1)

0 vs 3 in G₃

f(1) > f(0) 0 vs 4 in G₂

f(0) > f(1)

0 vs 2 in G₂ f(1) = f(0)

Eigencentrality (Dice dissimilarity)

f(0) > f(3) Eigencentrality (Jaccard dissimilarity)

f(0) = f(3) 0 vs 5 in G6

f(0) > f(4) PageRank (0.85)

f(4) > f(0) 0 vs 2 in G1 f(4) = f(0)

Bridging f(0) > f(2)

1 vs 4 in G2

f(2) > f(0) Eccentricity

f(0) = f(2)

Betweenness f(4) > f(1)

4 vs 5 in G5 Callout B f(4) = f(1)

3 vs 6 in G8 Callout A

f(0) = f(5)

2 vs 5 in G6

f(0) > f(5)

0 vs 5 in G5 f(5) > f(0)

Degree f(2) > f(0)

Coreness f(0) = f(2)

1 vs 4 in G₅ Callout C

f(5) > f(2)

1 vs 5 in G5

f(2) > f(5)

3 vs 6 in G8

f(0) > f(5)

Connectivity f(0) = f(5)

1 vs 6 in G11

f(5) > f(1) 0 vs 4 in G12

f(1) > f(5)

Closeness (Heat)

f(1) > f(6)

Closeness (Forest)

f(6) > f(1)

3 vs 5 in G9 Callout D

f(0) > f(4)

Generalized degree f(4) > f(0) Connectedness power

f(3) = f(6) Beta current

flow f(3) > f(6)

Fig. 3. The decision tree for the test setF⁴⁰; the callouts A to D are shown in Fig. 4.

f(4) inG2. The rather specific answer f(0)< f(4)leads to the PageRank measure (its peculiarities, which occur for all α ∈ (0; 1), were discussed in Section I and in more detail in [41]),f(0) =f(4)to Degree and Coreness, while f(0)>

f(4)to the remaining 25 measures on this branch of the tree.

The presence of questions, the answer to which is prompted by common sense or an application-specific understanding, suggests the formulation of the corresponding properties of centrality measures in the form of normative conditions (axioms). If the expert deems some of them necessary, this can drastically reduce the set of measures to be compared (cf. [62]). Based on this, we can offer the expert an abbreviated survey on the corresponding reduced set F^′⊂ F.

This leads to a combination of culling with the axiomatic approach. In the next section, we discuss two conditions that can be used in this combination.

F. Abbreviated Surveys

Among the axioms used in the literature to characterize centrality measures, the ones that are particularly appealing are those based on ordinal conditions. These axioms allow us to compare the centrality of certain nodes, but they do not provide explicit guidelines on how to calculate centrality in the general case.

Let us consider two such axioms and show how they can help significantly reduce the culling survey.

First, one may believe that the vector of centrality values of the neighbors(adjacent nodes) of any nodeucarries a lot of information about the centrality of uitself (cf. Consistency in [28]). A refinement of this requirement is that the greater

the centrality values of the neighbors of a node, the greater the centrality of the node itself.

The following axiom is based on this idea. In the case of directed graphs, it appeared in [63]; for undirected graphs, in [35], [64] under the name of Structural consistency.

LetNu denote the set of neighbors of nodeuinG.

Self-consistent monotonicity. If for u, v ∈ V, there is an injection from N_u to N_v such that every element of N_u is, according tof(·), no more central than the corresponding element ofN_v, thenf(u)≤f(v).If,additionally,|N_u|<|N_v| or “no more” is actually “less” at least once,thenf(u)< f(v).

For the subsequent analysis, we choose the following weaker and simpler axiom (cf. [65]).

Self-consistency. If for u, v ∈V,there exists a bijection be- tweenNu andNv such that each element ofNuis, according to f(·), no more central than the corresponding element of Nv, thenf(u)≤f(v).If “no more” is actually “less” at least once, thenf(u)< f(v).

The idea of another axiom [29] is quite different. It allows to compare the centrality of two endpoints of a bridge. Recall that we consider centralities defined for all connected graphs Gwith|V(G)|>3.

Bridge axiom. If the removal of edge {u, v} from E(G) separates G into two connected components with node sets denoted byV_u∋uandV_v∋v (i.e., {u, v} is abridge), then

|V_u|<|V_v| ⇔f(u)< f(v).

A strengthening of this axiom is the Ratio property [46], which holds when for a positivef(·),under the same premise, f(u)/f(v) =|Vu|/|Vv|.

(8)

p-Means (p = -2) Decay (δ = 0.9) Closeness p-Means (p = 2) 3 vs 6

in G8

2 vs 3 in G13

f(3) > f(6)

p-Means (p = 0) f(3) = f(6)

0 vs 5 in G5 f(6) > f(3)

f(3) > f(2)

Harmonic closeness

f(2) = f(3) f(0) > f(5) f(0) = f(5) f(5) > f(0)

4 vs 5 in G5

Eccentricity (Forest)

f(4) = f(5)

2 vs 3 in G3

f(5) > f(4)

3 vs 5 in G6

f(2) > f(3)

0 vs 1 in G7 f(3) > f(2)

Eccentricity (Heat)

f(3) = f(5)

Eccentricity (logWalk)

f(3) > f(5)

Eccentricity (logCommunicability)

f(0) > f(1)

0 vs 6 in G8 f(0) = f(1)

Eccentricity (logHeat)

f(0) > f(6) Eccentricity (logForest)

f(6) > f(0)

Callout A Callout B

1 vs 4 in G5

0 vs 3 in G4

f(1) > f(4)

2 vs 5 in G7 f(4) > f(1)

Decaying degree

f(0) > f(3)

0 vs 4 in G9

f(3) > f(0)

Closeness (logForest)

f(5) > f(2)

Closeness (logHeat) f(2) > f(5)

Closeness (logWalk)

f(0) > f(4)

Closeness (logCommunicability)

f(4) > f(0) Walk(Kij) Bonacich

3 vs 5 in G9

Eigenvector f(5) > f(3)

3 vs 5 in G8

f(3) > f(5)

0 vs 1 in G10

f(5) > f(3) 0 vs 1 in G10 f(3) > f(5)

Communicability (Kij)

f(1) > f(0) Total communicability

f(0) > f(1) 2 vs 6 in G11

f(0) > f(1) 2 vs 6 in G11 f(1) > f(0)

Katz f(6) > f(2)

1 vs 4 in G13

f(2) > f(6) f(6) > f(2) f(2) > f(6)

Estrada f(4) > f(1)

Walk(Kii) f(1) > f(4)

Callout C Callout D

Fig. 4. Callouts A to D in Fig. 3.

The Self-consistency and Bridge axioms are studied in more detail in [41], where a necessary and sufficient condition for Self-consistency and a sufficient condition for the Bridge axiom are obtained. Moreover, it is shown that in the presence of Monotonicity and Equivalence, these axioms are incompatible.

Table II presents the results of verification of the Self-

Fig. 5. A sample survey form in [61].

consistency and Bridge axioms for the centralities in F⁴⁰. It turns out that only four measures satisfy Self-consistency.

Four other measures satisfy the Bridge axiom, including two measures that do so after a slight modification. When a measure violates an axiom, a record of the form ‘−_(u,v)G_k’ indicates that the axiom is violated, in particular, for the ordered pair of nodes (u, v) in graphGk. GraphsG1 toG13

are shown in Fig. 1; two additional graphs found in Table II, G14andG15,are shown in Fig. 6. GraphG^′₅is obtained from G5 by replacingE(G5)withE(G5)∖{{1,2}}.

The following propositions describe the positive results in Table II.

Proposition 1 ([41]). The Generalized degree, Eigenvector, Katz, and Bonacich centralities satisfy Self-consistency.

The second result includes Closeness^∗(logForest) and Closeness^∗(logWalk) introduced in Materials and Methods.

They are based on cutpoint additive distances as distinct from Closeness(logForest) and Closeness(logWalk) included inF⁴⁰. This property enforces the Bridge axiom.

Proposition 2 ([41]). The Closeness, Closeness^∗(logForest), Closeness^∗(logWalk),and Connectivity centralities satisfy the Bridge axiom.

(9)

(a)

(b)

Fig. 6. Graphs used to test the Self-consistency and Bridge axioms.

(a) CaterpillarG14. (b) The Renaissance Florentine families marriage network G15[66].

It follows from Lemma 1 in [41] that other strictly positive transitional measures [67] and cutpoint additive distances also produce centralities that satisfy the Bridge axiom.

Suppose the expert believes that all appropriate centralities must satisfy Self-consistency (or Bridge axiom). This leads to a drastic reduction of the set F⁴⁰ and the culling survey.

The decision trees for the subsets of centralities satisfying the Self-Consistency or Bridge axiom are shown in Fig. 7 and Fig. 8, respectively.

0 vs 4 in 𝐺12

3 vs 5 in 𝐺9

0 vs 1

in 𝐺₁₀ Katz

Bonacich

Eigenvector Generalized degree

Fig. 7. A decision tree for the measures inF⁴⁰satisfying Self-consistency.

0 vs 5 in 𝐺₆

5 vs 6 in 𝐺13

Closeness*

(log Forest) Closeness*

(log Walk)

Closeness Connectivity

Fig. 8. A decision tree for the four measures satisfying the Bridge axiom.

The first one can be extracted from the general tree of Figures 3 and 4. The second tree involves graphs that separate Closeness^∗(logForest) and Closeness^∗(logWalk) from each other and the other measures that satisfy the Bridge axiom.

Some of the questions in the survey of Fig. 7 may not be easy to answer. This is due to the fact that the measures sat-

TABLE II

VERIFICATION OF THESELF-CONSISTENCY ANDBRIDGE AXIOMS FOR THE MEASURES INF⁴⁰. ‘+’INDICATES THAT THE MEASURE SATISFIES

THE AXIOM; ‘+/−’THAT IT IS SATISFIED BY A SLIGHTLY MODIFIED MEASURE; ‘−’THAT THE AXIOM IS VIOLATED. IN THE LATTER CASE,A

GRAPH AND AN ORDERED PAIR OF NODES ARE INDICATED FOR WHICH THE AXIOM IS VIOLATED. “COMM.”IS SHORT FORCOMMUNICABILITY. Centrality measures Self-consistency Bridge axiom

1. Betweenness −_(0,3)G₄ −_(0,5)G′

5

2. Closeness −_(0,3)G₄ +

3. Connectivity −_(0,3)G₄ +

4. Connectedness power −_(0,3)G₄ −_(0,5)G₅

5. Degree −_(0,4)G₂ −_(0,5)G₅

6. Coreness −_(0,1)G₇ −_(0,5)G₅

7. Bridging −_(0,3)G₄ −_(0,5)G₅

8. PageRank −_(0,3)G₄ −_(0,5)G₅

9. Harmonic closeness −_(0,3)G₄ −_(0,5)G₅

10. Eccentricity −_(0,4)G₂ −_(0,5)G₅

11.p-Means, p=−2 −_(0,3)G₄ −_(0,5)G₅ 12.p-Means, p= 0 −_(0,3)G₄ −_(0,5)G₅ 13.p-Means, p= 2 −_(0,3)G₄ −_(0,5)G₅ 14. Beta current flow −_(0,3)G₄ −_(0,5)G₅

15. Weighted degree −_(0,4)G₂ −_(1,2)G₁

16. Decaying degree −_(1,5)G

5 −_(0,5)G

5

17. Decay −_(0,3)G₄ −_(0,5)G₅

18. Generalized degree + −_(0,5)G₅

19. Katz + −_(0,5)G₅

20. Eigenvector + −_(0,5)G

5

21. Bonacich + −_(0,5)G₅

22. Total communicability −_(8,9)G₁₄ −_(0,5)G₅ 23. Communicability(Kij) −_(8,9)G₁₄ −_(0,5)G₅

24. Walk(Kij) −_(8,9)G₁₄ −_(0,5)G₅

25. Walk(Kii) −(Bischeri,Peruzzi)G15 −_(0,5)G₅ 26. Estrada −(Bischeri,Peruzzi)G₁₅ −_(0,5)G₅ 27. Eigencentrality(Dice) −_(3,6)G₉ −_(0,5)G₅ 28. Eigencentrality(Jaccard) −_(3,6)G₉ −_(0,5)G₅ 29. Closeness(Forest) −_(1,5)G₅ −_(0,5)G₅

30. Closeness(Heat) −_(1,5)G₅ −_(0,5)G₅

31. Closeness(logForest) −_(0,3)G₄ +/−_(0,5)G₅ 32. Closeness(logWalk) −_(0,3)G₄ +/−_(0,5)G₅ 33. Closeness(logHeat) −_(0,3)G₄ −_(0,5)G₅ 34. Closeness(logComm.) −_(0,3)G₄ −_(0,5)G₅ 35. Eccentricity(Forest) −_(0,1)G

7 −_(0,5)G

5

36. Eccentricity(Heat) −_(0,1)G₇ −_(0,5)G₅ 37. Eccentricity(logForest) −_(0,1)G₇ −_(0,5)G₅ 38. Eccentricity(logWalk) −_(0,1)G₇ −_(0,5)G₅ 39. Eccentricity(logHeat) −_(0,1)G

7 −_(0,5)G

5

40. Eccentricity(logComm.) −_(0,1)G₇ −_(0,5)G₅

isfying Self-consistency, as Lemma 2 in [41] shows, have the same operating principle based on supporting neighborhood representations, so the difference between them can be quite subtle. The comparison “0 vs 1 inG10” can be replaced by “2 vs 6 inG11” withf(6)> f(2)leading to Katz centrality and f(2)> f(6)yielding Bonacich centrality (which is a variation of Katz centrality) with the parameter values specified in Materials and Methods.

III. DISCUSSION

In this paper, we proposed a method called culling for selecting a point centrality measure based on expert understanding of how a measure suitable for a particular application should act on test graphs. The method consists of the following steps:

(1) Choosing a finite set F of candidate measures, among which there are no rank equivalent measures;

(10)

(2) Generating a list of small (if possible) test graphs that separate all measures inF;

(3) Compiling a decision-tree survey composed of questions about the comparative centrality of test nodes in the test graphs;

(4) Completing the survey to obtain a centrality measure consistent with the expert’s responses.

The developed procedures enable one to apply the culling method to any finite set F of centrality measures.

The conclusions are as follows.

1. Centrality measures based on rational ideas and satisfying reasonable axioms may behave counterintuitively (in the logic of a particular application or even in general) on small test graphs. This was seen in the examples of Weighted degree, PageRank, dissimilarity based Eigencentralities, and some other measures. Therefore, examining the performance of measures on test graphs is a good complement to typological, axiomatic, and data-driven methods of analysis. Poor results of such tests indicate inherent weaknesses in the measures and leave little chance for high-quality processing of real data.

2. The culling method relies on the expert’s ability to judge with some certainty which of two nodes in a fairly simple graph has a higher centrality. We assume that this ability can be based upon both the findings of prior studies and cumulative experience. Moreover, the certainty of such a judgment is often stronger than the appeal of the heuristics behind a measure.

Given a set of centralities, the culling method generates a decision-tree survey for selecting the measure that best fits the expert concept of centrality. The implementation of the method varies depending on the choice of graphs for separating measures. In this paper, we studied a test set F⁴⁰ consisting of forty centralities, which included, among others, several kernel-based closeness, eccentricity and other measures introduced in Materials and Methods. It contained many couples of closely related measures, and we were interested in whether large or dense graphs are necessary to separate them. The answer is no: we opted for1-trees (unicycles), and 13 graphs of this class of order 8 (just one graph) or lower with node degrees at most4were sufficient to separate all the measures inF⁴⁰. For the case of an arbitrary set of measures, a general procedure was proposed.

3. If a set of measures F is updated to a new set F^′ ⊃ F, then the culling method adapts by extending the existing decision tree to accommodate the changes.

4. The culling method generates a collection of triples in the form of a graph and a pair of its nodes that distinguish each measure in the set F from others. The centrality inequalities/equalities on these triples form the “profile” of a measure characterizing its behavior and helping to decide whether to accept or discard it. They can also be used to classify measures. In this way, culling allows one to reveal some perplexing features of centralities. Examples have been mentioned in Item 1 of the conclusions. As another example consider the Bridging centrality [50]. The expert who chooses it must approve the following claims of this measure (see Fig. 1): f(0)> f(2)inG2;f(3)> f(4)inG4;f(5)> f(3) in G₆ and G₇; f(5)> f(0)> f(1) in G₈; f(4) > f(2) in

G₉; f(5)> f(1)inG₁₁, andf(0) =f(1) inG₁.

5. A decision tree for the entire set of known centralities can be built by extending the tree presented above (Fig- ures 3 and 4). Such a dendrogram may be useful (along with correlation-based methods [3]) for hierarchical clustering of measures. This is suggested by the fact that the tree for F⁴⁰ contains several subtrees whose leaves correspond to closely related centralities. For example, Callout B (Fig. 4) consists of Eccentricity measures; Callout C comprises Closeness measures and the related Decaying degree; three measures satisfying Self-consistency belong to Callout D, while the fourth one is a “sister” of it. The reason for this is that the survey begins with the simplest graphs, while subtle differences between kindred measures are distinguished by more complex graphs that appear later in the survey.

6. On the other hand, relying solely on culling for selecting a single output measure from an extensive set of input measures may not be entirely robust. Indeed, the corresponding survey may contain rather difficult questions, the answers to which may not be confident enough, which can lead to an unstable result. To enhance reliability, option “not sure” can be added to the survey forms, potentially leading to multiple outcome.

The choice among the resulting measures can be made by analyzing their application to real networks and checking their adherence to credible axioms. Moreover, the expert may take the survey multiple times to obtain a comprehensive picture by using relevant centralities of several types (cf. [12], [21]).

7. A promising application of the culling method is to combine it with a normative approach by constructing decision trees for subsets of measures that satisfy the most credible axioms. The paper presents short surveys for the measures in F⁴⁰ (or their slight modifications) that meet the Self- consistency or Bridge axioms.

Thus, the recommended approach to address the question posed in the paper’s title is to utilize a strategy that encom- passes several elements: verification of compliance with the most important axioms, culling, consideration of computa- tional complexity factors, experimental studies involving sta- tistical methods and machine learning, as well as a typological perspective that connects network dynamics to the underlying assumptions and concepts of each centrality measure.

Among these components, the culling method stands out as one of the least labor-intensive and time-consuming. It relies on expert knowledge while its procedures are transparent.

This study identifies several directions for future research.

Here are some of those topics:

• Explore larger culling surveys;

• Collect and analyze statistics of survey results from diverse experts to obtain a measure attractiveness rating;

• Construct decision trees for subsets of measures that satisfy crucial axioms, such as monotonicity conditions, and their combinations;

• Adapt the culling method to centrality measures designed for directed networks;

• Utilize decision tree theory [68] to develop variations of the proposed procedure for constructing culling trees;

• Enhance the flexibility of culling surveys by (a) allowing the “not sure” option, (b) employing all graphs in G

(11)

that separate specific pairs of measures, (c) allowing experts to utilize their own test graphs to select measures, (d) enabling experts to report confidence levels in their answers;

• Create an interactive web application implementing culling surveys for any chosen subset of centrality measures;

• Apply the culling approach to the problem of selecting scoring methods for unbalanced tournaments [63].

IV. MATERIALS ANDMETHODS

A. Closeness and Eccentricity Centralities Induced by Graph Kernels

In this section, we present several new classes of centrality measures used in Test Survey and subsequent sections. More information about them can be found in [41], [69].

Let d(u, v) be theshortest path distance between nodesu andvof a graphG, i.e., the length of a shortest path between u andv.Two popular distance based centrality measures are the [shortest path] Closeness[44]

f(u) = X

v∈V

d(u, v)−1

, u∈V (2)

and [shortest path] Eccentricity[44]

f(u) = (max

v∈V d(u, v))⁻¹, u∈V. (3) General classes of Closeness and Eccentricity centralities are defined by (2) and (3) with d(u, v) being arbitrary distances for graph nodes. In the literature, several classes of such distances and, more generally, dissimilarity measures have been proposed (see, e.g., [69]–[71]). Many of them are defined via graph kernels. Let us consider four classes of them.

1. The parametric Katz [48] kernels (also referred to as Walk [72] orNeumann diffusion[71]kernels) are defined as

P^Walk(t) =

∞

X

k=0

(tA)^k= (I−tA)⁻¹ (4) with 0 < t < (ρ(A))⁻¹, whereA = (aij) is the adjacency matrix ofGandρ(A)is the spectral radius ofA.

2. TheCommunicability kernels [47] are P^Comm(t) =

∞

X

k=0

(tA)^k

k! = exp(tA), t >0.

Two other classes are defined similarly via the Laplacian matrix

L= diag(A1)−A,

where 1 = (1, . . . ,1)^T and diag(x) is the diagonal matrix with vector xon the diagonal.

3. TheForest kernels, orregularized Laplacian kernels[73]

are

P^For(t) = (I+tL)⁻¹, where t >0. (5) 4. TheHeat kernelsare the Laplacian exponential diffusion kernels [54]

P^Heat(t) =

∞

X

k=0

(−tL)^k

k! = exp(−tL), t >0.

By Schoenberg’s theorem [74], if matrix P = (p_uv) is a kernel (i.e., is positive semidefinite), then it produces a Euclidean distanced(u, v)by means of the transformation

d(u, v) = ¹₂(puu+pvv−puv−pvu)¹₂

, u, v∈V, (6) where ¹₂ is the scaling factor.

Thus, all Walk, Communicability, Forest, and Heat kernels with appropriate parameter values provide distances of the form (6) whose substitution in (2) and (3) yields Closeness and Eccentricity centralities. We will denote them by Close- ness(Kernel) and Eccentricity(Kernel) with the corresponding kernels substituted.

Furthermore, if P_n×n = (p_uv) determines a proximity measure(which means that for anyx, y, z∈V, pxy+pxz− pyz ≤pxx, and the inequality is strict whenever z = y and y̸=x), then [75] transformation

d(u, v) =¹₂(puu+pvv−puv−pvu), u, v∈V (7) provides a distance function that satisfies the axioms of a metric. The Forest kernel with anyt >0produces a proximity measure, while kernels in the remaining three classes do so whentis sufficiently small [69]. The centralities obtained from aProximitymeasure by transformation (7) and substitution of the resulting distance into (2) and (3) are denoted in the present paper byCloseness^∗(Proximity)andEccentricity^∗(Proximity), respectively.

Moreover, if P represents a strictly positive transitional measure onG(i.e., pxypyz ≤pxzpyy for all nodesx, y,and z,with pxypyz =pxzpyy whenever every path in Gfrom x toz visitsy), then transformation

ˆ

puv = lnpuv, u, v∈V (8) produces [67], [70] a proximity measure. In this case, (7) applied toPˆ = (ˆp_uv)reduces to

d(u, v) = ¹₂(lnp_uu+ lnp_vv−lnp_uv−lnp_vu) (9) and generates [70] a cutpoint additive distance d(u, v), viz., such a distance that d(u, v) +d(v, w) = d(u, w) whenever v is a cutpoint between u and w in G (or, equivalently, whenever all paths connectinguandwvisitv). The centralities obtained from any Transitional_Measure by transformation (9) and substitution of the resulting distance into (2) and (3) are denoted by Closeness^∗(logTransitional_Measure)and Eccentricity^∗(logTransitional_Measure), respectively.

Since the Walk and Forest kernels determine [67] strictly positive transitional measures, transformation (9) applied to them generates cutpoint additive distances. Substituting them into (2) and (3) produces Closeness^∗(logForest) and Closeness^∗(logWalk), as well as the correspondingEccentric- ity^∗(·)measures.

Thus, based on the above properties, we define Closeness and Eccentricity centralities obtained by substituting the:

• Forest kernel;

• Heat kernel;

• logarithmic Forest kernel;

• logarithmic Walk kernel;

• logarithmic Heat kernel, and