• Tidak ada hasil yang ditemukan

The external memory soft heap data structure is useful for finding exact and approximate medians, and for approximate sorting [19]. Each of these computations take O(N/B) I/Os:

1. To compute the median in a set of N numbers, insert the numbers in an EMSH with error rate . Next, perform N Deletemins. The largest number e deleted has a rank between N and 2N. Partition the set using e as the pivot in O(N/B) I/Os. We can now recurse with a partition of size at most max{,(1−2)}N. The median can be found in O(N/B) I/Os. This is an alternative to the algorithm in [79] which also requires O(N/B) I/Os.

2. To approximately sort N items, insert them into an EMSH with error rate , and perform N Deletemins consecutively. Each element can form an inversion with at mostN of the items remaining in the EMSH at the time of its deletion. The output sequence, therefore, has at most N2 inversions. We can also use EMSH to near sort N numbers in O(N/B) I/Os such that the rank of each number in the output sequence differs from its true rank by at most N; the in-core algorithm given in [19] suffices.

Chapter 4

The Minimum Cut Problem

4.1 Introduction

The minimum cut problem on an undirected unweighted graph is to partition the vertices into two sets while minimizing the number of edges from one side of the partition to the other. This is an important combinatorial optimisation problem. Efficient in-core and parallel algorithms for the problem are known. For a recent survey see [14, 52, 53, 69]. This problem has not been explored much from the perspective of massive data sets. However, it is shown in [3, 77] that the minimum cut can be computed in a polylogarithmic number of passes using only a polylogarithmic sized main memory on the streaming and sort model.

In this chapter we design an external memory algorithm for the problem on an undi- rected unweighted graph. We further use this algorithm for computing a data structure which represents all cuts of size at most α times the size of the minimum cut, where α < 3/2. The data structure answers queries of the following form: A cut X (defined by a vertex partition) is given; find whether X is of size at most α times the size of the minimum cut. We also propose a randomised algorithm that is based on our deterministic algorithm, and improves the I/O complexity. An approximate minimum cut algorithm which performs fewer I/Os than our exact algorithm is also presented.

4.1.1 Definitions

For an undirected unweighted graph G = (V, E), a cut X = (S, V −S) is defined as a partition of the vertices of the graph into two nonempty setsS and V −S. An edge with one endpoint in S and the other endpoint in (V −S) is called a crossing edge ofX. The valuec of the cutX is the total number of crossing edges of X.

The minimum cut (mincut) problem is to find a cut of minimum value. On un- weighted graphs, the minimum cut problem is sometimes referred to as the edge-connectivity problem. We assume that the input graph is connected, since otherwise the problem is trivial and can be computed by any connected components algorithm. A cut in G is α-minimum, for α >0, if its value is at most α times the minimum cut value of G.

A tree packing is a set of spanning trees, each with a weight assigned to it, such that the total weight of the trees containing a given edge is at most one. The value of a tree packing is the total weight of the trees in it. A maximum tree packing is a tree packing of largest value. (When there is no ambiguity, will use “maximum tree packing” to refer also to the value of a maximum tree packing, and a “mincut” to the value of a mincut.) A graph Gis called a δ-fat graph for δ >0, if the maximum tree packing of Gis at least

(1+δ)c 2 [52].

After [52], we say that a cutX k-respects a treeT (equivalently,T k-constrains X), if E[X]∩E[T]≤k, whereE(X) is the set of crossing edges ofX, and E(T) is the set of edges of T.

4.1.2 Previous Results

Several approaches have been tried in designing in-core algorithms for the mincut problem.

See the results in [33, 35, 40, 43, 54, 52, 68, 70, 71, 80]. Significant progress has been made in designing parallel algorithms as well [38, 51, 52, 53]. The mincut problem on weighted directed graphs is shown to be P-complete for LOGSPACE reductions [38, 51].

For weighted undirected graphs, the problem is shown to be in NC [53]. We do not know any previous result for this problem on the external memory model.

However, the current best deterministic in-core algorithm on an unweighted graph computes the minimum cut in O(E+c2V log(V /c)) time, and was given by Gabow [35].

Gabow’s algorithm uses the matroid characterisation of the minimum cut problem. Ac-

cording to this characterisation, the minimum cut in a graphG is equal to the maximum number of disjoint spanning trees in G. The algorithm computes the minimum cut on the given undirected graph in two phases. In first phase, it partitions the edges E into spanning forestsFi,i= 1, . . . , N inO(E) time using the algorithm given in [68]. This algo- rithm requiresO(1) I/O for each edge in the external memory model. In the second phase, the algorithm computes maximum number of disjoint spanning trees inO(c2V log(N/c)) time. To compute maximum number of disjoint spanning trees, augmenting paths like in flow based algorithms are computed. The procedures used in computing augmenting paths access the edges randomly and require O(1) I/Os for each edge in the external memory model. Thus, this algorithm, when executed on the external memory model, performs O(E+c2V log(V /c)) I/Os.

4.1.3 Our Results

We present a minimum cut algorithm that runs in O(c(MSF(V, E) logE + VBSort(V))) I/Os, and performs better on dense graphs than the algorithm of [35], which requires O(E+c2V log(V /c)) I/Os, where MSF(V, E) is the number of I/Os required in computing a minimum spanning tree. For a δ-fat graph, our algorithm computes a minimum cut in O(c(MSF(V, E) logE + Sort(E))) I/Os. Furthermore, we use our algorithm to construct a data structure that represents all α-minimum cuts, for α < 3/2. The construction of the data structure requires an additionalO(Sort(k)) I/Os, wherek is the total number of α-minimum cuts. Our data structure answers anα-minimum cut query inO(V /B) I/Os.

The query is to verify whether a given cut (defined by a vertex partition), isα-minimum or not.

Next, we show that the minimum cut problem can be computed with high probability inO(c·MSF(V, E) logE+Sort(E) log2V+VBSort(V) logV) I/Os. We also present a (2+)- minimum cut algorithm that requiresO((E/V)MSF(V, E)) I/Os and performs better on sparse graphs than our exact minimum cut algorithm.

All our results are summarised in Table 4.1.

Problems on EM model Lower/Upper Bounds mincut of an undirected Ω(EVSort(V))

unweighted graph O(c(MSF(V, E) logE+VBSort(V))) mincut of a δ-fat graph O(c(MSF(V, E) logE+ Sort(E))) Monte Carlo mincut algorithm O(c·MSF(V, E) logE+

with probability 1−1/V Sort(E) log2V +VBSort(V) logV) (2 +)-approx. mincut O((E/V)MSF(V, E))

Data Structure for O(c(MSF(V, E) logE+VBSort(V)) + Sort(k)) allα-mincuts answers a query in O(V /B) I/Os

Table 4.1: Our Results

4.1.4 Organisation of This Chapter

The rest of the chapter is organised as follows. In Section 4.2, we define some notations used in this chapter. In Section 4.3, we give a lower bound result for the minimum cut problem. In Section 4.4, we present an external memory algorithm for the minimum cut problem. Section 4.5 describes the construction of a data structure that stores all α-minimum cuts, for α < 3/2. In Section 4.6, we improve the I/O complexity of our minimum cut algorithm by using randomisation. In Section 4.7, we discuss a special class of graphs for which a minimum cut can be computed very efficiently. In Section 4.8, we present a (2 +)-minimum cut algorithm.