• Tidak ada hasil yang ditemukan

PDF COL866: Foundations of Data Science - IIT Delhi

N/A
N/A
Protected

Academic year: 2024

Membagikan "PDF COL866: Foundations of Data Science - IIT Delhi"

Copied!
11
0
0

Teks penuh

(1)

Ragesh Jaiswal, IITD

(2)

Algorithms for Massive Data Problems: Streaming algorithm

(3)

Distinct elements in a stream

Problem

Design a streaming algorithm for computing the number of distinct ai’s in the sequencea1, ...,an.

Algorithm

- LetM>m be a prime number - LetH ={ha,b|a,b∈ {0,1, ...,M−1}}

Distinct(a1, ...,an) - Pick a randomh fromH - Initialisemin=h(a1)

- Fori >1: updatemintoh(ai) iffh(ai)<min - return(minM )

Theorem

Let d be the number of distinct elements. With probability at least

(2/3−d/M), we have d6minM ≤6d where M and min are as defined

in the algorithm.

(4)

Streaming Algorithms

Distinct elements in a stream

Algorithm

- LetM>mbe a prime number - LetH={ha,b|a,b∈ {0,1, ...,M1}}

Distinct(a1, ...,an) - Pick a randomhfromH - Initialisemin=h(a1)

- Fori >1: updatemintoh(ai) iffh(ai)<min - return(minM)

Theorem

Let d be the number of distinct elements. With probability at least (2/3d/M), we have d6 minM 6d where M and min are as defined in the algorithm.

Proof sketch

Letb1, ...,bd be the distinct values that appear in the input.

LetS={h(b1), ....,h(bd)}andmin= min(S).

Claim 1: Pr[minM >6d]<16+Md.

(5)

Distinct elements in a stream

Algorithm

- LetM>mbe a prime number - LetH={ha,b|a,b∈ {0,1, ...,M1}}

Distinct(a1, ...,an) - Pick a randomhfromH - Initialisemin=h(a1)

- Fori>1: updatemintoh(ai) iffh(ai)<min - return(minM )

Theorem

Let d be the number of distinct elements. With probability at least

(2/3d/M), we have d6minM 6d where M and min are as defined

in the algorithm.

Proof sketch

Letb1, ...,bdbe the distinct values that appear in the input.

LetS={h(b1), ....,h(bd)}andmin= min(S).

Claim 1:Pr[minM >6d]<16+Md. Claim 2:Pr[[minM <d6]<16.

The theorem follows from the above two claims.

(6)

Streaming Algorithms

Counting number of occurences

Problem

Design an algorithm for counting the number of occurrences of a given element in the stream.

This can clearly be done using a deterministic algorithm that uses O(logn) space.

Question: Can a deterministic algorithm do any better?

(7)

Counting number of occurences

Problem

Design an algorithm for counting the number of occurrences of a given element in the stream.

This can clearly be done using a deterministic algorithm that uses O(logn) space.

Question: Can a deterministic algorithm do any better?

Question: Suppose you are allowed some slack with respect to the answer (say constant factor) and allowed randomness. Can you do better?

Here is a streaming algorithm:

Start withk= 0.

On every occurrence of the given element increment the counter with probability 1/2k.

(This is to keep the value of k so that2k is approximately the count.)

At the end of the stream, output 2k−1.

(8)

Streaming Algorithms

Counting number of occurences

Problem

Design an algorithm for counting the number of occurrences of a given element in the stream.

This can clearly be done using a deterministic algorithm that uses O(logn) space.

Question: Can a deterministic algorithm do any better?

Question: Suppose you are allowed some slack with respect to the answer (say constant factor) and allowed randomness. Can you do better?

Here is a streaming algorithm:

Start withk= 0.

On every occurrence of the given element increment the counter with probability 1/2k.

(This is to keep the value of k so that2k is approximately the count.)

At the end of the stream, output 2k1.

Claim: The above streaming algorithm usesO(log logn) space and in expectation outputs an answer within a factor 2 of the correct count.

(9)

Majority and frequent elements

Problem

Design an algorithm for finding the majority element (in case there exists one).

We want a time the element that appears in the stream more thann/2 times.

Claim Any deterministic algorithm requires Ω(min(n,m)) space.

We can do better if we relax our requirement in the following manner:

In case there is a majority element, then the algorithm should output this element.

In case there is no majority element, the algorithm is allowed to output any element.

(10)

Streaming Algorithms

Majority and frequent elements

Problem

Design an algorithm for finding the majority element (in case there exists one).

Claim Any deterministic algorithm requires Ω(min(n,m)) space.

We can do better if we relax our requirement in the following manner:

In case there is a majority element, then the algorithm should output this element.

In case there is no majority element, the algorithm is allowed to output any element.

Algorithm

Majority(a1, ...,an) -sa1andctr1 - Fori= 2 ton

- if (ai=s)ctrctr+ 1 - elseif (ctr6= 0)ctrctr1 - else{sai;ctr1}

- return(s)

(11)

Referensi

Dokumen terkait

First we run a quantum counting ( Q# ) procedure to find the number of repeated occurrences of M in the database DB. Finally a projective measurement in the computational basis

8.1 Approaching Data and the Object of Study, Mental Processes In this chapter, the key theme is that of mathematical representations of Matte Blanco’s bi-logic,

Give natural deduction proofs for the following propositional formulas under the premises given in each case.. Write each proof step on a numbered line of its own, and indicate for each

Isomorphism testing of constant-free read-once polynomials can be done in time polynomial in the number of variables in the input formulas.. Given Lemma 12, the algorithm is obvious: on

The course focuses on a set of countries, which followed clearly diverse trajectories and patterns of growth to achieve their industrial transition and compares the outcomes of these

The course focuses on a set of countries, which followed clearly diverse trajectories and patterns of growth to achieve their industrial transition and compares the outcomes of these

Partition Oriented Frame Based Fair Scheduler POFBFS: Our third algorithm is a frame-based work-conserving proportional fair scheduler that allows much higher reduction in the number of

The main components of the following architecture are: Counter and Carry Units, G Transformation, Next State Function, Extraction Scheme, Constants Unit and the Data Transformation