(1)NPTEL MOOC: Algorithms for Big Data Date: July 31st, 2016 Assignment 3 Total Marks: 20 Name: Roll No.: Question Total Marks Score: 1

(1)

NPTEL MOOC: Algorithms for Big Data Date: July 31st, 2016

Assignment 3 Total Marks: 20

Name: Roll No.:

Question: 1 2 3 4 5 6 7 8 9 10 11 12 Total

Marks: 2 2 2 2 2 2 2 1 1 2 1 1 20

Score:

1. (2 marks) Consider a random walk on a linear arrangement of nodes as described in the first segment. Suppose instead of starting at node 0, the random walker starts at noden/2. The expected number of steps the walker will take before reaching thenth node is:

A. Θ(n).

B. Θ(nlogn).

C. Θ(n²).

D. Θ(n³).

2. (2 marks) What is the cover time of a complete graph on nvertices (i.e., a graph onn vertices with edges between every pair of vertices)? Hint: use ideas from the coupon collector’s problem.

A. Θ(n).

B. Θ(nlogn).

C. Θ(n²).

D. Θ(n³).

3. (2 marks) In the context of streaming algorithms what does (, δ) approximation mean?

A. For all >0 there is a δsuch that the algorithm will give aapproximate solution.

B. The algorithm achieves an error of less thanif we sample at leastδfraction of the entire data.

C. The algorithm achieves an error of less than with probability at least 1−δ.

D. None of these.

4. (2 marks) Consider the Markov chain defined on two states 0 and 1 with the following transition matrix.

P =

p 1−p

1−p p

What is the stationary distribution probability of state 0?

A. 0 B. 1/2

C. 1/3 D. 1/4

5. (2 marks) Consider the Markov chain defined in Problem 4. Let p_t denote the probability that the Markov chain that started in state 0 was back in state 0 at time stept. (Hint: think about the two ways in which the Markov chain can reach state 0

A. piftis odd, elsep−1

Page 1 of 3

(2)

B. pt=p²_t−2+ (1−pt−2)² C. pt=p²_t−1+ (1−pt−1)²

D. none of these.

6. (2 marks) We want to samplekitems uniformly at random (without replacement) from a stream. Normally, we will use the reservoir sampling algorithm. Suppose we modify reservoir sampling in the following manner due to some engineering requirement. For each new data point seen, we shall choose one random item from our current sample of kitems and replace it with the new data point.

A. This is a valid sampling scheme producing a uniform sample of sizen.

B. This sampling scheme is biased towards old elements.

C. This Sampling scheme is biased towards new elements.

D. Nothing can be said about this scheme.

7. (2 marks) If state iis recurrent and stateidoes not communicate with statej (i.e., there is either no path fromito j or no path fromj toi), then what can you comment onP_ij?

A. Pij= 0.

B. Pij≥0.

C. Pij>0.

D. P_ij= 1.

8. (1 mark) Ann×nmatrixP is called stochastic if all entries are non-negative and if the sum of the entries in each row is 1. It is called doubly stochastic if, additionally, the sum of the entries in each column is 1. (Hint: you may want to construct a simple doubly stochastic matrix and test out the choices below.)

A. Uniform distribution is a stationary distribution for doubly stochastic matrix.

B. Uniform distribution is not a stationary distribution for doubly stochastic matrix.

C. There is no stationary distribution for doubly stochastic matrix.

D. None of these.

9. (1 mark) A professor continually gives exams to his students. He can give three possible types of exams, and his class is graded as either having done well or badly. Letpi denote the probability that the class does well on a type iexam, and suppose that p1= 0.3, p2 = 0.6, and p3 = 0.9. If the class does well on an exam, then the next exam is equally likely to be any of the three types. If the class does badly, then the next exam is always type 1. What proportion of exams over the years are typei,i= 1,2,3?

A. Type 1 = 3/7, Type 2 = 1/7 and Type 3 = 3/7 B. Type 1 = 3/7, Type 2 = 2/7 and Type 3 = 2/7 C. Type 1 = 4/7, Type 2 = 2/7 and Type 3 = 1/7 D. None of the above

10. (2 marks) Consider the Approximate Median algorithm in which we usedt ∈ Θ(⁻²log(1/δ)) samples. If we have the input stream which is sorted then,

A. our algorithm will fail because the sorted order is the worst case input.

B. our algorithm is oblivious to the order of the input, and therefore succeed in producing an -approximate median with1−δ probability.

C. our algorithm will need 2t samples to be drawn.

D. None of these.

Solution: For any randomized algorithm there is no bad input, it only has bad luck.

11. (1 mark) If we have a stream (of unknown size) of numbers which is sorted, can we have a streaming algorithm, which has memory budget of some fixed valuet, which always produces the correct median?

Name: Roll No.: Page 2 of 3

(3)

A. Yes B. No

Solution: Since our sample cannot exceedt values, we cannot always produce the correct result. During the streaming, if we discard some new elementx, then there always exists a stream in whichxis the median, and hence we would give the wrong answer.

12. (1 mark) A fair coin is tossed repeatedly and tossing stops the moment we get 3 consecutive tails. For example, a possible sequence of tosses could look like HHTTHTHHTHHTTT. What is the expected number of tosses? (Hint:

one can model this as a Markov Chain. Feel free to discuss your ideas in the forum. If you discover the answer, you can drop hints in the forum, but don’t just give away the solution.)

A. 3 B. 6 C. 8 D. 14

Solution: model this as a Markov chain.

Name: Roll No.: Page 3 of 3