• Tidak ada hasil yang ditemukan

Lecture 18 Parallel Computing and Big Data

N/A
N/A
Protected

Academic year: 2025

Membagikan "Lecture 18 Parallel Computing and Big Data"

Copied!
17
0
0

Teks penuh

(1)

Lecture 18

Parallel Computing and Big Data

Course: Algorithms for Big Data Instructor: Hossein Jowhari

Department of Computer Science and Statistics Faculty of Mathematics

K. N. Toosi University of Technology

Spring 2021

(2)

Parallel Processing: Basic Idea

Speeding upvia breaking large tasks into (independent) sub-tasks and executing them in parallel.

▸ Sequential time (stime):

Time to finish the job in sequential manner

▸ Parallel time (ptime, wall-clock time):

Time to finish the job using parallel processes

(3)

Models for parallel computing

▸ PRAM (shared memory)

▸ Multi-threading Java, Python, ...

▸ OpenMP C,C++, Fortran

▸ CUDA (GPU)

▸ Message Passing (local memory)

▸ MPI (C++)

▸ MapReduce, Hadoop

(4)

Theoretical Models for Parallel Computing

▸ Circuit Complexity

▸ PRAM (Parallel Random Access Model)

▸ MPC (Massively Parallel Computing)

▸ ...

(5)

Circuit Complexity

aka Non-uniform complexity

How many AND, OR, NOT gates are needed to compute a boolean function f∶ {0,1}n→ {0,1}?

total number of circuits : time complexity depth of the circuit: parallel time complexity

(6)

[Theorem] For any boolean function f ∶ {0,1}n→ {0,1}, there is a circuit with depth 4and 2n+1 gates.

Related Complexity Classes:

▸ AC: Functions solvable with (unbounded fan-in) circuits with O(logO(1)n)depth and nO(1) number of gates.

unbounded fan-in gate: a gate with multiple inputs

▸ AC0: Functions solvable with (unbounded fan-in) circuits with O(1) depth and nO(1) number of gates.

▸ NC: Functions solvable with (bounded fan-in) circuits with O(logO(1)n)depth and nO(1) number of gates.

▸ NCk: Functions solvable with (bounded fan-in) circuits with O(logO(k)n) depth and nO(1) number of gates.

(7)

Circuit Complexity

Examples:

▸ AND ofn bits: f(x1, . . . , xn) =x1∧. . .∧xn AC: (depth= 2, number of gates = 1)

NC: (depth=logn, number of gates = n−1)

(8)

Circuit Complexity

Examples:

▸ XOR of n bits: f(x1, . . . , xn) =x1+. . .+xn mod 2 AC: (depth=O(log lognlogn ), number of gates = nO(1)) NC: (depth=O(logn), number of gates = O(n))

Planar Perfect Matching∈N C Perfect Matching∈N C?

(9)

Abstract Theoretical Models: PRAM

▸ Exclusive read exclusive write (EREW)

▸ Concurrent read exclusive write (CREW)

▸ Exclusive read concurrent write (ERCW)

▸ Concurrent read concurrent write (CRCW)

(10)

∀k∈ [n], Sk=∑k

i=1

a[i] n processors

logn parallel time

(11)

PRAM: Merging sorted arrays

n processors: P1, . . . , Pn

Pi binary searches array B to find how many numbers inB are greater thanA[i]. Requires logn comparisons.

Having this information, the processorPi puts A[i] in its right position in the output array.

logn parallel time

(12)

PRAM (CRCW): find max

(n2) processors. n distinct numbers.

For i∈ [n], processor Pi writes m[i] =T.

Processor Pij compares a[i] and a[j]. Ifa[i] <a[j] writes m[i] =F

O(1)parallel time

(13)

Abstract Theoretical Models: MPC

Massively Parallel Computing

▸ N input data size, M machines

(14)

At first, input data is distributed among machines. Each machine gets a part of the input. We assume S=o(N). Typically S=O(N) for some<1.

If S≥N, we could give all data to a single machine which can solve the entire problem locally.

Computation is done in rounds.

In each round, the machines do local computations on their share of the input. Then in a synchronous manner they communicate with each other.

Each machine transmits at most O(S) words in each round!

With these assumptions, the real bottleneck is the number of rounds.

(15)

MPC for graph problems

▸ Big graphG= (V, E)with m edges onn vertices

▸ Input size N =O(m)

▸ Each machine gets a subset of the edges (repetitions?)

▸ We assume number of machinesM =O(mS)

▸ Strongly super-linear memory S=O(n1+)when ∈ (0,1]

▸ Near linear memoryS=O˜(n)

▸ Strongly sub-linear memoryS =O(n)when <1

(16)

Broadcasting trees

Every machines sends at most S words in each round.

Suppose machine number 0 wants to broadcastn words to all M machines. How many rounds does it take?

SupposeS=n1+ (super-linear case). If M ≥n broadcasting cannot be done in one round.

using a broadcast tree, this can be done inO(1) rounds.

(17)

In the first round, machine 0sends n words to machines 1, . . . , n. (second level of the tree)

In the second round, each machine in the second level sendsn words to n new machines (third level)

AfterO(1) rounds, all M ≤n2 machines have received the words.

Referensi

Dokumen terkait

Nevertheless, the total time taken by both the sequential and parallel codes is around three times that of the same codes for the free surface particles version, showing that the

Welcome to the Third International Workshop on Distributed Embedded systems (DEM-2016), which is held in conjunction with the 11th International Conference on P2P, Parallel,

Once an organization has an analytic sandbox set up and has imple- mented enterprise analytic data sets, it will be able to develop analytic processes and models more quickly and

The method consists of both processes: Task Scheduling process to assign deadline and completion time to cloudlets tasks and secondly, Load Balancing process to perform migration of

For tile systems outputting shapes in this manner, we can show that the total number of tiles not tile types in the produced shape is closely connected to the time complexity of the

‘Big data’ refers to large and unstructured datasets characterised by six main components known as the six Vs: volume, velocity, variety, veracity, validity, and volatility.502 These

It describe the optimal solutions using Hadoop cluster, Hadoop Distributed File System HDFS for storage and Map Reduce programming framework for parallel processing to process large

The experimental results illustrated the introduced algorithm reduced waiting time and it could be utilized as an effective technique for scheduling parallel jobs in the cloud