High Performance Computing (HPC)

(1)

Dr. Noha MM.

Computer Science Department Thebes Academy

High Performance Computing (HPC)

Lecture 2

(2)

Concepts Covered

Parallel Processing Concurrency

Multiprogramming Multiprocessing

Multitasking

Distributed Systems

(3)

Dynamic Random Access Memory

(DRAM) Control

Unit Registers

Arith. && Logic

CPU

Input/output (I/O) Controller

Sequential Computer Architecture

• Computer Performance Metrics

 Processor speed:

 RAM Bandwidth: rate at which data is fetched from RAM ( Bytes or words per cycle)

 Latency: Time taken by the memory to get

instructions from processor and return data to it

Data Stream Instruction Stream

(4)

Parallel Processing – What is it?

 A parallel computer is a computer system that uses multiple processing elements

simultaneously in a cooperative manner to solve a computational problem

 Parallel processing includes techniques and

technologies that make it possible to compute in parallel

❍ Hardware, networks, operating systems, parallel

libraries, languages, compilers, algorithms, tools, …

(5)

Concurrency

 Consider multiple tasks to be executed in a computer simultaneously.

 Tasks are concurrent with respect to each if

 They can execute at the same time (concurrent execution)

 Implies that there are no dependencies between the tasks

 Dependencies

 If a task requires results produced by other tasks in order to execute correctly, the task’s execution is dependent

 Producer/Consumer, Parent/Childs

 If two tasks are dependent, they are not concurrent

 Some form of synchronization must be used to enforce (satisfy) dependencies

 Lock/Unloke

 Concurrency is fundamental to computer science

 Operating systems, databases, networking, …

(6)

Concurrency and Parallelism

 Concurrent is not the same as parallel! Why?

 Parallel execution

 Concurrent tasks actually execute at the same time

 Multiple (processing) resources have to be available

 Parallelism = concurrency + “parallel”

hardware

 Both are required

 Find concurrent execution opportunities

 Develop application to execute in parallel

 Run application on parallel hardware

(7)

Parallelism

 There are granularities of parallelism (parallel execution) in programs

 Processes, threads, routines, statements, instructions, …

 Think about what are the software elements that execute concurrently

 These must be supported by hardware resources

 Processors, cores, … (execution of instructions)

 Memory, DMA, networks, … (other associated operations)

 All aspects of computer architecture offer opportunities for parallel hardware execution

 Concurrency is a necessary condition for parallelism

 Where can you find concurrency?

 How is concurrency expressed to exploit parallel systems?

(8)

Parallel Computing Elements

P P P P P

..

P

Microkernel

Multi-Processor Computing System Threads Interface

Hardware

Operating System

Process Processor Thread

P

Applications

Programming paradigms

(9)

Process Vs. Thread

Thread

:

A portion of a program that shares processor resources a dellac( .sdaerht rehto htiw

thgiewthgil process) Why Process :

a program that is running on the computer .

 A process reserves its own computer resources dna ecaps yromem sa hcus ,

sretsiger .

(10)

Why use parallel processing?

 Two primary reasons (performance)

 Faster time to solution (response time)

 Solve bigger computing problems (in same time)

 Other factors motivate parallel processing

 Effective use of machine resources (resource Utilization)

 Cost efficiencies

 Overcoming memory constraints

 Serial machines have inherent limitations

 Processor speed, memory, …

 Parallelism has become the future of computing

(11)

Perspectives on Parallel Processing

 Parallel computer architecture

 Hardware needed for parallel execution?

 Computer system design

 (Parallel) Operating system

 How to manage systems aspects in a parallel computer

 Parallel programming

 Libraries (low-level, high-level)

 Languages

 Software development environments

 Parallel algorithms

 Parallel performance evaluation

 Parallel tools

 Performance, analytics, visualization, …

(12)

Why Use Parallel Computing

 Save time –many processors work together

 Solve Larger Problems – larger than one processor’s CPU and memory can handle

 Provide Concurrency – do multiple things at the same time:

 Eg,; online access to databases, search engine

 Google’s 4,000 PC servers are one of the largest in clusters the world

(13)

Types of Concurrent Systems

 Multiprogramming

 Multiprocessing

 Multitasking

 Distributed Systems

(14)

Multiprogramming

 Share a single CPU among many users or tasks.

 May have a time-shared algorithm or a priority algorithm for determining which task to run next

 Give the illusion of simultaneous processing through rapid swapping of tasks (interleaving)

Memory User 1

User 2 CPU

User1 User2

(15)

Multiprocessing

 Executes multiple tasks at the same time

 Uses multiple processors to accomplish the tasks

 Each processor may also timeshare among several tasks

 Has a shared memory that is used by all the tasks

Memory

User 1: Task1 User 1: Task2

User 2: Task1 CPU

User1 User2

CPU CPU

(16)

Multitasking

 A single user can have multiple tasks running at the same time.

 Can be done with one or more processors.

 Used to be rare and for only expensive

multiprocessing systems, but now most modern operating systems can do it.

Memory

User 1: Task1 User 1: Task2 User 1: Task3

CPU

User1

(17)

Distributed Systems

Multiple computers working together with no central program “in charge”.

Central Bank

ATM Buford ATM Perimeter

ATM Student Ctr

ATM North Ave

(18)

Parallelism

 Using multiple processors to solve a single task.

 Involves:

◦ Breaking the task into meaningful pieces

◦ Doing the work on many processors

◦ Coordinating and putting the pieces back together.

(19)

Pipeline Processing

 Repeating a sequence of operations or pieces of a task (Interleaving).

 Allocating each piece to a separate processor ( or Function Unit) and chaining them together produces a pipeline, completing tasks faster.

 Example:

◦ Instructions Cycle in modern computers

Fetch

Instruction Decode Fetch

Operand Execute Store

Result

(20)

Task Queues

P1 P2 P3 Pn

Super Task Queue

 A supervisor processor maintains a

queue of tasks to be performed in shared memory.

 Each processor queries the queue,

dequeues the next task and performs it.

 Task execution may involve adding more tasks to the task queue.

(21)

Parallelizing Algorithms

How much gain can we get from parallelizing an algorithm?

{Performance}

(22)

Number of Processors

 Processors are limited by hardware.

 Usually: The number of processors is a constant factor, 2^K

 Imaginable : Networked computers joined as needed.

(23)

Adding Processors

 A program on one processor

◦ Runs in X time

 Adding another processor

◦ Runs in no more than X

𝟐 time

◦ Realistically it will run in X

𝒕𝒊𝒎𝒆 because of overhead

 At some point, adding processors will not help and could degrade performance.

(24)

Overhead of Parallelization

 Processors must be controlled and coordinated.

 We need a way to govern which processor does what work; this involves extra work.

 The program must be written in a special programming language for parallel systems.

 A parallelized program for one machine (with, say, 2 ^K processors’nseod )t work on other

machines (with, say, 2 ^L processors)

(25)

What We Know about Tasks

 Relatively isolated units of computation

 Should be roughly equal in duration

 Duration of the unit of work must be much greater than overhead time (CCR)

 Policy decisions and coordination required for shared data

(26)

von Neumann Architecture

 Common machine model

◦ for over 50 years

 Stored-program concept

 CPU executes a stored program

 A sequence of read and write operations on the memory (RAM)

 Order of operations is sequential

(27)

A More Detailed Architecture based on von Neumann Model

(28)

Motivations for Parallel Computing

 Fundamental limits on single processor speed

 Disparity between CPU & memory speed

◦ Performance Mismatch Problem

 Distributed data communications

 Need for very large scale computing platforms

(29)

Performance miss match Problem

(30)

Possible Solutions

 A hierarchy of successively fast memory devices (multilevel caches)

 Location of data reference (data Locality)

 Efficient programming can be an issue

 Parallel systems may provide

1. larger aggregate cache

2. higher aggregate bandwidth to the memory system

(31)

Why Use Parallel Computing

 Save time –many processors work together

 Solve Larger Problems –larger than one processor’s CPU and memory can handle

 Provide Concurrency –do multiple things at the same time :

 Eg,; online access to databases, search engine

 Google’s 4,000 PC servers are one of the largest in clusters the world

(32)

Cont.

 Taking advantages of non-local resources – using computing resources on a wide area network, or even internet (Grid & Cloud Computing)

◦ Remote Access Resources

 Cost savings –using multiple “cheap ”computing resources instead of a high-end CPU

 Overcoming memory constraints –for large problems, using memories of multiple computers may overcome the memory constraint obstacle

(33)