• Tidak ada hasil yang ditemukan

High Performance Computing (HPC)

N/A
N/A
@Muhammad J. Abu-Emaish

Academic year: 2024

Membagikan "High Performance Computing (HPC)"

Copied!
33
0
0

Teks penuh

(1)

Dr. Noha MM.

Computer Science Department Thebes Academy

High Performance Computing (HPC)

Lecture 2

(2)

Concepts Covered

Parallel Processing Concurrency

Multiprogramming Multiprocessing

Multitasking

Distributed Systems

(3)

Dynamic Random Access Memory

(DRAM) Control

Unit Registers

Arith. && Logic

CPU

Input/output (I/O) Controller

Sequential Computer Architecture

Computer Performance Metrics

Processor speed:

RAM Bandwidth: rate at which data is fetched from RAM ( Bytes or words per cycle)

Latency: Time taken by the memory to get

instructions from processor and return data to it

Data Stream Instruction Stream

(4)

Parallel Processing – What is it?

A parallel computer is a computer system that uses multiple processing elements

simultaneously in a cooperative manner to solve a computational problem

Parallel processing includes techniques and

technologies that make it possible to compute in parallel

Hardware, networks, operating systems, parallel

libraries, languages, compilers, algorithms, tools, …

(5)

Concurrency

Consider multiple tasks to be executed in a computer simultaneously.

Tasks are concurrent with respect to each if

They can execute at the same time (concurrent execution)

Implies that there are no dependencies between the tasks

Dependencies

If a task requires results produced by other tasks in order to execute correctly, the task’s execution is dependent

Producer/Consumer, Parent/Childs

If two tasks are dependent, they are not concurrent

Some form of synchronization must be used to enforce (satisfy) dependencies

Lock/Unloke

Concurrency is fundamental to computer science

Operating systems, databases, networking, …

(6)

Concurrency and Parallelism

Concurrent is not the same as parallel! Why?

Parallel execution

Concurrent tasks actually execute at the same time

Multiple (processing) resources have to be available

Parallelism = concurrency + “parallel”

hardware

Both are required

Find concurrent execution opportunities

Develop application to execute in parallel

Run application on parallel hardware

(7)

Parallelism

There are granularities of parallelism (parallel execution) in programs

Processes, threads, routines, statements, instructions, …

Think about what are the software elements that execute concurrently

These must be supported by hardware resources

Processors, cores, … (execution of instructions)

Memory, DMA, networks, … (other associated operations)

All aspects of computer architecture offer opportunities for parallel hardware execution

Concurrency is a necessary condition for parallelism

Where can you find concurrency?

How is concurrency expressed to exploit parallel systems?

(8)

Parallel Computing Elements

P P P P P

..

P

Microkernel

Multi-Processor Computing System Threads Interface

Hardware

Operating System

Process Processor Thread

P

Applications

Programming paradigms

(9)

Process Vs. Thread

Thread

:

A portion of a program that shares processor resources a dellac( .sdaerht rehto htiw

thgiewthgil process) Why Process :

a program that is running on the computer .

 A process reserves its own computer resources dna ecaps yromem sa hcus ,

sretsiger .

(10)

Why use parallel processing?

Two primary reasons (performance)

Faster time to solution (response time)

Solve bigger computing problems (in same time)

Other factors motivate parallel processing

Effective use of machine resources (resource Utilization)

Cost efficiencies

Overcoming memory constraints

Serial machines have inherent limitations

Processor speed, memory, …

Parallelism has become the future of computing

(11)

Perspectives on Parallel Processing

Parallel computer architecture

Hardware needed for parallel execution?

Computer system design

(Parallel) Operating system

How to manage systems aspects in a parallel computer

Parallel programming

Libraries (low-level, high-level)

Languages

Software development environments

Parallel algorithms

Parallel performance evaluation

Parallel tools

Performance, analytics, visualization, …

(12)

Why Use Parallel Computing

Save time –many processors work together

Solve Larger Problems – larger than one processor’s CPU and memory can handle

Provide Concurrency – do multiple things at the same time:

Eg,; online access to databases, search engine

Google’s 4,000 PC servers are one of the largest in clusters the world

(13)

Types of Concurrent Systems

Multiprogramming

Multiprocessing

Multitasking

Distributed Systems

(14)

Multiprogramming

Share a single CPU among many users or tasks.

May have a time-shared algorithm or a priority algorithm for determining which task to run next

Give the illusion of simultaneous processing through rapid swapping of tasks (interleaving)

Memory User 1

User 2 CPU

User1 User2

(15)

Multiprocessing

Executes multiple tasks at the same time

Uses multiple processors to accomplish the tasks

Each processor may also timeshare among several tasks

Has a shared memory that is used by all the tasks

Memory

User 1: Task1 User 1: Task2

User 2: Task1 CPU

User1 User2

CPU CPU

(16)

Multitasking

A single user can have multiple tasks running at the same time.

Can be done with one or more processors.

Used to be rare and for only expensive

multiprocessing systems, but now most modern operating systems can do it.

Memory

User 1: Task1 User 1: Task2 User 1: Task3

CPU

User1

(17)

Distributed Systems

Multiple computers working together with no central program “in charge”.

Central Bank

ATM Buford ATM Perimeter

ATM Student Ctr

ATM North Ave

(18)

Parallelism

Using multiple processors to solve a single task.

Involves:

Breaking the task into meaningful pieces

Doing the work on many processors

Coordinating and putting the pieces back together.

(19)

Pipeline Processing

Repeating a sequence of operations or pieces of a task (Interleaving).

Allocating each piece to a separate processor ( or Function Unit) and chaining them together produces a pipeline, completing tasks faster.

Example:

Instructions Cycle in modern computers

Fetch

Instruction Decode Fetch

Operand Execute Store

Result

(20)

Task Queues

P1 P2 P3 Pn

Super Task Queue

A supervisor processor maintains a

queue of tasks to be performed in shared memory.

Each processor queries the queue,

dequeues the next task and performs it.

Task execution may involve adding more tasks to the task queue.

(21)

Parallelizing Algorithms

How much gain can we get from parallelizing an algorithm?

{Performance}

(22)

Number of Processors

Processors are limited by hardware.

Usually: The number of processors is a constant factor, 2K

Imaginable : Networked computers joined as needed.

(23)

Adding Processors

A program on one processor

Runs in X time

Adding another processor

Runs in no more than X

𝟐 time

Realistically it will run in X

𝒕𝒊𝒎𝒆 because of overhead

At some point, adding processors will not help and could degrade performance.

(24)

Overhead of Parallelization

Processors must be controlled and coordinated.

We need a way to govern which processor does what work; this involves extra work.

The program must be written in a special programming language for parallel systems.

A parallelized program for one machine (with, say, 2 K processorsnseod )t work on other

machines (with, say, 2 L processors)

(25)

What We Know about Tasks

Relatively isolated units of computation

Should be roughly equal in duration

Duration of the unit of work must be much greater than overhead time (CCR)

Policy decisions and coordination required for shared data

(26)

von Neumann Architecture

Common machine model

for over 50 years

Stored-program concept

CPU executes a stored program

A sequence of read and write operations on the memory (RAM)

Order of operations is sequential

(27)

A More Detailed Architecture based on von Neumann Model

(28)

Motivations for Parallel Computing

Fundamental limits on single processor speed

Disparity between CPU & memory speed

Performance Mismatch Problem

Distributed data communications

Need for very large scale computing platforms

(29)

Performance miss match Problem

(30)

Possible Solutions

A hierarchy of successively fast memory devices (multilevel caches)

Location of data reference (data Locality)

Efficient programming can be an issue

Parallel systems may provide

1. larger aggregate cache

2. higher aggregate bandwidth to the memory system

(31)

Why Use Parallel Computing

Save time –many processors work together

Solve Larger Problems –larger than one processor’s CPU and memory can handle

Provide Concurrency –do multiple things at the same time :

Eg,; online access to databases, search engine

Google’s 4,000 PC servers are one of the largest in clusters the world

(32)

Cont.

Taking advantages of non-local resources using computing resources on a wide area network, or even internet (Grid & Cloud Computing)

Remote Access Resources

Cost savings –using multiple “cheap ”computing resources instead of a high-end CPU

Overcoming memory constraints –for large problems, using memories of multiple computers may overcome the memory constraint obstacle

(33)

Referensi

Dokumen terkait

Results FLOPS value, CPU Time, and scores obtained from the first test and the time of matrix calculations in a second test will be an equivalence between the performance of the

ln thi s research, our contribution concentration is on Bounding Volume Hierarchies (BVH) technique using high performance computing based on GPU acceleration in order

Riset ini bertujuan untuk melakukan salah satu aspek perancangan obat melalui screening bahan aktif obat tradisional Indonesia menggunakan infrastruktur High Performance

Berdasarkan hasil penelitian, beberapa hal yang dapat disimpulkan bahwa optimasi dayadengan metode prediksi pada sistem ini dapat dilakukan pada migrasi VM dan

Diharapkan dengan portal manajemen sumber daya HPC ini, mahasiswa/peneliti mitra dapat lebih produktif dalam melakukan penelitian yang membutuhkan simulasi

The management and analyses of Big Data through these various stages of its life cycle presents challenges that can be addressed using High Performance Computing (HPC) resources

The implementation of embedded system aims students be able to create an innovative high-performance computing cluster system to support education learning activities by

Although Parallel CRC Checker has larger in area and power as compared to Serial CRC Checker but it reduces the simulation time or checking time at the optimum level which is required