Dr. Noha MM.
Computer Science Department Thebes Academy
High Performance Computing (HPC)
Lecture 2
Concepts Covered
Parallel Processing Concurrency
Multiprogramming Multiprocessing
Multitasking
Distributed Systems
Dynamic Random Access Memory
(DRAM) Control
Unit Registers
Arith. && Logic
CPU
Input/output (I/O) Controller
Sequential Computer Architecture
• Computer Performance Metrics
Processor speed:
RAM Bandwidth: rate at which data is fetched from RAM ( Bytes or words per cycle)
Latency: Time taken by the memory to get
instructions from processor and return data to it
Data Stream Instruction Stream
Parallel Processing – What is it?
A parallel computer is a computer system that uses multiple processing elements
simultaneously in a cooperative manner to solve a computational problem
Parallel processing includes techniques and
technologies that make it possible to compute in parallel
❍ Hardware, networks, operating systems, parallel
libraries, languages, compilers, algorithms, tools, …
Concurrency
Consider multiple tasks to be executed in a computer simultaneously.
Tasks are concurrent with respect to each if
They can execute at the same time (concurrent execution)
Implies that there are no dependencies between the tasks
Dependencies
If a task requires results produced by other tasks in order to execute correctly, the task’s execution is dependent
Producer/Consumer, Parent/Childs
If two tasks are dependent, they are not concurrent
Some form of synchronization must be used to enforce (satisfy) dependencies
Lock/Unloke
Concurrency is fundamental to computer science
Operating systems, databases, networking, …
Concurrency and Parallelism
Concurrent is not the same as parallel! Why?
Parallel execution
Concurrent tasks actually execute at the same time
Multiple (processing) resources have to be available
Parallelism = concurrency + “parallel”
hardware
Both are required
Find concurrent execution opportunities
Develop application to execute in parallel
Run application on parallel hardware
Parallelism
There are granularities of parallelism (parallel execution) in programs
Processes, threads, routines, statements, instructions, …
Think about what are the software elements that execute concurrently
These must be supported by hardware resources
Processors, cores, … (execution of instructions)
Memory, DMA, networks, … (other associated operations)
All aspects of computer architecture offer opportunities for parallel hardware execution
Concurrency is a necessary condition for parallelism
Where can you find concurrency?
How is concurrency expressed to exploit parallel systems?
Parallel Computing Elements
P P P P P
..
PMicrokernel
Multi-Processor Computing System Threads Interface
Hardware
Operating System
Process Processor Thread
P
Applications
Programming paradigms
Process Vs. Thread
Thread
:
A portion of a program that shares processor resources a dellac( .sdaerht rehto htiw
thgiewthgil process) Why Process :
a program that is running on the computer .
A process reserves its own computer resources dna ecaps yromem sa hcus ,
sretsiger .
Why use parallel processing?
Two primary reasons (performance)
Faster time to solution (response time)
Solve bigger computing problems (in same time)
Other factors motivate parallel processing
Effective use of machine resources (resource Utilization)
Cost efficiencies
Overcoming memory constraints
Serial machines have inherent limitations
Processor speed, memory, …
Parallelism has become the future of computing
Perspectives on Parallel Processing
Parallel computer architecture
Hardware needed for parallel execution?
Computer system design
(Parallel) Operating system
How to manage systems aspects in a parallel computer
Parallel programming
Libraries (low-level, high-level)
Languages
Software development environments
Parallel algorithms
Parallel performance evaluation
Parallel tools
Performance, analytics, visualization, …
Why Use Parallel Computing
Save time –many processors work together
Solve Larger Problems – larger than one processor’s CPU and memory can handle
Provide Concurrency – do multiple things at the same time:
Eg,; online access to databases, search engine
Google’s 4,000 PC servers are one of the largest in clusters the world
Types of Concurrent Systems
Multiprogramming
Multiprocessing
Multitasking
Distributed Systems
Multiprogramming
Share a single CPU among many users or tasks.
May have a time-shared algorithm or a priority algorithm for determining which task to run next
Give the illusion of simultaneous processing through rapid swapping of tasks (interleaving)
Memory User 1
User 2 CPU
User1 User2
Multiprocessing
Executes multiple tasks at the same time
Uses multiple processors to accomplish the tasks
Each processor may also timeshare among several tasks
Has a shared memory that is used by all the tasks
Memory
User 1: Task1 User 1: Task2
User 2: Task1 CPU
User1 User2
CPU CPU
Multitasking
A single user can have multiple tasks running at the same time.
Can be done with one or more processors.
Used to be rare and for only expensive
multiprocessing systems, but now most modern operating systems can do it.
Memory
User 1: Task1 User 1: Task2 User 1: Task3
CPU
User1
Distributed Systems
Multiple computers working together with no central program “in charge”.
Central Bank
ATM Buford ATM Perimeter
ATM Student Ctr
ATM North Ave
Parallelism
Using multiple processors to solve a single task.
Involves:
◦ Breaking the task into meaningful pieces
◦ Doing the work on many processors
◦ Coordinating and putting the pieces back together.
Pipeline Processing
Repeating a sequence of operations or pieces of a task (Interleaving).
Allocating each piece to a separate processor ( or Function Unit) and chaining them together produces a pipeline, completing tasks faster.
Example:
◦ Instructions Cycle in modern computers
Fetch
Instruction Decode Fetch
Operand Execute Store
Result
Task Queues
P1 P2 P3 Pn
Super Task Queue
A supervisor processor maintains a
queue of tasks to be performed in shared memory.
Each processor queries the queue,
dequeues the next task and performs it.
Task execution may involve adding more tasks to the task queue.
Parallelizing Algorithms
How much gain can we get from parallelizing an algorithm?
{Performance}
Number of Processors
Processors are limited by hardware.
Usually: The number of processors is a constant factor, 2K
Imaginable : Networked computers joined as needed.
Adding Processors
A program on one processor
◦ Runs in X time
Adding another processor
◦ Runs in no more than X
𝟐 time
◦ Realistically it will run in X
𝒕𝒊𝒎𝒆 because of overhead
At some point, adding processors will not help and could degrade performance.
Overhead of Parallelization
Processors must be controlled and coordinated.
We need a way to govern which processor does what work; this involves extra work.
The program must be written in a special programming language for parallel systems.
A parallelized program for one machine (with, say, 2 K processors’nseod )t work on other
machines (with, say, 2 L processors)
What We Know about Tasks
Relatively isolated units of computation
Should be roughly equal in duration
Duration of the unit of work must be much greater than overhead time (CCR)
Policy decisions and coordination required for shared data
von Neumann Architecture
Common machine model
◦ for over 50 years
Stored-program concept
CPU executes a stored program
A sequence of read and write operations on the memory (RAM)
Order of operations is sequential
A More Detailed Architecture based on von Neumann Model
Motivations for Parallel Computing
Fundamental limits on single processor speed
Disparity between CPU & memory speed
◦ Performance Mismatch Problem
Distributed data communications
Need for very large scale computing platforms
Performance miss match Problem
Possible Solutions
A hierarchy of successively fast memory devices (multilevel caches)
Location of data reference (data Locality)
Efficient programming can be an issue
Parallel systems may provide
1. larger aggregate cache
2. higher aggregate bandwidth to the memory system
Why Use Parallel Computing
Save time –many processors work together
Solve Larger Problems –larger than one processor’s CPU and memory can handle
Provide Concurrency –do multiple things at the same time :
Eg,; online access to databases, search engine
Google’s 4,000 PC servers are one of the largest in clusters the world
Cont.
Taking advantages of non-local resources – using computing resources on a wide area network, or even internet (Grid & Cloud Computing)
◦ Remote Access Resources
Cost savings –using multiple “cheap ”computing resources instead of a high-end CPU
Overcoming memory constraints –for large problems, using memories of multiple computers may overcome the memory constraint obstacle