High Performance Computing (HPC)

(1)

Dr. Noha MM.

Computer Science Department Thebes Academy

High Performance Computing (HPC)

Lecture 1

(2)

Content

Performance and Metrics

Anatomy of A Supercomputer

Serial Computing v.s Parallel Computing

Elementary Steps in Parallel Programing

Introduction

Why HPC

(3)

Introduction

• High Performance Computing (HPC) refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a desktop computer in order to solve large problems in any science and engineering, simulation, modeling, or big data analytics application.

• HPC is really a collection of multiple interrelated disciplines, each providing an important aspect of the total field.

• HPC is a field that relates to all facets of technology, methodology, and application associated with achieving the greatest computing capability possible at any point in time and technology.

(4)

Why HPC

• Scientific simulation and modelling, many complex physical phenomena, biological complex systems, complex social behaviors drive the need for greater computing power.

• Single – core processors can’t be made that have enough resources for the simulations need.

 Making processors with faster clock speeds is difficult due to cost and power/heat limitations.

 Expensive to put huge memory on a single processor.

(5)

(6)

Serial Computing v.s Parallel Computing

Solution: Parallel computing – divide up the work among numerous linked systems

(7)

Distributed memory Serial/ Sequential

Cont.

• The load (memory) can be heavy to carry.

• Task will take long time

• One task divided into small blocks and different computers.

• Parallel processing+ Scientific Computing = High Performance Scientific Computing

Serial

Parallel

(8)

Elementary Steps in Parallel Programing

• How many computer doing the work (Degree of parallelism)

• What is needed to begin the work. (Initialization)

• Who does what (Work distribution)

• Access to work part. (Data I/O access)

• Whether they need information from each other to finish their own job. (Communication)

• When are they all done. (Synchronization)

• What needs to be done to collect the result.

• Message Passing Interface (MPI), Open Multi – Processing (OpenMP), etc.

(9)

Anatomy of A Supercomputer

• A modern supercomputer: Titan is one of the fastest computers in the world.

• The layered hierarchy of the many physical and logical components contributing to a general-purpose supercomputer

• The system hardware layer, represents the physical resources that are the most visible (and audible) aspect of a supercomputer like Titan. Even in this high-level view, the principal components of the system are perceived.

• The processors that perform the calculations and the memory which stores both the data and the program codes that operate on it are shown here.

• the interconnection network that integrates potentially many thousands and eventually millions of such processor/memory “nodes” into a single supercomputer.

(10)

(11)

Cont.

• The first levels of software that control the hardware and manage these resources are associated with the operating system.

• Each node has a local instance of an operating system controlling the physical memory and processor resources of the node as well as the interface to the external (to the node) system area network.

• The overall work management layer involves several support capabilities, including the programming languages (e.g., Fortran, C, C++), additional libraries often for parallelism (e.g., MPI, OpenMP), and compilers that provide machine-readable code for the processor cores translated and optimized from the user code.

(12)

Performance and Metrics

• 1.23958456606 + 4.2254568978 = ??

• For HPC (For scientific and technical programming) the most widely used metric is “floating-point operations per second” or “flops”.

• Modern supercomputers measured in PFLOPS (PetaFLOPS)

• Kilo = 10³ Mega = 10⁶ Giga = 10⁹

• Tera = 10¹² Peta = 10¹⁵

• Two basic measures are employed individually or in combination and in differing contexts to formulate the values used to represent the quality of a supercomputer.

• These two fundamental measures are “time” and “number of operations”

performed, both under prescribed conditions.

(13)

This formalization of performance degradation is referred to through the acronym SLOW , which identifies the sources as starvation, latency, overhead, and waiting for contention.

Latency

is the time it takes for information to travel from one

part of a system to

another.

Overhead is the amount of

additional work that required to

perform the computation (such as on a pure sequential

processor).

Waiting

Waiting of threads of action for

shared

resources due to contention of

access degrades performance

Starvation (delay)

Relates to a critical source of performance,

parallelism.

(14)

Techniques to improve one’s performance

Parallel algorithms

Performance monitoring

Hardware scaling

Task granularity control

Work and data distribution

(15)