Dr. Noha MM.
Computer Science Department Thebes Academy
High Performance Computing (HPC)
Lecture 1
Content
Performance and Metrics
Anatomy of A Supercomputer
Serial Computing v.s Parallel Computing
Elementary Steps in Parallel Programing
Introduction
Why HPC
Introduction
• High Performance Computing (HPC) refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a desktop computer in order to solve large problems in any science and engineering, simulation, modeling, or big data analytics application.
• HPC is really a collection of multiple interrelated disciplines, each providing an important aspect of the total field.
• HPC is a field that relates to all facets of technology, methodology, and application associated with achieving the greatest computing capability possible at any point in time and technology.
Why HPC
• Scientific simulation and modelling, many complex physical phenomena, biological complex systems, complex social behaviors drive the need for greater computing power.
• Single – core processors can’t be made that have enough resources for the simulations need.
Making processors with faster clock speeds is difficult due to cost and power/heat limitations.
Expensive to put huge memory on a single processor.
Serial Computing v.s Parallel Computing
Solution: Parallel computing – divide up the work among numerous linked systems
Distributed memory Serial/ Sequential
Cont.
• The load (memory) can be heavy to carry.
• Task will take long time
• One task divided into small blocks and different computers.
• Parallel processing+ Scientific Computing = High Performance Scientific Computing
Serial
Parallel
Elementary Steps in Parallel Programing
• How many computer doing the work (Degree of parallelism)
• What is needed to begin the work. (Initialization)
• Who does what (Work distribution)
• Access to work part. (Data I/O access)
• Whether they need information from each other to finish their own job. (Communication)
• When are they all done. (Synchronization)
• What needs to be done to collect the result.
• Message Passing Interface (MPI), Open Multi – Processing (OpenMP), etc.
Anatomy of A Supercomputer
• A modern supercomputer: Titan is one of the fastest computers in the world.
• The layered hierarchy of the many physical and logical components contributing to a general-purpose supercomputer
• The system hardware layer, represents the physical resources that are the most visible (and audible) aspect of a supercomputer like Titan. Even in this high-level view, the principal components of the system are perceived.
• The processors that perform the calculations and the memory which stores both the data and the program codes that operate on it are shown here.
• the interconnection network that integrates potentially many thousands and eventually millions of such processor/memory “nodes” into a single supercomputer.
Cont.
• The first levels of software that control the hardware and manage these resources are associated with the operating system.
• Each node has a local instance of an operating system controlling the physical memory and processor resources of the node as well as the interface to the external (to the node) system area network.
• The overall work management layer involves several support capabilities, including the programming languages (e.g., Fortran, C, C++), additional libraries often for parallelism (e.g., MPI, OpenMP), and compilers that provide machine-readable code for the processor cores translated and optimized from the user code.
Performance and Metrics
• 1.23958456606 + 4.2254568978 = ??
• For HPC (For scientific and technical programming) the most widely used metric is “floating-point operations per second” or “flops”.
• Modern supercomputers measured in PFLOPS (PetaFLOPS)
• Kilo = 103 Mega = 106 Giga = 109
• Tera = 1012 Peta = 1015
• Two basic measures are employed individually or in combination and in differing contexts to formulate the values used to represent the quality of a supercomputer.
• These two fundamental measures are “time” and “number of operations”
performed, both under prescribed conditions.
This formalization of performance degradation is referred to through the acronym SLOW , which identifies the sources as starvation, latency, overhead, and waiting for contention.
Latency
is the time it takes for information to travel from one
part of a system to
another.
Overhead is the amount of
additional work that required to
perform the computation (such as on a pure sequential
processor).
Waiting
Waiting of threads of action for
shared
resources due to contention of
access degrades performance
Starvation (delay)
Relates to a critical source of performance,
parallelism.
Techniques to improve one’s performance
Parallel algorithms
Performance monitoring
Hardware scaling
Task granularity control
Work and data distribution