Chapter 6-2:
M l i S f
Multiprocessor Software
Soo-Ik Chae Soo Ik Chae
High Performance Embedded Computing 1
Topics p
Multiprocessor scheduling.
Middleware and software services.
Design verification.
6.3.3 Scheduling with dynamic tasks g y
C ’t t th t ll t k b h dl d
Can’t guarantee that all tasks can be handled.
Can’t guarantee start time for a process.
Unless the source of the tasks limits itself
Unless the source of the tasks limits itself
In a real time system, once we start a process, we want to guarantee its completion time. g p
Admission control determines what processes can execute based on resources, load.
For dynamic systems with more than one processors and/or tasks that have exclusion
t i t ti l l ti d t i t
constraints, an optimal solution does not exists
Use heuristics
High Performance Embedded Computing 3
Ramarithram et al. myopic scheduling y p g
Assumptions:
Tasks are nonperiodic.
Tasks are executed non-preemptively.
No data dependencies between tasks
No data dependencies between tasks.
Task characterized by
arrival time: Ta
arrival time: Ta
Deadline: Td
worst-case processing time: Tp
resource requirements {Tr}
Original heuristic algorithm: O(n
2)
At each step, it checks whether the current partial schedule is strongly-feasible and if so, it applies the H heuristics
Myopic scheduling algorithm y p g g
Partial sched le is strongl feasible if the sched le
Partial schedule is strongly feasible if the schedule itself is feasible and every possible next choice for a task also gives a feasible schedule. g
O(nk) version: myopic (short-sighted) scheduling algorithm
C t t ti l h d l
Constructs partial schedules.
Check strongly feasibility by evaluating an H (search metric) function
Sh t t d dli fi t h t t i ti
Shortest deadline first, shortest processing time, ..
Search includes backtracking.
Add a task to a partial schedule.
Searches only first k tasks of the remaining tasks pre- sorted by deadlines.
High Performance Embedded Computing 5
Myopic scheduling algorithm y p g g
{tasks_remaining}: arranged in the order of increasing deadlines
Nr: # of tasks in {tasks_remaining}
K: maximum number of tasks in tasks remaining
K: maximum number of tasks in tasks_remaining considered by myopic algorithm
Nk: actual number oftasks in tasks remaining considered
Nk: actual number oftasks in tasks_remaining considered by the myopic algorithm at each step of scheuling
Nk = min (k,Nr)
Nk min (k,Nr)
{tasks_considered}: the first Nk tasks in {taks_remaining}
Load balancingg
It is a form of dynamic task allocation y
Move tasks to new processing element during execution.
Task migration moves an executing task:
Homogeneous MP with shared memory: Just move the g y task’s activation record from the old PE to the new PE
Heterogeneous MP with shared memory:Two versions of code should be available in both old and new PEs. Harder on heterogeneous multiprocessor.
MP with non shared memory
MP with non-shared memory
Need to copy all program data. Harder still if memory is not shared
High Performance Embedded Computing 7
shared.
Load balancing scheduling g g
Shin and Chang: schedule using a buddy list for each processing element.
List of other processing elements with which it can share tasks.
Subdivided into preferred list, ordered by communication distance to the buddy. y
When moving a job, search the buddy list in
order checking load until a satisfactory node
order, checking load until a satisfactory node
is found.
Load balancing scheduling g g
High Performance Embedded Computing 9
6.4 Middleware and software services
Operating systems provide services for shared resources in uniprocessors.
resources in uniprocessors.
Must generalize this notion for multiprocessors.
Need distributed information about resource state
Need distributed information about resource state.
Middleware provides services in distributed systems.
Generic services such as data transport
Generic services such as data transport.
Application-specific services such as signal processing.
Uses of middleware
Middl i d f l t t
Middleware: coined for general-purpose systems to describe software that provides services for
applications in distributed systems and applications in distributed systems and multiprocessors.
It is not the application for itself nor does it describe
It is not the application for itself, nor does it describe primitive services provided by the operating system.
Provides fairly generic data services such as data transport y g p among the processor that may have different endianness or other data formatting issues.
It l id li ti ifi i
It can also provide application-specific services
High Performance Embedded Computing 11
Purposes of middleware p
S i ll li ti t b d l d
Services allow applications to be developed more quickly.
Which may be tied to a particular PE or an I/O device
Which may be tied to a particular PE or an I/O device.
Alternatively, they may provide high-level communication services.
services.
Simplifies porting application to a new platform.
Middleware standards are particularly useful
Middleware standards are particularly useful
Ensures that key functions are correct and efficient.
Rather than rely on users to directly implement all functions
Rather than rely on users to directly implement all functions, a vendor may provide middleware that showcases the
Middleware vs. libraries
Traditional software libraries may provide functions but don’t manage resources like OS.
but don t manage resources like OS.
Middleware need to know global state, have
privileges to manage resources by giving request to
p g g y g g q
the OS on each processor to implement those decisions.
Resources must be managed dynamically when requests come in dynamically.
Statically designing the system for worst case costs too
Statically designing the system for worst-case costs too much.
High Performance Embedded Computing 13
Embedded vs. general-purpose middleware
Embedded middleware must be very efficient:
Small software footprint.
Low latency.
Predictable performance.
E b dd d iddl id ti l ithi
Embedded middleware may reside entirely within a chip or may communicate with other systems-on- chips
chips.
Middleware makes use of general standards:
I t t t l (IP) ft d
Internet protocol (IP): often used
CORBA: widely used for distributed embedded services
CORBA
Common Object Request Broker Architecture is widely used in business oriented software
widely used in business-oriented software.
It is not a specific protocol
Rather meta model sing an object oriented ser ices
Rather meta-model using an object-oriented services.
CORBA services are provided by objects that combine functional interfaces as well as data combine functional interfaces as well as data.
An interface to an object is defined in an interactive data language (IDL)
data language (IDL)
It is language independent
Can be implemented in any programming language
Can be implemented in any programming language.
Objects and their variables are typed.
High Performance Embedded Computing 15
What is the purpose / goals of CORBA? p p g
Enable the b ilding of pl g and pla component
Enable the building of plug and play component software environment
Enable the development of portable, object oriented,
Enable the development of portable, object oriented, interoperable code that is hardware, operating
system, network, and programming language independent
independent
How to meet the goals
Interface Definition Language (IDL)
Interface Definition Language (IDL)
Object Request Broker (ORB)
Interface Definition Language (IDL) Interface Definition Language (IDL)
Language Independence
Defines Object Interfaces j
Hides underlying object implementation
L i i t f C C J
Language mappings exist for C, C++, Java, Cobol, Smalltalk, and Ada
High Performance Embedded Computing 17
Interface Definition Language (IDL) g g ( )
module <identifier>
Defines a container
{
interface <identifier> [:inheritance]
{
e es co e
(namespace)
{
<type declarations>;
<constant declarations>;
Defines a CORBA object
<exception declarations>;
<attribute declarations>;
[<op_type>] <identifier>(<parameters>) [raises exception][context];
}
D fi
}
}
Defines a
method
IDL Compiler p
IDL
Definitions
1. Define objects using IDLIDL
2. Run IDL file through IDL compiler
3 Compiler uses language
IDL Compiler
3. Compiler uses language mappings to generate programming language specific stubs and
specific stubs and skeletons
Stubs Skeletons
High Performance Embedded Computing 19
What is ORB?
Implementation of CORBA
Application specification
Middleware product pp
Conceptual Software Bus
Hides location and Middleware
implementation details
about objects H d
OS
Driversabout objects Hardware
ORB Architecture
Interface IDL Compiler Implementation
Repository Repository
Client OBJ Object (servant)
Ref
IDL DSI
Skeleton DSI ORB
Interface IDL Stub DII
Object Adapter ORB Core
GIOP/IIOP
High Performance Embedded Computing 21
CORBA requests q
Requests handled by object request broker (ORB).q ( )
Client and object may be on different machines.
ORBs may communicate.y Thread pool
A request to a remote machine can invkoe
multiple ORBs Client Object
Object Thread pool
The stub on the client-side provides the interface for the client while the skeleton is the interface to the object
Stub request Stub
interface to the object.
A given service appears as an object but may be
implemented with a thread
Object request broker
implemented with a thread pool.
Each object instance has a unique object referenceq j
ORB-to-ORB communication
•
Provides mechanism for transparently
communicating client requests to target object communicating client requests to target object implementations
•
Makes client requests appear to be local procedure q pp p calls
•
GIOP – General Inter-ORB Protocol
•
IIOP – Internet Inter-ORB Protocol
• The client’s ORB and object’s ORB must agree on a common agree on a common protocol : IIOP
High Performance Embedded Computing 23
IDL Stub
Static invocation
Static invocation interface (SII)
Marshals (encode)
Client( )
application data into a common packet-level
t ti representation
Network byte order (little- endian or big-endian) IDL
St b endian or big endian)
Size of data types Stub
IDL Skeleton
• Demarshals (decode) the packet level representation
Object (servant)
packet-level representation back into typed data that is
meaningful to an
Object (servant)meaningful to an application
Network byte order (little
IDL Skeleton –
Network byte order (little-
endian or big-endian)
–
Size of data types Size of data types
SkeletonHigh Performance Embedded Computing 25
Dynamic Invocation Interface y
•
Dynamically issue requests to objects without requiring IDL stubs to be linked in
Client
•
Clients discover interfaces at run- time and learn how to call them
ClientSteps:
1. Obtain interface name
2. Obtain method description (from interface repository)
3 Create argument list
DII
Object Adapter
3. Create argument list 4. Create request 5. Invoke request
Dynamic Skeleton Interface y
Object (servant)
• Server side analogue to DII
• Allows an ORB to deliver
Object (servant)requests to an object
implementation that does
DSI
not have compile-time knowledge of the type of
Object Adapter
object it is implementing
High Performance Embedded Computing 27
ARMADA
Motivated by the requirements of large embedded application such as command and control,
application such as command and control,
automated flight, shipboard computing, and radar data processing.
Traditionally constructed from special-purpose hardware and software.
A recent trend: build embedded systems using
commercial-over-the-shelf (COTS) components
ARMADA
Middleware system for fault tolerance and QoS.
Low-level real-time communication support
Middleware for group communication and fault tolerance.
Dependability evaluation and validation tools
High Performance Embedded Computing 29
A command and control application pp
General service implementation approach p pp
High Performance Embedded Computing 31
General service implementation approach p pp
Real-time communication service architecture
High Performance Embedded Computing 33
Real-time communication service architecture
Architectural requirements for QoS
Performance isolation between connections such that errors in one channel does not starve another channel.
S i diff ti ti ( i it )
Service differentiation (priority)
Graceful degradation in the presence of overload.
CLIPS a comm nication librar for implementing
CLIPS: a communication library for implementing priority semantics
Provide resource management mechanism
Provide resource management mechanism
A clip is an object that guarantees a certain throughput in terms of the number of packets sent via it per period. And terms of the number of packets sent via it per period. And implements a configurable buffer to accommodate bursty sources.
Real-time communication service architecture
RTCOP: real-time connection ordination protocol
Manages requests to create and destroy connections.
A clip is created for each end of the channel.
Each clip includes a message queue at the interface to the objects, a communication handler that schedules
operations and a packet queue at the interface to the operations, and a packet queue at the interface to the channel.
The communication handler is scheduled is scheduled using an EDF policy.
High Performance Embedded Computing 35
MPI ( multiprocessor interface) ( p )
A specification for middleware interface for multiprocessor communication.
Widely used in scientific clusters.
Decouples architectural parameters (# PEs) from algorithmic parameters (# data elements)
parameters (# data elements).
Six basic MPI functions:
MPI_Init().
MPI_Comm_rank(): get the index of this node
MPI_Comm_size(): get the total number of nodes
MPI Send()
MPI_Send().
MPI_Recv().
MPI_Finalize(): clean up
The basic MPI communication functions (MPI_send(), MPI_recv())
Point-to-point, blocking communication
Software stacks in MPSoCs
MPS C lt d i ti f t iddl th t
MPSoC resulted in a new generation of custom middleware that relies less on standard services and models.
SoC middleware has been designed from scratch for several reasons.
Often power or energy constrained. Any services must be implemented very efficiently.p y y
Although SOC may be committed with outside standard services, they are not constrained to use standards within the chip.
Today’s SoC are composed of a relatively small number of
Today s SoC are composed of a relatively small number of processors. ( simple yet)
Software stack manages resources, abstracts hardware details.
P f i di h k h i
Performance, power requirements dictate a shorter stack than in general-purpose systems.
High Performance Embedded Computing 37
Typical MPSoC stack yp
Application layer provides
Application layer provides user function.
Application-specific libraries
t il d t id
are tailored to provide utilities for computation or communication specific to the application
Applications Application-specific
libraries
the application
Interprocess communication provides services across multiprocessor
Interprocess communication libraries
multiprocessor.
RTOS controls basic system functions.
HAL if l b t t
Real-time operating system
HAL uniformly abstracts
basic hardware services. Hardware abstraction layer
Multiflex programming env (Paulin) p g g ( )
Paulin et al.: uses hardware accelerators plus
ft t id lti i ti
software to provide multiprocessor communication.
Supports two parallel programming models:
DSOC: Distributed system object component
DSOC: Distributed system object component
SMP: Symmetric multiprocessing
DSOC is an object-oriented model. j
It is a message-passing model and it supports a very simple CORBA-like interface definition (dubbed SIDL)
Client marshals data for call.
Client marshals data for call.
Server side unmarshals data for use.
SMP engine uses memory-mapped reads/writes.
Supports concurrent threads accessing shared memory
The implementation performs scheduling, and includes support for threads, monitors, conditions and semaphores.
High Performance Embedded Computing 39
pp , , p
MultiFlex platform p
MultiFlex platform p
High Performance Embedded Computing 41
MultiFlex platform mapping p pp g
MultiFlex concurrency engine y g
High Performance Embedded Computing 43
Hardware concurrency engine y g
Hardware concurrency engine y g
High Performance Embedded Computing 45
Monitor
Example: OMAP software platform p p
MM services, plug-ins, protocols
CSL: chip support library DDK: driver development kit MM OS server
MM services, plug ins, protocols Multimedia APIs
DDK: driver development kit
High-
DSP Gateway components DSP SW components
App- specific
High Level
OS
Device
DSP RTOS Device
DSP/BIOS DSP Bridge
DDAPI API DDAPI
Device Drivers
Device Drivers DSP/BIOS
Bridge CSLAPI
ARM CSL (OS-independent) DSP CSL (OS-independent)
High Performance Embedded Computing 47
OMAP software architecture
DSPBridge g
Abstracts the DSP software architecture for the general-purpose software environment.
APIs include driver interfaces and application interfaces:
Initiate and control DSP tasks.
Exchange messages with DSP.
St d t t /f DSP
Stream data to/from DSP.
Check status.
High Performance Embedded Computing 49
Resource manager g
Provide API interface to the DSP.
Loads, initiates, and controls DSP applications.
Keeps track of resources:
CPU time, memory pool, utilization, etc.
Controls:
Tasks.
Data streams between DSP and CPU.
Memory allocation.
Multimedia messaging service g g
Mi i i t f
Minimum requirement from spec:
JPEG, MIME text with SMS, GSM AMR, H.263, SVG for graphics
graphics.
Optional: AAC, MP3, MIDI, MP4, and GIF.
Must provide: MM presentation user notification
Must provide: MM presentation, user notification, MM message retrieval.
Additional functions: MM composition, MM p , submission, MM message storage,
encryption/decryption, user profile management.
High Performance Embedded Computing 51
Quality-of-service
Q y
QoS must be measured system-wide.
One component can destroy system QoS characteristics
One component can destroy system QoS characteristics.
QoS modeling:
Contract specifies resources that will be provided
Contract specifies resources that will be provided.
Protocol manages the contract.
Scheduler implements the contract.
Scheduler implements the contract.
Resources must be available to deliver on the contract.
contract.
Design verification g
Verifying multiprocessors is hard:
Observe and control data.
Drive part of the system into a desired state.
Generate and test timing effects.
Generate and test timing effects.
CoMET simulator
H ll t d
Hellestrand
High Performance Embedded Computing 53
High Performance Embedded Computing 55
High Performance Embedded Computing 57
CoMET simulator
A new, fast, and accurate processor model
Two parts
M d l f th b h i f i t ti ti
Models for the behavior of instruction execution
Models for the dynamic parts of the processor
Virtual processor model (VPM) describes function of the application running on the processor
application running on the processor.
The code executed by a VPM may be C or C++ code
Very fast, like static timing analysis
I/O parts that communicate between the processor and the
I/O parts that communicate between the processor and the hardware
Interrupts, cache, virtual memory, DMA, bus signals.
Speed is limited by the level of detail modeled and by the speed of
Speed is limited by the level of detail modeled and by the speed of the hardware simulator effecting communication
Simulation backplane connects processor models and hardware models.
Many VPMs and a single HDL simulator
The back plane kernel mediates the maintenance of causality
between the domains that defines their own relative frames of space and time
and time
High Performance Embedded Computing 59
High Performance Embedded Computing 61