Basic Overview of GEMS - Some Important Terms for Reference

2.5 Some Important Terms for Reference

3.1.2 Basic Overview of GEMS

GEMS has three major modules: Ruby, Garnet [99] and Orion [100]. Ruby is used for modeling the entire memory system of any CMP based architecture.

Each component like L1 cache, L2 banks, memory banks, directories etc. can be modeled in Ruby. These individual components are called “machine” in Ruby.

Each machine is given a unique id called machineID. The machineID is required for the unique identification of a machine during the on-chip communications. All the machines in CMP communicates by the underlying NoC. The NoC in GEMS is managed by Garnet. Any CMP based architecture can be modeled on ruby and connection between its different components (machines) can be designed by Garnet. It simulates the real time scenarios to communicate a message through the NoC. Orion is used for modeling the energy consumed by the simulated system.

The Garnet version we used follows X-Y routing. The on-chip communication cost is considered for all the experimental analysis done in this thesis.

The block request (load, store, fetch) from simics are passed to the Ruby module of GEMS. The first level of cache in ruby determines if the block is a hit or a

miss. If it is a hit then simics continues its execution. Otherwise request from the issuing core is stalled and GEMS simulates the cache miss. Ruby determines the timing-dependent functional simulation in Simics by controlling the timing of when Simics advances.

Each L1 cache is attached with a sequencer which is responsible for managing the requests from the corresponding core. The sequencer sends the request to L1. In case of CMP there can be multiple L1s and L2s available. Each of the machine has a controller and all the communications and operations of a machine are performed by the corresponding controller. GEMS provides a domain-specific language called SLICC to model the controllers. The purpose of a controller is to manage all the operations of a machine and also its communication with other machines. Maintaining coherence is a major concern in case of CMP based cache structure, hence coherence protocol must be implemented. The controllers of different modules manage the coherence protocol by communicating with each other through message passing. The messages are communicated from one module to another through the NoC (modeled by Garnet). SLICC is also responsible for designing the coherence protocol as it is a combined task of all the controllers.

3.1.2.1 CMP Architectures Supported by GEMS

GEMS only has T-SNUCA (cf. Section2.2.3.1) implemented on it. The T-SNUCA architecture provided by GEMS is very robust and can be configured with different varieties. For example, cache size, number of banks, number of tiles, cache access latency, miss penalty, hit time, network protocols etc. can be changed by just changing their values in a configuration file. There are many other parameters that can be reset based on the configuration demand like number of virtual networks, block size, cache associativity, replacement policy, network flit size etc. Multiple coherence protocols are also available to support T-SNUCA. One of them is MESI- CMP which is the baseline protocol of our designs. MESI-CMP protocol is based on the MESI protocol proposed in [46]. Different network topology can be designed by changing the network configuration file in Garnet.

Any other architecture behaving different than T-SNUCA has to be implemented by modifying the modules of GEMS. Additional modules may also required to be designed. For example, implementation of T-DNUCA (cf. Section 6.1) in GEMS is not possible by just changing the configuration parameters. A modified architecture may require to change the SLICC based coherence files and other ruby modules like cache memory, replacement policy etc. The mapping and replacement policies as mentioned in Chapter 2 also need to be changed for different designs requirements. The cache memory architecture is also reprogrammed if its new behavior is not like a conventional cache. For example, to implement Z-Cache [20] or CMP-SVR (cf. Section 1.5.2) we need to design our own cache memory in GEMS. For our proposed architectures like T-DNUCA and TLD-NUCA (cf.

Section 1.5.4) we have to model our own coherence protocol in SLICC. Rigorous testing has been done to guarantee consistency and liveness. Separate module has to be added for additional hardware support like TGS (in CMP-VR) and TLD (in TLD-NUCA). The modified GEMS requires to re-compile for generating the new architectures.

The operation of entire NoC can be modified according to the design requirements.

In this work we consider the existing mesh based NoC (shown in Figure 2.9) as the NoC architecture for all the designs.

3.1.2.2 Result Analysis

Since GEMS is a full-system simulator (together with simics) it can run real work- loads on the simulated (modeled) CMP architecture. During the execution it records different information. Some important information that GEMS records during execution are:

• Total Cycle Executed: The cycles executed in all the cores. Cycles executed in each core is also recorded.

• Total instruction executed: The instructions executed in all the cores. In- structions executed in each core is also recorded.

• Total L1 accesses/misses: The L1 accesses of all the L1 caches together. The individual L1 cache accesses are also recorded. The L1 misses are recorded similarly.

• Total L2 accesses/misses: Total number of misses in LLC (L2). The bank- wise distribution is not provided in the original GEMS. Though it can be implemented easily when required. The L2 misses are recorded in a similar manner.

• Average network latency: The average cycles required for each message to communicate through the NoC.

• NoC energy consumption: Total energy consumption of NoC during the execution.

Other important statistics provided by GEMS are: cycle per instructions (CPI), instructions per cycle (IPC), average memory access latency (including on-chip/off- chip communication time), average link utilisation etc. Miss Per Thousand In- structions (MPKI) is also provided by GEMS.

GEMS has a special module called Profiler for managing all the results during execution. The Profiler can be initialized at any time such that the results can be recorded for any specific time period during the execution. The Profiler can also be modified as per user requirements. For example we added a feature to profiler for recording the misses of every set in the banks.

Dalam dokumen Effective Utilisation of LLCs by Managing Associativity, Placement and Mapping (Halaman 72-75)