1.7 Thesis Contributions
1.7.4 Victim Caching to Improve the Performance of NVM Caches 15
To fill the performance gap generated due to costly write operation in the NVM cache, we have integrated a victim cache with the main cache. Employment of the victim cache with the NVM cache requires good victim migration and retention capabilities. As the migration of the block from the victim cache to the NVM based main cache requires extra clock cycles as well as incurs extra energy with the NVM caches. By considering these facts, we have developed a strategy that serves write-intensive blocks directly from the victim cache without placing them back to the main cache. In addition to this approach, we have also proposed a replacement policy for the victim cache. The proposed replacement policy has taken the victim replacement decision based upon the number and the type of requests entertained by the block in the victim cache as well as the time-stamp at
Chapter 1. Introduction 16 which the block is last accessed. The result of the proposed scheme is compared with the existing hybrid cache architecture: RWHCA [28] and the baseline SRAM and STT-RAM-based caches for the quad-core system.
The proposed technique improves performance by 5.88% over STT-RAM and 3.45% over RWHCA. Whereas, the energy improvement values over baselines STT and SRAM and the prior works: RWHCA are 8%, 93.5%, and 78.85% respectively.
All the improvements come at the marginal cost of storage and area overhead.
The full description of this work is given in Chapter 7.
1.7.5 Victim Caching to Improve the Performance of Hy- brid Caches
In this proposal, we have added a victim cache to compensate the performance gap due to increased miss rate by partitioning the cache into multiple regions, and the applied block placement policy. The employment of the victim cache with the hybrid cache requires an effective victim block placement policy upon hit in the victim cache to the different region of the hybrid cache and to give a substantial amount of space for the victims evicted from each region of hybrid cache in the victim cache, a region-based dynamic victim partitioning policy. All these facts motivate us to develop an access based victim block placement policy and the dynamic region-based victim cache partitioning strategy. Our access based victim block placement policy has considered the type of access as well as the dirty status of the victim block before making a placement decision to the different region of the hybrid cache. Whereas, our dynamic region-based victim partitioning approach dynamically partitions the victim cache based upon the number of victim evictions from the smaller region of the hybrid cache. The victim eviction counts are assimilated by dividing the application execution into multiple intervals. The results of the proposed scheme are compared with baselines SRAM, STT-RAM and the hybrid cache with no specific placement policy and prior hybrid cache architecture: RWHCA [28] on a quad-core system.
Chapter 1. Introduction 17 The proposed work improves the performance of hybrid cache by 4.43% against STT, 3.03% against baseline HCA, and 2.32% over RWHCA. Whereas, the re- spective improvements in the energies are 41.3% against SRAM, 34.1% over STT, 24.3% against HCA, and 15% over RWHCA. All these improvements come at a marginal cost of storage and area overhead.
More details about this work are given in Chapter 7.
1.8 Summary
To fulfill the high data demands in the modern CMPs, integrated with a large number of processing cores, larger multi-level on-chip caches are attached. Among these multi-level caches, the LLCs play an essential role in maintaining system performance. But these larger sized LLC fabricated from traditional memory technologies occupies larger wafer real estate area and also accounts for signifi- cant leakage power consumption. The recent emergence of Non-Volatile Memory technologies has shifted the paradigm and the computer architects are looking at them as a alternate choice in the memory hierarchy. Over the traditional mem- ory technologies, these NVMs allow the construction of the on-chip LLC which are highly dense, non-volatile, low static energy, and better scalability. However, when employed in the cache, these NVMs incur extra write energy and consume extra clock cycles for the write operations. In addition, these NVM caches also have a minimal lifetime due to the weak write endurance and the non-uniform write distribution (at the level of cache-sets, i.e., inter-set and blocks inside the cache sets, i.e., intra-set) from the higher-level caches. In this dissertation, we aim to enhance the longevity of the NVM based LLC by dealing with their challenges and make them as a capable candidate to fit in this cache hierarchy.
To overcome the costly write operations, we initially employed a hybrid cache architecture where a larger portion of NVM, a small portion of SRAM is integrated to save the costly write operation. In such an HCA, block placement is a critical task for energy efficiency. In this regard, we have presented a block placement
Chapter 1. Introduction 18 technique that considers the private blocks from the different memory block access and uses a predictor to effectively place the different blocks in the different regions of HCA. Whereas, with regards to compensating the performance gap due to costly write operations for the NVM cache and the increased miss rate for HCA, we have employed the victim cache with both the architectures. In this work, with NVM cache, we have presented selective victim retention and caching policy for the write-intensive blocks. Whereas, with HCA, we have proposed an access aware block placement technique for placing the block from the victim cache upon a hit and to give substantial space for the victim evicted from each region of HCA, a dynamic region-based victim cache partition approach is presented.
To improve the lifetime of the NVM caches affected by the non-uniform write dis- tributions, we propose two wear-leveling approaches for inter-set and intra-set. For intra-set wear leveling, four approaches: SWWR, DWWR, DWAWR, and MWWR are presented that works on the basic concept of the write-restriction. Whereas, for inter-set wear leveling, two strategies: FSSRP and FSDRP are proposed that works on the basic idea of fellow-set and dynamic associativity management. Both the wear leveling proposals improve the lifetime significantly with the negligible impact on performance.