5.2 Proposed Wear Leveling Technique: MWWR
5.2.2 Operation
The operation of the proposed wear leveling technique is elaborated through al- gorithm 4. In the algorithm, the parameter I acts as a tunable value for the predefined interval (line 1). The total number of sub-ways that are treated as a write-restricted in each module is represented by the variable m (line 2). The variable M represents the total number of modules inside a cache bank (line 3).
M odulewaysList is a list of lists of size M ∗m. It contains the m sub-ways id of all the M modules which are treated as write-restricted in that particular period of the regular interval I (line 4). Thus, the M odulewaysList is populated in each interval. The counter associated with each sub-way of the module is represented by the variableCij (line 5).
For the initialI cycles of process execution, the cache bank is treated as a normally available bank (line 6). Once the process executes the I cycles, different sub-ways of the different modules are treated as a write-restricted for the nextI cycles (line 7). The selection of sub-ways in each module is based upon the counterCij associ- ated with the sub-ways of the modules. In particular, from each module, maximum msub-ways counter values are chosen and placed into theM odulewaysListof that particular module (line 12 and 13). By this way, the proposed technique restricts the chances of heavily written sub-ways of the modules to get further more writes in the next interval. Once the m sub-ways of the module are selected for write restriction, the respective counters associated with them are reset to zero (line 14).
When the interval I ends, the write restricted sub-way list is prepared again for each module and the process continues until the end of execution.
Chapter 5. Intra-Set Wear Leveling using Horizontal Partitions 134 Algorithm 4 Wear Leveling Algorithm - MWWR
1: I: Predefined interval.
2: m: Number of sub-ways in each module that are treated as read only or write restricted.
3: M: Number of modules in the cache.
4: List< integer, List < integer >> M odulewaysList: List of write restricted sub-ways in each module. Size of list isM∗m.
5: Cij: Sub-way counter with respect toithmodule andjthsub-way that records the number of write accesses from L1 cache to that particular sub-way. 0≤i < M, 0≤j < cache assoc.
6: Run application forIcycles treating the whole cache as normally available cache.
7: After theIcycles treatmways of each module as read-only or write restricted.
8: repeat
9: forevery intervalIdo
10: fork←0 toM do
11: forl←0 tomdo
12: LetCijbe the maximum counter among all the counters in the module i of cache. 0≤j <
cache assoc
13: M odulewaysList[k][l] =j .create module list of heavily written ways
14: Cij= 0
15: end for
16: end for
17: foreach requestRfrom L1 cache to the blockBin L2 cache duringIcyclesdo
18: ifR=ReadHitthen
19: Perform the read operation as in the conventional cache.
20: else ifR=W riteHitthen
21: ifthe request R is for the blockBthat belongs to the currentM odulewaysListthen
22: The write request for the blockBis redirected to the other locationLin the same cache set. Note thatL /∈M odulewaysList.
23: The corresponding sub-way counterCijof the locationLis incremented.
24: else
25: Write operation is performed onBas in conventional cache. Increment the counterCij of the way where the blockBis present.
26: end if
27: else
28: Forward the RequestRto main memory to fetch the block. Keep the newly arrived block in a
locationLsuch thatL /∈M odulewaysList. .cache miss
29: end if
30: end for
31: end for
32: untilthe end of the execution
In the meanwhile, between the intervals, for each requestR coming from L1 cache to the L2 cache, the tag lookup operation is performed for that particular request in the L2 cache (line 17). Based upon the result of lookup operation and the type of requests, different operations are performed in the L2 cache which is as follows (Note that the PUTX used in the following cases represents the write-back of dirty data from the L1 cache to L2 cache):
• Read Hit: The read operation is performed for the requested blockB as same as in the conventional cache (line 18 and 19).
• Write Hit (Write-back or PUTX) and blockB not in M odulewaysList:
The write requestRis served from the original location of blockB. Once request is served, the sub-way counter of the module where blockB is located is incremented (line 24 to 26).
Chapter 5. Intra-Set Wear Leveling using Horizontal Partitions 135
• Write Hit (Write-back or PUTX) and block B in M odulewaysList:
If the requested block B is present in the cache at location T that belongs to the M odulewaysList, then the write request R (from L1 cache) is redirected to the first invalid way of the same cache set other than the ways belonging to M odulewaysList. In case, if there is no invalid entry present in the other sub- ways of same cache set, one Least Recently Used (LRU) victim block is picked, say v from the locationL. Note that the location L is the location other than the location contained in the M odulewaysList for that particular module. Once the victim entryv is selected, the write-back operation is scheduled forv according to the status of its dirty bit. Subsequently, the requestR is redirected to the location L and the block is written in that location. Once the request is served, the block B is invalidated from its locationT. The counter for the sub-way where the write was redirected, Lin this case, is incremented (line 20 to 23).
• Cache Miss: The request R from the L1 cache is forwarded to next level of memory. When the requested block has arrived, it is placed in a location other than those belonging to M odulewaysList for that particular module (line 27 to 29).
The working example of the proposed MWWR wear leveling technique is de- picted in fig. 5.3. As shown in the figure 5.3 (a) at time-stamp t1, in module-2 (M2), the sub-way 1 and 5 are treated as write-restricted for the current interval (M odulewaysList[2] = {1,5}). Three cases are considered in part(a) to demon- strate the method for M2 i.e., set-4 and set-5 of a cache bank. In the first case, a read request (shown by arrow-1) to the sub-way 2 of set-4 is served normally by the cache (arrow-2). In the second case, the write request (shown by dotted arrow-3) from the L1 cache to set-4 and sub-way 7 is served (arrow-4) normally by the L2 cache. Once the write operation is completed, the respective counter C27 is incre- mented. For the last case, the write request (arrow-5) from the L1 cache to the set-5 and sub-way 5 (sub-way treated as write restricted in the current interval) is redirected (arrow-6) to the one of the other ways (3 and 6 in our case) based upon the availability and the victim entry location in the same cache set. Depending
Chapter 5. Intra-Set Wear Leveling using Horizontal Partitions 136
M2 Set-4
Set-5
0 1
8 ways
2 3 4 5 6 7
C20 C21 C22 C23 C24 C25 C26 C27 L1 Cache
1
2 3
4 5
6
M2 Set-4
Set-5
0 1 2 3 4 5 6 7
C20 C21 C22 C24 C25 C26 (a)
(a)
(b)
C23 C27
Figure 5.3: Working example of the proposed MWWR wear leveling tech- nique. (a) Status of module-2 at time-stamp t1 (b) Status of module-2 at
time-stamp t2
Components Parameters
Processor 2Ghz, Quad Core, X86 L1 Cache
Private, 32 KB SRAM Split I/D caches, 4-way set associative cache, 64B block, 1-cycle
latency, LRU, write-back policy L2 Cache Shared, 64B block, LRU, write-back policy Main Memory 2GB, 160 cycle Latency
Protocol MESI CMP Directory
Table 5.1: System parameters
upon where the write is redirected, the respective sub-way counter is incremented, and the data block in the set 5 of sub-way is invalidated because of its relocation.
If the write is redirected to the sub-way 3, then the counter C23 is incremented.
At the end of time-stamp t1 interval, for time-stamp t2 interval (shown in fig. 5.3 (b)), different sub-ways (3 and 7) are selected for write restriction based upon the values of write counters.
Chapter 5. Intra-Set Wear Leveling using Horizontal Partitions 137
L2 Configuration
Leakage Power (mW)
Hit Energy
(nJ)
Miss Energy
(nJ)
Write Energy
(nJ)
Hit Latency
(ns)
Miss Latency
(ns)
Write Latency
(ns)
32MB, 16way 454.35 0.486 0.094 4.215 5.047 1.616 12.425
16MB, 32way 423.03 0.534 0.193 6.296 4.19 1.522 11.974
16MB, 16way 406.22 0.432 0.092 4.162 4.227 1.560 11.974
16MB, 8way 405.40 0.387 0.047 3.189 4.225 1.558 11.974
8MB, 16way 136.2 0.329 0.096 4.164 3.605 1.479 11.81
Table 5.2: Timing and energy parameters for STT-RAM L2 cache