4.2 Proposed Wear Leveling Techniques
4.2.1 Static Window Write Restriction (SWWR)
Chapter 4. Intra-Set Wear Leveling using Vertical Partitions 95
W1 W2 W3
W0 W0
Write Restricted
Window Sec. 4.2.1 / Sec. 4.2.2
Write Restricted
Way Sec.4.2.3 Window
Figure 4.1: General overview of contribution in chapter 4
Figure 4.1 presents the general overview of the proposed contributions in this chapter.
The chapter is organized as follows: Proposed wear leveling techniques are pre- sented in section 4.2. Section 4.3 illustrates the experimental methodology. Re- sults and analysis are presented in section 4.4. Section 4.5 reports the parameter comparison analysis. Finally, we summarize this chapter in section 4.6.
Chapter 4. Intra-Set Wear Leveling using Vertical Partitions 96 predefined interval (I). In each interval, one window of the cache is selected and treated as a write restricted (i.e., read-only) window. In particular, during the interval, all the writes coming from L1/L2 cache, i.e., Upper-Level Cache (called ULC) to the Write Restricted Window (WRW) of an L2/L3 cache (i.e., LLC) are redirected to other windows of the same cache set. At the end of a predefined interval, the next window of the cache is selected (as a write-restricted), and the process continues until the end of execution.
4.2.1.2 Algorithm
The working approach of the SWWR is elaborated through Algorithm 1. In our case, L2/L3 or LLC is the non-volatile STT-RAM/ReRAM based cache. In the algorithm, the tunable parameter I is used as a predefined interval (line 3). The total number of logical partitions or windows in the LLC are denoted by m (line 4). Note that each partition or window in the algorithm is represented by the variable Wi (line 5), where the range of i is from 0 to m−1.
When the application execution begins then for the initial I cycles, the cache is treated as an ordinarily available cache (line 6). Once the application executes I cycles, one of the windows in the cache is treated as a write-restricted window for next intervalI(line 7) and, periodically for each interval, a new window is selected by rotation (line 9 to 11). The process continues until the end of execution (line 26).
When the request R comes from ULC to LLC, the tag lookup operation is per- formed. Depending upon the result of the lookup operation for requested block B, the actions are taken as given below:
• Read Hit: The requested blockBin the LLC is served normally to higher-level cache irrespective of its location in the cache (line 13 and 14).
• Write Hit (PUTX or write-back) and requested Block B in Wi: If B belongs to the selected window Wi with the invalid line(s) in the other windows
Chapter 4. Intra-Set Wear Leveling using Vertical Partitions 97 Algorithm 1 Static Window Write Restriction (SWWR)
1: U LC: Upper Level Cache i.e. L1/L2.
2: LLC: Last Level Cache i.e. L2/L3.
3: I: Predefined interval.
4: m: Number of logical partitions or windows.
5: Wi: ith logical partition or window that treated as read only (or write restricted) in the current interval.
0≤i < m
6: Run application forIcycles treating the whole cache as normally available cache.
7: AfterIcycles treat one window at a time as write restricted and rotate turns in round robin fashion.
8: repeat
9: forevery intervalIdo
10: i=W IN SELECT(i, m)
11: WindowWiis selected as Write Restricted Window (WRW) for the current intervalI.
12: foreach requestRfromU LCto the blockBinLLCduringIcyclesdo
13: ifR=ReadHitthen
14: N ORM OP R(R, B)
15: else ifR=W riteHitthen
16: ifB∈Withen
17: W RIT EREDIRECT(R, B,WRW)
18: else
19: N ORM OP R(R, B)
20: end if
21: else
22: P ROCESSCACHEM ISS(R,WRW) .cache miss
23: end if
24: end for
25: end for
26: untilthe end of the execution
Write Restricted Window Selection for SWWR
27: functionWinSelect(i,m)
28: i= (i+ 1)%m
29: returni
30: end function
Functions used by the Algorithms
31: functionNormOpr(R,B)
32: RequestRis served normally from blockBas the conventional cache.
33: end function
34: functionWriteRedirect(R,B, WRW)
35: Write the blockBto other locationLin the same cache set. Note that the locationLdoes not belong to currently selected WRW.
36: returnL
37: end function
38: functionprocessCacheMiss(R, WRW)
39: Forward the RequestRto main memory to fetch the block. Keep the newly arrived block in a location other than WRW location.
40: end function
of the same cache set, the request R from ULC is redirected to the first invalid line and the BlockB is invalidated from the respective location in write restricted window. In the other case, when there is no invalid line in the other windows of the same cache set, the LRU victim line v is picked from the Location L and the write-back operation is performed according to its dirty bit. Note that the location L is the location other than the write restricted window location. Once the victim v is evicted from the LLC, the write request from a ULC is redirected to the generated location L and the Block B is invalidated from its respective location in WRW (line 15 to 17).
Chapter 4. Intra-Set Wear Leveling using Vertical Partitions 98
W0 W1 W2 W3 W4...W7 W8...W11 W12...W15 L1/L2 Cache
(ULC) 1
2 3
4 W1
Write Restricted Window
L2/L3 Cache (LLC)
5
W2 W3
W0
Figure 4.2: Working example of proposed SWWR wear leveling policy
• Write Hit (PUTX or write-back) and B not inWi: The requested Block B in the LLC is written normally by the ULC (line 18 to 20).
• LLC miss: When the requested BlockB is not present in the LLC, the request Rfrom the ULC is forwarded to the next level of the memory hierarchy (i.e., main memory in our case). In this case, the newly arrived Block is placed in the location other than the WRW Wi location (line 21 to 23).
4.2.1.3 Working Example
Figure 4.2 depicts the working methodology of the SWWR. In the figure, 16-way set-associative LLC is partitioned into 4 (m = 4) equal-sized windows (W0, W1, W2 and W3). Each window of the LLC contains four ways and, the write restricted window is in W0, i.e., way-0 to way-3. To explain the methodology, we used the arrows to show the request and response from the ULC and LLC. For the read request (shown by the arrow 1) to the way-0 of LLC, the response is served normally by the LLC (as indicated by arrow 2). On a write request from ULC cache to the block belongs toW0 of the LLC (arrow 3), the request is redirected to the one of the free cache way(s) of W1, W2 and W3. In case, if all the cache ways are occupied, the LRU victim among these ways is selected, and the write-back
Chapter 4. Intra-Set Wear Leveling using Vertical Partitions 99
1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192
Write Counts (Log2)
Way Indexes
Mix3 Mix4 Swap Body
Figure 4.3: Write counts for four different workloads in the SWWR
Workloads H-Win
Type Bench 1 2 Avg.
PARSEC
Body 14.4% 3.01% 17.44%
Cann 2.54% - 2.54%
Ded 6.03% 1.41% 7.44%
Swap 17.2% 7.27% 24.5%
X264 10.04% - 10.04%
SPEC
Mix1 11.17% 3.71% 14.88%
Mix2 9.34% 4.11% 13.45%
Mix3 9.88% 4.26% 14.14%
Mix4 12.4% 4.74% 17.2%
MEAN 10.35% 3.17% 13.51%
Table 4.1: Percentage times Heavily written Window (H-Win) available in cache
operation is scheduled with the redirection of a write request (arrow 4). Once the write operation is performed, the write-back acknowledgment is sent to the ULC (arrow 5), and the requested block is invalidated from the W0.
4.2.1.4 Limitation of SWWR
Figure 4.3 presents the effects in the write counts of different blocks inside the cache sets after applying SWWR. As can be seen from the figure, the maximum write count is reduced compared to write count of the Non-Volatile without any wear-leveling support as shown in the figure 2.9 (Details about the experimental setup is reported in section 4.3). However, the limitation of the SWWR is the lack of consideration of write intensity of other windows in the window selection process.
Because of write variation generated by the applications, the write intensity of the windows changes over the period. In other words, during execution, the lightly
Chapter 4. Intra-Set Wear Leveling using Vertical Partitions 100 written window becomes heavily written, and vice-versa. By considering these heavily written windows as a write-restricted during the execution, we can further improve the relative lifetime and reduce the coefficient of intra-set write variation (observed in figure 4.3 for SWWR). Table 4.1 shows the percentage availability of heavily written windows (H-win) during the window selection process in SWWR.
From the table, we can conclude that on an average 13.51% times a heavily written window other than the selected write restricted window is available in the cache.
This motivates us to identify such windows and improve the lifetime further.