4.4 Experimental Setup
4.4.1 Simulation Results and Analysis
CMP-VR is designed on top of the baseline TCMP (T-SNUCA) as shown in Figure 4.6. Note that the figure is exactly same as the Figure 3.1. It is shown here again for completeness. The concept of CMP-VR is applied to each bank of the baseline TCMP. The size of LLC in CMP-VR is equal to the size of LLC in baseline.
The number of L2 misses in CMP-VR and in the baseline can be calculated as follows:
• Total L2 misses in CMP-VR=
Total L1 miss - (Total NT hit + Total RT hit).
• Total L2 misses in baseline = Total L1 miss - Total L2 hit.
The term xMyW represents a baseline TCMP having LLC of size xMB. The LLC has 16 equal sized banks. The associativity of each bank is y. Also the term xMyW-zR represents a TCMP having LLC of size xMB with sixteen y-way set associative banks and in each bank, z% of y ways (z/100∗y) from each set are reserved for RT. For example, 4M4W-25R represents an LLC of size 4MB and having sixteen equal sized banks. Each bank is 4-way set associative and 25%
ways (i.e. 0.25*4=1) from each set are reserved for RT.
The performance of CMP-VR is analyzed with different associativity and reserve storage areas. The complete list is given below:
2M-4W: This category has a 2MB 4-way set-associative LLC as baseline (2M4W) to compare with the following CMP-VR configurations.
• 2M4W-25R: 2MB 4-way set associative LLC having 25% ways (1 way) from each set as reserve.
• 2M4W-50R: 2MB 4-way set associative LLC having 50% ways (2 ways) from each set as reserve.
4M-4W: This category has a 4MB 4-way set-associative LLC as baseline (4M4W), to compare with the following CMP-VR configurations.
• 4M4W-25R: 4MB 4-way set associative LLC having 25% ways (1 way) from each set as reserve.
• 4M4W-50R: 4MB 4-way set associative LLC having 50% ways (2 ways) from each set as reserve.
4.4.1.1 2M-4W
The size of each L2 bank in this configuration is 128KB and the total L2 cache size is 2MB (128KB*16). Figure 4.7 shows the performance comparison of CMP- VR with the baseline design. The result of CMP-VR is given for two different configurations: 2M4W-25R and 2M4W-50R. Each graph in the figure shows the results of different performance metrics normalized to the baseline design value.
(a) MPKI. (b) CPI.
Figure 4.7: Normalized performance comparison of CMP-VR (2M4W-25R and 2M4W-50R) with baseline (2M4W) design.
CPI MPKI
Benchmarks 25R-O-B 50R-O-B 50R-O-25R 25R-O-B 50R-O-B 50R-O-25R
swp 6.25 8.04 1.91 9.61 21.21 12.84
face 8.07 14.62 7.13 18.08 38.37 24.76
vp4 10.00 20.31 11.46 23.36 30.89 9.82
frrt 3.86 10.51 6.92 28.92 35.83 9.73
body 4.27 7.83 3.72 13.33 27.94 16.85
flud 6.18 9.46 3.50 34.34 41.55 10.98
x264 7.05 12.29 5.64 15.91 21.83 7.04
blck 4.82 6.82 2.11 16.10 23.46 8.77
Average 6.33 11.33 5.34 20.35 30.52 12.77
Table 4.3: Performance improvement (in %) chart for 2M-4W. 25R-O-B means percentage of improvement in 2M4W-25R over baseline (2M4W); 50R-O-B means percentage of improvement in 2M4W-50R over baseline (2M4W); and 50R-O-25R means percentage of improvement in 2M4W-50R over 2M4W-25R.
Figure 4.7(a) shows that 2M4W-25R gets 9.61% to 34.34% reduction in MPKI with an average of 20.35%. In other words, on average, 20.35% times the blocks are found in RT, resulting in hit, whereas they would be termed as misses in the baseline design. Similarly, 2M4W-50R gets 21.21% to 41.5% reduction is MPKI with an average of 30.52%. Figure 4.7(b) shows the performance comparison in terms of CPI. It shows that in the case of 2M4W-25R, CPI improves by 3.86%
to 10% with an average improvement of 6.33% and in case of 2M4W-50R the improvement is 6.82% to 20.31% with an average of 11.33%.
Table 4.3 shows the performance improvement in terms of both MPKI and CPI for each benchmark. The performance of both proposed configurations (2M4W- 25R and 2M4W-50R) are compared with the baseline (2M4W) as well as they are compared between themselves.
In the results shown, it can be observed that for some benchmarks there is a large improvement in the MPKI, however the CPI improved is not as expected. This is because of the more number of indirect-hits, which takes twice the time taken by direct-hits. Figure 4.8 shows the distribution of direct-hits and indirect-hits in each benchmark. Though the CPI is not improving proportionally (with the improving MPKIs) but there is performance gain over baseline.
Comparison with higher associativity baseline: Increasing the associativity of baseline (keeping the cache size same) improves the performance because of the less number of conflict misses. However, CMP-VR performs even better than the
Figure 4.8: Distribution of direct-hits and indirect-hits in CMP-VR.
– 25R-O-B 50R-O-B 25R-O-2B 50R-O-2B
Average CPI improvement(%) 6.33 11.33 3.02 8.20
Average MPKI improvement(%) 20.35 30.52 9.50 21.06
Table 4.4: Average performance improvement (in %) of CMP-VR, considering all the eight benchmarks. 25R-O-B means improvement of 2M4W-25R over the first baseline (2M4W) and 25R-O-2B means improvement of 2M4W-25R over the second baseline (2M8W). 50R-O-B and 50R-O-2B also have similar meaning.
baseline with double the associativity (2M8W) compared to the original baseline (2M4W).
(a) MPKI. (b) CPI.
Figure 4.9: Normalized performance comparison of CMP-VR (2M4W-25R and 2M4W-50R) with baseline (2M8W) having higher associativity.
The performance of both 2M4W-25R and 2M4W-50R is compared with an 8-way set associative baseline cache (2M8W). Figure 4.9(a) shows this comparison in terms of MPKI. It shows that, in the case of 2M4W-25R, the MPKI improved by 2.06% to 17.80% with an average of 9.5%. On the other hand, the MPKI of 2M4W-50R improves by 12.82% to 27.40% with an average of 21.06%. Figure 4.9(b)shows the improvement in terms of CPI. It shows that 2M4W-25R improves CPI by 1.91% to 4.12% with an average of 3.02% and 2M4W-50R improves CPI by 4.45% to 14.42% with an average of 8.20%.
Table4.4shows the average improvement of both 2M4W-25R and 2M4W-50R over the two baseline design (2M4W and 2M8W). From the analysis, it is clear that CMP-VR having 25% reserve storage (25R) improves performance of the baseline design but a CMP-VR with 50% reserve storage (50R) improves performance even more.
4.4.1.2 4M-4W
The bank size in this configuration is 256KB and the total L2 cache size is 4MB (256KB*16). Figure4.10shows the performance comparison of CMP-VR with the baseline design. The two different configurations of CMP-VR used in this section are: 4M4W-25R and 4M4W-50R.
(a) MPKI. (b) CPI.
Figure 4.10: Normalized performance comparison of CMP-VR (4M4W-25R and 4M4W-50R) with baseline (4M4W) design.
Figure 4.10(a) shows that 4M4W-25R gets 9.28% to 32.92% reduction in MPKI with an average of 21.55% and the same for 4M4W-50R is 23.67% to 43.25% with an average of 36.29%. The performance comparison in terms of CPI is shown in Figure 4.10(b). Here, 4M4W-25R improves CPI by 1.12% to 12.29% with an average of 7.75% and in the case of 4M4W-50R the improvement is 2.88% to 23.03% with an average of 13.31%. Table4.5 shows the performance improvement in terms of both MPKI and CPI for each benchmark.
CPI MPKI
Benchmarks 25R-O-B 50R-O-B 50R-O-25R 25R-O-B 50R-O-B 50R-O-25R
swp 7.84 10.54 2.93 13.67 23.14 10.97
face 10.39 18.71 9.28 29.04 41.46 17.50
vp4 11.88 23.03 12.65 26.13 32.26 8.30
frrt 4.46 16.29 12.38 14.03 43.25 33.98
body 5.79 9.99 4.46 9.28 25.88 18.30
flud 12.29 14.18 2.15 28.14 40.12 16.67
x264 1.12 2.88 1.78 15.71 40.24 29.10
blck 7.67 9.27 1.73 32.92 40.86 11.83
Average 7.75 13.31 6.02 21.55 36.29 18.78
Table 4.5: Performance improvement (in %) chart for 4M-4W. 25R-O-B means percentage of improvement in 4M4W-25R over 4M4W (baseline); 50R-O-B means percentage of improvement in 4M4W-50R over 4M4W; and 50R-O-25R
means percentage of improvement in 4M4W-50R over 4M4W-25R.
Comparison with higher associativity baseline: Comparison of 4M4W-25R and 4M4W-50R with higher associativity baseline (4M8W) is shown in Figure4.11.
(a) MPKI. (b) CPI.
Figure 4.11: Normalized performance comparison of CMP-VR (4M4W-25R and 4M4W-50R) with baseline (4M8W) having higher associativity.
– 25R-O-B 50R-O-B 25R-O-2B 50R-O-2B
Average CPI improvement(%) 7.75 13.31 2.69 8.55
Average MPKI improvement(%) 21.55 36.29 10.11 27.00
Table 4.6: Average performance improvement (in %) of CMP-VR, considering all the eight benchmarks. 25R-O-B means improvement of 4M4W-25R over the first baseline (4M4W) and 25R-O-2B means improvement of 4M4W-25R over the second baseline (4M8W). 50-R-O-B and 50R-O-2B have similar meaning.
Figure4.11(a)shows that in case of 4M4W-25R, the average MPKI improvement is 10.11% while the same for 4M4W-50R is 27%. Figure4.11(b)shows that 4M4W- 25R improves CPI by 2.69% and 4M4W-50R improves CPI by 10.11%.
Table 4.6 shows the average improvement of both 4M4W-25R and 4M4W-50R over the two baseline designs: 4M4W and 4M8W.