5.6 Experimental Evaluation
5.6.2 Comparisons for C1
5.6.2.1 Comparison with baseline
Figure 5.7 shows the improvement of FS-DAM as compared to baseline. Three different configurations of FS-DAM based on the fellow-group sizes are used for comparison. The terms 2F, 4F and 8F means FS-DAM having fellow-group size as 2, 4 and 8 respectively. The improvement in terms of MPKI is given in Figure 5.7(a). The average improvements are 26.35%, 27.94% and 28.97% for 2F, 4F and 8F respectively. Reduction (improvement) in MPKI results in more number of cache hits and hence reduces the number of expensive main memory accesses.
Reduction in main-memory accesses improves the average memory access time (AMAT). Figure 5.7(b) shows the improvements of FS-DAM in terms of AMAT over baseline. The improvements are 11.23% (2F), 12.55% (4F) and 12.95% (8F).
Improvement in MPKI as well as in AMAT means improvement in CPI. Figure
fellow-group size (F) =2 fellow-group size (F)=4 fellow-group size (F) =8 Benchmarks MPKI AMAT CPI MPKI AMAT CPI MPKI AMAT CPI vips 17.86 05.21 02.93 19.83 05.55 05.12 20.01 05.34 05.79 face 16.54 06.61 06.21 17.45 06.66 06.50 17.40 06.80 06.38 bdy2 14.52 06.11 04.76 15.82 06.35 05.18 16.42 06.82 06.16 flud 10.38 05.32 05.47 11.81 05.04 06.52 18.33 05.05 06.94 frq16 13.00 05.67 04.54 13.76 06.37 04.58 14.85 06.32 04.86 ferrt 16.07 05.39 07.42 14.26 06.17 07.60 14.37 06.70 07.46 mix1 11.90 08.71 05.95 10.45 08.25 05.92 10.83 08.23 05.95 mix2 11.21 05.27 05.27 12.46 05.92 06.21 12.77 06.09 07.84 frt16 10.37 03.02 05.48 10.87 03.46 05.97 10.73 03.47 05.86 blk4 14.04 08.11 06.91 13.69 08.80 07.17 13.97 10.05 07.97 freq 11.14 05.29 05.10 11.62 06.43 05.40 11.82 07.01 05.73 Average 13.13 05.69 05.32 13.56 06.11 5.95 14.38 06.32 06.38
Table 5.5: Improvements (in %) of FS-DAM over CMP-SVR. The improve- ments are shown in terms of MPKI, AMAT and CPI for three different fellow-
group sizes. The cache configuration is C1.
5.7(c)shows that the average CPI improvement of FS-DAM as compared to base- line are: 10.82% (2F), 12.04% (4F) and 12.58% (8F).
The improvement of FS-DAM over baseline is because of the better utilisation of the storage alloted to each bank. The heavily used sets can store more number of blocks than the bank associativity. It has been observed that the hit rate of FS-DAM is better than baseline. The improvement in hit rate is because of the indirect hits of FS-DAM. The heavily used set can use the reserve ways of underused sets and hence generates indirect-hits which would have a miss in case of baseline even though there are idle ways in other sets.
5.6.2.2 Comparison with CMP-SVR
FS-DAM is proposed to handle the issues in CMP-SVR. The index based static fellow-group creation in CMP-SVR restricts the utilisation enhancement in the banks. FS-DAM creates fellow-groups based on the set-loads and also re-grouped them dynamically if required. Table 5.5 shows the improvement of FS-DAM over CMP-SVR for 2F, 4F and 8F. The same fellow-group size is used for both FS- DAM and CMP-SVR for the comparisons. The table shows that the FS-DAM gives 13.13% (2F), 13.56% (4F), 14.38% (8F) improvements in terms of MPKI as compared to CMP-SVR. The improvements in AMAT are 5.69% (2F), 6.11% (4F)
and 6.32% (8F). Improved MPKI and AMAT results 5.32% (2F), 5.95% (4F) and 6.38% (8F) improvements in CPI.
Figure 5.8: MPKI improvements in FS-DAM with and without re-grouping.
The cache configuration considered for this experiment is C2.
Figure 5.9: Load distribution among the fellow-groups of FS-DAM.
The improvement of FS-DAM over CMP-SVR is obtained due to two reasons: (a) better load distribution among the fellow-groups and (b) re-organising the fellow- groups based on the current demand. Figure 5.9 shows the load distribution among the fellow-groups of FS-DAM. The figure shows the comparison for the same benchmark and same bank as was considered for Figure 5.3, to show the non-uniform load distribution among the fellow-sets of CMP-SVR. In comparison with Figure 5.3, we have a uniform distribution in Figure 5.9. The importance of re-grouping in FS-DAM, which is not present in CMP-SVR, can be observed from Figure 5.8. The figure shows the MPKI improvement of FS-DAM with and without re-grouping, while compared to the baseline. It can be observed that the improvement is only 20.77% without re-grouping while using re-grouping the improvement is 32.56%. Note that the improvement shown without re-grouping is also better than CMP-SVR because of the set-load based fellow-group creation.
5.6.2.3 Different fellow-group sizes
Higher fellow-group size shows more improvement as the sets get more chances to distribute their load. But as the fellow-group size increases the associativity of
FS-TGS also increases and hence hardware overhead also increases. Section 5.6.5 gives the detail discussion about the hardware overheads of FS-DAM. Since the improvements shown for 4F and 8F in the above comparisons are almost similar, most of remaining comparison shown in this chapter use only 4F to compare FS- DAM with other techniques.
(a) MPKI (C1).
(b) AMAT (C1).
(c) CPI (C1).
Figure 5.10: The improvements of FS-DAM over V-Way, Z-Cache and SBC while compared to baseline. The cache configuration is C1.
5.6.2.4 Comparison with other techniques
FS-DAM is also compared with V-Way, Z-Cache and SBC. The same cache config- uration is taken for all the experiments. For V-Way the Tag to Data Ratio (TDR) is taken as 2, which means the associativity of each set can be increased twice at max. For Z-Cache the replacement candidate is considered as three level.
Figure5.10 shows the improvements of FS-DAM, V-Way, Z-Cache and SBC over baseline. The improvements over MPKI is shown in Figure 5.10(a). It can be ob- served that the average improvement of FS-DAM over baseline is 27.94% while other techniques improve by 15.76% (V-Way), 17.15% (Z-Cache) and 17.32%
(SBC). The AMAT improvements are shown in Figure 5.10(b). The average im- provements in terms of AMAT are 12.55% (FS-DAM), 5.79% (V-Way), 6.59%
(Z-Cache) and 6.86% (SBC). The improvements in terms of CPI (Figure 5.10(c)) are 12.04%(FS-DAM), 5.98% (V-Way), 6.85% (Z-Cache) and 6.75% (SBC). It can be observed that FS-DAM shows better performance than the other three existing techniques. The performance of V-Way, Z-Cache and SBC are almost similar to CMP-SVR. The load based fellow-group creation and dynamic re-grouping based on requirements gives FS-DAM better improvements. The performance enhance- ment of FS-DAM over these existing techniques is discussed in Section 5.6.4.