7.3 TLD-NUCA
7.4.6 Hardware Overheads
TLD-NUCA has some additional hardware overheads because of its TLD. The hardware overheads are calculated in terms of energy, storage and area.
7.4.6.1 Energy Overhead
The energy model used in this thesis is influenced from [72]. The total energy consumed by LLC for executing a particular application is given in Equation 7.1.
The term EX represents energy consumed by component X.
Etotal=Ecache+Entw+Etld (7.1)
The total energy is calculated as the energy consumed by the four different compo- nents: cache banks (cache), NoC (ntw), TLD (tld) and off chip accesses. There are
two types of energy consumed by each components: static (st) and dynamic (dy).
Dynamic energy is consumed during every cache access whereas the static energy represents the leakage power. The total energy consumption of cache banks is cal- culated from the equations7.2,7.3and7.4. The termEXp means energy consumed byX per access, PX means leakage power consumed byX.
Ecache =Ecache dy +Ecache st (7.2)
Ecache dy =totalBankAccesses×Ebankp (7.3)
Ecache st = (no. of banks×Pbank)×execT ime (7.4) The value ofEbankp andPbankare calculated from CACTI 6.0 [2]. Rest of the values are obtained from the full system simulation. The termexecT imemeans the total execution time of TLD-NUCA. The NoC energy consumption is provided by the two modules of GEMS called Garnet and Orion. Equations 7.5, 7.6 and 7.7 gives the total energy consumption of TLD.
Etld=Etld dy +Etld st (7.5)
Etld dy =no. of requests×Etldp (7.6)
Etld st = (no. of tlds×Ptld)×execT ime (7.7) Although logically eachtld-part has very high associativity, it is actually segmented into multiple t-segs having associativity same as the associativity of a bank. Each t-seg duplicates the tag-array contents of S/p sets from a particular bank, where S is the total sets per bank and p is the number of tld-parts. Hence the energy consumption of a t-seg is considered as the energy consumption of tag-array for a bank having S/p sets. The value of Etldp and Ptld are calculated by adding the energy consumption of all the t-segs in a tld-part.
The total energy overhead of both T-DNUCA and TLD-NUCA as compared to T-SNUCA is shown in Figure 7.16(a). Though T-DNUCA has many multicast re- quests and block movements the energy consumption remains same as T-SNUCA.
(a) Total energy overhead compared to T-SNUCA.
(b) The energy breakdown of TLD-NUCA.
(c) EDP gain compared to T-SNUCA.
Figure 7.16: Comparison of different energy consumption parameters of TLD- NUCA and T-DNUCA with T-SNUCA. The experiments are done on a 4MB LLC having each bank as 256KB 4-way associative. For both T-DNUCA and
TLD-NUCA the cascading number considered as 3 and migration is on.
The main reason for this is the smart placement policy which reduces the chances of remote search and hence reduces the network communication as well as the number of additional bank accesses required for remote search. TLD-NUCA on average has 3.08% energy overhead as compared to T-SNUCA. The reason for this overhead is the centralised TLD. Figure 7.16(b) shows the energy breakdown of the different components of TLD-NUCA. The off-chip energy is not shown here. It can be observed that TLD on average require 8% of the total energy consumed by TLD-NUCA. Though TLD requires 8% energy the total energy overhead of TLD- NUCA (as shown in Figure7.16(a)) is less (3.08%) because of the improvements in network energy consumption. TLD-NUCA requires no mandatory communication between the requesting core and the home-bank, hence require less communica- tion than T-DNUCA. Also no multicast search request is needed to go through NoC. TLD-NUCA requires 13.4% less energy for NoC as compared to T-SNUCA.
Though TLD-NUCA has additional energy overhead it gives better EDP (energy
LLC
A LLC size: 4MB J Tag bits: 44 bits
B Number of banks: 16 K Sharers bits: 4 bits
C Size of each bank: (A/B) = 256KB L Dirty bit: 1 bit
D Block size: 64 bytes M Total tag size: (J+K+L)=49 bits
E Address space: 64 bits N Total tag array size: (I×M) = 3211264 bits F Bank associativity: 4 way O Total counter bits: 10 bits (per block) G Total sets per bank: (C/(F×D)) = 1024 P Counter storage: (O×H×B)=655360 bits H Total block per bank: (G×F) = 4096 Q Total LLC Size: (N+A+P)=37421056 bits I Total tag per LLC: (H×B) = 65536
TLD
R Total tld-parts: 4 V Bits for bank-id: 4 bits
S Set per tld-part: (G/R) = 256 W Dirty bit: 1 bit
T TLD associativity: (F×B) = 64 X Total bits per entry: (J+V +W) = 49 bits U Total TLD entries: (R×S×T)=65536 bits Y Total TLD size: (U×X)= 3211264 bits Total TLD-NUCA storage: (Q+Y)= 40632320 bits Total T-SNUCA storage: 36765696 bits
Storage overhead of TLD-NUCA:10.5%
Table 7.5: Storage overhead calculation of TLD-NUCA over T-SNUCA.
× CPI) than T-DNUCA. Figure 7.16(c) shows the total EDP gain of T-DNUCA and TLD-NUCA as compared to T-SNUCA. The reason of better EDP is the improvement in CPI.
7.4.6.2 Storage and area overhead
TLD-NUCA requires additional storage for the TLD. Table 7.5 gives the storage details of the LLC in TLD-NUCA. The table shows that a TLD-NUCA having 4MB LLC with 16 4-way associative banks and 4 tld-parts have 10.5% of storage overhead as compared to T-SNUCA. The overhead remains same for all LLC size and associativity. The area overhead of TLD-NUCA having 4MB LLC, divided into 16 4-way associative banks is 11.47% while the same for 2MB LLC is 10.6%.
7.4.6.3 Energy overhead of highly associative TLD
The total energy consumption of TLD-NUCA (as shown in Section7.4.6.1) is only 3.08% more than T-SNUCA, this overhead increases with the increase in TLD associativity. The associativity of the TLD increases as the bank associativity increases. For example, in case of a 16-core TLD-NUCA having 8-way associative banks, the associativity of TLD is 128. Hence the associativity of each t-segs will
be 8. Experimental analysis found that the energy overhead of such TLD-NUCA is 3.6% more than the corresponding T-SNUCA.
(a) LLC (b) TLD
Figure 7.17: The static energy required per cycle in LLC (all banks) and TLD (all tld-parts). The values are calculated on CACTI.
Figure 7.18: The in- crease in average network latency for TLD-NUCA
(single bankset).
Figure 7.19: The CPI improvements of TLD- NUCA (single bankset)
over T-SNUCA.
The energy consumption of TLD also increases with increasing number of cores in TLD-NUCA. Figure 7.17 shows the total static power consumption per cycle for both LLC and TLD having different number of tiles. It can be observed that though the energy consumption of TLD increases the corresponding consumption of LLC also increases. Such relative increase in energy consumption limits the overhead of TLD during execution. But there are other constraints which reduces the performance of TLD-NUCA having more than 16 cores. Section 7.5 discusses it in detail and also proposes an alternative TLD-NUCA design to support high number of cores without degrading performance.